[Freeipa-users] replication mess
Robert Story
2017-03-24 00:25:32 UTC

we have 2 auth servers with a replication agreement. Turns out that auth-2
had network issues that went unnoticed from some time after a reboot. This
wasn't discovered until after a yum update on auth-1 this morning. Now my
logfile is filling up with this message:

[23/Mar/2017:10:33:58.923454036 -0400] NSMMReplicationPlugin - changelog program - agmt="cn=masterAgreement1-auth-2.XXX-pki-tomcat" (auth-2:389): CSN 586175b0000000600000 not found, we aren't as up to date, or we purged

I'm not quite sure how to proceed. auth-2 network was fixed, and yum
updated as well. Here are the replication error messages on auth-1 from
today. You can see where it came up after the yum update around 08:56, and
where auth-2 came up around 10:33.

[23/Mar/2017:08:56:13.006916824 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: disordely shutdown for replica dc=XXX. Check if DB RUV needs to be updated
[23/Mar/2017:08:56:13.107849258 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: disordely shutdown for replica o=ipaca. Check if DB RUV needs to be updated
[23/Mar/2017:08:56:17.107916747 -0400] NSMMReplicationPlugin - agmt="cn=meToauth-2.XXX" (auth-2:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()
[23/Mar/2017:08:56:17.222567755 -0400] NSMMReplicationPlugin - agmt="cn=masterAgreement1-auth-2.XXX-pki-tomcat" (auth-2:389): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ()
[23/Mar/2017:09:42:22.306319176 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max CSN [58d3852e000000600000] from RUV [database RUV] is larger than the max CSN [58d381ab000000600000] from RUV [changelog max RUV] for element [{replica 96 ldap://auth-1.XXX:389} 585cae49000000600000 58d3852e000000600000]
[23/Mar/2017:09:42:22.336995007 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica o=ipaca does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[23/Mar/2017:09:42:54.126984585 -0400] NSMMReplicationPlugin - agmt="cn=meToauth-2.XXX" (auth-2:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()
[23/Mar/2017:09:44:43.187606945 -0400] NSMMReplicationPlugin - changelog program - _cl5NewDBFile: PR_DeleteSemaphore: /var/lib/dirsrv/slapd-NETSEC/cldb/509e3886-c88911e6-bead9c0e-906bed50.sema; NSPR error - -5943
[23/Mar/2017:09:45:13.525102119 -0400] NSMMReplicationPlugin - changelog program - _cl5NewDBFile: PR_DeleteSemaphore: /var/lib/dirsrv/slapd-NETSEC/cldb/f377a685-c8cb11e6-bead9c0e-906bed50.sema; NSPR error - -5943
[23/Mar/2017:09:45:13.971420939 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: disordely shutdown for replica dc=XXX. Check if DB RUV needs to be updated
[23/Mar/2017:09:45:14.024029592 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: disordely shutdown for replica o=ipaca. Check if DB RUV needs to be updated
[23/Mar/2017:09:45:19.314736866 -0400] NSMMReplicationPlugin - agmt="cn=masterAgreement1-auth-2.XXX-pki-tomcat" (auth-2:389): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ()
[23/Mar/2017:09:46:30.253821850 -0400] NSMMReplicationPlugin - agmt="cn=meToauth-2.XXX" (auth-2:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()
[23/Mar/2017:09:48:39.269006200 -0400] NSMMReplicationPlugin - changelog program - _cl5NewDBFile: PR_DeleteSemaphore: /var/lib/dirsrv/slapd-NETSEC/cldb/509e3886-c88911e6-bead9c0e-906bed50.sema; NSPR error - -5943
[23/Mar/2017:09:49:26.639767435 -0400] NSMMReplicationPlugin - changelog program - _cl5NewDBFile: PR_DeleteSemaphore: /var/lib/dirsrv/slapd-NETSEC/cldb/f377a685-c8cb11e6-bead9c0e-906bed50.sema; NSPR error - -5943
[23/Mar/2017:09:49:26.762324568 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: disordely shutdown for replica dc=XXX. Check if DB RUV needs to be updated
[23/Mar/2017:09:49:26.813931624 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: disordely shutdown for replica o=ipaca. Check if DB RUV needs to be updated
[23/Mar/2017:09:49:37.397494832 -0400] NSMMReplicationPlugin - agmt="cn=meToauth-2.XXX" (auth-2:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()
[23/Mar/2017:09:49:37.756217644 -0400] NSMMReplicationPlugin - agmt="cn=masterAgreement1-auth-2.XXX-pki-tomcat" (auth-2:389): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ()
[23/Mar/2017:09:51:06.555004134 -0400] NSMMReplicationPlugin - agmt="cn=masterAgreement1-auth-2.XXX-pki-tomcat" (auth-2:389): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ()
[23/Mar/2017:09:51:06.616444861 -0400] NSMMReplicationPlugin - agmt="cn=meToauth-2.XXX" (auth-2:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()
[23/Mar/2017:10:27:26.076130103 -0400] NSMMReplicationPlugin - agmt="cn=masterAgreement1-auth-2.XXX-pki-tomcat" (auth-2:389): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ()
[23/Mar/2017:10:27:26.208080067 -0400] NSMMReplicationPlugin - agmt="cn=meToauth-2.XXX" (auth-2:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()
[23/Mar/2017:10:33:47.546474913 -0400] NSMMReplicationPlugin - agmt="cn=masterAgreement1-auth-2.XXX-pki-tomcat" (auth-2:389): Replication bind with SIMPLE auth resumed
[23/Mar/2017:10:33:47.588128814 -0400] NSMMReplicationPlugin - agmt="cn=meToauth-2.XXX" (auth-2:389): Replication bind with GSSAPI auth resumed
[23/Mar/2017:10:33:50.852781071 -0400] NSMMReplicationPlugin - [S] Schema agmt="cn=masterAgreement1-auth-2.XXX-pki-tomcat" (auth-2:389) must not be overwritten (set replication log for additional info)
[23/Mar/2017:10:33:51.089308587 -0400] NSMMReplicationPlugin - [S] Schema agmt="cn=meToauth-2.XXX" (auth-2:389) must not be overwritten (set replication log for additional info)
[23/Mar/2017:10:33:53.444495512 -0400] NSMMReplicationPlugin - changelog program - agmt="cn=masterAgreement1-auth-2.XXX-pki-tomcat" (auth-2:389): CSN 586175b0000000600000 not found, we aren't as up to date, or we purged
[23/Mar/2017:10:33:53.501394903 -0400] NSMMReplicationPlugin - agmt="cn=masterAgreement1-auth-2.XXX-pki-tomcat" (auth-2:389): Data required to update replica has been purged from the changelog. The replica must be reinitialized.
[23/Mar/2017:10:33:58.923454036 -0400] NSMMReplicationPlugin - changelog program - agmt="cn=masterAgreement1-auth-2.XXX-pki-tomcat" (auth-2:389): CSN 586175b0000000600000 not found, we aren't as up to date, or we purged

I tried to re-initialize auth-2:

auth-2 # ipa-replica-manage re-initialize --from=auth-1.XXX
Directory Manager password:

ipa: INFO: Setting agreement cn=meToauth-2.XXX,cn=replica,cn=dc\=XXX,cn=mapping tree,cn=config schedule to 2358-2359 0 to force synch
ipa: INFO: Deleting schedule 2358-2359 0 from agreement cn=meToauth-2.XXX,cn=replica,cn=dc\=XXX,cn=mapping tree,cn=config
Update in progress, 6 seconds elapsed
Update succeeded

but the errors continue on auth-1.

Any suggestions on how to fix this would be greatly appreciated.


Continue reading on narkive: