[Freeipa-users] ipa-replica-manage failing to delete a node

Discussion:

Linder, Rolf

2017-03-28 09:30:39 UTC

Hello

First, we really would like to thank the developers / community for the great work doing with FreeIPA!

At our company, we're using a CentOS7 based FreeIPA installation (uspidm01 primary and uspidm02 replica) and it worked like a charm the last couple of months. Last week we suffered a severe outage (DNS related) and are still suffering from this on. We have a similar issue as reported by

https://bugzilla.redhat.com/show_bug.cgi?id=826677 (upstream https://pagure.io/freeipa/issue/2797)
https://www.redhat.com/archives/freeipa-users/2013-May/msg00034.html
https://www.redhat.com/archives/freeipa-users/2012-June/msg00382.html

mainly our synchronization stopped with uspidm02 (replica) logging:

"[27/Mar/2017:11:57:39.756880208 +0200] NSMMReplicationPlugin - agmt="cn=meTouspidm01.[domainname].[tld]" (uspidm01:389): Data required to update replica has been purged from the changelog. The replica must be reinitialized."

We tried to re-initialize using "ipa-replica-manage re-initialize --from uspidm01.[domain].[tld]" but this failed. After this we headed for a "clean" first remove then add again solution (knowing that we will temporarily loss the replica and loss any unsynchronized changes). We followed upstream documentation from RedHat (see below) on this.

Unfortunately, the "ipa-replica-manage list" command still lists both servers (uspidm01 and uspidm02). The error given by a forced removal using "ipa-replica-manage del --no-lookup --force --cleanup uspidm02.[domain].[tld]" is

Cleaning a master is irreversible.
This should not normally be require, so use cautiously.
Continue to clean master? [no]: yes
unexpected error: This entry already exists

we then tried to further debug the python code used (ipa-replica-manage) and could identify using PDB that the function "replica_cleanup" from "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py" complains about duplicate entries:

/usr/lib/python2.7/site-packages/ipaserver/install/replication.py(1203)replica_cleanup()
-> self.conn.delete_entry(entry)
(Pdb) n
DuplicateEntry: Duplicat...exists',)

/usr/lib/python2.7/site-packages/ipaserver/install/replication.py(1203)replica_cleanup()

-> self.conn.delete_entry(entry)
(Pdb) n

/usr/lib/python2.7/site-packages/ipaserver/install/replication.py(1204)replica_cleanup()

-> except errors.NotFound:
(Pdb) n

/usr/lib/python2.7/site-packages/ipaserver/install/replication.py(1206)replica_cleanup()

-> except Exception, e:
...

Using LDAPSearch we can confirm there are still entries listed for the ghost/offline server uspidm02 (which seems the reason why ipa-replica-manage still lists it). But we cannot identify where a duplicate entry is exactly. As long as there are entries for this host, it can not be added again (a ipa-server cannot be removed using "ipa host-del" and adding a new also fails).

Our situation for now is we're having a "read-only" IDM solution since any modification (password change, adding new servers, ...) fails. Adding a new replica (new name) is also failing. We suspect if we could clean up the ghost replica entry we should be able to restore IDM / replica again.

Any help would be greatly appreciated!!

Best regards,
Rolf

Documentation used:
Uninstallation: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Linux_Domain_Identity_Authentication_and_Policy_Guide/replica-uninstall.html
New installation: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Linux_Domain_Identity_Authentication_and_Policy_Guide/creating-the-replica.html

Versions in use: initially both servers were updated to ipa-server-4.4.0-14.el7.centos.6.x86_64, uspidm01 was rollbacked to ipa-server-4.2.0-15.0.1.el7.centos.19.x86_64 (eliminating any upgrade issues)

Jochen Hein

2017-03-28 16:13:02 UTC

Permalink

Post by Linder, Rolf
"[27/Mar/2017:11:57:39.756880208 +0200] NSMMReplicationPlugin -
agmt="cn=meTouspidm01.[domainname].[tld]" (uspidm01:389): Data
required to update replica has been purged from the changelog. The
replica must be reinitialized."
We tried to re-initialize using "ipa-replica-manage re-initialize
--from uspidm01.[domain].[tld]" but this failed. After this we headed
for a "clean" first remove then add again solution (knowing that we
will temporarily loss the replica and loss any unsynchronized
changes).

I had these messages too, and also failed to get it running with
ipa-replicate-manage. But later I realiazed that the failing replica
was a CA replica, and using ipa-careplica-manage I could reinitialize
the replica.

I found it also mildly confusing, which server was ok and which server
needed the replica reinitialized. Could we add a hint to the log what
the admin needs to do? Something like:

,----
| Data required to update replica has been purged from the
| changelog. The replica <hostname> must be reinitialized.
| Use "ipa-(ca)?replica-manage ... --from <hostname>" on "<host>".
`----

Jochen

--
This space is intentionally left blank.
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Linder, Rolf

2017-03-29 14:19:59 UTC

Permalink

Thanks jochen for your response!
So far, we could quite well identify whos the master and the replica and identify how and where we should re-initialize.

Still there is good news at our side, we could further identify an issue and by fixing that (see below) also remove the replica and reinstall it. We had to "isolate" the second server (it was still reachable by ICMP ping) and were then able to just execute "ipa-replica-manage del uspidm02.[domain].[tld] --force --cleanup" and afterwards add it again.

After a small duplicate RUV issue (documented at https://access.redhat.com/solutions/2741521) we're now up again and have a running IdM setup.

Still, at our end there's one question left: for now, we have different passwords for the "admin" user and the directory manager password. Is this normal? Or do we have a broken setup now?

Best regards,
Rolf

Ps: here's what we did to fix our issue:

1. copied uspidm01 and run isolated (offline) tests => we could identify this way all is well
2. after already doing reboots on uspidm02 disconnected that server and removed it on uspidm01 via ipa-replica-manage
3. by this identified an error in hosts entry of uspidm01 (listing uspidm02 with a wrong ip conflicting DNS information)
4. reinstalled uspidm02 according documentation from redhat