Discussion:
[Freeipa-users] Fwd: Marking subdomain offline
m***@chinewalking.com
2017-04-06 17:21:01 UTC
Permalink
Hi,

My IPA<->AD trust setup experiences intermittent failures during login
events. The AD subdomain goes in an inactive/offline state and users
logging in are put into a 'delayed authentication' queue. Usually
logging in after a minute or so succeeds as the subdomain is reset and
the user is cached for following events. At all times getent/id and
kinit's are succesfull, even with a purged sssd cache.
SRV records are correctly resolved, except for _kerberos-master.

I have not been able to further troubleshoot the intermittent failures.
Traffic captures show no strange behaviour, yet the sssd_domain log is
clearly showing AD to be unreachable at times. All AD servers are W2012
and DNS masking _ldap and _kerberos to single nodes, factoring out any
faulty Windows configs, so far has not had any effect (Would it?).

sssd's data_provider_fo.c :> be_fo_reset_svc() calls fo_get_service(),
which returns EOK. I'm not familiar yet with the variables at play,
would adding debug statements here reveal faults that may cause this?

Any pointers are very much appreciated.

Mike


[sssd[be[unix.foo.local]]] [ipa_srv_ad_acct_lookup_step] (0x0400):
Looking up AD account
[sssd[be[unix.foo.local]]] [ipa_srv_ad_acct_lookup_done] (0x0080):
Sudomain lookup failed, will try to reset sudomain..
[sssd[be[unix.foo.local]]] [ipa_server_trusted_dom_setup_send] (0x1000):
Trust direction of subdom foo.local from forest foo.local is: one-way
inbound: local domain trusts the remote domain
[sssd[be[unix.foo.local]]] [ipa_server_trusted_dom_setup_1way] (0x0400):
Will re-fetch keytab for foo.local
[sssd[be[unix.foo.local]]] [ipa_getkeytab_send] (0x0400): Retrieving
keytab for UNIX$@FOO.local from ipa01.unix.foo.local into
/var/lib/sss/keytabs/foo.local.keytab6AXxWV using ccache
/var/lib/sss/db/ccache_UNIX.FOO.local
[sssd[be[unix.foo.local]]] [child_handler_setup] (0x2000): Setting up
signal handler up for pid [6242]
[sssd[be[unix.foo.local]]] [child_handler_setup] (0x2000): Signal
handler set up for pid [6242]
[sssd[be[unix.foo.local]]] [sdap_process_result] (0x2000): Trace:
sh[0x7f71cd9ddb80], connected[1], ops[(nil)], ldap[0x7f71cd9e65a0]
[sssd[be[unix.foo.local]]] [sdap_process_result] (0x2000): Trace: end of
ldap_result list
[sssd[be[unix.foo.local]]] [ad_online_cb] (0x0400): The AD provider is
online
[sssd[be[unix.foo.local]]] [be_ptask_online_cb] (0x0400): Back end is
online
[sssd[be[unix.foo.local]]] [be_ptask_enable] (0x0080): Task [Subdomains
Refresh]: already enabled
Keytab successfully retrieved and stored in:
/var/lib/sss/keytabs/foo.local.keytab6AXxWV
[sssd[be[unix.foo.local]]] [child_sig_handler] (0x1000): Waiting for
child [6242].
[sssd[be[unix.foo.local]]] [child_sig_handler] (0x0100): child [6242]
finished successfully.
[sssd[be[unix.foo.local]]] [ipa_getkeytab_recv] (0x2000): ipa-getkeytab
status 0
[sssd[be[unix.foo.local]]] [ipa_server_trust_1way_kt_done] (0x0400):
Keytab successfully retrieved to
/var/lib/sss/keytabs/foo.local.keytab6AXxWV
[sssd[be[unix.foo.local]]] [ipa_server_trust_1way_kt_done] (0x2000):
Keytab renamed to /var/lib/sss/keytabs/foo.local.keytab
[sssd[be[unix.foo.local]]] [ipa_server_trust_1way_kt_done] (0x0400):
Keytab /var/lib/sss/keytabs/foo.local.keytab6AXxWV contains the expected
principals
[sssd[be[unix.foo.local]]] [ipa_server_trust_1way_kt_done] (0x0400):
Established trust context for foo.local
[sssd[be[unix.foo.local]]] [unique_filename_destructor] (0x2000):
Unlinking [/var/lib/sss/keytabs/foo.local.keytab6AXxWV]
[sssd[be[unix.foo.local]]] [unlink_dbg] (0x2000): File already removed:
[/var/lib/sss/keytabs/foo.local.keytab6AXxWV]
[sssd[be[unix.foo.local]]] [ipa_srv_ad_acct_retried] (0x0400): Sudomain
re-set, will retry lookup
[sssd[be[unix.foo.local]]] [be_fo_reset_svc] (0x1000): Resetting all
servers in service foo.local
[sssd[be[unix.foo.local]]] [be_fo_reset_svc] (0x0080): Cannot retrieve
service [foo.local]
[sssd[be[unix.foo.local]]] [ipa_srv_ad_acct_lookup_step] (0x0400):
Looking up AD account
[sssd[be[unix.foo.local]]] [be_mark_dom_offline] (0x1000): Marking
subdomain foo.local offline
[sssd[be[unix.foo.local]]] [ipa_srv_ad_acct_lookup_done] (0x0040):
ipa_get_*_acct request failed: [1432158270]: Subdomain is inactive.
[sssd[be[unix.foo.local]]] [ipa_subdomain_account_done] (0x0040):
ipa_get_*_acct request failed: [1432158270]: Subdomain is inactive.
[sssd[be[unix.foo.local]]] [dp_reply_std_set] (0x0080): DP Error is OK
on failed request?
[sssd[be[unix.foo.local]]] [dp_req_done] (0x0400): DP Request [Account
#4]: Request handler finished [0]: Success
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Jakub Hrozek
2017-04-06 18:18:42 UTC
Permalink
Post by m***@chinewalking.com
Hi,
My IPA<->AD trust setup experiences intermittent failures during login
events. The AD subdomain goes in an inactive/offline state and users logging
in are put into a 'delayed authentication' queue. Usually logging in after a
minute or so succeeds as the subdomain is reset and the user is cached for
following events. At all times getent/id and kinit's are succesfull, even
with a purged sssd cache.
SRV records are correctly resolved, except for _kerberos-master.
I have not been able to further troubleshoot the intermittent failures.
Traffic captures show no strange behaviour, yet the sssd_domain log is
clearly showing AD to be unreachable at times. All AD servers are W2012 and
DNS masking _ldap and _kerberos to single nodes, factoring out any faulty
Windows configs, so far has not had any effect (Would it?).
sssd's data_provider_fo.c :> be_fo_reset_svc() calls fo_get_service(), which
returns EOK. I'm not familiar yet with the variables at play, would adding
debug statements here reveal faults that may cause this?
Could you paste a bit more context? I think what would work is to trim
the logs (truncate --size 0), then reproduce the issue and search for
the first occurence of "NOT_WORKING" message from any of the fo_*
functions.
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
m***@chinewalking.com
2017-04-06 19:39:16 UTC
Permalink
Post by Jakub Hrozek
Post by m***@chinewalking.com
Hi,
My IPA<->AD trust setup experiences intermittent failures during login
events. The AD subdomain goes in an inactive/offline state and users logging
in are put into a 'delayed authentication' queue. Usually logging in after a
minute or so succeeds as the subdomain is reset and the user is cached for
following events. At all times getent/id and kinit's are succesfull, even
with a purged sssd cache.
SRV records are correctly resolved, except for _kerberos-master.
I have not been able to further troubleshoot the intermittent
failures.
Traffic captures show no strange behaviour, yet the sssd_domain log is
clearly showing AD to be unreachable at times. All AD servers are W2012 and
DNS masking _ldap and _kerberos to single nodes, factoring out any faulty
Windows configs, so far has not had any effect (Would it?).
sssd's data_provider_fo.c :> be_fo_reset_svc() calls fo_get_service(), which
returns EOK. I'm not familiar yet with the variables at play, would adding
debug statements here reveal faults that may cause this?
Could you paste a bit more context? I think what would work is to trim
the logs (truncate --size 0), then reproduce the issue and search for
the first occurence of "NOT_WORKING" message from any of the fo_*
functions.
After truncating the logs I noticed a comparable error that was fixed
earlier today. I created a number of existing groups (sudo, app, etc)
with low GIDs during initial deployment of IPA. One group caused issues
and I deleted it earlier on. Now another group triggered exactly the
same sequence of errors:

[{"CODE_FILE=src/providers/ipa/ipa_id.c",
36}{"CODE_FUNC=ipa_initgr_get_overrides_step"{"The group
name=***@unix.FOO.local,cn=groups,cn=unix.foo.local,cn=sysdb has no
UUID attribute objectSIDString, error!\n"
[{"CODE_FILE=src/providers/ipa/ipa_subdomains_id.c",
47}{"CODE_FUNC=ipa_id_get_groups_overrides_done", 42}{"IPA resolve user
groups overrides failed [22].\n"
[{"CODE_FUNC=be_mark_dom_offline", 29}{"Marking subdomain foo.local
offline\n"

With all these troublesome groups removed I have not been able to
reproduce the issues. I will further test with different users and
mapped groups. I guess the main fault was incorrect log handling.
Multiple logins caused overlooking the real error and only showed the
mentions of offline AD backends and subdomains.

I am not sure why these Posix groups had no objectSIDString while others
did.

Thank you,

Mike
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Chris Dagdigian
2017-04-06 18:39:02 UTC
Permalink
I see similar things in our environment where IPA is used as "glue"
between AD Forests that have a 1-way trust relationship. We believe that
the root cause has something to do with the 30+ domain controllers the
IPA client tries to make contact with (in seemingly random order) across
the AD Forest. Very hard to reproduce but the "subdomain marked
offline" problem is one we see often in the sssd logs. We think that
there are some AD servers in our sprawling environment that we either
can't reach properly over the network (firewalls, etc.) or are just
plain not configured to talk properly to us. Login success depends on
hitting a happy domain controller.

We are VERY interested in the recent updates to IPA server that seem to
indicate we can 'pin" clients to certain specific AD controllers and
from my understanding we just need to wait until the SSSD software gets
broad support for this feature as well. Once we can do that we plan to
pin our clients to named controllers and see if that helps with any of
the intermittent login problems.

One workaround we've started to use for power users is collecting public
SSH keys and hosting them in the IPA server -- as long as IPA knows that
the user "exists" in AD and has a roughly complete group membership list
than logging in with SSH key instead of AD password bypasses the
transient password checking failures and is very quick.

Chris
April 6, 2017 at 1:21 PM
Hi,
My IPA<->AD trust setup experiences intermittent failures during login
events. The AD subdomain goes in an inactive/offline state and users
logging in are put into a 'delayed authentication' queue. Usually
logging in after a minute or so succeeds as the subdomain is reset and
the user is cached for following events. At all times getent/id and
kinit's are succesfull, even with a purged sssd cache.
SRV records are correctly resolved, except for _kerberos-master.
I have not been able to further troubleshoot the intermittent
failures. Traffic captures show no strange behaviour, yet the
sssd_domain log is clearly showing AD to be unreachable at times. All
AD servers are W2012 and DNS masking _ldap and _kerberos to single
nodes, factoring out any faulty Windows configs, so far has not had
any effect (Would it?).
sssd's data_provider_fo.c :> be_fo_reset_svc() calls fo_get_service(),
which returns EOK. I'm not familiar yet with the variables at play,
would adding debug statements here reveal faults that may cause this?
Any pointers are very much appreciated.
Mike
Looking up AD account
Sudomain lookup failed, will try to reset sudomain..
[sssd[be[unix.foo.local]]] [ipa_server_trusted_dom_setup_send]
(0x1000): Trust direction of subdom foo.local from forest foo.local
is: one-way inbound: local domain trusts the remote domain
[sssd[be[unix.foo.local]]] [ipa_server_trusted_dom_setup_1way]
(0x0400): Will re-fetch keytab for foo.local
[sssd[be[unix.foo.local]]] [ipa_getkeytab_send] (0x0400): Retrieving
/var/lib/sss/keytabs/foo.local.keytab6AXxWV using ccache
/var/lib/sss/db/ccache_UNIX.FOO.local
[sssd[be[unix.foo.local]]] [child_handler_setup] (0x2000): Setting up
signal handler up for pid [6242]
[sssd[be[unix.foo.local]]] [child_handler_setup] (0x2000): Signal
handler set up for pid [6242]
sh[0x7f71cd9ddb80], connected[1], ops[(nil)], ldap[0x7f71cd9e65a0]
[sssd[be[unix.foo.local]]] [sdap_process_result] (0x2000): Trace: end
of ldap_result list
[sssd[be[unix.foo.local]]] [ad_online_cb] (0x0400): The AD provider is
online
[sssd[be[unix.foo.local]]] [be_ptask_online_cb] (0x0400): Back end is
online
[sssd[be[unix.foo.local]]] [be_ptask_enable] (0x0080): Task
[Subdomains Refresh]: already enabled
/var/lib/sss/keytabs/foo.local.keytab6AXxWV
[sssd[be[unix.foo.local]]] [child_sig_handler] (0x1000): Waiting for
child [6242].
[sssd[be[unix.foo.local]]] [child_sig_handler] (0x0100): child [6242]
finished successfully.
ipa-getkeytab status 0
Keytab successfully retrieved to
/var/lib/sss/keytabs/foo.local.keytab6AXxWV
Keytab renamed to /var/lib/sss/keytabs/foo.local.keytab
Keytab /var/lib/sss/keytabs/foo.local.keytab6AXxWV contains the
expected principals
Established trust context for foo.local
Unlinking [/var/lib/sss/keytabs/foo.local.keytab6AXxWV]
[sssd[be[unix.foo.local]]] [unlink_dbg] (0x2000): File already
removed: [/var/lib/sss/keytabs/foo.local.keytab6AXxWV]
Sudomain re-set, will retry lookup
[sssd[be[unix.foo.local]]] [be_fo_reset_svc] (0x1000): Resetting all
servers in service foo.local
[sssd[be[unix.foo.local]]] [be_fo_reset_svc] (0x0080): Cannot retrieve
service [foo.local]
Looking up AD account
[sssd[be[unix.foo.local]]] [be_mark_dom_offline] (0x1000): Marking
subdomain foo.local offline
ipa_get_*_acct request failed: [1432158270]: Subdomain is inactive.
ipa_get_*_acct request failed: [1432158270]: Subdomain is inactive.
[sssd[be[unix.foo.local]]] [dp_reply_std_set] (0x0080): DP Error is OK
on failed request?
[sssd[be[unix.foo.local]]] [dp_req_done] (0x0400): DP Request [Account
#4]: Request handler finished [0]: Success
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Jakub Hrozek
2017-04-07 06:34:21 UTC
Permalink
I see similar things in our environment where IPA is used as "glue" between
AD Forests that have a 1-way trust relationship. We believe that the root
cause has something to do with the 30+ domain controllers the IPA client
tries to make contact with (in seemingly random order) across the AD Forest.
When an AD user logs to an IPA client, there are actually two actions --
a user lookup and the authentication.

The user lookup is in fact done by the SSSD instance running on one of
the IPA masters, the clients just talks to the masters, but the SSSD on
the master talks to one AD DCs.

Authentication is done directly against one of AD DCs.
Very hard to reproduce but the "subdomain marked offline" problem is one we
see often in the sssd logs. We think that there are some AD servers in our
sprawling environment that we either can't reach properly over the network
(firewalls, etc.) or are just plain not configured to talk properly to us.
Login success depends on hitting a happy domain controller.
We are VERY interested in the recent updates to IPA server that seem to
indicate we can 'pin" clients to certain specific AD controllers and from my
understanding we just need to wait until the SSSD software gets broad
support for this feature as well. Once we can do that we plan to pin our
clients to named controllers and see if that helps with any of the
intermittent login problems.
I don't think there are any changes needed to the the IPA server (maybe
some management framework), but in general you're looking for this
feature:
https://docs.pagure.org/SSSD.sssd/design_pages/subdomain_configuration.html

(after we migrated the upstream projects from fedorahosted to pagure,
our documentation is still in a bit of a flux, but we're migrating the
docs gradually..)

As the design page says, you will be able to set up the AD DCs the IPA
masters talk to using the subdomain configuration, but the DCs the
clients authenticate to must currently be set in krb5.conf on the
clients until https://pagure.io/SSSD/sssd/issue/3336 is implemented.
One workaround we've started to use for power users is collecting public SSH
keys and hosting them in the IPA server -- as long as IPA knows that the
user "exists" in AD and has a roughly complete group membership list than
logging in with SSH key instead of AD password bypasses the transient
password checking failures and is very quick.
Another workaround (for the IPA masters at least) would be to put the
reachable AD DCs into a site and assign the IPA masters to this site.
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Loading...