Discussion:
[Freeipa-users] Servers intermittently losing connection to IPA
Jeff Hallyburton
2016-04-15 01:53:23 UTC
Permalink
We're seeing the following issue with our jump servers in a client
environment:

One (sometimes both) jump servers will fall back to local logins at regular
intervals. This seems to happen for a brief period every 10 - 15 minutes.
Once IPA access is restored the only indication of a problem in the logs is:

Apr 14 18:09:25 jump01 [sssd[krb5_child[24814]]]: Generic error (see
e-text)
Apr 14 18:09:25 jump01 [sssd[krb5_child[24814]]]: Generic error (see
e-text)

(Fri Apr 8 01:06:25 2016) [sssd[be[example.com]]] [krb5_auth_store_creds]
(0x0010): unsupported PAM command [249].
(Fri Apr 8 01:06:25 2016) [sssd[be[example.com]]] [krb5_auth_store_creds]
(0x0010): password not available, offline auth may not work.


This doesn't shed much light on what's going on. Do you have any
suggestions for troubleshooting?

Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com

Engineering Support: ***@bloomip.com
Billing Support: ***@bloomip.com
Customer Support Portal: https://my.bloomip.com <http://my.bloomip.com/>
Sumit Bose
2016-04-15 07:14:41 UTC
Permalink
Post by Jeff Hallyburton
We're seeing the following issue with our jump servers in a client
One (sometimes both) jump servers will fall back to local logins at regular
intervals. This seems to happen for a brief period every 10 - 15 minutes.
Apr 14 18:09:25 jump01 [sssd[krb5_child[24814]]]: Generic error (see
e-text)
Apr 14 18:09:25 jump01 [sssd[krb5_child[24814]]]: Generic error (see
e-text)
(Fri Apr 8 01:06:25 2016) [sssd[be[example.com]]] [krb5_auth_store_creds]
(0x0010): unsupported PAM command [249].
(Fri Apr 8 01:06:25 2016) [sssd[be[example.com]]] [krb5_auth_store_creds]
(0x0010): password not available, offline auth may not work.
at least the messages from krb5_auth_store_creds() are unrelated. I will
write a patch to silence this messages.

I would expect that SSSD switches to offline mode for some reason. If
you run SSSD with debug_level 8 or higher in the [domain/...] section
you should see messages like 'Going offline!' which indicate the
switching into the offline mode. The log lines before should help to
identify the reason.

HTH

bye,
Sumit
Post by Jeff Hallyburton
This doesn't shed much light on what's going on. Do you have any
suggestions for troubleshooting?
Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com
Customer Support Portal: https://my.bloomip.com <http://my.bloomip.com/>
--
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Jeff Hallyburton
2016-04-20 18:18:28 UTC
Permalink
Sumit,

Raised the debug level to 10 and let it run for about 24 hours. Uploading
the last 2000~ lines of the sssd_domain.com.log. Thanks for your help!

https://pastebin.com/MD6N1Dj7

Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com

Engineering Support: ***@bloomip.com
Billing Support: ***@bloomip.com
Customer Support Portal: https://my.bloomip.com <http://my.bloomip.com/>

On Tue, Apr 19, 2016 at 1:14 PM, Jeff Hallyburton <
Sumit,
Raised the debug level to 10 and let it run for about 24 hours. Uploading
the full sssd_domain.com.log. Thanks for your help!
Jeff
Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com
Customer Support Portal: https://my.bloomip.com <http://my.bloomip.com/>
Unfortunately the domain log and the krb5_child log do not relate to
each other.
(Fri Apr 15 20:10:46 2016) [sssd[be[example.com]]]
[child_handler_setup]
(0x2000): Setting up signal handler up for pid [32382]
....
(Fri Apr 15 20:32:47 2016) [[sssd[krb5_child[32731]]]] [k5c_setup_fast]
(0x0100): SSSD_KRB5_FAST_PRINCIPAL is set to [host/
...
(Fri Apr 15 20:32:47 2016) [[sssd[krb5_child[32731]]]]
[get_and_save_tgt]
(0x0400): krb5_get_init_creds_password returned [-1765328324} during
pre-auth.
Can you shed any light on this?
In the domain log the child with the pid 32382 is started to run a
pre-authentication request. The request is needed to find out which kind
of authentication types are available for the user, e.g. password or
2-factor authentication with the OTP token. The request in the child
with the PID 32731 looks like a real authentication request with returns
with an error code -1765328324 which just means 'Generic error' but
might have cause SSSD to go offline.
I would like to ask you to run the test again with debug_level=10 in the
[domain/...] section of sssd.conf which would enable some low level
Kerberos tracing messages which might help to understand what kind of
'Generic error' was hit here. Additionally I would like ask you to send
the full log files as attachment or in an archive which would hep be to
better navigate through them.
bye,
Sumit
Sumit Bose
2016-04-21 11:47:04 UTC
Permalink
Post by Jeff Hallyburton
Sumit,
Raised the debug level to 10 and let it run for about 24 hours. Uploading
the last 2000~ lines of the sssd_domain.com.log. Thanks for your help!
Can you send the related krb5_child log file as well?

bye,
Sumit
Post by Jeff Hallyburton
https://pastebin.com/MD6N1Dj7
Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com
Customer Support Portal: https://my.bloomip.com <http://my.bloomip.com/>
On Tue, Apr 19, 2016 at 1:14 PM, Jeff Hallyburton <
Post by Jeff Hallyburton
Sumit,
Raised the debug level to 10 and let it run for about 24 hours. Uploading
the full sssd_domain.com.log. Thanks for your help!
Jeff
Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com
Customer Support Portal: https://my.bloomip.com <http://my.bloomip.com/>
Unfortunately the domain log and the krb5_child log do not relate to
each other.
(Fri Apr 15 20:10:46 2016) [sssd[be[example.com]]]
[child_handler_setup]
(0x2000): Setting up signal handler up for pid [32382]
....
(Fri Apr 15 20:32:47 2016) [[sssd[krb5_child[32731]]]] [k5c_setup_fast]
(0x0100): SSSD_KRB5_FAST_PRINCIPAL is set to [host/
...
(Fri Apr 15 20:32:47 2016) [[sssd[krb5_child[32731]]]]
[get_and_save_tgt]
(0x0400): krb5_get_init_creds_password returned [-1765328324} during
pre-auth.
Can you shed any light on this?
In the domain log the child with the pid 32382 is started to run a
pre-authentication request. The request is needed to find out which kind
of authentication types are available for the user, e.g. password or
2-factor authentication with the OTP token. The request in the child
with the PID 32731 looks like a real authentication request with returns
with an error code -1765328324 which just means 'Generic error' but
might have cause SSSD to go offline.
I would like to ask you to run the test again with debug_level=10 in the
[domain/...] section of sssd.conf which would enable some low level
Kerberos tracing messages which might help to understand what kind of
'Generic error' was hit here. Additionally I would like ask you to send
the full log files as attachment or in an archive which would hep be to
better navigate through them.
bye,
Sumit
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Jeff Hallyburton
2016-04-21 13:44:47 UTC
Permalink
Sumit,

We found a resolution for this and I'm dropping it here for posterity.
After some digging, it turns out that our ipa server and ipa replica were
returning different IPs for systems in the environment in DNS requests (one
returned internal results, one returned external results).

After resolving this our intermittent connectivity issue went away. So it
seems that in some cases, the incorrect IP was being returned for LDAP
requests.

One additional item found here, it seems that the timeout to resolve an
address (from the sssd logs) is 6 seconds. Can this be raised?

Thanks,

Jeff

Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com

Engineering Support: ***@bloomip.com
Billing Support: ***@bloomip.com
Customer Support Portal: https://my.bloomip.com <http://my.bloomip.com/>
Post by Jeff Hallyburton
Post by Jeff Hallyburton
Sumit,
Raised the debug level to 10 and let it run for about 24 hours.
Uploading
Post by Jeff Hallyburton
the last 2000~ lines of the sssd_domain.com.log. Thanks for your help!
Can you send the related krb5_child log file as well?
bye,
Sumit
Post by Jeff Hallyburton
https://pastebin.com/MD6N1Dj7
Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com
Customer Support Portal: https://my.bloomip.com <http://my.bloomip.com/
On Tue, Apr 19, 2016 at 1:14 PM, Jeff Hallyburton <
Post by Jeff Hallyburton
Sumit,
Raised the debug level to 10 and let it run for about 24 hours.
Uploading
Post by Jeff Hallyburton
Post by Jeff Hallyburton
the full sssd_domain.com.log. Thanks for your help!
Jeff
Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com
Customer Support Portal: https://my.bloomip.com <
http://my.bloomip.com/>
Post by Jeff Hallyburton
Post by Jeff Hallyburton
After setting debug_level=8, this is what I see in the
Unfortunately the domain log and the krb5_child log do not relate to
each other.
(Fri Apr 15 20:10:46 2016) [sssd[be[example.com]]]
[child_handler_setup]
(0x2000): Setting up signal handler up for pid [32382]
....
(Fri Apr 15 20:32:47 2016) [[sssd[krb5_child[32731]]]]
[k5c_setup_fast]
Post by Jeff Hallyburton
Post by Jeff Hallyburton
(0x0100): SSSD_KRB5_FAST_PRINCIPAL is set to [host/
...
(Fri Apr 15 20:32:47 2016) [[sssd[krb5_child[32731]]]]
[get_and_save_tgt]
(0x0400): krb5_get_init_creds_password returned [-1765328324} during
pre-auth.
Can you shed any light on this?
In the domain log the child with the pid 32382 is started to run a
pre-authentication request. The request is needed to find out which
kind
Post by Jeff Hallyburton
Post by Jeff Hallyburton
of authentication types are available for the user, e.g. password or
2-factor authentication with the OTP token. The request in the child
with the PID 32731 looks like a real authentication request with
returns
Post by Jeff Hallyburton
Post by Jeff Hallyburton
with an error code -1765328324 which just means 'Generic error' but
might have cause SSSD to go offline.
I would like to ask you to run the test again with debug_level=10 in
the
Post by Jeff Hallyburton
Post by Jeff Hallyburton
[domain/...] section of sssd.conf which would enable some low level
Kerberos tracing messages which might help to understand what kind of
'Generic error' was hit here. Additionally I would like ask you to
send
Post by Jeff Hallyburton
Post by Jeff Hallyburton
the full log files as attachment or in an archive which would hep be
to
Post by Jeff Hallyburton
Post by Jeff Hallyburton
better navigate through them.
bye,
Sumit
Lukas Slebodnik
2016-04-21 14:03:57 UTC
Permalink
Post by Jeff Hallyburton
Sumit,
We found a resolution for this and I'm dropping it here for posterity.
After some digging, it turns out that our ipa server and ipa replica were
returning different IPs for systems in the environment in DNS requests (one
returned internal results, one returned external results).
After resolving this our intermittent connectivity issue went away. So it
seems that in some cases, the incorrect IP was being returned for LDAP
requests.
One additional item found here, it seems that the timeout to resolve an
address (from the sssd logs) is 6 seconds. Can this be raised?
man sssd.conf -> dns_resolver_timeout

LS
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Petr Spacek
2016-04-21 15:05:09 UTC
Permalink
Post by Jeff Hallyburton
Sumit,
We found a resolution for this and I'm dropping it here for posterity.
After some digging, it turns out that our ipa server and ipa replica were
returning different IPs for systems in the environment in DNS requests (one
returned internal results, one returned external results).
After resolving this our intermittent connectivity issue went away. So it
seems that in some cases, the incorrect IP was being returned for LDAP
requests.
It would be interesting to see logs from named daemon running on these servers
(around the time of failure).

I hope it helps.

Petr^2 Spacek
Post by Jeff Hallyburton
One additional item found here, it seems that the timeout to resolve an
address (from the sssd logs) is 6 seconds. Can this be raised?
Thanks,
Jeff
Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com
Customer Support Portal: https://my.bloomip.com <http://my.bloomip.com/>
Post by Jeff Hallyburton
Post by Jeff Hallyburton
Sumit,
Raised the debug level to 10 and let it run for about 24 hours.
Uploading
Post by Jeff Hallyburton
the last 2000~ lines of the sssd_domain.com.log. Thanks for your help!
Can you send the related krb5_child log file as well?
bye,
Sumit
Post by Jeff Hallyburton
https://pastebin.com/MD6N1Dj7
Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com
Customer Support Portal: https://my.bloomip.com <http://my.bloomip.com/
On Tue, Apr 19, 2016 at 1:14 PM, Jeff Hallyburton <
Post by Jeff Hallyburton
Sumit,
Raised the debug level to 10 and let it run for about 24 hours.
Uploading
Post by Jeff Hallyburton
Post by Jeff Hallyburton
the full sssd_domain.com.log. Thanks for your help!
Jeff
Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com
Customer Support Portal: https://my.bloomip.com <
http://my.bloomip.com/>
Post by Jeff Hallyburton
Post by Jeff Hallyburton
After setting debug_level=8, this is what I see in the
Unfortunately the domain log and the krb5_child log do not relate to
each other.
(Fri Apr 15 20:10:46 2016) [sssd[be[example.com]]]
[child_handler_setup]
(0x2000): Setting up signal handler up for pid [32382]
....
(Fri Apr 15 20:32:47 2016) [[sssd[krb5_child[32731]]]]
[k5c_setup_fast]
Post by Jeff Hallyburton
Post by Jeff Hallyburton
(0x0100): SSSD_KRB5_FAST_PRINCIPAL is set to [host/
...
(Fri Apr 15 20:32:47 2016) [[sssd[krb5_child[32731]]]]
[get_and_save_tgt]
(0x0400): krb5_get_init_creds_password returned [-1765328324} during
pre-auth.
Can you shed any light on this?
In the domain log the child with the pid 32382 is started to run a
pre-authentication request. The request is needed to find out which
kind
Post by Jeff Hallyburton
Post by Jeff Hallyburton
of authentication types are available for the user, e.g. password or
2-factor authentication with the OTP token. The request in the child
with the PID 32731 looks like a real authentication request with
returns
Post by Jeff Hallyburton
Post by Jeff Hallyburton
with an error code -1765328324 which just means 'Generic error' but
might have cause SSSD to go offline.
I would like to ask you to run the test again with debug_level=10 in
the
Post by Jeff Hallyburton
Post by Jeff Hallyburton
[domain/...] section of sssd.conf which would enable some low level
Kerberos tracing messages which might help to understand what kind of
'Generic error' was hit here. Additionally I would like ask you to
send
Post by Jeff Hallyburton
Post by Jeff Hallyburton
the full log files as attachment or in an archive which would hep be
to
Post by Jeff Hallyburton
Post by Jeff Hallyburton
better navigate through them.
bye,
Sumit
--
Petr^2 Spacek
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Sumit Bose
2016-04-21 15:17:18 UTC
Permalink
Post by Jeff Hallyburton
Sumit,
We found a resolution for this and I'm dropping it here for posterity.
After some digging, it turns out that our ipa server and ipa replica were
returning different IPs for systems in the environment in DNS requests (one
returned internal results, one returned external results).
After resolving this our intermittent connectivity issue went away. So it
seems that in some cases, the incorrect IP was being returned for LDAP
requests.
Thank you for the feedback.

bye,
Sumit
Post by Jeff Hallyburton
One additional item found here, it seems that the timeout to resolve an
address (from the sssd logs) is 6 seconds. Can this be raised?
Thanks,
Jeff
Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com
Customer Support Portal: https://my.bloomip.com <http://my.bloomip.com/>
Post by Jeff Hallyburton
Post by Jeff Hallyburton
Sumit,
Raised the debug level to 10 and let it run for about 24 hours.
Uploading
Post by Jeff Hallyburton
the last 2000~ lines of the sssd_domain.com.log. Thanks for your help!
Can you send the related krb5_child log file as well?
bye,
Sumit
Post by Jeff Hallyburton
https://pastebin.com/MD6N1Dj7
Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com
Customer Support Portal: https://my.bloomip.com <http://my.bloomip.com/
On Tue, Apr 19, 2016 at 1:14 PM, Jeff Hallyburton <
Post by Jeff Hallyburton
Sumit,
Raised the debug level to 10 and let it run for about 24 hours.
Uploading
Post by Jeff Hallyburton
Post by Jeff Hallyburton
the full sssd_domain.com.log. Thanks for your help!
Jeff
Jeff Hallyburton
Strategic Systems Engineer
Bloomip Inc.
Web: http://www.bloomip.com
Customer Support Portal: https://my.bloomip.com <
http://my.bloomip.com/>
Post by Jeff Hallyburton
Post by Jeff Hallyburton
After setting debug_level=8, this is what I see in the
Unfortunately the domain log and the krb5_child log do not relate to
each other.
(Fri Apr 15 20:10:46 2016) [sssd[be[example.com]]]
[child_handler_setup]
(0x2000): Setting up signal handler up for pid [32382]
....
(Fri Apr 15 20:32:47 2016) [[sssd[krb5_child[32731]]]]
[k5c_setup_fast]
Post by Jeff Hallyburton
Post by Jeff Hallyburton
(0x0100): SSSD_KRB5_FAST_PRINCIPAL is set to [host/
...
(Fri Apr 15 20:32:47 2016) [[sssd[krb5_child[32731]]]]
[get_and_save_tgt]
(0x0400): krb5_get_init_creds_password returned [-1765328324} during
pre-auth.
Can you shed any light on this?
In the domain log the child with the pid 32382 is started to run a
pre-authentication request. The request is needed to find out which
kind
Post by Jeff Hallyburton
Post by Jeff Hallyburton
of authentication types are available for the user, e.g. password or
2-factor authentication with the OTP token. The request in the child
with the PID 32731 looks like a real authentication request with
returns
Post by Jeff Hallyburton
Post by Jeff Hallyburton
with an error code -1765328324 which just means 'Generic error' but
might have cause SSSD to go offline.
I would like to ask you to run the test again with debug_level=10 in
the
Post by Jeff Hallyburton
Post by Jeff Hallyburton
[domain/...] section of sssd.conf which would enable some low level
Kerberos tracing messages which might help to understand what kind of
'Generic error' was hit here. Additionally I would like ask you to
send
Post by Jeff Hallyburton
Post by Jeff Hallyburton
the full log files as attachment or in an archive which would hep be
to
Post by Jeff Hallyburton
Post by Jeff Hallyburton
better navigate through them.
bye,
Sumit
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Loading...