Discussion:
[Freeipa-users] DNS forwarding configuration randomly breaks and stops working
n***@nathanpeters.com
2015-10-01 21:07:00 UTC
Permalink
We have a FreeIPA domain running IPA server 4.1.4 on CentOS 7.

We have no per zone forwarding enabled, only a single global forwarder.
This seems to work fine, but then after a while (several weeks I think)
will randomly stop working.

We had this issue several weeks ago on a different IPA domain (identical
setup) in our production network but it was ignored because a server
restart fixed it.

This issue then re-surfaced in our development domain today (different
network, different physical hardware, same OS and IPA versions).

I received a report today from a developer that he could not ping a
machine in another domain so I verified network connectivity and
everything was fine. When I tried to resolve the name from the IPA dc
using ping it would fail, but nslookup directly to the forward server
worked fine.

ipactl showed no issues, and only after I restarted the server did the
lookups start working again.

Console log below :

Using username "myipausername".
Last login: Thu Oct 1 16:36:51 2015 from 10.5.5.57
[***@dc1 ~]$ sudo su -
Last login: Tue Sep 29 19:03:39 UTC 2015 on pts/3

ATTEMPT FIRST PING TO UNRESOLVABLE HOST
=======================================
[***@dc1 ~]# ping artifactory.externaldomain.net
ping: unknown host artifactory.externaldomain.net

CHECK IPA STATUS
================
[***@dc1 ~]# ipactl status
Directory Service: RUNNING
krb5kdc Service: RUNNING
kadmin Service: RUNNING
named Service: RUNNING
ipa_memcached Service: RUNNING
httpd Service: RUNNING
pki-tomcatd Service: RUNNING
smb Service: RUNNING
winbind Service: RUNNING
ipa-otpd Service: RUNNING
ipa-dnskeysyncd Service: RUNNING
ipa: INFO: The ipactl command was successful

ATTEMPT PING OF GLOBAL FORWARDER
================================
[***@dc1 ~]# ping 10.21.0.14
PING 10.21.0.14 (10.21.0.14) 56(84) bytes of data.
64 bytes from 10.21.0.14: icmp_seq=1 ttl=64 time=0.275 ms
64 bytes from 10.21.0.14: icmp_seq=2 ttl=64 time=0.327 ms
^C
--- 10.21.0.14 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.275/0.301/0.327/0.026 ms

MANUAL NSLOOKUP OF DOMAIN ON GLOBAL FORWARDER FROM IPA DC
=========================================================
server 10.21.0.14
Default server: 10.21.0.14
Address: 10.21.0.14#53
artifactory.externaldomain.net
Server: 10.21.0.14
Address: 10.21.0.14#53

Non-authoritative answer:
artifactory.externaldomain.net canonical name =
van-artifactory1.externaldomain.net.
Name: van-artifactory1.externaldomain.net
Address: 10.20.10.14

RE-ATTEMPT PING SINCE WE KNOW THAT NAME RESOLUTION (at least via nslookup
IS WORKING FROM THIS MACHINE
======================================================================================================
ping: unknown host artifactory.externaldomain.net
[***@dc1 ~]# ping van-artifactory1.externaldomain.net
ping: unknown host van-artifactory1.externaldomain.net

RESTART IPA SERVICES
====================
[***@dc1 ~]# ipactl restart
Restarting Directory Service
Restarting krb5kdc Service
Restarting kadmin Service
Restarting named Service
Restarting ipa_memcached Service
Restarting httpd Service
Restarting pki-tomcatd Service
Restarting smb Service
Restarting winbind Service
Restarting ipa-otpd Service
Restarting ipa-dnskeysyncd Service
ipa: INFO: The ipactl command was successful
[***@dc1 ~]# ipa dnsconfig-show
ipa: ERROR: did not receive Kerberos credentials
[***@dc1 ~]# kinit myipausername
Password for ***@ipadomain.NET:

OUTPUT GLOBAL FORWARDER CONFIG FOR TROUBLESHOOTING
==================================================
[***@dc1 ~]# ipa dnsconfig-show
Global forwarders: 10.21.0.14
Allow PTR sync: TRUE

PING NOW WORKS BECAUSE IPA SERVICES WERE RESTARTED
==================================================
[***@dc1 ~]# ping artifactory.externaldomain.net
PING van-artifactory1.externaldomain.net (10.20.10.14) 56(84) bytes of data.
64 bytes from 10.20.10.14: icmp_seq=1 ttl=60 time=3.00 ms
64 bytes from 10.20.10.14: icmp_seq=2 ttl=60 time=1.42 ms
64 bytes from 10.20.10.14: icmp_seq=3 ttl=60 time=2.39 ms
^C
--- van-artifactory1.externaldomain.net ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
rtt min/avg/max/mdev = 1.420/2.274/3.004/0.653 ms
[***@dc1 ~]#

Here are some strange enties from my /var/log/messages relating to errors
from today :

Oct 1 20:39:31 dc1 named-pkcs11[15066]: checkhints: unable to get root NS
rrset from cache: not found
Oct 1 20:39:17 dc1 named-pkcs11[15066]: error (network unreachable)
resolving 'pmdb1.ipadomain.net/A/IN': 2001:500:2f::f#53
Oct 1 20:39:17 dc1 named-pkcs11[15066]: error (network unreachable)
resolving 'pmdb1.ipadomain.net/AAAA/IN': 2001:500:2f::f#53

Looking at the log entries, it appears that there may have been a network
connectivity 'blip' (maybe a switch or router was restarted) at some point
and even after connectivity was restored, the global forwarding was
failing because the "we can't contact our forwarder" status seemed to get
stuck in memory.
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
n***@nathanpeters.com
2015-10-02 23:47:28 UTC
Permalink
This issue has occured again and I am once again trying to troubleshoot it.

show forwarder
--------------
-bash-4.2$ ipa dnsconfig-show
Global forwarders: 10.21.0.14
Allow PTR sync: TRUE

attempt ping
------------
-bash-4.2$ ping stash.externaldomain.net
ping: unknown host stash.externaldomain.net

-attempt nslookup
-----------------
-bash-4.2$ nslookup
stash.externaldomain.net
Server: 127.0.0.1
Address: 127.0.0.1#53

** server can't find stash.externaldomain.net: NXDOMAIN

*comment* : strange it doesn't work against localhost. Lets make sure
that localhost lookups work at all :

-bash-4.2$ nslookup
google.com
Server: 127.0.0.1
Address: 127.0.0.1#53

Non-authoritative answer:
Name: google.com
Address: 216.58.216.142

*comment* : yup, I can resolve google.com when talking to localhost...

now lets try to talk to the forwarder configured in the global settings
-----------------------------------------------------------------------
server 10.21.0.14
Default server: 10.21.0.14
Address: 10.21.0.14#53
stash.externaldomain.net
Server: 10.21.0.14
Address: 10.21.0.14#53

Non-authoritative answer:
stash.externaldomain.net canonical name = git1.externaldomain.net.
Name: git1.externaldomain.net
Address: 10.20.10.30

more troubleshooting
--------------------
I ran wireshark to see what the freeipa server was sending back to the
client :

3 0.000393 10.178.0.99 10.178.21.2 DNS 163 Standard query response 0x12ca
No such name CNAME git1.externaldomain.net

I've never seen a 'no such CNAME' response before. Lets look at the
contents of the packet:


Frame 4: 163 bytes on wire (1304 bits), 163 bytes captured (1304 bits)
Ethernet II, Src: Vmware_b7:09:c6 (00:50:56:b7:09:c6), Dst:
HewlettP_3c:f9:48 (2c:59:e5:3c:f9:48)
Internet Protocol Version 4, Src: 10.178.0.99 (10.178.0.99), Dst:
10.178.21.2 (10.178.21.2)
User Datagram Protocol, Src Port: 53 (53), Dst Port: 57374 (57374)
Domain Name System (response)
[Request In: 1]
[Time: 0.000414000 seconds]
Transaction ID: 0x12ca
Flags: 0x8183 Standard query response, No such name
1... .... .... .... = Response: Message is a response
.000 0... .... .... = Opcode: Standard query (0)
.... .0.. .... .... = Authoritative: Server is not an authority
for domain
.... ..0. .... .... = Truncated: Message is not truncated
.... ...1 .... .... = Recursion desired: Do query recursively
.... .... 1... .... = Recursion available: Server can do recursive
queries
.... .... .0.. .... = Z: reserved (0)
.... .... ..0. .... = Answer authenticated: Answer/authority
portion was not authenticated by the server
.... .... ...0 .... = Non-authenticated data: Unacceptable
.... .... .... 0011 = Reply code: No such name (3)
Questions: 1
Answer RRs: 1
Authority RRs: 1
Additional RRs: 0
Queries
stash.externaldomain.net: type A, class IN
Name: stash.externaldomain.net
[Name Length: 21]
[Label Count: 3]
Type: A (Host Address) (1)
Class: IN (0x0001)
Answers
stash.externaldomain.net: type CNAME, class IN, cname
git1.externaldomain.net
Name: stash.externaldomain.net
Type: CNAME (Canonical NAME for an alias) (5)
Class: IN (0x0001)
Time to live: 22483
Data length: 20
CNAME: git1.externaldomain.net
Authoritative nameservers
externaldomain.net: type SOA, class IN, mname
van-dns1.externaldomain.net
Name: externaldomain.net
Type: SOA (Start Of a zone of Authority) (6)
Class: IN (0x0001)
Time to live: 518
Data length: 38
Primary name server: van-dns1.externaldomain.net
Responsible authority's mailbox: tech.externaldomain.net
Serial Number: 2015092101
Refresh Interval: 10800 (3 hours)
Retry Interval: 900 (15 minutes)
Expire limit: 604800 (7 days)
Minimum TTL: 86400 (1 day)
We have a FreeIPA domain running IPA server 4.1.4 on CentOS 7.
We have no per zone forwarding enabled, only a single global forwarder.
This seems to work fine, but then after a while (several weeks I think)
will randomly stop working.
We had this issue several weeks ago on a different IPA domain (identical
setup) in our production network but it was ignored because a server
restart fixed it.
This issue then re-surfaced in our development domain today (different
network, different physical hardware, same OS and IPA versions).
I received a report today from a developer that he could not ping a
machine in another domain so I verified network connectivity and
everything was fine. When I tried to resolve the name from the IPA dc
using ping it would fail, but nslookup directly to the forward server
worked fine.
ipactl showed no issues, and only after I restarted the server did the
lookups start working again.
Using username "myipausername".
Last login: Thu Oct 1 16:36:51 2015 from 10.5.5.57
Last login: Tue Sep 29 19:03:39 UTC 2015 on pts/3
ATTEMPT FIRST PING TO UNRESOLVABLE HOST
=======================================
ping: unknown host artifactory.externaldomain.net
CHECK IPA STATUS
================
Directory Service: RUNNING
krb5kdc Service: RUNNING
kadmin Service: RUNNING
named Service: RUNNING
ipa_memcached Service: RUNNING
httpd Service: RUNNING
pki-tomcatd Service: RUNNING
smb Service: RUNNING
winbind Service: RUNNING
ipa-otpd Service: RUNNING
ipa-dnskeysyncd Service: RUNNING
ipa: INFO: The ipactl command was successful
ATTEMPT PING OF GLOBAL FORWARDER
================================
PING 10.21.0.14 (10.21.0.14) 56(84) bytes of data.
64 bytes from 10.21.0.14: icmp_seq=1 ttl=64 time=0.275 ms
64 bytes from 10.21.0.14: icmp_seq=2 ttl=64 time=0.327 ms
^C
--- 10.21.0.14 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.275/0.301/0.327/0.026 ms
MANUAL NSLOOKUP OF DOMAIN ON GLOBAL FORWARDER FROM IPA DC
=========================================================
server 10.21.0.14
Default server: 10.21.0.14
Address: 10.21.0.14#53
artifactory.externaldomain.net
Server: 10.21.0.14
Address: 10.21.0.14#53
artifactory.externaldomain.net canonical name =
van-artifactory1.externaldomain.net.
Name: van-artifactory1.externaldomain.net
Address: 10.20.10.14
RE-ATTEMPT PING SINCE WE KNOW THAT NAME RESOLUTION (at least via nslookup
IS WORKING FROM THIS MACHINE
======================================================================================================
ping: unknown host artifactory.externaldomain.net
ping: unknown host van-artifactory1.externaldomain.net
RESTART IPA SERVICES
====================
Restarting Directory Service
Restarting krb5kdc Service
Restarting kadmin Service
Restarting named Service
Restarting ipa_memcached Service
Restarting httpd Service
Restarting pki-tomcatd Service
Restarting smb Service
Restarting winbind Service
Restarting ipa-otpd Service
Restarting ipa-dnskeysyncd Service
ipa: INFO: The ipactl command was successful
ipa: ERROR: did not receive Kerberos credentials
OUTPUT GLOBAL FORWARDER CONFIG FOR TROUBLESHOOTING
==================================================
Global forwarders: 10.21.0.14
Allow PTR sync: TRUE
PING NOW WORKS BECAUSE IPA SERVICES WERE RESTARTED
==================================================
PING van-artifactory1.externaldomain.net (10.20.10.14) 56(84) bytes of data.
64 bytes from 10.20.10.14: icmp_seq=1 ttl=60 time=3.00 ms
64 bytes from 10.20.10.14: icmp_seq=2 ttl=60 time=1.42 ms
64 bytes from 10.20.10.14: icmp_seq=3 ttl=60 time=2.39 ms
^C
--- van-artifactory1.externaldomain.net ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
rtt min/avg/max/mdev = 1.420/2.274/3.004/0.653 ms
Here are some strange enties from my /var/log/messages relating to errors
Oct 1 20:39:31 dc1 named-pkcs11[15066]: checkhints: unable to get root NS
rrset from cache: not found
Oct 1 20:39:17 dc1 named-pkcs11[15066]: error (network unreachable)
resolving 'pmdb1.ipadomain.net/A/IN': 2001:500:2f::f#53
Oct 1 20:39:17 dc1 named-pkcs11[15066]: error (network unreachable)
resolving 'pmdb1.ipadomain.net/AAAA/IN': 2001:500:2f::f#53
Looking at the log entries, it appears that there may have been a network
connectivity 'blip' (maybe a switch or router was restarted) at some point
and even after connectivity was restored, the global forwarding was
failing because the "we can't contact our forwarder" status seemed to get
stuck in memory.
--
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Petr Spacek
2015-10-05 08:03:09 UTC
Permalink
Post by n***@nathanpeters.com
This issue has occured again and I am once again trying to troubleshoot it.
show forwarder
--------------
-bash-4.2$ ipa dnsconfig-show
Global forwarders: 10.21.0.14
Allow PTR sync: TRUE
attempt ping
------------
-bash-4.2$ ping stash.externaldomain.net
ping: unknown host stash.externaldomain.net
-attempt nslookup
-----------------
-bash-4.2$ nslookup
Post by n***@nathanpeters.com
stash.externaldomain.net
Server: 127.0.0.1
Address: 127.0.0.1#53
** server can't find stash.externaldomain.net: NXDOMAIN
*comment* : strange it doesn't work against localhost. Lets make sure
-bash-4.2$ nslookup
Post by n***@nathanpeters.com
google.com
Server: 127.0.0.1
Address: 127.0.0.1#53
Name: google.com
Address: 216.58.216.142
*comment* : yup, I can resolve google.com when talking to localhost...
now lets try to talk to the forwarder configured in the global settings
-----------------------------------------------------------------------
Post by n***@nathanpeters.com
server 10.21.0.14
Default server: 10.21.0.14
Address: 10.21.0.14#53
Post by n***@nathanpeters.com
stash.externaldomain.net
Server: 10.21.0.14
Address: 10.21.0.14#53
stash.externaldomain.net canonical name = git1.externaldomain.net.
Name: git1.externaldomain.net
Address: 10.20.10.30
more troubleshooting
--------------------
I ran wireshark to see what the freeipa server was sending back to the
3 0.000393 10.178.0.99 10.178.21.2 DNS 163 Standard query response 0x12ca
No such name CNAME git1.externaldomain.net
I've never seen a 'no such CNAME' response before. Lets look at the
Frame 4: 163 bytes on wire (1304 bits), 163 bytes captured (1304 bits)
HewlettP_3c:f9:48 (2c:59:e5:3c:f9:48)
10.178.21.2 (10.178.21.2)
User Datagram Protocol, Src Port: 53 (53), Dst Port: 57374 (57374)
Domain Name System (response)
[Request In: 1]
[Time: 0.000414000 seconds]
Transaction ID: 0x12ca
Flags: 0x8183 Standard query response, No such name
1... .... .... .... = Response: Message is a response
.000 0... .... .... = Opcode: Standard query (0)
.... .0.. .... .... = Authoritative: Server is not an authority
for domain
.... ..0. .... .... = Truncated: Message is not truncated
.... ...1 .... .... = Recursion desired: Do query recursively
.... .... 1... .... = Recursion available: Server can do recursive
queries
.... .... .0.. .... = Z: reserved (0)
.... .... ..0. .... = Answer authenticated: Answer/authority
portion was not authenticated by the server
.... .... ...0 .... = Non-authenticated data: Unacceptable
.... .... .... 0011 = Reply code: No such name (3)
Questions: 1
Answer RRs: 1
Authority RRs: 1
Additional RRs: 0
Queries
stash.externaldomain.net: type A, class IN
Name: stash.externaldomain.net
[Name Length: 21]
[Label Count: 3]
Type: A (Host Address) (1)
Class: IN (0x0001)
Answers
stash.externaldomain.net: type CNAME, class IN, cname
git1.externaldomain.net
Name: stash.externaldomain.net
Type: CNAME (Canonical NAME for an alias) (5)
Class: IN (0x0001)
Time to live: 22483
Data length: 20
CNAME: git1.externaldomain.net
Authoritative nameservers
externaldomain.net: type SOA, class IN, mname
van-dns1.externaldomain.net
Name: externaldomain.net
Type: SOA (Start Of a zone of Authority) (6)
Class: IN (0x0001)
Time to live: 518
Data length: 38
Primary name server: van-dns1.externaldomain.net
Responsible authority's mailbox: tech.externaldomain.net
Serial Number: 2015092101
Refresh Interval: 10800 (3 hours)
Retry Interval: 900 (15 minutes)
Expire limit: 604800 (7 days)
Minimum TTL: 86400 (1 day)
Post by n***@nathanpeters.com
We have a FreeIPA domain running IPA server 4.1.4 on CentOS 7.
We have no per zone forwarding enabled, only a single global forwarder.
This seems to work fine, but then after a while (several weeks I think)
will randomly stop working.
We had this issue several weeks ago on a different IPA domain (identical
setup) in our production network but it was ignored because a server
restart fixed it.
This issue then re-surfaced in our development domain today (different
network, different physical hardware, same OS and IPA versions).
I received a report today from a developer that he could not ping a
machine in another domain so I verified network connectivity and
everything was fine. When I tried to resolve the name from the IPA dc
using ping it would fail, but nslookup directly to the forward server
worked fine.
ipactl showed no issues, and only after I restarted the server did the
lookups start working again.
Using username "myipausername".
Last login: Thu Oct 1 16:36:51 2015 from 10.5.5.57
Last login: Tue Sep 29 19:03:39 UTC 2015 on pts/3
ATTEMPT FIRST PING TO UNRESOLVABLE HOST
=======================================
ping: unknown host artifactory.externaldomain.net
CHECK IPA STATUS
================
Directory Service: RUNNING
krb5kdc Service: RUNNING
kadmin Service: RUNNING
named Service: RUNNING
ipa_memcached Service: RUNNING
httpd Service: RUNNING
pki-tomcatd Service: RUNNING
smb Service: RUNNING
winbind Service: RUNNING
ipa-otpd Service: RUNNING
ipa-dnskeysyncd Service: RUNNING
ipa: INFO: The ipactl command was successful
ATTEMPT PING OF GLOBAL FORWARDER
================================
PING 10.21.0.14 (10.21.0.14) 56(84) bytes of data.
64 bytes from 10.21.0.14: icmp_seq=1 ttl=64 time=0.275 ms
64 bytes from 10.21.0.14: icmp_seq=2 ttl=64 time=0.327 ms
^C
--- 10.21.0.14 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.275/0.301/0.327/0.026 ms
MANUAL NSLOOKUP OF DOMAIN ON GLOBAL FORWARDER FROM IPA DC
=========================================================
server 10.21.0.14
Default server: 10.21.0.14
Address: 10.21.0.14#53
artifactory.externaldomain.net
Server: 10.21.0.14
Address: 10.21.0.14#53
artifactory.externaldomain.net canonical name =
van-artifactory1.externaldomain.net.
Name: van-artifactory1.externaldomain.net
Address: 10.20.10.14
RE-ATTEMPT PING SINCE WE KNOW THAT NAME RESOLUTION (at least via nslookup
IS WORKING FROM THIS MACHINE
======================================================================================================
ping: unknown host artifactory.externaldomain.net
ping: unknown host van-artifactory1.externaldomain.net
RESTART IPA SERVICES
====================
Restarting Directory Service
Restarting krb5kdc Service
Restarting kadmin Service
Restarting named Service
Restarting ipa_memcached Service
Restarting httpd Service
Restarting pki-tomcatd Service
Restarting smb Service
Restarting winbind Service
Restarting ipa-otpd Service
Restarting ipa-dnskeysyncd Service
ipa: INFO: The ipactl command was successful
ipa: ERROR: did not receive Kerberos credentials
OUTPUT GLOBAL FORWARDER CONFIG FOR TROUBLESHOOTING
==================================================
Global forwarders: 10.21.0.14
Allow PTR sync: TRUE
PING NOW WORKS BECAUSE IPA SERVICES WERE RESTARTED
==================================================
PING van-artifactory1.externaldomain.net (10.20.10.14) 56(84) bytes of data.
64 bytes from 10.20.10.14: icmp_seq=1 ttl=60 time=3.00 ms
64 bytes from 10.20.10.14: icmp_seq=2 ttl=60 time=1.42 ms
64 bytes from 10.20.10.14: icmp_seq=3 ttl=60 time=2.39 ms
^C
--- van-artifactory1.externaldomain.net ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
rtt min/avg/max/mdev = 1.420/2.274/3.004/0.653 ms
Here are some strange enties from my /var/log/messages relating to errors
Oct 1 20:39:31 dc1 named-pkcs11[15066]: checkhints: unable to get root NS
rrset from cache: not found
Oct 1 20:39:17 dc1 named-pkcs11[15066]: error (network unreachable)
resolving 'pmdb1.ipadomain.net/A/IN': 2001:500:2f::f#53
Oct 1 20:39:17 dc1 named-pkcs11[15066]: error (network unreachable)
resolving 'pmdb1.ipadomain.net/AAAA/IN': 2001:500:2f::f#53
Looking at the log entries, it appears that there may have been a network
connectivity 'blip' (maybe a switch or router was restarted) at some point
and even after connectivity was restored, the global forwarding was
failing because the "we can't contact our forwarder" status seemed to get
stuck in memory.
Most likely.
Post by n***@nathanpeters.com
Post by n***@nathanpeters.com
Global forwarders: 10.21.0.14
Allow PTR sync: TRUE
This means that you are using the default forward policy which is 'first'.
I.e. BIND daemon on the IPA server is trying to use the forwarder first and
when it fails it fallbacks to asking server on the public Internet.

I speculate that public servers know nothing about the name you were asking
for and this negative answer got cached. This is default behavior in BIND and
IPA did not change it.

Workaround for network problems could be
$ ipa dnsconfig-mod --forward-policy=only
which will prevent BIND from falling back to public servers.

Anyway, you should solve network connectivity problems, too :-)

I hope this helps.
--
Petr^2 Spacek
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
n***@nathanpeters.com
2015-10-05 19:48:12 UTC
Permalink
Post by Petr Spacek
Post by n***@nathanpeters.com
Looking at the log entries, it appears that there may have been a network
connectivity 'blip' (maybe a switch or router was restarted) at some point
and even after connectivity was restored, the global forwarding was
failing because the "we can't contact our forwarder" status seemed to get
stuck in memory.
Most likely.
Post by n***@nathanpeters.com
Global forwarders: 10.21.0.14
Allow PTR sync: TRUE
This means that you are using the default forward policy which is 'first'.
I.e. BIND daemon on the IPA server is trying to use the forwarder first and
when it fails it fallbacks to asking server on the public Internet.
I speculate that public servers know nothing about the name you were asking
for and this negative answer got cached. This is default behavior in BIND and
IPA did not change it.
Workaround for network problems could be
$ ipa dnsconfig-mod --forward-policy=only
which will prevent BIND from falling back to public servers.
Anyway, you should solve network connectivity problems, too :-)
I hope this helps.
--
Petr^2 Spacek
--
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Ok, we managed to figure out what was happening here, but I still think
there is a bug somewhere in the FreeIPA DNS components that is
exacerbating the issue.

We have split DNS in our company. We have a public copy of our DNS
records, which contain only A records. We also have an internal copy of
our DNS records, which contains a bunch of CNAME records.

When we use nslookup to query the IPA server for stash.externaldomain.net
NSLOOKUP returns that stash.externaldomain.net is a CNAME and it returns
the associated A address.

When we query FreeIPA though a DNS client, FreeIPA returns that stash is a
cname and does not return the associated A address. It seems like at that
point, FreeIPA decides that instead of sticking in 'forward' mode and
forwarding the request for the CNAME
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
n***@nathanpeters.com
2015-10-05 19:57:24 UTC
Permalink
Post by Petr Spacek
Post by n***@nathanpeters.com
Looking at the log entries, it appears that there may have been a network
connectivity 'blip' (maybe a switch or router was restarted) at some point
and even after connectivity was restored, the global forwarding was
failing because the "we can't contact our forwarder" status seemed to get
stuck in memory.
Most likely.
Post by n***@nathanpeters.com
Global forwarders: 10.21.0.14
Allow PTR sync: TRUE
This means that you are using the default forward policy which is 'first'.
I.e. BIND daemon on the IPA server is trying to use the forwarder first and
when it fails it fallbacks to asking server on the public Internet.
I speculate that public servers know nothing about the name you were asking
for and this negative answer got cached. This is default behavior in BIND and
IPA did not change it.
Workaround for network problems could be
$ ipa dnsconfig-mod --forward-policy=only
which will prevent BIND from falling back to public servers.
Anyway, you should solve network connectivity problems, too :-)
I hope this helps.
--
Petr^2 Spacek
--
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Ok, we managed to figure out what was happening here, but I still think
there is a bug somewhere in the FreeIPA DNS components that is
exacerbating the issue.

We have split DNS in our company. We have a public copy of our DNS
records, which contain only A records. We also have an internal copy of
our DNS records, which contains a bunch of CNAME records.

When we use nslookup to query the IPA server for stash.externaldomain.net
NSLOOKUP returns that stash.externaldomain.net is a CNAME and it returns
both the CNAME and the associated A address.

When we query FreeIPA though a DNS client, FreeIPA returns that stash is a
CNAME and does not return the associated A address. It seems like at that
point, FreeIPA decides that instead of sticking in 'forward' mode and
forwarding the request for the CNAME record to the forwarding server, it
decides instead to switch into recursive mode and begin the lookup by
querying root servers for externaldomain.net at which point since it is
going out to the general internet, it gets back answers from our public
facing DNS servers (remember we use split DNS).

The answer it gets is (quite rightly) that there is no such CNAME record
on the public DNS server.

So I have a couple questions about how the forward first policy is
supposed to work...

1. In forward first mode, if the forward server returns a CNAME, is
FreeIPA supposed to ask the same forwarding server for the A record
associated with the CNAME, or is it supposed to then flip into recursive
mode and go to the root servers to find the CNAME? I would expect #1, but
it seems like #2 is always happening no matter what. I would only expect
it to attempt recursion if the initial query actually failed, not just
returned something other than an A record.

2. We did some more network packet capture, and noticed that in forward
first mode, the FreeIPA server, always sent out both a forward request to
the forwarding server, and an additional simultaneous request to the root
name servers (recursive mode). It got back responses to both the
forwarded and recursive queries it had performed. The recursive query
failed due to split DNS and the forwarded query succeeded due to it going
to an internal server which had the correct records. Strangely enough...
the IPA server ignored the successful forwarded answer, and sent back the
'failed' answer it had gotten through recursion back to the requesting
client. What is the behavior supposed to be in this situation and why is
the server always sending out the recursive request, even when it gets a
valid answer from the forwarded request?
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Petr Spacek
2015-10-06 11:48:40 UTC
Permalink
Post by n***@nathanpeters.com
Post by Petr Spacek
Post by n***@nathanpeters.com
Looking at the log entries, it appears that there may have been a network
connectivity 'blip' (maybe a switch or router was restarted) at some point
and even after connectivity was restored, the global forwarding was
failing because the "we can't contact our forwarder" status seemed to get
stuck in memory.
Most likely.
Post by n***@nathanpeters.com
Global forwarders: 10.21.0.14
Allow PTR sync: TRUE
This means that you are using the default forward policy which is 'first'.
I.e. BIND daemon on the IPA server is trying to use the forwarder first and
when it fails it fallbacks to asking server on the public Internet.
I speculate that public servers know nothing about the name you were asking
for and this negative answer got cached. This is default behavior in BIND and
IPA did not change it.
Workaround for network problems could be
$ ipa dnsconfig-mod --forward-policy=only
which will prevent BIND from falling back to public servers.
Anyway, you should solve network connectivity problems, too :-)
I hope this helps.
--
Petr^2 Spacek
--
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Ok, we managed to figure out what was happening here, but I still think
there is a bug somewhere in the FreeIPA DNS components that is
exacerbating the issue.
We have split DNS in our company. We have a public copy of our DNS
records, which contain only A records. We also have an internal copy of
our DNS records, which contains a bunch of CNAME records.
When we use nslookup to query the IPA server for stash.externaldomain.net
NSLOOKUP returns that stash.externaldomain.net is a CNAME and it returns
both the CNAME and the associated A address.
When we query FreeIPA though a DNS client, FreeIPA returns that stash is a
CNAME and does not return the associated A address. It seems like at that
point, FreeIPA decides that instead of sticking in 'forward' mode and
forwarding the request for the CNAME record to the forwarding server, it
decides instead to switch into recursive mode and begin the lookup by
querying root servers for externaldomain.net at which point since it is
going out to the general internet, it gets back answers from our public
facing DNS servers (remember we use split DNS).
The answer it gets is (quite rightly) that there is no such CNAME record
on the public DNS server.
So I have a couple questions about how the forward first policy is
supposed to work...
Here I would say that IPA does not do anything special - IPA just configures
BIND and all the rest is standard BIND behavior. I.e. when in doubt, consult
respective BIND docs.
Post by n***@nathanpeters.com
1. In forward first mode, if the forward server returns a CNAME, is
FreeIPA supposed to ask the same forwarding server for the A record
associated with the CNAME, or is it supposed to then flip into recursive
mode and go to the root servers to find the CNAME? I would expect #1, but
it seems like #2 is always happening no matter what. I would only expect
it to attempt recursion if the initial query actually failed, not just
returned something other than an A record.
Your expectation #1 is correct, but there can be multiple reasons why it fails.

Did you try to set forward policy = only as I advised you in the previous
e-mail? Forward policy 'first' does not make sense when split-DNS is involved
because you can end up with mixture of records from different views in one
cache, which obviously results in a mess.
Post by n***@nathanpeters.com
2. We did some more network packet capture, and noticed that in forward
first mode, the FreeIPA server, always sent out both a forward request to
the forwarding server, and an additional simultaneous request to the root
name servers (recursive mode). It got back responses to both the
forwarded and recursive queries it had performed. The recursive query
failed due to split DNS and the forwarded query succeeded due to it going
to an internal server which had the correct records. Strangely enough...
the IPA server ignored the successful forwarded answer, and sent back the
'failed' answer it had gotten through recursion back to the requesting
client. What is the behavior supposed to be in this situation and why is
the server always sending out the recursive request, even when it gets a
valid answer from the forwarded request?
This is weird, but again - it can have multiple reasons. Do you see something
in BIND logs? Does it e.g. complain about DNSSEC validation failures?

Petr^2 Spacek
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
n***@nathanpeters.com
2015-10-06 16:57:58 UTC
Permalink
Post by Petr Spacek
Your expectation #1 is correct, but there can be multiple reasons why it fails.
Did you try to set forward policy = only as I advised you in the previous
e-mail? Forward policy 'first' does not make sense when split-DNS is involved
because you can end up with mixture of records from different views in one
cache, which obviously results in a mess.
Yes, we ended up having to use the forward only policy to get this
working. That is unfortunate, because if our forwarding server ever goes
down or gets rebooted, that essentially disconnects us from being able to
resolve external internet domain names. It would be nice to have
recursion as a fallback, but it seems to go into that mode too often to be
useful in our split DNS situation.
Post by Petr Spacek
Post by n***@nathanpeters.com
2. We did some more network packet capture, and noticed that in forward
first mode, the FreeIPA server, always sent out both a forward request to
the forwarding server, and an additional simultaneous request to the root
name servers (recursive mode). It got back responses to both the
forwarded and recursive queries it had performed. The recursive query
failed due to split DNS and the forwarded query succeeded due to it going
to an internal server which had the correct records. Strangely enough...
the IPA server ignored the successful forwarded answer, and sent back the
'failed' answer it had gotten through recursion back to the requesting
client. What is the behavior supposed to be in this situation and why is
the server always sending out the recursive request, even when it gets a
valid answer from the forwarded request?
This is weird, but again - it can have multiple reasons. Do you see something
in BIND logs? Does it e.g. complain about DNSSEC validation failures?
Petr^2 Spacek
Yes, we actually were getting DNSSEC validation failures. We had to
disable DNSSEC to get the forward only policy to work. With DNSSEC turned
on, forward only would not work because DNSSEC still tried to directly
contact root servers.
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Petr Spacek
2015-10-07 10:05:04 UTC
Permalink
Post by n***@nathanpeters.com
Post by Petr Spacek
Your expectation #1 is correct, but there can be multiple reasons why it fails.
Did you try to set forward policy = only as I advised you in the previous
e-mail? Forward policy 'first' does not make sense when split-DNS is involved
because you can end up with mixture of records from different views in one
cache, which obviously results in a mess.
Yes, we ended up having to use the forward only policy to get this
working. That is unfortunate, because if our forwarding server ever goes
down or gets rebooted, that essentially disconnects us from being able to
resolve external internet domain names. It would be nice to have
recursion as a fallback, but it seems to go into that mode too often to be
useful in our split DNS situation.
Post by Petr Spacek
Post by n***@nathanpeters.com
2. We did some more network packet capture, and noticed that in forward
first mode, the FreeIPA server, always sent out both a forward request to
the forwarding server, and an additional simultaneous request to the root
name servers (recursive mode). It got back responses to both the
forwarded and recursive queries it had performed. The recursive query
failed due to split DNS and the forwarded query succeeded due to it going
to an internal server which had the correct records. Strangely enough...
the IPA server ignored the successful forwarded answer, and sent back the
'failed' answer it had gotten through recursion back to the requesting
client. What is the behavior supposed to be in this situation and why is
the server always sending out the recursive request, even when it gets a
valid answer from the forwarded request?
This is weird, but again - it can have multiple reasons. Do you see something
in BIND logs? Does it e.g. complain about DNSSEC validation failures?
Petr^2 Spacek
Yes, we actually were getting DNSSEC validation failures. We had to
disable DNSSEC to get the forward only policy to work. With DNSSEC turned
on, forward only would not work because DNSSEC still tried to directly
contact root servers.
It is very likely that this was caused by some misconfiguration in your DNS
views. Could you share error messages from BIND logs? We could use them to
improve detection logic so we can warn users early instead of tedious debugging.

BTW what version if IPA do you use? We were adding checks to catch common
misconfigurations to version 4.2.
--
Petr Spacek @ Red Hat
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Loading...