Discussion:
[Freeipa-users] any tips or horror stories about automating dynamic enrollment and removal of IPA clients?
Chris Dagdigian
2017-04-13 12:05:41 UTC
Permalink
Hi folks,

I've got a high performance computing (HPC) use case that will need AD
integration for user identity management. We've got a working IPA server
in AWS that has 1-way trusts going to several remote AD forests and
child domains. Works fine but so far all of the enrolled clients are
largely static/persistent boxes.

The issue is that the HPC cluster footprint is going to be elastic by
design. We'll likely keep 3-5 nodes in the grid online 24x7 but the vast
majority of the compute node fleet (hundreds of nodes quite likely) will
be fired up on demand as a mixture of spot, RI and hourly-rate EC2
instances. The cluster will automatically shrink in size as well when
needed.

Trying to think of which method I should use for managing users (mainly
UID and GID values) on the compute fleet:

[Option 1] Script the enrollment and de-install actions via existing
hooks we have for running scripts at "first boot" as well as
"pre-termination". I think this seems technically pretty
straightforward but I'm not sure I really need to stuff our IPA server
with host information for boxes that are considered anonymous and
disposable. We don't care about them really and don't need to implement
RBAC controls on them. Also slightly worried that a large-scale
enrollment or uninstall action may bog down the server or (worse)
perhaps only partially complete leading to an HPC grid where jobs flow
into a bad box and die en-mass because "user does not exist..."

[Option 2] Steal from the HPC ops playbook and minimize network
services that can cause failures. Distribute static files to the worker
fleet -- Bind the 24x7 persistent systems to the IPA server and force
all HPC users to provide a public SSH key. Then use commands like "id
<username" and getent utilities to dump the username/uid/gid values so
that we can manufacture static /etc/passwd, /etc/shadow and /etc/group
files that can be pushed out to the compute node fleet. The main win
here is that we can maintain consistent IPA-derived
UID/GID/username/group data cluster wide while totally removing the need
for an elastic set of anonymous boxes to be individually enrolled and
removed from IPA all the time.

Right now I'm leaning towards Option #2 but would love to hear
experiences regarding moderate-scale automatic enrollment and removal of
clients!

-Chris
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Gerald-Markus Zabos
2017-04-13 13:21:20 UTC
Permalink
Post by Chris Dagdigian
Right now I'm leaning towards Option #2 but would love to hear
experiences regarding moderate-scale automatic enrollment and removal of
clients!
-Chris
Hi Chris,

we're facing a similar use case from day to day, but changed from AWS to
another cloud provider. Our use case works on both, so i am refering to
AWS.

We decided...

...to use SGE for our HPC infrastructure
...recycle network ranges for 100 static IP addresses + 100 static
hostnames
...to use scripts & cronjobs & ansible (depending on "qstat" and "qhost"
output) on the cluster head node to determine how many additional
cluster nodes have to be created as an additional reserve for
"What-if-we-need-more-nodes?" scenarios
...to create cluster nodes via ansible-playbook on AWS from a
pre-defined image, do software installation & configuration via
ansible-playbook, do the IPA domain join via ansible-playbook
("ipa-client-install --domain=<DOMAIN> --mkhomedir
--hostname=<FreeIPA-Client>.<DOMAIN> --ip-address=<FreeIPA-Client IP
address> -p <Join User> -w <Join User's password> --unattended")
...to destroy cluster nodes in two steps: 1) ansible-playbook
"ipa-client-install --uninstall", 2) ansible-playbook destroy cluster
node on AWS via API

(Right now, i am working on a bulk creation script of IPA users/groups
for expanding our single HPC cluster into several ones, whereas we have
the same set of users (~65-100) with differing suffix in the username
e.g. "it_ops01", "it_ops20", etc...)

We're using 2x IPA-Servers (ESXi VMs, 4GB RAM, 2 CPU) in replication
with another 2x IPA Servers (same dimensions) on our main physical
datacenter. Didn't see much impact on the IPA servers during
enrollment/removal of domain hosts. So far after three months of
operations, we had several "bad box" scenarios, all of them because of
problems with SGE. We solved these problems manually, by removing/adding
cluster nodes via SGE commands.

As you can see, i tend to [Option 1], since it does all the magic with
pre-defined software commands(sge, ansible, ipa cli), instead of jumping
around with additional scripts doing work, which can be done by
"built-in" commands. For us, this works best.

Regards,

Gerald
--
Gerald-Markus Zabos <***@googlemail.com>
Web: http://www.gmzgames.de
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Chris Dagdigian
2017-04-13 14:44:51 UTC
Permalink
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Simo Sorce
2017-04-13 13:31:31 UTC
Permalink
Post by Chris Dagdigian
Hi folks,
I've got a high performance computing (HPC) use case that will need AD
integration for user identity management. We've got a working IPA server
in AWS that has 1-way trusts going to several remote AD forests and
child domains. Works fine but so far all of the enrolled clients are
largely static/persistent boxes.
The issue is that the HPC cluster footprint is going to be elastic by
design. We'll likely keep 3-5 nodes in the grid online 24x7 but the vast
majority of the compute node fleet (hundreds of nodes quite likely) will
be fired up on demand as a mixture of spot, RI and hourly-rate EC2
instances. The cluster will automatically shrink in size as well when
needed.
Trying to think of which method I should use for managing users (mainly
[Option 1] Script the enrollment and de-install actions via existing
hooks we have for running scripts at "first boot" as well as
"pre-termination". I think this seems technically pretty
straightforward but I'm not sure I really need to stuff our IPA server
with host information for boxes that are considered anonymous and
disposable. We don't care about them really and don't need to implement
RBAC controls on them. Also slightly worried that a large-scale
enrollment or uninstall action may bog down the server or (worse)
perhaps only partially complete leading to an HPC grid where jobs flow
into a bad box and die en-mass because "user does not exist..."
[Option 2] Steal from the HPC ops playbook and minimize network
services that can cause failures. Distribute static files to the worker
fleet -- Bind the 24x7 persistent systems to the IPA server and force
all HPC users to provide a public SSH key. Then use commands like "id
<username" and getent utilities to dump the username/uid/gid values so
that we can manufacture static /etc/passwd, /etc/shadow and /etc/group
files that can be pushed out to the compute node fleet. The main win
here is that we can maintain consistent IPA-derived
UID/GID/username/group data cluster wide while totally removing the need
for an elastic set of anonymous boxes to be individually enrolled and
removed from IPA all the time.
Right now I'm leaning towards Option #2 but would love to hear
experiences regarding moderate-scale automatic enrollment and removal of
clients!
One option could also be to keep a (set of) keytab(s) you can copy on
the elastic hosts and preconfigure their sssd daemon. At boot you copy
the keytab in the host and start sssd and everything should magically
work. They all are basically the same identity so using the same key for
all of them may be acceptable.
Post by Chris Dagdigian
From the IPA side it will look like suddenly the same host has multiple
IP addresses and is opening one connection from each of them, but that
is ok.

Simo.
--
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Alexander Bokovoy
2017-04-13 14:16:19 UTC
Permalink
Post by Simo Sorce
Post by Chris Dagdigian
Hi folks,
I've got a high performance computing (HPC) use case that will need AD
integration for user identity management. We've got a working IPA server
in AWS that has 1-way trusts going to several remote AD forests and
child domains. Works fine but so far all of the enrolled clients are
largely static/persistent boxes.
The issue is that the HPC cluster footprint is going to be elastic by
design. We'll likely keep 3-5 nodes in the grid online 24x7 but the vast
majority of the compute node fleet (hundreds of nodes quite likely) will
be fired up on demand as a mixture of spot, RI and hourly-rate EC2
instances. The cluster will automatically shrink in size as well when
needed.
Trying to think of which method I should use for managing users (mainly
[Option 1] Script the enrollment and de-install actions via existing
hooks we have for running scripts at "first boot" as well as
"pre-termination". I think this seems technically pretty
straightforward but I'm not sure I really need to stuff our IPA server
with host information for boxes that are considered anonymous and
disposable. We don't care about them really and don't need to implement
RBAC controls on them. Also slightly worried that a large-scale
enrollment or uninstall action may bog down the server or (worse)
perhaps only partially complete leading to an HPC grid where jobs flow
into a bad box and die en-mass because "user does not exist..."
[Option 2] Steal from the HPC ops playbook and minimize network
services that can cause failures. Distribute static files to the worker
fleet -- Bind the 24x7 persistent systems to the IPA server and force
all HPC users to provide a public SSH key. Then use commands like "id
<username" and getent utilities to dump the username/uid/gid values so
that we can manufacture static /etc/passwd, /etc/shadow and /etc/group
files that can be pushed out to the compute node fleet. The main win
here is that we can maintain consistent IPA-derived
UID/GID/username/group data cluster wide while totally removing the need
for an elastic set of anonymous boxes to be individually enrolled and
removed from IPA all the time.
Right now I'm leaning towards Option #2 but would love to hear
experiences regarding moderate-scale automatic enrollment and removal of
clients!
One option could also be to keep a (set of) keytab(s) you can copy on
the elastic hosts and preconfigure their sssd daemon. At boot you copy
the keytab in the host and start sssd and everything should magically
work. They all are basically the same identity so using the same key for
all of them may be acceptable.
It would be better to avoid using Kerberos authentication here at all.

Multiple hosts authenticating with the same key would cause a lot of
updates in the LDAP entry representing this principal. This is going to
break replication if this is the only key that is used by multiple hosts
against multiple IPA masters.
--
/ Alexander Bokovoy
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Simo Sorce
2017-04-13 14:25:45 UTC
Permalink
Post by Alexander Bokovoy
Post by Simo Sorce
Post by Chris Dagdigian
Hi folks,
I've got a high performance computing (HPC) use case that will need AD
integration for user identity management. We've got a working IPA server
in AWS that has 1-way trusts going to several remote AD forests and
child domains. Works fine but so far all of the enrolled clients are
largely static/persistent boxes.
The issue is that the HPC cluster footprint is going to be elastic by
design. We'll likely keep 3-5 nodes in the grid online 24x7 but the vast
majority of the compute node fleet (hundreds of nodes quite likely) will
be fired up on demand as a mixture of spot, RI and hourly-rate EC2
instances. The cluster will automatically shrink in size as well when
needed.
Trying to think of which method I should use for managing users (mainly
[Option 1] Script the enrollment and de-install actions via existing
hooks we have for running scripts at "first boot" as well as
"pre-termination". I think this seems technically pretty
straightforward but I'm not sure I really need to stuff our IPA server
with host information for boxes that are considered anonymous and
disposable. We don't care about them really and don't need to implement
RBAC controls on them. Also slightly worried that a large-scale
enrollment or uninstall action may bog down the server or (worse)
perhaps only partially complete leading to an HPC grid where jobs flow
into a bad box and die en-mass because "user does not exist..."
[Option 2] Steal from the HPC ops playbook and minimize network
services that can cause failures. Distribute static files to the worker
fleet -- Bind the 24x7 persistent systems to the IPA server and force
all HPC users to provide a public SSH key. Then use commands like "id
<username" and getent utilities to dump the username/uid/gid values so
that we can manufacture static /etc/passwd, /etc/shadow and /etc/group
files that can be pushed out to the compute node fleet. The main win
here is that we can maintain consistent IPA-derived
UID/GID/username/group data cluster wide while totally removing the need
for an elastic set of anonymous boxes to be individually enrolled and
removed from IPA all the time.
Right now I'm leaning towards Option #2 but would love to hear
experiences regarding moderate-scale automatic enrollment and removal of
clients!
One option could also be to keep a (set of) keytab(s) you can copy on
the elastic hosts and preconfigure their sssd daemon. At boot you copy
the keytab in the host and start sssd and everything should magically
work. They all are basically the same identity so using the same key for
all of them may be acceptable.
It would be better to avoid using Kerberos authentication here at all.
Multiple hosts authenticating with the same key would cause a lot of
updates in the LDAP entry representing this principal. This is going to
break replication if this is the only key that is used by multiple hosts
against multiple IPA masters.
If replication is a issue we should probably mask those attributes from
replication as well, just like we do for attributes for failed auth.

Simo.
--
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc
--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project
Loading...