7

So, I have this host group, which consist of 35k VMs.
Aaaand I need to run a playbook over it.
If it does matter - playbook is just a call for community role for installing node_exporter.

But I'm having hard times trying to run it wildly.
I know that running it as is on a such huge host group will definitely cause OOM, so I made a bunch of tries to make it both reliable (make sure it'll finish and not be killed) and fast (but in reality it doesn't)

So I'm doing this :

  1. Using strategy: free
  2. Using serial: 350
  3. Collecting only facts I need:
  gather_facts: true
  gather_subset:
    - "default_ipv4"
    - "system"
    - "service_mgr"
    - "pkg_mgr"
    - "os_family"
    - "selinux"
    - "user"
    - "mounts"
    - "!all"
    - "!min"
  1. Using -f 350 when calling playbook to make it running playbook over 350 machines simultaneously.
  2. Using persistent setting to make it hold ssh connection
use_persistent_connections = True

[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=1800s -o PreferredAuthentications=publickey -o ForwardAgent=yes

[connection]
ansible_pipelining = True

[persistent_connection]
connect_timeout = 1800

And, well....it doesn't work. Like the biggest problem I see - it's not really spawning 350 forks at a time for doing that. All I see is ~3-5 processes running something on a remote host (max I've seen was maybe 20 of them?) so it's painfully slow. Running it on 350 hosts takes ~1.5h, which is insane, as calling this playbook/role on 30 machines takes around 3-4 minutes to complete.

Plus, it's OOMing anyway at some point. I'm running it on 32 cores/ 64 Gb RAM VM dedicated for running this one playbook only at the moment and it's OOMing anyway, that's insane.
From my understanding serial setting should preventing that, as it would free up some memory after every batch. But it's not it seems. It just constantly grows.

Now I'm running it using bash script which builds batches of machines and then I'm calling playbook with -l "machine1:machine2:.....:machine350" but that completely wrong.

So my questions here are - why am I not able to run the role/playbook on the host group at once, why it's so slow, why it's OOMing and how to prevent that.

TIA for all the help!

3
  • From my experience the solution would just be instances like or similar to Ansible Tower - Instance Groups. Depending on what was written in the playbook or how it is designed, it should be easily possible to run it on in example 10 different nodes (instances) in parallel. So every node would have to deal with 3.5k Remote / Target Nodes instead of 35k. In other words do not let one single node do the work and manage his load, instead just distribute the work load.
    – U880D
    Commented Mar 15 at 18:42
  • Thanks @U880D , I honestly don't even understand why it's not spawning 350 forks for real as it should according to the flag. It's really confusing Commented Mar 16 at 13:31
  • Look at all running ansible (python) and ssh programs running, count them, study their process command line, and see what patterns you can find. Note that if you have serial at less than the default 100%, you may have a remainder of hosts finishing before the next batch can start. Unfortunately the aren't really tools to profile Ansible internals (executor, strategy, worker threads) performance scaling up this far, most users plays are on a far smaller subset. Decide whether to throw hardware resources, or your time, at this problem. Commented Mar 21 at 14:17

2 Answers 2

9

Increase forks

You have significant memory and CPU, so a few hundred forks aka worker threads is reasonable even if it they are heavy on resources. ansible.cfg:

[defaults]
forks = 350

serial is the batch size of the play. Automatically running that many hosts at once, to the end of the play. Delete serial to get back the default of 100%, if you intend only to increase worker threads. serial has other effects, most notably if all the hosts in a batch fail, the play will stop.

Bad comparison: imagine you have a very large project to compile. serial is like dividing it up into smaller targets and other chunks. But its still running with make --jobs=5 so the parallelism is limited. Ansible forks set an upper limit on worker threads.

Measure memory use

Find all the processes started on the Ansible controller, estimate their memory use, and find out how that is angering the virtual memory system. You didn't say your operating system, detailed performance analysis is very platform specific.

For example, if you use a systemd Linux, systemd-cgtop -m will show all sessions and services. Find out the total memory use and if this comes up against cgroup limits.

Ansible runs other programs, not just python. Probably ssh for connections, at one per host per task, which is a lot. In theory these are short lived, but connection lifecycle brings us to the next topic:

use_persistent_connections

Confusingly, use_persistent_connections is not intended for POSIX hosts, do not bother setting it to true. This is for the libssh thing for network gear not the OpenSSH ssh connection plugin for Unix/Linux hosts.

In contrast, ssh_args is used by the ssh connection plugin. The default ControlPersist added will tell ssh to keep a connection going, and subsequent low level ssh connections to that same host skip connection and auth. Normally speeds things up. However, this adds to the number of ssh programs running, so if you quickly cycle through 30k hosts that is a lot of ssh programs running.

Consider altering ssh_args to remove the ControlPersist stuff. Take a hit on per connection overhead, but you don't have quite so many ssh running.

Check that your maximum number of processes or pids is quite large, maybe 60000.

Smaller groups

35k hosts is not the largest size inventory I've heard about, but it is big. Ansible is heavy in many ways, so you may struggle to get plays done fast enough by scaling up.

Consider running playbooks on smaller sets of hosts at a time. --limit can target groups as well, much less tedious then providing thousands of hosts on the command line.

Could make your inventory smart enough to tag hosts in various ways, and generate groups from that. Data center region, availability zone, VM host, hardware generation. Or make up your own group names and slice up the inventory into smaller groups.

With smaller groups, you can run multiple ansible-playbook --limit programs in parallel, possibly with xargs or GNU parallel. Or split the runs between different controller hosts.

push option

Default Ansible concept is running on many remote hosts from a central controller. However, some managed hosts can have Python installed and run ansible themselves. So you could install ansible on every managed host, and have it run on itself, in cron or whatever.

ansible-pull script included with ansible is an example of this. Downloading playbook from version control, and automatically --limit to this host.

This is a very different method of operation, and might not work with what you want to run on managed hosts. But it is an option.

3
  • Thank for your answers! But I'm already using forks, It just doesn't work. -f flag on running playbook is literally the same thing, no? System is pure Ubuntu 22.04, there's nothing running there at all but ansible. Memory consumption, when I was using 4-5 parallel anisble-playbook runs was ~ 40Gb, which is insane. There's no limitation for forks as well -I've tested it with perl script and the number of works it spawned was really huge, much more than 60k . Commented Mar 11 at 8:32
  • I also tried it on smaller groups - my 35k group is the parent one. There's few child groups, which ale smaler. And I tried to run my playbook on them in separate screens - memory consumption was ~40Gb and was constantly growing Commented Mar 11 at 8:37
  • I missed your --forks command line when quickly reviewing your post, assuming everything was in ansible.cfg; it is the same. Ansible execs a fork of a scripting runtime (Python) per host. 30k hosts will be very heavy weight. If you must run 30k concurrently, you might need more memory. Please edit your question with what performance analysis you have done: what indicates memory pressure, if you are CPU bound, how many ssh processes and children of the ansible-playbook thread are going at once, how it compares to a more trivial play that doesn't collect facts and just pings the hosts. Commented Mar 12 at 14:00
3

You should try using Mitogen for Ansible, which replaces the host communication part in Ansible with a different approach, and by their words, increases execution speed 1.25x - 7x and decreases CPU usage by 2x.

I have used it for years in my projects and I haven't had any issues with it.

2
  • Thanks for the advice! Probably will give it a try! Commented Mar 11 at 8:43
  • 2
    Note that mitogen is a fork of Ansible due to modifications to connection code that upstream disagrees with. You may be stuck with an older version, and with limited support options. Commented Mar 12 at 13:39

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .