Building a MongoDB Cluster with Vagrant and Ansible

If you’re interested in Vagrant, Ansible and MongoDB, then you might find this repo a colleague and I have been working on interesting. The Vagrant setup launches 5 Centos 7 machines and the Ansible playbook uses them to create a 5 node replicaset on MongoDB v3.0 using the new WiredTiger Storage Engine.

https://github.com/a-h/ansible-mongodb-cluster

I forked it off the [0] base, but it’s stale and we’ve also modified it quite a bit to match our requirements.

https://github.com/ansible/ansible-examples [0]

Upgrading from CentOS 6 to Centos 7

Aside from the basics, like updating repository locations, the init scripts needed reworking to become systemd service files.

I used the example service file from here: [1]

http://tom-chapman.uk/2012/12/28/installing-mongodb-on-a-linux-distro-using-systemd-instead-of-inittab/ [1]

Centos 6 uses iptables, whereas Centos 7 uses firewalld, so the firewall configuration also needed reworking. One point that we noticed was that when firewalld starts, at some point later it will temporarily refuse / drop network connections. Ansible will then generate this error:

{'msg': 'FAILED: [Errno 61] Connection refused', 'failed': True}

Talking through ideas with a colleague, and discarding the “pause” option, the easiest thing to do was to move starting and enabling firewalld from being part of the Ansible playbook into the Vagrant base box provisioning. This stopped our Ansible playbook from collapsing partway through the first run with a connection error, but then magically working fine the second time around.

Running On Virtual Machines

The base Ansible playbook was designed to run all of the MongoDB instances on a single box, but I wanted to test power cycling machines, so I added a Vagrantfile setup to run each MongoDB node on a VM and cut down the complexity of running all of the different instances on non-standard ports.

One thing to remember with VMs is to enable ntpd on the base boxes, otherwise the base box’s time is set to when it was created. If you destroy and recreate a box, it’s time will be totally out of line with the other machines in the cluster. Plus it’s just confusing when the logs are not in sync with the current time.

MongoDB Configuration

I simplified the cluster configuration to remove sharding support, because I don’t need it for what I’m doing. This removes the need to maintain a mongoc cluster.

Authentication

It’s important to carry out the MongoDB setup in the correct order, for example, you need to start the first node without authentication enabled, add a root user, enable authentication and the security.keyFile in the configuration file and restart the server. At that point, you can bring up more nodes and add them into the replicaset. This took me a few runs of the playbook to get right.

One thing that caught me out is that if you enable the security.keyFile feature to ensure that only known servers can join the replicaset, that implicitly also enables the authentication setting. While this fact is present in the documentation, it took me a while to work out why I couldn’t set-up the first node in the replicaset using the localhost exception.

Replication Setup

On first trying to setup replication with rs.initiate(), I got an error like this:

No host described in new configuration 1 for replica set mongo_replication maps to this node

To fix this, your hosts file should have a localhost alias, looking something like:

127.0.0.1 localhost mongo1

You can see that change at: [2]

https://github.com/a-h/ansible-mongodb-cluster/commit/d1d819b119db9a3a756b3305c3366b191626dd5a [2]

Disabling Transparent Hugepages

WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'. We suggest setting it to 'never'

If you google this warning, you’ll find suggestions to change grub command lines etc., but MongoDB’s documentation for CentOS 7 is detailed here [3] and worked fine for me.

https://docs.mongodb.org/manual/tutorial/transparent-huge-pages/ [3]

You can see how this was implemented in Ansible at the following commit:

https://github.com/a-h/ansible-mongodb-cluster/commit/392b871995d1549fbc590778ded49bacac2c0006

Setting Readahead

WARNING: Readahead for /data is set to 2048KB We suggest setting it to 256KB (512 sectors) or less http://dochub.mongodb.org/core/readahead

Originally, when I set the cluster up, I used the MMAPv1 storage engine, because it’s the default in MongoDB 3.0 (it’s changing to WiredTiger in 3.2). If you don’t adjust the readahead value, then you’ll get this warning on startup.

Reading through the Production Notes gives you the command line, but you then need to run it at the right point in the boot process. To do this, I had to create a “oneshot” systemd service and slot it in before the mongod.service:

https://github.com/a-h/ansible-mongodb-cluster/commit/132c2a5ad7e9175d38670982fd5902f282169379

One thing I learned here is how systemd handles the forking process pid file. The base Ansible example created a directory at /var/log/mongo and added permission for the mongod process to place it’s mongod.pid file there, but this directory was lost on reboot. This stopped the cluster from automatically healing on a power cycle. To fix that, I used the systemd-tmpfiles command to make sure that the directory was present with the following configuration file:

+d /var/run/mongo 0755 mongod mongod

Benchmarking

With the cluster in place, it’s possible to do some benchmarking using YCSB, here’s a couple of example results from a run on my 2015 Macbook Pro.

Loading Data

./bin/ycsb/ load mongodb -p recordcount=100000 -P workloads/workloada -p mongodb.url=mongodb://mongo1:27017,mongo2:27017/ycsb?replicaset=mongo_replication&w=majority

[OVERALL], RunTime(ms), 194789.0 [OVERALL], Throughput(ops/sec), 513.3760119924636 [CLEANUP], Operations, 1.0 [CLEANUP], AverageLatency(us), 3713.0 [CLEANUP], MinLatency(us), 3712.0 [CLEANUP], MaxLatency(us), 3713.0 [CLEANUP], 95thPercentileLatency(us), 3713.0 [CLEANUP], 99thPercentileLatency(us), 3713.0 [INSERT], Operations, 100000.0 [INSERT], AverageLatency(us), 1931.10877 [INSERT], MinLatency(us), 544.0 [INSERT], MaxLatency(us), 3397631.0 [INSERT], 95thPercentileLatency(us), 4347.0 [INSERT], 99thPercentileLatency(us), 8351.0 [INSERT], Return=0, 100000
```

### Workload A: Update heavy workload

This workload has a mix of 50/50 reads and writes. An application example is a session store recording recent actions.

```sh
./bin/ycsb/ run mongodb -threads 16 -p recordcount=100000 -P workloads/workloada -p mongodb.url=mongodb://mongo1:27017,mongo2:27017/ycsb?replicaset=mongo_replication&w=majority
```

```
[OVERALL], RunTime(ms), 2752.0 [OVERALL], Throughput(ops/sec), 363.3720930232558 [READ], Operations, 505.0 [READ], AverageLatency(us), 38837.51683168317 [READ], MinLatency(us), 1022.0 [READ], MaxLatency(us), 342271.0 [READ], 95thPercentileLatency(us), 127999.0 [READ], 99thPercentileLatency(us), 247039.0 [READ], Return=0, 505 [CLEANUP], Operations, 16.0 [CLEANUP], AverageLatency(us), 902.6875 [CLEANUP], MinLatency(us), 1.0 [CLEANUP], MaxLatency(us), 14423.0 [CLEANUP], 95thPercentileLatency(us), 8.0 [CLEANUP], 99thPercentileLatency(us), 14423.0 [UPDATE], Operations, 495.0 [UPDATE], AverageLatency(us), 34045.058585858584 [UPDATE], MinLatency(us), 1333.0 [UPDATE], MaxLatency(us), 353791.0 [UPDATE], 95thPercentileLatency(us), 106815.0 [UPDATE], 99thPercentileLatency(us), 201087.0 [UPDATE], Return=0, 495
```