Single point of truth

31 Aug 2016, 21:29

automation

Something I’ve been pushing (and this is pretty much a truism amongst anyone who’s looked at “Cloud”) is the idea of automation. It doesn’t matter if you’re just treating the cloud as an outsourced datacenter or if you’re doing full 12-factor dynamically scalable apps. Automation is the key to consitency and control.

So, ideally, this means your automation system is the “single point of truth” for your estate. Whether you use ansible or chef or (saints preserve us) cfengine, your configuration file explicitly defines your target state. You can learn everything from that.

But is this true?

It’s nice in theory but, as is always the case, practice may be different.

Your source of truth may contradict itself.

Now cfengine is easy to see; one promise could say “X is true” and another promise could says “!X is true”. cfengine will complain that these rules don’t converge (assuming anyone reads the logs) and your server is in an unknown state. This is simple.

But there’s a more subtle failure mode.

Let’s say we use ansible to build our environment. The build process calls a sequence of playbooks to take your machine from raw state through to final configuration. So far, so good.

Now let’s say each playbook should be in its own git repo; after all, the playbook that installs and configures apache doesn’t really need to impact the playbook for postfix. It makes sense to seperate out these playbooks into different areas; different teams may be responsible; different access controls can be applied (you don’t want the SMTP team to impact your web servers).

OK, that’s a contrived example, but you can see how it goes; the team building out your Postgres database automation shouldn’t necessarily have the ability to change the configuration of your OpenLDAP servers.

But here’s where things get complicated…

Sometimes there is overlap. Your apache automation may configure the addresses of your single sign on servers. Your nginx configuration may require the same data. If they’re in different repo’s, then how do you ensure consistency?

Your single point of truth (“this is the single signon server”) may not be consistent.

Iteration

There’s no simple answer to this. How you factor your code repositories, how you factor your automation, how you build systems will evolve over time. But be aware; if you define a variable (“single signon server”) for one playbook, maybe it’s also useful elsewhere? Define a global namespace?

Laziness

I spotted this in my own tooling. I have a script that will build my DNS and DHCP configuration. Given an entry in a config file it will build A, AAAA and PTR records for the machine.

I noticed, today, that one of my domains isn’t controlled this way. It has an A and AAAA record that’s hard-coded. I’m sure this will bite me in the bum down the line (when the primary server fails and I need to failover to a secondary). Will I remember this? Or should I fix my automation. The answer is obvious…

or is it?

Iteration

Laziness