Reducing the number of personnel with access to the production environment and cardholder data minimizes risk and helps ensure that access is limited to those individuals with a business need to know. The intent of this requirement is to separate development and test functions from production functions. For example, a developer may use an administrator-level account with elevated privileges in the development environment, and have a separate account with user-level access to the production environment.
One of the “hot” things around today is the concepts of Site Reliability Engineering (SRE). I’m gonna be slightly provocative and state that this is not a new thing; we were doing this 30 years ago. Indeed, these concepts go back to where we were when I started out in this industry.
Although, to be fair, there is one new factor.
History Now I’ll be the first to say that my take on history is very much biased by my personal experiences, and how I worked.
One of the exciting parts of the “new world” of cloud is the ability to green field solutions. We don’t have the legacy requirements and so we’re free to do what we want.
Or so the evangelists would have you believe.
The past lingers on The reality is that many people are closer to a brown field environment. The organisation their team is embedded into has a tonne of reporting (“is your machine patched?
Even after all this time I hear statements like “Oh, we can just run our code in the cloud”. This is the core of the lift and shift school of cloud usage.
And these people are perfectly correct; they can just run their stuff in the cloud. But it won’t work so well.
I’ve previously written about lift and shift issues, but here I want to focus on the “resiliency” issue.
“To summarise the summary of the summary; people are a problem” - Douglas Adams, The Restaurant At The End Of The Universe
The above quote is one of my favourite jokes (I’ve used it in a previous post); it highlights how people can complicate any situation. We can try to avoid this by automating as much as possible but, at the end of the day, there’s always a human involved somewhere; even if it’s the team that manages the automation!
Identity and Access Management (IAM) historically consists of the three A’s
Authentication What acccount is being accessed? Authorization Is this account allowed access to this machine? Access Control What resources are you allowed to use? Companies spend a lot of time and effort on the Authentication side of the problem. Single signon solutions for web apps, Active Directory for servers (even Unix machines), OAuth for federated access to external resources, 2 Factor for privileged access… there’s a lot of solutions around and many companies know what they should be doing, here.
A typical cloud engagement has a dual responsibility model. There’s stuff that can be considered “below the line” and is the responsibility of the cloud service provider (CSP) and there’s stuff above the line, which is the responsibility of the customer.
Amazon have a good example for their IaaS:
Where the line lives will depend on the type of engagement; the higher up the abstraction tree (IaaS->PaaS->SaaS) the more the CSP has responsibility.
Something I’ve been pushing (and this is pretty much a truism amongst anyone who’s looked at “Cloud”) is the idea of automation. It doesn’t matter if you’re just treating the cloud as an outsourced datacenter or if you’re doing full 12-factor dynamically scalable apps. Automation is the key to consitency and control.
So, ideally, this means your automation system is the “single point of truth” for your estate. Whether you use ansible or chef or (saints preserve us) cfengine, your configuration file explicitly defines your target state.