The core problem with a public cloud is “untrusted infrastructure”. We could get a VM from Amazon; that’s easy. What now? The hypervisor isn’t trusted (non company staff access it and could use this to bypass OS controls). The storage isn’t trusted (non company staff could access it). The network isn’t trusted (non company…).
So could we store Personal Identifying Information in the cloud? Could a bank store your account data in a public cloud?
Could we even store public data in the cloud? Imagine if $MEGACORP stored their TV promotional videos on Amazon; they become the official promotional URLs. And then some replaces it with some porn. Umm…
So there are different security concerns in operating in a public cloud environment and different mitigation strategies required (some of which may be contractual and with $100mm indemnity clauses; some of which may be technical; some of which may just be restrictions on what we allow to run there).
And then once we’ve got the “cloud management” aspect under control, at the end of the day we still just have a VM. Many companies are already pretty good at building VMs. Automation tools can do this in minutes. But then we have the megaton of layered stuff that takes a month of manual work. It needs to hook into the authorization tools, the inventory management tools, the security scanning tools, the change mangement tools…
You were planning on those integrations, right? You weren’t just going to run a production workload on uncontrolled machines. Heck, you weren’t going to run a DEV workload on a machine that can tunnel through your security perimeter…
Building servers is easy. Controlling servers is hard.
So using Amazon just for “replacement IaaS” (Infrastructure as a service) doesn’t really buy us much. Theoretically many large companies can create and run VMs cheaper than Amazon, especially those servers that run 24x7.
Now Amazon IaaS is great for “short running high compute” apps. Let’s say we have a monthly reconcile process that takes 80 physical machines and runs for 24 hours a month. That’d be cheaper to run in Amazon; we don’t need to pay for 30 days of server we’re not using; just for the one day. Neat! But for a 24x7 server that’s always up? We can (and should!) do cheaper. (Hint: many internal service costs includes layered tools and monitoring and network and stuff; we’d have to pay additional licensing fees and network usage charges and so on, on top of the base Amazon EC2 charge).
But this “run a server only when we need it” process requires a change in how we deploy things. This ‘month of manual work’ just won’t do. Existing monitoring, control and inventory management systems are all based around persistent consistent data (your server is flagged red because your license scanning tool can never reach the server because the VM isn’t running!) So now our existing security and audit and access management tools need to be reconsidered.
How does change management and patching work in a world where the server isn’t ever up? So now our vulnerability management processes need to be reconsidered.
Solutions we’ve built for traditional datacenters, based around relatively static compute (inventory changes take place over days, rather than minutes) may not be suitable for a cloud environment.
Then we have regulators; their requirements aren’t necessarily cloud friendly either. Can we meet regulatory controls and demands in this world?
To me, this says “lift and shift” of existing apps from a corporate datacenter into a cloud environment may not be the best options. But if we spend the time and effort… if we build out the datastructures and controls; if we automate application deployment; if we move away from “hands on keyboard” for operate staff (“No, do not ssh into the server; have tools and processes to collect data upfront and then analyse the data, fix the problem and then redeploy the fixed version”). Basically streamline the development process via automated testing/deployment tools (CI/CD FTW)…
Now we can change our security posture. Do we need central authentication in a world where no one can login? Do we need inventory license scanning in a world where we state what apps are running where? Do we need application and config discovery when server configuration is defined programmatically and never changes?
Not all of these changes are dependent on the public cloud. Some of these things we should also drive in the traditional compute environment. Think how much more productive your code release cycle would be if you had full end-to-end automation processes? Think how much more secure your apps would be if no one can login. You might gain many of the perceived benefits of a public cloud internally just by adopting the same tooling.
These operational changes then change our security stance, simply by taking people out of the equation. We don’t discover the state of the world, we define it and automation ensures it.
Why do we need a detail keystroke log of change activity if no one logs in to do changes? Why do we need a central authentication system when no one can login?
The controls move out to the code management environment. How does code get passed through dev/test/qa/prod? What automated scanning tools are deployed? Do you use a CI/CD process? Can you use test-driven processes (check code in, it autobuilds and deploys all tests)?
This changes how companies report their security to external auditors and regulators. It changes how we deploy security technology. And it can actually reduce “time to market” delays!