In my spare time I’ve been playing on Unix StackExchange. And I’ve found the old song There’s a Hole In My Bucket going through my head. It’s a conversation between Henry and Liza; Henry has a problem and is asking Liza for help.
In summary:
H: There's a hole in my bucket
L: Mend it!
H: How?
L: With straw.
H: But it's too long
L: So cut it
H: How?
L: With an axe
H: But the axe is blunt
L: Sharpen it
H: How?
L: With a stone
H: The stone is dry
L: So make it wet
H: How?
L: With water
H: How do I get the water?
L: In a bucket
H: But there's a hole in my bucket
As problem solvers we’re typically in the Liza role. Someone comes to us with a problem that seems simple (you have a hole in your bucket? Well, fix it!). And it seems frustrating to have to go through really basic remediation steps (surely you know how to cut straw!) and it’s only 10 times around this loop that you start to learn what the real problem is.
This can be even more frustrating if the communication is high latency (e.g. the other person is in a different timezone, such as between New York and Singapore; each part of the conversation could take a whole day!). What should be a simple problem now takes a week to solve.
In an attempt to solve this you write detailed documentation; you try to foresee all possible failure modes and write them down, create a decision tree with steps to follow. And thus begins the disconnect between “engineering” and “operations”. If this goes too far then the ops teams will refuse to support stuff that isn’t fully documented, leading to 100 page handover docs. Ops turn into deskilled button pushers.
This can now lead to security problems; the operations teams don’t fully understand what they are doing, make the wrong choices, take a path of the decision tree that almost but not quite matches their problem and so implement the wrong solution (they put the straw in the bucket at an angle so it doesn’t fill the hole properly; water leaks out).
(This is one reason why “reliability engineering” is becoming popular; an attempt to reverse the trend. Take the button pushing out of the ops role, re-skill the role to the level it should be).
As problem solvers we need to listen and engage more with our operations teams; we need to remove barriers between teams. This isn’t necessarily a “devops” model (in some cases there are regulatory separation of duties requirements that prevent this), but higher bandwidth communication channels, co-located engineering/operations teams, shared responsibility for problems and so on. Encourage a collegiate atmosphere between the teams, rather than an adversarial one which can form when barriers are created (finger pointing between ops and engineering is a common failure mode).
Now we can make sure the right solution is applied to the problem, and the security hole isn’t opened… hopefully!
Next up: how you are the security Itsy Bitsy Spider trying to climb the drain pipe of security and the rain drops are the attackers washing away your work.