HomeBlogPapersTechWolf

Animal Farm

17 July, 2020 - 4 min read

For some reason, many concepts in modern DevOps are explained using animals: Kubernetes is a zoo, services are either pets or cattle and if you really want, you can dig deep into animal characteristics to determine what spirit animal each of your deployments has. All of these animal stories tend to push a single paradigm: building your system as a set of loosely coupled deployments, with each being easy to scale as well as easy to replace. Mainly, you want your deployments to be like cattle: easy to manage, easy to replace and easy to move around. Infused with open source technologies, this makes for a rather beautiful dream - whatever you need, it's just a single deployment to your container zoo away.

If anything, this dream is alive and kicking inside our team. We take pride in building a setup that could run anywhere - be it Azure, AWS or a bunch of old servers in our basement (admittedly, we tend to prefer the former two). If our deployments are animals indeed, then we've got a bunch of happy little farm animals living together in ever-more-complex harmony. Especially to fulfil functional requirements, this approach works pretty well. There's a catch though: with the array of non-functional requirements for modern enterprise software, this setup can grow to be a substantial burden on your organisation. Take security for example - it's clear that it needs to be done right, but anyone wishing to handle encryption, customer-controlled key management, identity management and many other aspects themselves will find that implementing everything yourself comes at a cost.

At this point, you'll typically discover that your cloud provider has built-in functionality to help you out with pretty much any of these challenges - in fact, any of the cloud giants can typically help you wrap your functionality to provide security, reliability, performance... For any company wishing to move fast, the advantage is undeniable: why would you implement per-customer API request throttling yourself when Google has APIGee and Amazon offers an API Gateway? The reduced complexity and cost make this decision feel like a no-brainer, and aside from fulfilling your essential requirements, you get hundreds of additional features as a bonus. Most of these X-as-a-Service services are even based on open standards and plug right into your existing setup. Don't be mistaken though: the goal of this openness is to welcome you into the ecosystem with minimal friction, but you won't find yourself leaving it as easily. As soon as you start using one of these services (at that point in time, in such a way that you can still replace it with an alternative offering relatively easily), you'll find that the extended offering is constantly pulling you towards deeper integration. Why would you pay to keep a server running behind your API Gateway if you can use an AWS Lambda function? Why would you manage your own model training when Google can turn training and deployment into a simple one-click pipeline? There's a fallacy hard at work here: if you're already using the base service, you feel like leveraging an additional feature can't hurt. However, with each additional feature you adopt, you'll find yourself straying further away from the ideal of having a flexible setup where things are easy to replace. Your actual functionality might still be cattle, but everything around it gets bolted down very firmly.

As long as you're fine staying with your cloud provider forever, you probably won't feel the disadvantage of this situation too much. However, vendor lock-in is never risk-free, and if you take it upon yourself to avoid it, you'll find yourself fighting the urge to optimise locally every step of the way.