Published on 2023-06-18 by GCH

Devops: the funeral

Devops is dead, they say. But the death of Devops is no more than the death of a word. A word – like agile or microservices – that is a tad too open for interpretation. Nobody owns it, therefore everybody can have opinions on it. So can I. Especially now that it’s dead and won’t come back to sue me.

Devops was primarily about reproducibility. One only had to read the classic Snowflake Server text to get into the spirit. The Pets vs Cattle metaphor, from the same era, revolves around similar ideas.

Why is reproducibility even an issue? Well, because real world requirements are not about installing Ubuntu with the default configuration so that hello-world.py can be executed. Reproducibility is an issue because real world systems are hard to reproduce. Manual configuration of non-trivial systems is slow and error-prone; documentation is hard to write and prone to ambiguity; tracking the evolution of configurations is complicated if the configuration is not code based. All of these tasks were part of a subject known as System Administration, which has been practiced at different levels of quality, with different amounts of adhoc automation.

The Devops response to varying sysadmin practices was the use of code based configuration with resort to tools such as Puppet and Chef. This enabled the reproducibility of systems, the tracking of configurations, and a substantial reduction of the amount of required documentation. With configuration stored as code, software development techniques could be applied (versioning, tickets, branches, pull requests, …). By supporting direct management of operating system resources (files, directories, permissions, users, packages, …), Puppet and Chef were the structured replacement of adhoc bash scripts previously used by competent sysadmins. Proper use of these tools allowed for complexity to be packaged and for systems to be managed at a low cognitive load. The effort shifted from repetitive operational tasks to creative automation tasks, thus “dev” ops. This shift meant that systems could finally be managed at scale: once the automated configuration process had been developed, the effort associated to the creation of multiple systems became negligible.

A comparison of effort versus size in traditional IT and Devops

Fast forward a couple of years to the cloud hype era and what do you find? Perhaps the same phenomena I've been finding in multiple projects over the last couple of years: cloud sysadmins who think they are doing Devops; project managers who think that developers with no infrastructure experience can deliver high quality operations; team leaders who think that a “Devops” team of 20 people is the way to go, and that “Ops” can be decided upon by consensus; deployment pipelines that break every 3 weeks; nicely spread across directories, Terraform code which must be applied in a sequence of multiple steps following a specific, undocumented order; creation processes dependent on deployment pipelines themselves dependent on names of cloud resources created by the very creation processes!

The result: unique, irreproducible cloud snowflakes that nobody wants to touch. Just what Devops intended to avoid.

Let’s be clear: if an environment can’t be created/deleted with a single command, and normal changes to its configuration (ex: size) can’t be applied in a simple way, you are not doing Devops. You are a sysadmin and you are living before 2012. No amount of money paid to the yellow cloud provider will change that.

Ah... one only had to read the classic Snowflake Server text to get into the spirit. But why would someone read such an old text if the world is changing so fast?

Maybe because this state of affairs is the norm rather than the exception. Present day tech influencers have decided that Devops is dead and that what matters at this point is Platform Engineering or Site Reliabiliy Engineering. I find the latter particularly interesting: how many times do people have to screw up operations til we need to write reliability before engineering? Since when do engineers aim for anything unreliable? What’s next? Evidence based Science?

Let’s see if Platform and Reliability magically succeed where Dev and Ops have failed; or if, due to the persistence of the same practices, we will soon attend the funeral of another two words.

I’ll be reading the classic Snowflake Server text once again.