IT is NOT ops

At the end of DevopsCon I participated in a panel discussing the future of Ops and DevOps. One thread of discussion at that panel followed a statement by Ben Kepes which basically amounted to “in the future, there won’t be any IT Ops guys anymore” (I can’t remember the exact wording, apologies). I was very disturbed by that statement; not because of fear for my profession, but rather because of the lack of dismay in the crowd. To me, this reaction indicated a widespread misconception in the Web industry - the identification of two very different disciplines as one.

Systems engineering is a methodical, disciplined approach for the design, realization, technical management, operations, and retirement of a system. A "system" is a construct or collection of different elements that together produce results not obtainable by the elements alone. The elements, or parts, can include people, hardware, software, facilities, policies, and documents; that is, all things required to produce system-level results

NASA Systems Engineering Handbook [source]

Looking at that NASA quote, it seems clear to me that SE (Systems Engineering) can’t possibly be recognized as IT. Yes, hardware is in there, but so are people, software and policies. SE is about “the big picture”, it’s about interconnecting systems with different modes of operations, concepts and flows, making them work reliably. It’s about defining, supervising, validating and improving processes. It’s a crosscutting discipline that is completely useless when limited to only one aspect of the system.

In summary, the systems engineer is skilled in the art and science of balancing organizational and technical interactions in complex systems. However, since the entire team is involved in the systems engineering approach, in some ways everyone is a systems engineer. Systems engineering is about tradeoffs and compromises, about generalists rather than specialists. Systems engineering is about looking at the "big picture" and not only ensuring that they get the design right (meet requirements) but that they get the right design

NASA Systems Engineering Handbook [source]

Moreover, this applies to Ops and RE (Reliability Engineers) as well. These disciplines are closely related to SE and sometimes considered a subset or overlapping. My personal experience is that people often mistake IT and NOC (Network Operations Center) personnel with Ops, RE and SE. This is not so surprising if you consider that most Israeli web developers work in small start-up companies or medium ex-startups and have never worked or even seen a SE. Plus, the high tech industry sees itself as an elite and rarely turns to other industries. And here’s the catch: Web developers are found mostly in, well, web companies; in contrast, Ops, RE and SE are everywhere! Every major industry has them, exactly because their function is general and cross cutting. Wherever you have systems, you have SE. Everything that moves needs Ops and everything that can fail needs RE. From NASA and Aerospace, through Power plants and infrastructure and all the way to supermarkets. Wherever machines are built, wherever components are assembled and systems operated. The Web industry is no different, it too needs SE, despite delusions to the contrary, because as systems grow in complexity and automation the same issues experienced by every industry are present. These issues are not bugs in your code or broken hardware. They are fundamental flows in processes and design, enhanced and revealed by complexity, feedback between components and neglected hidden dependencies.

For example, consider conflicting naming schemes in your system. Puppet has a “service” resource, but it is very different from Nagios’s service; and both disagree with the developer’s perception of what “service” is. Doesn’t sound like a big issue, right? In reality conflicting concepts and notations cause major damage, I have seen this in Web Ops many times, but let’s again refer to NASA as an example, and this time in a less flattering way: the metrics mixup. Yep, NASA lost a 125 Million dollar satellite because of a metrics mixup (Mars Climate Orbiter Mishap Investigation Board Phase I Report). But you can bet it won’t happen to them again, because their processes have been fixed, their validations now test this and their standards are amended. This is far from being an isolated incident; At the root of so many disasters and technical failures lie malformed processes, hidden dependencies and bad sub-systems interfaces. Don’t take my word for it, take Richard Feynman’s; while investigating the Challenger disaster as part of the Rogers commission he traced the origins of the disaster to bad design processes (top down design of completely new technology) and disconnected management.

nasa

My take on DevOps is that it’s a hyped name for the renaissance of SE and Ops. For more then a decade, software companies completely ignored well known system engineering principles - just read The Mythical Man Month. The principles of DevOps are way older then you might think; measure everything, tools as a way of empowering users, holistic view, shared responsibility, end to end involvement, software engineers touching reality? nothing new here. The only new thing is that someone managed to snick SE and Ops through the back door and convince Devs and IT to get involved. Way overdue if you ask me.

So no, Ops and system will not go away. Ever. As long as you have processes, as long as you build complex systems, as long as humans are involved and as long as systems are running in production you will have Ops and system. I hope to see Op minded devs and SE practices as integral part of the industry. It will mean more people doing Ops, not less.

Interesting enough, the crowd at DevopsCon was composed mainly of developers. There were very few IT or Ops there and if you take a look at the speakers you’ll see that of 16 speakers only 3 are Ops/SE/IT - one from Wonga (based in Ireland) and two from Fewbytes (Israel). Unfortunately this reflects the state of Ops and Systems engineering in Israel. This chronic condition is also why DevOps in Israel tends to be misunderstood as NoOps - in the sense of “let’s get rid of these annoying IT/Ops guys”. Which is why DevOps meetups in Israel can be collectively described as “devs teaching themselves a bit of Ops”.

For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.

Richard Feynman, Appendix F - Personal observations on the reliability of the Shuttle; The Rogers commission report [source]