Autonomation in web systems

Yes, I got the spelling right. Autonomation is a manufacturing methodology originally developed by Toyota (part of the famed Toyota Production System). The idea is sort of like this: you have a machine on a production line; this machine is supervised by a human and is feeding its output to some other machine down the line. Occasionally something goes wrong, perhaps a defective product by the machine. The machine stops itself immediately and the human will fix/change the machine (not the product, the machine!) before restarting it.

Lets break it down to principles:

  1. Every machine must monitor itself or have a monitor attached
  2. Every product is validated
  3. Stop immediately after finding an abnormality/defect
  4. Systematically fix problems as they occur
  5. Never push a potential problem down the line

Sounds simple, right? In fact, this is very hard psychologically - we usually apply some band-aid solution to the product and note the systematic problem for “future” remediation. We validate/monitor the end products and our Q/A usually start way after code was written. Bad data propagates through the system and we do not stop production, EVER.

So is Autonomation wrong for the web industry? Personally I think there is a lot to learn from it and I believe it should be adopted at the very least in infrastructure automation. Take a closer look at the way autonomation affects our work cycle and perhaps you will be convinced:

  • Autonomation relieves the human supervisor from constantly watching over the machine
  • Stopping production forces us to fix the problem systematically, we cannot delay this to a future that may never come. It thus blocks technical debt from forming.
  • Catching problems early increases the reliability of the end product as well as reducing wasted production resources.
  • Systematic improvement is built into any machine.

For automated infrastructure this means we prevent automated control systems from spiraling out of control due to feedback loops. It also ensures quality improvements in parts of the system we are likely to otherwise ignore because we don’t view them as important (scripts, anyone?).

The method can be used (to a limit, we can’t stop production entirely) in our business code as well. Think about writing your code such that each piece of data is verified on read from the database; every web transaction response is validated for sanity and every error results in a critical bug being assigned to a human operator. There will be a big initial overhead, but gradual improvement in quality is ensured; no more 0.5% weird/slow/erroneous transactions lingering in the system for ages.