Sep 112011
 

Last February I developed an Ear, Nose and Throat (ENT)  problem that placed me squarely in the category of “interesting patient” (as one of the physician I saw told me with a wry grin). Just at the point the number of medical specialists I had to consult grew to the level that my medical insurance started suspecting a fraud, I reached the conclusion that while nothing is too wrong with any single organ, I am probably struggling with some from of a system problem. Since then I have been known to quip that henceforth Jerry Weinberg will be the only “physician” whose help I would seek..

Imagine my delight getting the thoughts captured below from Ernest Mueller. Drawing upon the research of Richard Cook, M.D., Ernest emphasizes the similarities between medicine and computer science. With Ernest and Richard supporting me, I felt totally vindicated in my carefully thought through strategy for choosing physicians….

On a more serious note, Ernest nicely captures the real nature of IT Operations nowadays, viewing it as an integral part of an indivisible spectrum from user needs to fulfillment. In the course of so doing, Ernest breaks a few metaphorical walls and shatter some myths that have hurt the IT industry for long time.

Here is Ernest:

There is a common fallacy I often hear regarding IT operations work. People describe development as the “creative end” of technology work, and say that operations in the perfect world would be completely automated. It’s all simply rote work, and that is what computers were created to obsolete in the first place.  I consider this view to be quite incorrect.

I contend that it takes originality and ingenuity to keep a complex system running. It’s a false “Jetsons” inspired view of the future to think that all complex activity can be reduced to pushing a button.

Early in my tenure at National Instruments (NI), an application architect was trying to get my systems team to work on a pet project of his, which was not prioritized, leading to some contention. As the conversation became heated, the developer said “Well, I don’t know why your group exists anyway.  All you do is move files around all day!” And this is what some people think operations is, just moving files around all day. Needless to say a developer’s job can be similarly trivialized. Someone else’s job always seems like it should be simple to one who doesn’t do it.

It reminds me of the apocryphal story about the engineer who thought a moment and marked the problem area with a chalk “X” and submitted an invoice for $10,000, and when asked to itemize the invoice, allotted $1 for making the mark and $9,999 for knowing where to place it.

Some of the issue, I think, is that certain tasks are done by different people in different shops.  In some shops, all systems engineering work is done by “the dev organization” and Ops is relegated to answering tickets and taking pages.  In others, the devs are hapless functional-code-writers only and testing, application performance management, security, and architecture falls to Ops. (NI was like this when I started there.)

But even if you remove all the systems engineering work and just leave pure troubleshooting, the work of a modern Operations person begins to appear a lot like that of a medical doctor. The complexity of these systems defies a purely procedural approach, and invention and intuition are extremely valuable in curing the patient. As an illustration, take the seminal essay “How Complex Systems Fail,” by Richard Cook, M.D., a medical doctor with the University of Chicago. This essay was reprinted in the O’Reilly Web Operations book because its insights on caring for the complex system of patient safety are directly relevant to the practice of Web operations as well! “Complex systems require substantial human expertise in their operation and management,” contends the essay. Caring for a complex system is a continuous exercise in risk management and the continuous creation of safety (in our case, uptime) can require “novel combinations or de novo creations of new approaches.”

You certainly can try to perform operational tasks without any ingenuity or creative thought. And the fact that many organizations promote that way of thinking explains why in general the state of operations remains unsophisticated in many ways.  I personally have spent many years leading operations teams with highly skilled individuals continuously striving to apply new and ingenious solutions to the problems that face complex systems.

In the end, the problem is false assumptions about the nature of things.  It’s one of those theoretical scientist vs. engineer mindset differences – developers tend to “assume a frictionless environment” due to their largely theoretical CS backgrounds. The default state of a complex system is not “safe.” To quote Dr. Cook again, “Complex systems are intrinsically hazardous systems.” Changes aren’t safe to make by their nature, they are risky. You don’t have any idea what a complex system is doing without thoughtful instrumentation and analysis of what those instruments tell you. You cannot correct problems without a sound scientific method – hypotheses and tests. All these activities are an exercise in human ingenuity, just as engineering or medicine are.

My team has certainly labored to automate all the routine work – environment provisioning, code deployment – but that has simply freed us up to innovate even more.  Cloud computing, for example, has made system provisioning a lot easier, more like the goal of “waving your magic wand,” but it is naive to think that level of convenience is the default state of the universe instead of the result of continuous and sustained innovation.

Bio: Ernest Mueller (ernest.mueller@ni.com) has 18 years of IT industry experience in large and small organizations. He has been with National Instruments (http://www.ni.com) for the last nine of those years, and in 2009 moved into the LabVIEW R&D group at NI to serve as the Web systems architect for its new cloud-based SaaS products. He is active in the DevOps movement and helped found the the Austin Cloud Computing Users Group (http://groups.google.com/group/austin-cug) and the Austin chapter of OWASP (https://www.owasp.org/index.php/Austin). He also blogs with several of his colleagues at the agile admin (http://theagileadmin.com/). Ernest resides with his daughter, Aoife, in Round Rock, Texas.

avatar

Israel Gat

Israel Gat is Director of Cutter Consortium's Agile Product & Project Management practice and a Fellow of the Lean Systems Society. He is recognized as the architect of the Agile transformation at BMC Software. Under his leadership, BMC software development increased Scrum users from zero to 1,000 in four years. Dr. Gat's executive career spans top technology companies, including IBM, Microsoft, Digital, and EMC.

Discussion

  One Response to “Originality and Operations”

  1. I’m highly motivated to say “right on” to Ernest Mueller’s observations about the all-too-typical expectations of IT Operations …by not only IT Development organizations but also by business users of IT. IT Operations is definitely in a squeeze position being sandwiched between those who develop the application software solutions and those who actually use the solutions …and carrying responsibility (…but little authority) to ensure satisfaction.

    It seems common sense that what is introduced by the development community into the “IT system” in only going to increase the complexity of that system and that unpredictable things are going to happen. Yet the behavior of both development and business is that it’s an IT operational problem …and has nothing to do with the inappropriate usage by the business users or the inadvertent complexity introduced by the developer.

    Sometimes a picture comes to mind of an elegant residence (software application) that’s in danger of collapsing and the wealthy dowager (business user) who owns the property and the creative architect (application developer) are demanding that the builder “just fix it” …without recognizing that the unreasonable expectations of the owner pushed the architect into designing something that jeopardizes the inherent structure of the house (IT system) …and that the builder needs assistance/involvement from both owner and architect to resolve the dangers.

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>