I have spent most of my professional time in telecommunications company projects. Although both telecommunications and IT are technology-intensive industries, they differ in a fundamental way. Telecommunications services are end products and customers pay for them. IT services represent a means for supporting the products delivered to customers, and customers pay for the product, not for the IT component included in the product. This is the reason why a service assurance practice is much better developed and established in the telecommunications business. But the world is changing, and IT-based services are increasingly becoming end products themselves. Practices for IT-based service assurance can gain a lot if we pattern them on telecommunications practices.
The latest developments in structuring IT organizations, namely DevOps, strive to integrate quality assurance (QA) within a joint development and operational environment. If DevOps is to deliver on its promise of faster, cheaper, and better, it cannot afford the separation of various QA tasks. DevOps needs an architectural foundation to integrate these tasks. A service assurance architecture pattern, described in a recent Executive Report, is such an architectural foundation.
Compared to telecommunications, the complexity of the IT environment is a relatively recent development. Over time, IT development and operations adopted different views on IT, which separated the two functions. Luckily, this separation gave us many concepts and components, which we can use for service assurance. Unfortunately, it also left many gaps, which means that we deliver services and applications, but we manage systems and infrastructure. We can fill these gaps with a little help from the telecommunications field.
Through long-lasting discussions, we have developed an understanding of what IT service is. We can conceptualize it to the point where customers can see the value of an IT component in the service and can blame this IT component when the service is not performing as it is supposed to perform. Software development practices are accustomed to developing services. Unfortunately, under IT operations, most of the focus on services tends to disappear. In general, we operate IT systems — not services.
There are many initiatives, usually married to ITIL, that strive to somehow “measure” services, but, in my experience, they focus on the ability to be reactive, at best, based on weekly or even monthly reports presenting statistics of service performance metrics. We lack the ability to perform real-time validation of service performance, and we have very limited means of being proactive.
One of the main concerns in the DevOps space is that if such a structure is to provide benefits, there has to be a way to measure something, which bears business meaning and is actionable. DevOps promises to be faster and more accurate, but needs those metrics to base its priorities. A concept of service and service performance could settle the discussions on what to measure, but still, it must be done in real time. Knowing that certain service average availability during last month was 95%, although being fundamental information, is not enough in today’s business circumstances; our customers anticipate that we are able to act on symptoms of service quality degradation.
The main area of difference between development and operations is the view of assurance activities. Development is usually done for projects focused on delivery of results, while operations is almost entirely process-based, focusing on stability. Since, in reality, there is not always enough time for transition between these different viewpoints, a knowledge gap is created: development focuses on applications while operations concentrates on infrastructure. And it becomes bigger with each delivery — the more that’s delivered by development, the less stability operations perceives, so that group looks for it by retreating to infrastructure. Development, constrained in time and costs, focuses on the delivery of applications’ core functionality, while operations, responsible for continuity, focuses on infrastructure stability. The more each side is pushed, the more they “entrench” themselves in their extreme positions. As a result, during development, we test functionality while during operations, we monitor infrastructure. These activities hardly have anything in common. From the IT user point of view, such situations look, at best, confusing. During software development, we want to convince users that we aspire to deliver business value. But when they can finally use the software delivered, they only see megabytes and gigahertzs. It looks like an unfulfilled promise.
Although many software development methodologies stress it, in reality we lack a common view on the subject of quality and service assurance activities, consistently applied in both the development and operations areas. The subject of assurance should be brought up during the development period, as well as agreeing on an understanding of what assurance means and how to measure it. The same QA and service areas and their metrics should be then validated in tests during the development period and monitored during the operations period, with appropriate actions taken. I believe this would close, or at least narrow, the transition knowledge gap currently being created.
There are many components capable of taking part in assurance architecture, but, for the same reasons that a knowledge gap is created, they exist in separated spaces of development and operations. There are test frameworks to perform certain classes of tests. We can brilliantly automate unit tests, GUI tests, performance tests, and so on. There are many tools dedicated to system monitoring. We can, out of the box, monitor hardware, operational platforms, and application platforms with these tools. Despite a great wealth of tools, any initiative for service monitoring represents a major effort and investment.
Does this mean that we need some new tools? No, at least not entirely. What we lack is a vital architecture point of view, one that is almost nonexistent in almost every software development project: the maintainability view. Although they are supposed to serve business services, almost none of the IT systems developed are capable of being monitored at the level of their business services. Although built purposely for monitoring, the tools available focus on infrastructure parameters with only marginal support for level-of-business services. The maintainers’ view, although present in what currently exists of service assurance architecture, is not implemented in IT systems.