What About Ops?

Josh Kite
Apr 29, 2019
5 min read

Poor Operations. Regardless of the industry, the Operations group often seems to be forgotten stepchild of the organization. Perhaps this is why the Chief Operations Officer position was first created, because Ops was forgotten so much while bearing so much of the burden that everyone finally agreed that they deserved a seat at the table. And they have one for the business operations part of the organization that typically provides customer service and similar functions. Unfortunately IT Operations seems to always struggle to get the same respect while Delivery (Development) receives all of the attention. Some organizations are even cutting out IT Operations altogether without a clear understanding of whether this is the right thing for them to do.

In order to think about what to do with IT Ops, we first need to think about what's required to get software code into production in the first place. Let's consider a typical Sprint cycle and its corresponding Waterfall counterparts. During the previous Sprint the team worked with the Product Owner to refine the Backlog (we used to refer to this as grooming, but that term is long past outdated). In a Waterfall world this is gathering the requirements. Then we move on to Planning followed by the Dev/QA cycle. Once the work is accepted, then it moves to Deployment and the Release.

The Scrum Delivery Model ©2019 PIVOT Agile

But what then? During the days of giant single-shot projects (necessitated by project-based finance and long approval periods), the Solution is thrown over the wall to Operations/Support who is responsible for fixing defects (that they did not create) and possibly capturing a list of enhancement requests that they "may" address if they have enough time or which will get passed along to the Project Sponsor who hopefully will bundle them up into a big enough bundle to support the next Business Case which just might get approved so the Solution can be upgraded. Delivery team that built the Solution is, of course, disbanded and no longer available to assist with any problems that they created when they built the Solution.

But what if there was another way?

Suppose that instead of funding Project after Project the organization instead funded a set of Delivery Teams that stayed together for a long time. Their primary purpose could then be to build new Solutions while also providing capacity to enhance or repair the previous Solutions they delivered.

In this scenario Operations still exists, because there is still a need for common Tier 1 Support like a Help Desk to ask the classic question.

IT Crowd - have you tried turning it off and on again - https://giphy.com/gifs/the-it-crowd-chris-odowd-F7yLXA5fJ5sLC

And it still makes sense to have a Solution-Oriented Tier 2 Support Team that can perform deeper level trouble shooting and some configuration resolution. Depending on the nature of the Solution it might make sense to have additional levels of Support (Networking and Infrastructure, "How-to," etc.)

But what happens in the Scrum world if that last level of Support cannot resolve the problem? The answer is relatively simple - it returns to the Team that created the problem to begin with! That team can resolve the problem more easily because they know the code, AND it builds in accountability since they are troubled by their own defects.

And since Scrum has this idea of a standing team with a priotized Backlog, it also makes sense to proactively Monitor the use of the Solution both through automated telemtry AND through regular conversations with the users, to see what enhancement ideas come up that should be put into that Backlog. (Remember that a Backlog Item is nothing more than a promise for a future conversation; the Backlog should be regularly pruned of ideas that aren't worth the investment.)

Full Agile Lifecycle Model - ©2019 PIVOT Agile

At this point the question usually comes up, "What is the process for Severity 1 issues?" The nice thing about this approach is that the process is actually exactly the same. That is, if something comes in to Level 1 through Level N support and it truly meets the Sev 1 criteria (which you have defined, right?), then it goes into the team's Backlog. The difference is that everyone involved has agreed up front that Sev 1 issues automatically take precedence over anything currently in development, so the Team quickly reprioritizes their work and immediately does what is necessary to restore the Solution to a working state. (This level of priotization absolutely requires clear criteria for each category of Problem, otherwise the teams will never get anything done.) All other issues are brought to the attention of the Product Owner who may decide that a Sev 2 issue is important enough to disrupt the current Sprint or may decide that it can wait a few days until the next one. Obviously the Sev 3 and 4 issues may sit in the Backlog for one or several Sprints or may never be addressed at all since the Product Owner recognizes that resolving Defects is not free (even when outsourced, there is always a cost to resolving them, even if it is just in terms of Change Management on the Business side) and he or she can determine whether new functionality is higher priority than addressing a defect that may affect one user or is merely cosmetic.

So, we should keep Operations, right? Yes, but we should right-size it through the classic criteria of automating anything that is dumb, dirty, or dangerous. • Manual deployments are dumb AND dangerous. There is no reason for Operations to babysit a deployment, and manual deployments are historically rife with errors • Unversioned code is dangerous because Developers can easily re-introduce bugs, some of which may be critical or security related • Merging code and code builds is dirty work; a good Continuous Integration / Continuous Deployment (CI/CD) Pipeline takes a lot of the dirt out of the entire process • In today's world manually building environments is all of the above because building virtual machines is boring, connecting virtual machines in an environment is dirty, and manually building anything is a security nightmare • Having humans read logs or watch dashboards also fits all three categories, as due to the dullness of the work something is bound to be overlooked, and in many legacy environments, just gathering the information takes more work than should be required These types of changes are a part of the basis for advocating for DevOps which is generally understood as the integration of Development and Operations. A lot of the work listed above can be "shifted left" by moving up the Development Value Stream from Operations to Development.

DevOps - https://medium.com/@neonrocket/devops-is-a-culture-not-a-role-be1bed149b0

The shift to DevOps might require Operations to give up some of its best people to the Delivery organization in return for a leaner Ops team that can focus on aspects which can make Delivery more efficient (like more effective troubleshooting in the Level 1-N support Teams). DevOps is reallymore of a culture than merely a set of practices, and it takes time to develop and mature, but every organization can learn from it and determine which aspects to adopt now. And all of this should make the IT Operations organization much happier.