These days I think most of us are sold on continuous integration as a great way to find both code and process defects earlier so we can resolve them before they grow out of control, but did you know that by applying the Lean concept of Stop the Line manufacturing to the build pipeline there are even further gains we can take advantage of?
First, some history
Stop the Line manufacturing is a technique introduced by Taiichi Ohno (of Toyota Production System fame) in which every employee on the assembly line has a responsibility to push a big red button that stops everything whenever they notice a defect on the assembly line. When this was first introduced people couldn’t wrap their heads around it; it was part of manufacturing dogma that the best thing you could do as a plant manager was to keep your assembly lines running full steam as many hours of the day as possible so that you’re maximizing throughput. His idea, however, was that by fixing inefficiencies and problems as they occur what you’re doing instead of maximizing your existing process is actually proactively building a better one.
When he put this system into practice he found that some of his managers took his advice and some didn’t. The managers who implemented Stop the Line had their productivity drop by a shocking amount; they were spending much of their time fixing defects on the line rather than actually producing any goods. The managers who hadn’t listened thought this was a great victory for them, and I can just imagine them feeling sorry for poor Taiichi Ohno who would be ruined for having come up with such a horrible and wasteful idea.
Before long, however, something strange started to happen. Slowly but surely the managers that had spent so much time fixing defects instead of producing goods started producing their goods faster, cheaper, and more reliably than their counterparts to the point where the caught up with and then exceeded the lines who hadn’t made improvements. The initial investment in improved process and tools had paid off and Toyota went on to be quite successful using this method. Even today their engineers and managers share a cultural belief that their job is not actually to manufacture cars but instead to learn to manufacture cars better than anyone else.
How does this relate to Continuous Integration?
Continuous integration is a technique that allows us to run the build process just like it was a continuously running assembly line; fresh code goes in one end and (after a series of assembly steps) a build that’s ready for a human to test comes out the other. On its own CI is a big win for a software team over an old style ”daily build”, but by adding the concept of Stop the Line manufacturing to our continuous integration process we can really take things to the next level.
Some real world examples
A typical reason a release engineer might stop the line is because build or deployment times are slowing down. The difficulty for the build engineer trying to speed things back up is that there are quite a few elements outside of his or her strict control, but by using a Stop the Line approach you can say OK, we’ve passed a critical threshold here and as an organization we need to stop what we’re doing to figure out how to fix this new constraint. Let’s get our SDETs looking at improving unit test speed, let’s get our developers with the build engineer to look at reducing the number of configurations we’re building and to find some general compile-time optimizations, let’s get our systems engineers looking at our build and deployment machines, and let’s get our DBAs looking at the database deployment and servers to see what they can do. This is a much more proactive and holstic approach to managing this problem than simply expecting your release engineer to manage this on his or her own, or worse yet not dealing with it until you really do need fast iterations.
Aside from build times, how many times have you come in in the morning to a broken that has been firing off failures all night thanks to a late night, last minute check-in? One approach that a lot of teams take to handling this is to embarrass the person who broke the build; he gets to wear a goofy hat, gets called out in a daily meeting, or something along those lines. In actuality, however, these types of defects are almost always systemic to the way people are working. With a Stop the Line approach you’re not just accepting that Joe Schmoe is a slacker who broke the build and instead you’re getting a focused strike team together to look at what really caused the problem. Maybe there’s no easy way for a developer to run a quick set of tests as a sanity check before check-in. Maybe Joe is being asked to work on a component he’s not familiar with and needs some guidance. Maybe the component itself is overly complicated and could use some refactoring. Whatever the case may be, it is unlikely that the long term solution is to just blame Joe and call it a day. If you stop the line to look at the situation in depth you can really take steps to understand what is happening at the systems level.
The key with Stop the Line is that when you find a defect that you stop the build pipeline and gather the important stakeholders together to look for the root cause. In too many companies build system defects will continue to pile up for far too long as people work around them, only coming to a head when the build time and reliability becomes so outrageously bad that it’s affecting productivity organizationally, or when you’re late cycle and trying to push as many builds as you can through your now highly dysfunctional pipeline. What adopting Stop the Line allows you to say as a manager is that you’re not going to tolerate that slow, silent accumulation of issues that are going to come back to bite you later; that you’re building an organization that isn’t just building software but is committed from the beginning to actively learning how to build software better.
In closing I want to share a couple caveats with you. It is important to adopt the Stop the Line mentality early so that you can reap the rewards of your work later when you really need it. It’s much easier to stop everything and fix the pipeline when it isn’t already completely broken and you’re in danger of slipping. Also, you can’t be afraid to dedicate some of your most talented people to do the systemic root cause analysis; they’re going to be able to identify and fix the problem faster and more effectively since they have the skills, experience, and are empowered to do so.
In my next article I’ll be discussing how to use value streams to analyze your build pipeline. Value streams are a great way to figure out where you’re spending your time when you’ve hit that critical iteration time constraint, stopped the line, and need to figure out how to get back on track.