You are currently browsing the monthly archive for May 2009.

One easy way to get more value out of your build system is to improve one of it’s fundamental outputs: the build number.  Many of us have been using the basic major, minor, revision, increment format for years (I know I did!), but most build systems these days, especailly ones you’ve built yourself, can use any format you like.  A good one that I’ve used in the past is the following:

Template: (Major).(Minor)_(Branch Name)_(Datestamp).(DailyIncrement)

Build Number: 3.4_Prod_20090529.3

Translation: Production Branch, 3.4 release, built on 05/29/09, was the third build of the day.

This provides a wealth of more information than the basic format.  Just from seeing the build number you know when it was built, what major/minor release it is associated with, and what branch it was built out of.  The only piece of information you’re losing versus the old method is the running total number of builds you’ve created since started the project, which is of questionable value anyway.

One issue you may run into if you are using Windows is that it expects the “File Version” value for assemblies to be in the 0,0,0,0 format.   A good way to get around this is to put major, minor, date (with the year removed), and increment in that field and then use the “Product Version” field for your expanded build number.  So, with the example above, you’d end up with a file version of 3.4.529.3 which works great with the file version field.

Advertisements

Looks like Eric is going to be speaking live on Lean Startups at the HP Campus in Cupertino; this is apparently his last scheduled Bay Area speech.

If you had a chance to check out the Lean Startup presentations I blogged about earlier you know this is going to be a can’t miss event.  If anyone ends up going let me know, I’d love to stop and chat for a few.

http://startuplessonslearned.blogspot.com/2009/05/last-chance-to-register-for-lean.html

In this article we’re going to dig into using value stream mapping to find inefficiencies in our build and release pipelines.   A value stream map is a Lean method for visualizing and understanding your processes and is a great starting point to finding and weeding out inefficiences.

How to create a value stream map

The simplified form of value stream maps which I’ll be showing here are actually pretty straightforward to create:

  • Start with the entry point into your process and generate a workflow that ends up with value created for your customers.   Be sure that each of your workflow steps are real work – someone should be actively doing something for each step (waiting for a build, or really any kind of waiting, is not a work step in a value stream because nothing is being produced.)  
  • Next, notate the average time spent doing each step as well as the average time spent in between each step.  Estimates are OK, but if you actually do the experiment and collect some real world data you will probably find some pretty surprising numbers.  If you’re using a full-fledged workflow tool you may be able to get these timings out of your reporting system, but if you’re not tracking this in software then the best way I’ve found to get this data is to use Excel to track all the timings for one or two instances of the process per day for a week or two.  After you’re finished you can average your timings to come up with reasonably accurate estimates.   Be sure to also count the number of times you rework issues and what rework steps are associated with doing so.
  • Finally, go back and notate any rework loops and any other ways to exit the process aside from creating value (e.g., cancelling a feature would be an early exit to a feature development process that does not create value.) Possible flows where no value gets created are huge potential time sinks, especially in cases where you rework an issue or feature many, many times only to leave it on the cutting room floor.

A real world example

Below is an example value stream map I’ve created for a typical build pipeline.  You’ll notice that I’ve mapped an entire defect resolution process from discovery through resolution, and that’s because it’s very important to always look at your systems within their broader context; the CM pipeline doesn’t exist within a vacuum and looking at your systems holistically is really the best way to approach solving systems level problems.

  1. After 3 hours of build testing a defect is found by QA.
  2. An hour goes by waiting as the QA team discusses the issue internally and verifies it is reproducible.
  3. QA spends an hour writing up the official bug report which includes all information needed to fix the issue.
  4. Sixteen hours go by (on average) as we wait for the next change control board meeting where the bug is assigned to a developer.
  5. The developer spends two hours making the fix and checking it in (this includes any unit testing or other pre check-in requirements.)
  6. An hour goes by as the continuous integration system monitors for and pools check-ins.
  7. The build runs for an hour.
  8. QA only tests one build per day, so we wait one day for the fix to be ready for deployment.
  9. The deployment runs for an hour.
  10. After about a half day QA gets to the test in question.
  11. QA spends three hours testing.
  12. QA spends an hour updating the (hopefully) fixed issue with updated information and adding it to their regression suite.  This is where the value is created in this example – the end product has one less critical defect.

For this example rework happens on average once on four out of five bugs submitted.

The value stream map

Mapped out using Visio, the value stream map looks like this:

CM value stream

Metrics

Once you’ve got your value stream mapped out you can find some interesting overall metrics. These are great to keep track of so over time so that you can see how you’re tracking at the macro level, especially since within configuration management it’s easy to get lost in the details.

You can find an overall ideal efficiency by dividing the number of hours spent doing real work by the total number of hours in the entire process, without including reworks.  In this value stream map there are 13 hours of real work happening (for the ideal situation where the defect is fixed on the first attempt) and 43 total hours of cycle time total.   This gives us an ideal efficiency of roughly 30%.  A good target I’ve found to shoot for with build systems is around 20-25% – you never want to get too close to 100% because at that point any new work added to the system will cause thrashing, and work coming into the build pipeline is spiky even on the best of days.  

Another interesting value you can pull out of this is rework cost, which can be found by adding up the total number of hours spent on each rework cycle.  In this example we add 21 total hours each time we come to a fix failed state.  Since we found we average one rework cycle on four out of five bugs our rework average is 80%.  Rework metrics are great resources to get a feel for quality coming into the system and amount of waste being produced based on quality issues.  

Next steps

One of the great wastes that value stream maps can highlight is time spent in rework cycles.   Because in our example we lose 21 hours per rework cycle (and we’re sitting at an 80% rework average) anything we can do to bring that iteration time or rework average down will be a huge win.  If we can increase development time, for example, in order to our reduce rework average then we’ve made a signficant overall improvement through making a change that in isolation would have appeared to be counterproductive.  Keep in mind that any improvements made during steps in a rework cycle have the potential to be gained every time that rework cycle happens, so if you’ve got a high rework average then spending time reducing rework cost is going to give you a lot of bang for your buck, but if your rework average is very low you may be able to exchange a higher rework cost for a lower ideal efficiency.

A great place to start to look for optimizations is by analyzing your “waiting steps.”  These are pure waste (they are idle inventory in Lean terms) and are prime targets for efficiency gains from simply reducing the time spent there.  In our example the most egregious waiting periods are the ones where we are spending two days for the original defect to be scheduled for work, a day for QA to pick up the fix in the next deployment, and four hours for a QA engineer to get a chance to take a look at the bug in the test environment.  Because there’s no real work happening here it’s often easy and cheap to find ways to reduce these waiting periods.

Process steps are also good potential candidates for making improvements, but here things are more complicated because someone is actually doing some kind of productive work and you’ll need to do some investigation into the details to find out what’s happening there.  In this example there aren’t any process steps that jump out as particularly slow, but if our build and deployment times started creeping up it would definitely be worth spending some engineering resources to bring that back down to a level that makes more sense.

Going further, the next steps would be to try to think about your map overall:  

  • What other kinds of timing improvements can you find?  
  • What other teams could you bring on board to improve iteration speed?  
  • Do you really need to do every step?  
  • Does this workflow even make sense now that you’ve written it out?  
  • Is there  a way to reduce efficiency in one area that increase overall effiency?  
  • Are there any more hidden rework cycles in here that can be looked at?  
  • Are there any possible flows where value is never created, and if so, how can these be avoided?  

Summary

Taking the time to analyze your processes using value stream maps can be a great way to look at your existing problems in a new way.  I hope that I’ve given you some ideas on ways to get started looking at your systems in terms of a value stream maps and that they can be as useful for you as they have been for me.

If you find that creating your value stream creates more questions than answers then you’re absolutely doing it right.  Value streams are a great method for highlighting waste and inefficiencies, but they aren’t a method for determining how to fix the problems.  In order to find the way to fix any problems you find you’re going to need to do some root cause analysis, which is a subject I hope to get into in more detail in a future article.

I came across another great video from Eric Ries about Lean Startups.  The build and release pipeline can be the heartbeat to any fast iterating software team (especially one such as a startup), and I highly recommend watching this one in addition to the one I posted yesterday.

In this presentation he goes into detail on some very interesting concepts including split testing, continuous deployment (that’s right, deployment), and using the “five whys” to control throughput of your build system.

For those of us who have done the rollercoaster ride of a startup that looks like it’s going up, up, and away only to crash back down to Earth when the reality of the marketplace hits I think you’ll find Eric Ries’ (of the fantastic blog Startup Lessons Learned) presentation from on Lean startups to be an enlightening video to watch.   If you’ve been there like I have you’ll laugh, you’ll cry, and you’ll also learn a lot about how to take the lessons of Lean and apply them better to your next venture.

The presentation can be viewed here: The Lean Startup at Web 2.0 Expo

The big conference in Miami on Lean and Kanban just finished up.  The best coverage I have been able to find is over at Leading Agile, check it out!

These days I think most of us are sold on continuous integration as a great way to find both code and process defects earlier so we can resolve them before they grow out of control, but did you know that by applying the Lean concept of Stop the Line manufacturing to the build pipeline there are even further gains we can take advantage of? 

First, some history

Stop the Line manufacturing is a technique introduced by Taiichi Ohno (of Toyota Production System fame)  in which every employee on the assembly line has a responsibility to push a big red button that stops everything whenever they notice a defect on the assembly line.  When this was first introduced people couldn’t wrap their heads around it; it was part of manufacturing dogma that the best thing you could do as a plant manager was to keep your assembly lines running full steam as many hours of the day as possible so that you’re maximizing throughput.  His idea, however, was that by fixing inefficiencies and problems as they occur what you’re doing instead of maximizing your existing process is actually proactively building a better one. 

When he put this system into practice he found that some of his managers took his advice and some didn’t.   The managers who implemented Stop the Line had their productivity drop by a shocking amount; they were spending much of their time fixing defects on the line rather than actually producing any goods.  The managers who hadn’t listened thought this was a great victory for them, and I can just imagine them feeling sorry for poor Taiichi Ohno who would be ruined for having come up with such a horrible and wasteful idea. 

Before long, however, something strange started to happen.  Slowly but surely the managers that had spent so much time fixing defects instead of producing goods started producing their goods faster, cheaper, and more reliably than their counterparts to the point where the caught up with and then exceeded the lines who hadn’t made improvements.  The initial investment in improved process and tools had paid off and Toyota went on to be quite successful using this method.  Even today their engineers and managers share a cultural belief that their job is not actually to manufacture cars but instead to learn to manufacture cars better than anyone else. 

How does this relate to Continuous Integration?

Continuous integration is a technique that allows us to run the build process just like it was a continuously running assembly line; fresh code goes in one end and (after a series of assembly steps) a build that’s ready for a human to test comes out the other.   On its own CI is a big win for a software team over an old style “daily build”, but by adding the concept of Stop the Line manufacturing to our continuous integration process we can really take things to the next level.

Some real world examples

A typical reason a release engineer might stop the line is because build or deployment times are slowing down.  The difficulty for the build engineer trying to speed things back up is that there are quite a few elements outside of his or her strict control, but by using a Stop the Line approach you can say OK, we’ve passed a critical threshold here and as an organization we need to stop what we’re doing to figure out how to fix this new constraint.  Let’s get our SDETs looking at improving unit test speed, let’s get our developers with the build engineer to look at reducing the number of configurations we’re building and to find some general compile-time optimizations, let’s get our systems engineers looking at our build and deployment machines, and let’s get our DBAs looking at the database deployment and servers to see what they can do.  This is a much more proactive and holstic approach to managing this problem than simply expecting your release engineer to manage this on his or her own, or worse yet not dealing with it until you really do need fast iterations.

Aside from build times, how many times have you come in in the morning to a broken that has been firing off failures all night thanks to a late night, last minute check-in?  One approach that a lot of teams take to handling this is to embarrass the person who broke the build; he gets to wear a goofy hat, gets called out in a daily meeting, or something along those lines.  In actuality, however, these types of defects are almost always systemic to the way people are working.  With a Stop the Line approach you’re not just accepting that Joe Schmoe is a slacker who broke the build and instead you’re getting a focused strike team together to look at what really caused the problem.  Maybe there’s no easy way for a developer to run a quick set of tests as a sanity check before check-in.  Maybe Joe is being asked to work on a component he’s not familiar with and needs some guidance.  Maybe the component itself is overly complicated and could use some refactoring.  Whatever the case may be, it is unlikely that the long term solution is to just blame Joe and call it a day.  If you stop the line to look at the situation in depth you can really take steps to understand what is happening at the systems level.

Summary

The key with Stop the Line is that when you find a defect that you stop the build pipeline and gather the important stakeholders together to look for the root cause.    In too many companies build system defects will continue to pile up for far too long as people work around them, only coming to a head when the build time and reliability becomes so outrageously bad that it’s affecting productivity organizationally, or when you’re late cycle and trying to push as many builds as you can through your now highly dysfunctional pipeline.  What adopting Stop the Line allows you to say as a manager is that you’re not going to tolerate that slow, silent accumulation of issues that are going to come back to bite you later; that you’re building an organization that isn’t just building software but is committed from the beginning to actively learning how to build software better. 

In closing I want to share a couple caveats with you.  It is important to adopt the Stop the Line mentality early so that you can reap the rewards of your work later when you really need it.  It’s much easier to stop everything and fix the pipeline when it isn’t already completely broken and you’re in danger of slipping.   Also, you can’t be afraid to dedicate some of your most talented people to do the systemic root cause analysis; they’re going to be able to identify and fix the problem faster and more effectively since they have the skills, experience, and are empowered to do so.

In my next article I’ll be discussing how to use value streams to analyze your build pipeline.  Value streams are a great way to figure out where you’re spending your time when you’ve hit that critical iteration time constraint, stopped the line, and need to figure out how to get back on track.