How Bad Is Your Architecture? Measure it the Agile Way!

It happens to developers all the time. After working on some piece of code for a while, they discover how truly inflexible, counter-intuitive and generally not helpful its architecture really is. Only miracles and truly brilliant developer insight is capable of making new features happen, and just barely so. Fixing bugs feels like a whack-a-mole, and only wonder-coder Joe who has been with the company for 6 years knows how things really work down there.

But then management is not always excited to pursue larger architectural overhauls just because developers say so. Hey, that piece of code has been working just fine over the last 8 years, so why touch it now? And anyways, why on earth should we be spending time on something old that delivers no visible value instead of working on all these new features that we promised to deliver to our customers??

If re-architecting that piece of code requires well above the regular 2-3 day refactoring job and would likely go on for weeks, a developer may need arguments that go beyond the “but it would be so much nicer if…” to convince everyone that more work needs to be invested there.

For the remainder of this post I want to present a practical way of measuring the drag factor of suboptimal architecture. We recently came up with it in my company and are in the process of making it work for us. The focus is on larger architectural issues, usually ones that a routine refactoring is not enough to fix. Although the discussion is in the context of Extreme Programming, it can easily be extended to non-agile environments given the right team dynamics.

The main idea is fairly simple. Upon completion of every story in the XP iteration, an estimate is given to reflect how much the story was “slowed down” by the presence of “bad” or suboptimal architecture. We call this the Legacy Architecture Cost (LAC). If more than one component or area of the product is suspect of needing re-architecting, the LAC is broken down by area/component. After a certain period of time, say 6 months, the LAC total for each component or area is added and analyzed on a Return on Investment (ROI) basis.

For example, if it turns out that over 6 months the team has spent 20 man days because of the suboptimal design of some crusty component that would have taken 10 days to re-architect, that component is an obvious candidate for redesign. However, if the LAC for another component is 5 days and it would have taken more than a month to fix, maybe we shouldn’t roll up our sleeves just yet.

A natural priority list then emerges. Components with the highest ratio of LAC to estimated fix time percolate to the top of the list of thnigs that need to be re-architected first. Of course, the usefulness of the priority list depends on the length of the LAC collection period as well as on how statistically representative it is of the overall software development activity in the team.

Just blindly using the LAC ratio would probably be over-zealous too. Other things can influence the decision of what components to re-architect first, such as current priorities as well as expectation of upcoming work (e.g. are the next few months going to be mostly backend work?).

The idea is to give some empirical and objective value to the drag factor of suboptimal architecture and move the re-design discussion into less subjective territory.

  1. What if the estimate for the fix cost is not known? Well, if it is not known how to fix the design of the component, how can you know for sure that the current design is not actually the optimal one? You just can’t be sure. So unless there is a strategy that has a reasonable assurance of producing a better design, developers shouldn’t be griping about how bad the component is! To get a better estimate of the fix cost, a special research story can be added to an iteration, or a spike can be done to produce a quick proof of concept.
  2. The LAC numbers are too subjective. The idea behind measuring LAC is to produce a rough ROI argument in favor of re-design. Does it really matter if the team spent 31 days instead of 30 being held down by a component that would have taken 10 days to fix? Probably not. A high level of precision may not be needed at the end of the day.
  3. Developers may inflate LAC numbers. Developers usually like building and rebuilding stuff, so they may have a conscious or subconscious inclination to up the numbers and make the case for a re-design look more compelling. This could be a bit of a problem, although it may not be that big of an issue if the relationship between developers and management is a healthy one.

I will make more posts as we continue to use this technique. This is all work in progress for us, so any feedback or tips are highly appreciated!

Related links:

  •  Measuring the cost of suboptimal architecture is closely related to the concept of Technical Debt that Martin Fowler describes and talks about on his bliki.
  • Johanna Rothman gives an interesting approach to paying off testing technical debt in this article.
  • Technical Debt and the Death of Design, Part 1. An excellent article by Kane Mar further describing the nature of technical debt.
Advertisements

3 comments so far

  1. Top Posts « WordPress.com on

    […] How Bad Is Your Architecture? Measure it the Agile Way! It happens to developers all the time. After working on some piece of code for a while, they discover how truly […] […]

  2. Sergiu Rata on

    Thanks for the article, it’s really insightful. We currently face the same problem – the architecture is old and rusty, but we need better metrics to prove the developer’s point.

  3. Chris Balavessov on

    Sergiu,

    Let us know how you guys end up using this methodology and how it works out for you – any modifications you end up doing to fit your circumstances, etc.

    This is a novel approach that we are rolling out, so it would be great to hear other people’s experiences and get a better feel of how it all holds up in practice. My hope is to generate a good discussion and produce a good, reliable practice that the development community out there can use.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s