Posts Tagged “Agile”

Generally, the longer a defect remains undetected in a system, the more costly it will be to fix. I’ve seen this fact proven true over and over but you don’t have to take my word for it, ask Steve McConnell.

I’ve always assumed this was well understood yet many organisations adopt processes, approaches and structures that guarantee certain kinds of defects will be undiscovered for substantial periods of time. One of the more common faults is the separation of Development and Operations.

Each side has its own view of what’s important and what they’re responsible for:

  • Operations more often than not seeks to own non-functional aspects (performance, stability, scalability etc).
  • Development more often than not seeks to own the functional aspects (features).

Such a mindset often leads to a classic process mistake, issues with the functional aspects get dealt with early and all those linked to the non-functional and operational are left unsurfaced until last moment grand testing regimes (P&C, User Acceptance Testing) dig them out or worse, are discovered at the point of release into production.

The warning signs are usually there if only we paid attention to them:

  1. Developers work in isolation building, deploying and configuring the components they develop in ways that suit them. It follows that deployment and configuration are not optimised for production and do not account for any hard won operational experience.
  2. Operations staff demand huge handover documents be written by developers and passed over with the product. Inevitably the documentation fails to account for operational concerns (what would a developer know about operations?)
  3. There are separate environments for the purposes of validating correctness and accuracy of handover documents. After all developers can’t be trusted to get the documentation right so it must be checked.
  4. The development environments are ad hoc with no resemblance to production (certainly they aren’t a scale unit of production). Leading to large numbers of problems at release time: files can’t be found, configurations are broken and various versioning issues present themselves.

The antidote is relatively straightforward, all development activity should be performed in a production like situation. For example:

  1. Deployment and configuration of software components under development should be routinely performed by operational staff. The result is early knowledge transfer and the documentation can now be written by those best able to produce it (operations staff, not developers).
  2. Development environments should contain appropriate network topology. Often production setups contain segregated networks for security or availability reasons. Ensuring developers are exposed early to these issues means software is more likely to account for these demands.
  3. Monitoring and logging infrastructure should be as per production and used routinely for debugging and capture of data relevant to testing (performance, failure etc)
  4. Development environments should be scale units of production. This permits early production-like performance testing. This should be backed up with routine robustness testing e.g. to identify memory leaks early.

A typical reaction is for development and operations staff to say this cannot possibly work and will slow development to a crawl. They aren’t actually wrong but they’re missing a key insight:

If development has slowed to a crawl it’s an early warning of future production troubles.

For example, if deployment is taking too much effort and time, something needs tweaking, simplifying or automating. What we’ve done is best summarised by a proverb from Toyota (via Eric Ries):

“Stop production so that production never has to stop”.

We’ve created a feedback loop that highlights defects spanning all concerns (functional, non-functional and operational) early which keeps costs down.

Clearly, delivering a given feature will take a little longer as we must account for all aspects from functional through non-functional and operational. That’s acceptable because if we don’t cover all these aspects we’re asking for trouble in many forms including:

  • If we cannot adequately monitor the performance of a newly delivered feature there’s a direct impact on customer experience. They will know before we do that something is broken which leads to irate phone calls, lost revenue etc.
  • If we cannot adequately track the effect of a new feature on customer behaviours, we cannot evolve it appropriately.

Needless to say developing features in this fashion fits well with lean and agile approaches.

So the antidote is relatively straightforward and there are development approaches that fit well with what needs to be done. The toughest challenge remains though, effecting the necessary mindset shift to get it done. It ought to be a little easier with the rise of DevOps but notably there are early signs of trouble as has been seen with lean and agile adoption.

There are many who claim to know and practice each of these disciplines but most are paying only lip service, picking out the bits of process, mindset or tooling that suit them and ignoring the rest.

Sporting Index is right in the middle of making this tricky jump from Dev and Ops to DevOps, I’ll let you know how we get on.

 

Comments Comments Off

We’ve all seen it, customers change their requirements, add a few more features and yet expect the project deadline to stay the same even though there are no additional resources.

For some reason they act as if a software team has infinite, cost-free capacity. The psychology that drives this behaviour is somewhat unclear because there are various potential motivators such as political ambition, naivety or willful ignorance.

One might expect to see this problem occurring in waterfall projects but it can also plague early agile projects. Typically the backlog grows and grows, the customer has a desired release date in mind and expresses horror when it becomes clear that the whole backlog cannot possibly be implemented in the timeframe (accompanied by cries of “but I followed the process”).

It shouldn’t be possible to make this mistake given real-world experiences. For example:

We put our car in for an oil change, we get a quote for cost and an estimate for how long the work will take. We drop the car in at the garage and then a little later phone up and request additional work such as fixing the air-conditioning, replacing two tires, sorting the exhaust and swapping out the brake pads. Not for a second do we entertain the idea that the cost and time for the work will be the same as originally quoted.

Yet we still persist in the notion that a software development team is a bottomless pit of resource.

Comments Comments Off

It’s tempting when trying to be customer-centric to focus on delivering lots of functionality quickly. Supposedly features win the race and can increase revenue, but is that all that matters? Evidence such as the troubles Twitter have had in the past and this anecdote from Google about search time suggests there are other qualities of our website that matter like:

  • Service charges
  • Responsiveness
  • Availability
  • Quality of interaction

Whilst these qualities are all about the customer experience, success in maintaining them at an appropriate level is related to how well a company performs internally:

  • It’s undesirable to be charging excessively to cover development inefficiencies caused for example, by a tightly coupled architecture that makes even a small change a multi-month death-march.
  • A service that runs slow at peak times due to insufficient focus in our architecture and code on performance and scaling, appears sluggish or even down which can drive customers away.
  • Prolonged outages as the result of trivial problems occurring that take operational staff excessive time to fix because of poor monitoring and diagnostic tools, will impact customer satisfaction.
  • If we routinely rollback upgrades or they’re brittle or bug-ridden we will negatively impact the quality of interaction.

Thus being more customer-centric requires a company to quantify it’s performance and work to improve it. In the case of the examples above, things like response time, site downtime, number of failed upgrades, time to perform a release, bug counts and feature count against cost of delivery can be used as metrics to indicate how we’re doing in our mission to make the customer happy. Methods for improving these metrics though not always easy to apply are relatively well-understood and include:

  • Ensuring architecture/design includes well-defined interfaces, avoid integration via databases etc.
  • Considering scalability: how many machines can be thrown at a problem and are they used efficiently? Essentially, balancing horizontal-scale and straight-line optimisation.
  • Removing computation from the critical path to generating a user-response e.g. use asynchronous methods.
  • Publishing software and hardware telemetry, gather it all up (using the right infrastructure) and perform appropriate analysis via tools etc.
  • Focusing on simplicity, isolation of components, failure tolerance, in-live testing, versioning and the ability to rapidly rollback.
  • Applying an appropriate testing regime.

Ultimately everything a company does internally has implications for customers. This includes what might normally be notoriously subjective such as, for example technology selection. In this particular case we ought to test the technology and assess the effect on relevant metrics to verify that it does provide meaningful benefits. Also as most technology has it’s downsides, we can quantify these too and ensure there’s an appropriate trade-off.

Comments Comments Off