Author Archive

Generally, the longer a defect remains undetected in a system, the more costly it will be to fix. I’ve seen this fact proven true over and over but you don’t have to take my word for it, ask Steve McConnell.

I’ve always assumed this was well understood yet many organisations adopt processes, approaches and structures that guarantee certain kinds of defects will be undiscovered for substantial periods of time. One of the more common faults is the separation of Development and Operations.

Each side has its own view of what’s important and what they’re responsible for:

  • Operations more often than not seeks to own non-functional aspects (performance, stability, scalability etc).
  • Development more often than not seeks to own the functional aspects (features).

Such a mindset often leads to a classic process mistake, issues with the functional aspects get dealt with early and all those linked to the non-functional and operational are left unsurfaced until last moment grand testing regimes (P&C, User Acceptance Testing) dig them out or worse, are discovered at the point of release into production.

The warning signs are usually there if only we paid attention to them:

  1. Developers work in isolation building, deploying and configuring the components they develop in ways that suit them. It follows that deployment and configuration are not optimised for production and do not account for any hard won operational experience.
  2. Operations staff demand huge handover documents be written by developers and passed over with the product. Inevitably the documentation fails to account for operational concerns (what would a developer know about operations?)
  3. There are separate environments for the purposes of validating correctness and accuracy of handover documents. After all developers can’t be trusted to get the documentation right so it must be checked.
  4. The development environments are ad hoc with no resemblance to production (certainly they aren’t a scale unit of production). Leading to large numbers of problems at release time: files can’t be found, configurations are broken and various versioning issues present themselves.

The antidote is relatively straightforward, all development activity should be performed in a production like situation. For example:

  1. Deployment and configuration of software components under development should be routinely performed by operational staff. The result is early knowledge transfer and the documentation can now be written by those best able to produce it (operations staff, not developers).
  2. Development environments should contain appropriate network topology. Often production setups contain segregated networks for security or availability reasons. Ensuring developers are exposed early to these issues means software is more likely to account for these demands.
  3. Monitoring and logging infrastructure should be as per production and used routinely for debugging and capture of data relevant to testing (performance, failure etc)
  4. Development environments should be scale units of production. This permits early production-like performance testing. This should be backed up with routine robustness testing e.g. to identify memory leaks early.

A typical reaction is for development and operations staff to say this cannot possibly work and will slow development to a crawl. They aren’t actually wrong but they’re missing a key insight:

If development has slowed to a crawl it’s an early warning of future production troubles.

For example, if deployment is taking too much effort and time, something needs tweaking, simplifying or automating. What we’ve done is best summarised by a proverb from Toyota (via Eric Ries):

“Stop production so that production never has to stop”.

We’ve created a feedback loop that highlights defects spanning all concerns (functional, non-functional and operational) early which keeps costs down.

Clearly, delivering a given feature will take a little longer as we must account for all aspects from functional through non-functional and operational. That’s acceptable because if we don’t cover all these aspects we’re asking for trouble in many forms including:

  • If we cannot adequately monitor the performance of a newly delivered feature there’s a direct impact on customer experience. They will know before we do that something is broken which leads to irate phone calls, lost revenue etc.
  • If we cannot adequately track the effect of a new feature on customer behaviours, we cannot evolve it appropriately.

Needless to say developing features in this fashion fits well with lean and agile approaches.

So the antidote is relatively straightforward and there are development approaches that fit well with what needs to be done. The toughest challenge remains though, effecting the necessary mindset shift to get it done. It ought to be a little easier with the rise of DevOps but notably there are early signs of trouble as has been seen with lean and agile adoption.

There are many who claim to know and practice each of these disciplines but most are paying only lip service, picking out the bits of process, mindset or tooling that suit them and ignoring the rest.

Sporting Index is right in the middle of making this tricky jump from Dev and Ops to DevOps, I’ll let you know how we get on.

 

Comments Comments Off

A frequent problem I observe when reviewing system designs is they are built atop one or more libraries, frameworks or products that are poorly suited to the intended task. Fitting the design to these underpinnings warps it in undesirable ways incurring all sorts of costs:

  • It takes an increasing number of staff just to deploy and run the system.
  • Customers face an increasingly bad experience in terms of interaction, performance and stability.
  • One spends more time refactoring than developing new features – although in many cases developers will simply not bother with this effort which accelerates the drop in quality for customers.
  • The level of coupling increases impacting the integrity of the design and making future change more difficult.

I call this the “design-by-product anti-pattern”. There are a couple of things that cause it to manifest:

  1. Absence of a real design prior to product/framework/library selection – most of those given the remit for design cannot construct proper abstractions that are adequately divorced from implementation. That is they do not understand the core entities and operations that exist within the domain they are building a system for. Thus when products/libraries/frameworks are selected there is limited structure to assist in evaluating their appropriateness.
  2. These products are used because they are on the list of “company approved technologies”. The justification for the existence of such a list is that it “reduces cost” which it might well do if all one accounts for is licenses and product support. Unfortunately, the cost equation is not nearly so simple (see above re: costs).
  3. A related problem to “company approved technologies” is hot or favourite technologies preferred by the development team regardless of their appropriateness for use in any particular design situation.

Any product/library/framework is created by an individual who has their own view of how their customers design their systems and builds APIs accordingly. In the worst cases these individuals design APIs in total isolation, focused on making them theoretically perfect (for some definition of perfect). If we as customers create designs that do not align well with the views of these individuals, the result will be costly as we force the two designs together. The cost is magnified for each additional conflicting product/library/framework design.

Loose coupling as the result of proper definition of roles and responsibilities is the only tool we have to allow for future design evolution. Poor selection of products/libraries/frameworks erodes this property and should be avoided otherwise death-march awaits.

Comments Comments Off

There are some design basics that development teams routinely fail to account for:

  1. Roles
  2. Responsibilities
  3. Coupling

Role

The basic justification for the existence of some api, interface or class. A summary of what it’s for. Just as importantly, the role defines what a particular entity is not for.

Responsibility

The things that some entity can do/knows in support of a role.

Coupling

An expression of the dependencies between roles. This property tells us a lot about the state of our design.

Two things that are heavily dependent upon each other might well be serving individual parts of a single role and thus should be consolidated. If everything ends up in a single role, it can suggest that the current approach to classifying behaviours is missing some factors.

Coupling can be temporal such that, for example, one entity cannot dispatch its responsibilities without the presence of another at the same time. This might indicate the need for some work on handling availability issues in a distributed system.

Limited coupling is a sign of cohesion, clarity in roles and responsibilities which can be indicative of a clean, maintainable design.

Platform Neutral

These basics apply regardless of the platform one chooses to develop upon. Roles, responsibilities and coupling apply just as well to service architectures, databases (tables and associated triggers and packages) and applications in Java, Scala, Clojure, C# or any other programming environment.

Warning Signs

It is very common for individual developers or development teams to allocate additional functions to existing elements of a design unthinkingly, thus eroding its quality. This manifests in many ways including:

  1. Some element of the system becomes the source of all information in respect of e.g. configuration or the entirety of customer data.
  2. A single cache contains all data regardless of its nature (e.g. customer, account details, market price).
  3. Some element of the system must always be running otherwise nothing else works.
  4. Some element of the system has functions that span many different bits of data (e.g. customer, account, market price).

Rule of Thumb

Any entity within a system should do only one thing and it should do it well (often credited as Unix Philosophy). This applies to everything from applications and products to services and individual classes.

Comments Comments Off

Design is not rules, it’s not patterns, it’s not technological choices or indeed code. Design is tradeoffs, driven by data where possible and gut instinct. It’s about identifying the core challenges of a problem domain (which might ultimately be one or many systems) and addressing them through creation of appropriate abstractions. These abstractions embody:

  • Functions to be performed
  • Data to be discovered, consumed and produced
  • Non-functionals (e.g. SLAs)

The abstractions are then rendered into the real-world using appropriate hardware, technologies, patterns and languages. A good design:

  • Exhibits few exception cases
  • Has logic and/or data located neatly and predictably
  • Applies a small set of core constructs repeatedly
  • Addresses operational needs
  • Considers cost versus value delivered
  • Is as simple as possible
  • Has the minimum of implementation assumption

There are several key failing points in the design process:

  • No adjustment in the face of implementation feedback – No design is complete or perfect. There will always be missed details leading to brittle code, complex corner cases or convoluted solutions. It is critical that we monitor our progress and adapt the design accordingly.
  • No up front design – Design is the skeleton upon which we hang technology choices and code structure. In it’s absence we rapidly descend into a world of difficult to navigate code and costly constraints set by uninformed product choices.
  • No care in following the design – A key element of design is to place the right things in the right places. Failing to do this at code time increases coupling, makes maintenance difficult and can impact both performance and scalability. Similar effects occur as the result of poor technology selection.

Design and implementation go hand in hand yet many of us lack awareness of where the boundary between these two elements lies. We don’t understand how these elements interact with each other or appreciate the impact of decisions we make in respect of one element on the other.

 

Comments Comments Off

Point the average development team at a problem and in very little time:

  • IDEs have been fired up and code is being cranked out
  • The same well-worn non-process is being followed as before
  • The testers and developers don’t talk to each other

The average development manager (and by implication their superiors) stands behind the team exhorting them to hit the keyboards and crank it out regardless:

  • They actively or passively encourage late night coding
  • There’s no concern over quality of development environment
  • Getting purchases signed-off takes far too long

There’s an endless parade of vendors promising this or that coding acceleration product that is always cheap and easy to integrate:

  • Swiss army knife frameworks
  • Automated metrics
  • Do it all database solutions
  • Testing tools
  • Code generators

The result is a self-reinforcing, software disaster generator endlessly thrashing around the same old cycle producing the same poor results but always with the expectation that “this time things will be different”. I’m pretty sure that falls under at least one definition of insanity.

I call this “code myopia” and believe it to be at the root of many ongoing industry problems including:

  • The failure to focus on customer value – why are we developing at all? What’s the minimum we need to deliver? What do our customers actually need?
  • Spiralling costs – if we’re intent on delivering value to our customers can we do it practically?
  • Operational nightmares – endless production rollbacks, painful deployment, no anticipation of production issues and slow resolution.
  • The absence of real design – the shape of a system is entirely dictated by favourite or already licensed technologies and maintenance is a nightmare.
  • Minimal advancement – focus instead is on poor reinventions of decades old algorithms or designs because few do their research or simply prefer to reinvent for intellectual entertainment.
  • Management of process – one manages people, the environment and work, not process.

Let’s be clear: The best code we can write is no code at all. We want maximum customer value for least effort and best possible profit. When that requires us to deliver code:

  • We want to leverage past experience
  • We want sustainable design
  • We want minimal code be it our own or vendors’
  • Nothing beats real-world testing
  • If we are to make mistakes, they should be new ones
  • We want products that work for us not our vendors
  • We want to be operationally effective
  • We want to get pragmatic about deadlines
  • We want active management
  • We want to help our customers innovate

That’s quite a challenge for all concerned (developers, testers, operational staff, management, architects) yet in a twist of irony, say the above to most people and they run back to what they know; those same people complain that their jobs are mostly hassle and there’s no real challenge! If however, you’re up for it, drop me a line.

Comments Comments Off