A frequent problem I observe when reviewing system designs is they are built atop one or more libraries, frameworks or products that are poorly suited to the intended task. Fitting the design to these underpinnings warps it in undesirable ways incurring all sorts of costs:
- It takes an increasing number of staff just to deploy and run the system.
- Customers face an increasingly bad experience in terms of interaction, performance and stability.
- One spends more time refactoring than developing new features – although in many cases developers will simply not bother with this effort which accelerates the drop in quality for customers.
- The level of coupling increases impacting the integrity of the design and making future change more difficult.
I call this the “design-by-product anti-pattern”. There are a couple of things that cause it to manifest:
- Absence of a real design prior to product/framework/library selection – most of those given the remit for design cannot construct proper abstractions that are adequately divorced from implementation. That is they do not understand the core entities and operations that exist within the domain they are building a system for. Thus when products/libraries/frameworks are selected there is limited structure to assist in evaluating their appropriateness.
- These products are used because they are on the list of “company approved technologies”. The justification for the existence of such a list is that it “reduces cost” which it might well do if all one accounts for is licenses and product support. Unfortunately, the cost equation is not nearly so simple (see above re: costs).
- A related problem to “company approved technologies” is hot or favourite technologies preferred by the development team regardless of their appropriateness for use in any particular design situation.
Any product/library/framework is created by an individual who has their own view of how their customers design their systems and builds APIs accordingly. In the worst cases these individuals design APIs in total isolation, focused on making them theoretically perfect (for some definition of perfect). If we as customers create designs that do not align well with the views of these individuals, the result will be costly as we force the two designs together. The cost is magnified for each additional conflicting product/library/framework design.
Loose coupling as the result of proper definition of roles and responsibilities is the only tool we have to allow for future design evolution. Poor selection of products/libraries/frameworks erodes this property and should be avoided otherwise death-march awaits.
Comments Off
Point the average development team at a problem and in very little time:
- IDEs have been fired up and code is being cranked out
- The same well-worn non-process is being followed as before
- The testers and developers don’t talk to each other
The average development manager (and by implication their superiors) stands behind the team exhorting them to hit the keyboards and crank it out regardless:
- They actively or passively encourage late night coding
- There’s no concern over quality of development environment
- Getting purchases signed-off takes far too long
There’s an endless parade of vendors promising this or that coding acceleration product that is always cheap and easy to integrate:
- Swiss army knife frameworks
- Automated metrics
- Do it all database solutions
- Testing tools
- Code generators
The result is a self-reinforcing, software disaster generator endlessly thrashing around the same old cycle producing the same poor results but always with the expectation that “this time things will be different”. I’m pretty sure that falls under at least one definition of insanity.
I call this “code myopia” and believe it to be at the root of many ongoing industry problems including:
- The failure to focus on customer value – why are we developing at all? What’s the minimum we need to deliver? What do our customers actually need?
- Spiralling costs – if we’re intent on delivering value to our customers can we do it practically?
- Operational nightmares – endless production rollbacks, painful deployment, no anticipation of production issues and slow resolution.
- The absence of real design – the shape of a system is entirely dictated by favourite or already licensed technologies and maintenance is a nightmare.
- Minimal advancement – focus instead is on poor reinventions of decades old algorithms or designs because few do their research or simply prefer to reinvent for intellectual entertainment.
- Management of process – one manages people, the environment and work, not process.
Let’s be clear: The best code we can write is no code at all. We want maximum customer value for least effort and best possible profit. When that requires us to deliver code:
- We want to leverage past experience
- We want sustainable design
- We want minimal code be it our own or vendors’
- Nothing beats real-world testing
- If we are to make mistakes, they should be new ones
- We want products that work for us not our vendors
- We want to be operationally effective
- We want to get pragmatic about deadlines
- We want active management
- We want to help our customers innovate
That’s quite a challenge for all concerned (developers, testers, operational staff, management, architects) yet in a twist of irony, say the above to most people and they run back to what they know; those same people complain that their jobs are mostly hassle and there’s no real challenge! If however, you’re up for it, drop me a line.
Comments Off
How big does a website have to get before custom infrastructure becomes necessary? When a website reaches this stage, what infrastructure gets built? Before trying to answer these questions we must have some means of measuring the size of a website. I’ve settled on the number of machines as a reasonable approximation because:
- As a codebase grows it must be split up along functional boundaries, and spread across multiple processes. More code equals more processes and more machines to run them on.
- More customers, means more load and requires more machines to handle it.
- More data means more storage and more processors to chew through it.
Now let’s see how many machines some of the big players are running and what infrastructure they’re talking about:
TicketMaster have at least 3000 machines and have built Spine to help them manage configuration of their infrastructure.
eBay have built a custom deployment tool (Roller), logging infrastructure, configuration management for their software services, messaging software and more. They’re running around 15000 machines across four geographical locations.
Microsoft have built a custom deployment, configuration and monitoring infrastructure called Autopilot focused on many thousands of machines. In fact we’re talking hundreds of thousands.
Google are dealing in a million or more machines and expending effort on software to handle staged, automatic upgrades. Of course they’ve already built GFS, Chubby etc.
Twitter have moved beyond the half-dozen or so machines they used to have to “a lot of servers” (hundreds?) and are seemingly still hiring operations staff but have built a custom queue server.
Facebook have at least 10000 webservers, 800 MemcacheD instances and 1800 MySQL instances. They’ve built a custom configuration-serving infrastructure, management and monitoring tools. They also contribute to MemcacheD and have built Cassandra and Thrift. They also appear to be busy building their own optimized webservers and a replacement for squid.
Amazon have tens of thousands of servers (surely more?) and have constructed Dynamo, S3, EC2, SQS etc.
A few tentative conclusions:
- It would seem that by the time a website has moved into the thousands of boxes it will have had to address configuration and monitoring. Which suggests development efforts started before this threshold (perhaps at a couple of hundred boxes?)
- As the machine count moves towards the tens of thousands, automated deployment becomes essential and there’s a need to develop more service-specific infrastructure.
1 Comment »