The longer one holds onto the single shared memory, multi-core, big box approach, the harder and more costly it gets to shift to distributed.

Every time we buy a bigger box for increased load we’re wasting money come the day there isn’t a bigger box to buy (something that is looking increasingly likely for many of us). All that money would have been better spent on buying racks of smaller boxes. It’s possible we can recover some of our losses by repurposing that big iron via virtualization rather than throwing it away (like all our previous big boxes) but of course, if that box dies it takes an awful lot of VM‘s with it.

Every time we assume we can keep all our data in a single memory or database (even if it’s a cluster) we’re embedding assumptions into our software that will be broken come the day we must partition across multiple memories or databases.

Each time we choose an algorithm that doesn’t easily partition or assumes a single memory/database we’re storing up trouble in our data and computational models.

In big monolithic systems it’s possible to create (by force) a never-fails environment which allows developers to ignore various edge cases. The move to a system built out of many separate parts makes failure almost impossible to avoid. This requires us to adjust our system design to take account of all those edge cases we previously ignored.

The time we spend gaining experience in building big monolithic systems has limited application when we switch to building distributed systems. We must learn new habits and adopt new modes of thought and that costs time.

In the worst cases, an organization’s processes, tools and departmental structure become heavily optimized for managing these big monolithic software and hardware systems such that it needs serious revision to cope with the move to horizontal, many box scaling. Typical problem areas include:

  1. Monitoring – suddenly there’s a much greater number of machines to gather stats from. Existing gui representations mightn’t cope with such a large number.
  2. Diagnosis – no longer does a single timestamp imply an order on events making analysis of logging information and root cause identification harder.
  3. Deployment – previous methods simply break as the level of automation provided is inadequate for the number of machines and software components involved.
  4. Testing – existing testing practices where everything can live on the developer’s desktop or in a single VM are no longer viable. There are too many moving parts and the convenience of isolation provided by testing at the desktop or in a single VM is lost.

I doubt threads will ever go away but learning to build and manage systems constructed in any of the following ways might be worthwhile:

  1. Multiple communicating reliable processes on a reliable bus
  2. Multiple communicating unreliable processes on a reliable bus
  3. Multiple communicating unreliable processes on an unreliable bus

[ Where bus is typically a backplane or a network ]

Technorati Tags: , , ,

  • Share/Bookmark
4 Responses to “Painted Into A Corner”
  1. [...] wrong assumptions too early.  Having to rearchitect those under fire is a recipe for disaster.  Don’t paint yourself into an architectural corner where scaling is concerned, says Dan Cresswell.  I’ve worked at three companies now that [...]

  2. [...] Don’t paint your architecture into a multicore corner [...]

  3. mind says:

    this post sucks. you say very vague things

    using something like quicksort (single thread, single memory, etc) right now is completely justified because your parallelization happens at a higher level. if your main goal is to sort a large dataset, then you wouldn’t choose quicksort, but i’m guessing for 99% of applications, this is not the case.

    you’re trying to extrapolate the multicore ‘crisis’ (trend) into other areas. software is always built upon assumptions, and frankly the assumption of one big shared memory isn’t that big. memory can _always_ get bigger (we can always put more chips in a circuit. eventually the decoding logic would become the limiting factor, but that would be way off). memory bandwith is what you should really be worried about. and when it comes down to it, nonscaling memory bandwith is really the same kind of limit as nonscaling single core cpus.

    furthermore, when you decompose software into message sends (like erlang and scala actors), you’re making enough of an abstraction for each sequential processor to have it’s own memory (a message send would then pass the message into the other piece of memory, and that could even be helped by dedicated hardware)

    btw, we’re already writing software for scenario 4 -> a varying number of hostile processes communicating on a hostile bus

  4. Dan Creswell says:

    “this post sucks. you say very vague things”

    Uh huh – might mean I’m stupid and don’t have anything useful to say or maybe I was trying to move some readers down a particular alley of thought or perhaps I was writing it for myself, a snapshot of some part of my thought processes or maybe there was some other reason entirely. Maybe you believe that anything vague or subjective has no value or place in the world, who knows.

    “but i’m guessing for 99% of applications”

    See now that’s interesting – you’ve switched from being very definitive in your first statement to apparently a guess in the follow up. Having come out so strongly I’d have expected you to continue in that vein and rather than guess, present some data to back yourself up.

    “btw, we’re already writing software for scenario 4 -> a varying number of hostile processes communicating on a hostile bus”

    Hmmm, you’re maybe alluding to Byzantine systems – dealing in those might make you smart by some measures or it could just be that your mind works a particular way suited to that problemspace. However the above is merely a reference so there’s no way I can make a judgement on how clever you might be.

    So maybe I’m uninformed or stupid and don’t know about such things so I didn’t mention “scenario 4″ in my posting (and the fact you did makes you superior to me) or maybe I was deliberately leaving it out for another posting or maybe I didn’t feel that referencing it helped in what I was trying to say or …..

  5.