Archive for July 30th, 2007

Distributed systems practitioners get really excited by swarm theory because it holds the promise of being able to assert a state change in a system without centralized control or a global interaction (e.g. two-phase commit). Bees, ants and the like manage to conduct a co-ordinated life within a co-operative using only local interactions. They don’t have to communicate with the entire colony to agree a good place to nest or a good source of raw materials. It should be no surprise then that distributed systems practitioners also get excited about epidemic behaviours given that they possess similar qualities.

These natural systems contain a good measure of resilience as well. Each bee for example carries a little bit of state with it that it communicates (gossips) with other bees it meets. This state is the result of something it’s encountered in it’s environment. Should this bee die losing the state it’s unlikely to matter as another bee will probably encounter the same environmental conditions and thus the state will be recovered. Victory through weight of numbers.

So nature has provided us a mechanism that can make a binding decision using very loose co-ordination and eventual consistency, whilst functioning entirely off local interactions - no centralized control. Naturally more scalable. What we don’t get is a predictable time for the point at which this decision will be made (concurrent transactional systems aren’t entirely predictable in this respect either). In addition for certain kinds of decision, it’s entirely possible to have race conditions leading to a less than optimal choice. For some systems this doesn’t matter (nature is happy to be a little sloppy) but in cases where it is important there are solutions available.

One key challenge in building these systems is that much of our existing communications middleware is either point-to-point (e.g. RPC or RMI) with fixed addresses or broadcast (message queues or multicast) neither of which is well suited to the random one-to-one nature of gossip driven approaches such as those described above. What’s needed is some form of registry which can cope with dynamically changing membership that takes account of locality of nodes and an efficient means of directed inter-node communication algorithm which mimics the relatively random properties of gossiping.

Technorati Tags: , , , ,

Comments Comments Off

I was reading this from Bill which follows on from Joe. And had a couple of thoughts:

  1. Google seems to have applied N > 1 to everything, not just storage - that’s pure distributed thinking, not the norm for the majority of software heads.
  2. Eventual consistency might be kinda like concurrent programming for most people - i.e. many like to program sequentially and with the certainty that x has been completed immediately, in order and within a known timespan. Concurrency, eventual consistency and friends aren’t terribly amenable to this programming approach and it consequently melts many a brain.
  3. We need for a lot more people to understand CAP.

[Note for Bill should he read this: it seems like your comments are broken, I'm seeing server errors when I hit post]

Comments Comments Off