Jul 30
Riding the Swarm
Distributed systems practitioners get really excited by swarm theory because it holds the promise of being able to assert a state change in a system without centralized control or a global interaction (e.g. two-phase commit). Bees, ants and the like manage to conduct a co-ordinated life within a co-operative using only local interactions. They don’t have to communicate with the entire colony to agree a good place to nest or a good source of raw materials. It should be no surprise then that distributed systems practitioners also get excited about epidemic behaviours given that they possess similar qualities.
These natural systems contain a good measure of resilience as well. Each bee for example carries a little bit of state with it that it communicates (gossips) with other bees it meets. This state is the result of something it’s encountered in it’s environment. Should this bee die losing the state it’s unlikely to matter as another bee will probably encounter the same environmental conditions and thus the state will be recovered. Victory through weight of numbers.
So nature has provided us a mechanism that can make a binding decision using very loose co-ordination and eventual consistency, whilst functioning entirely off local interactions - no centralized control. Naturally more scalable. What we don’t get is a predictable time for the point at which this decision will be made (concurrent transactional systems aren’t entirely predictable in this respect either). In addition for certain kinds of decision, it’s entirely possible to have race conditions leading to a less than optimal choice. For some systems this doesn’t matter (nature is happy to be a little sloppy) but in cases where it is important there are solutions available.
One key challenge in building these systems is that much of our existing communications middleware is either point-to-point (e.g. RPC or RMI) with fixed addresses or broadcast (message queues or multicast) neither of which is well suited to the random one-to-one nature of gossip driven approaches such as those described above. What’s needed is some form of registry which can cope with dynamically changing membership that takes account of locality of nodes and an efficient means of directed inter-node communication algorithm which mimics the relatively random properties of gossiping.
Technorati Tags: biology, distributed systems, epidemic, swarm, theory