Archive for June 20th, 2007

The debate about all these many core processors continues to circle the blogosphere. Tim Bray had this to say which set me thinking (always a bad thing):

Any time we have a piece of state that needs to be accessed concurrently we hit problems. One can hide this problem using messaging (or similar) but the key aspect in these solutions is that we can partition operations into streams against discrete elements of data (a discrete element could be a group of things) that don’t interfere with each other. Partitioning however can be problematic:

  1. Our data has to be amenable to partitioning via hashing or some other method.
  2. It gets tricky when we need to deal with availability and disaster recovery.
  3. Getting the correct granularity of partitioning can be challenging.

Which is interesting because whilst we’ve eliminated the concurrency issue, we’re now faced with a different one (partitioning) which could be just as hard to cope with and requires just as much thought from a developer and/or architect. Coincidentally, Werner Vogels (Amazon) is going to be talking about an internal data store (HASS) at the Google Scability Conference and specifically the problems of partitioning and consistent hashing (my original interest with respect to this talk was in the context of the CAP conjecture).

Another means of avoiding all these concurrency issues is to push them somewhere else. More often than not this becomes an exercise in creating a supposedly stateless system which in reality simply puts all the state in one place, usually the database. The argument is that this is acceptable because it’s only the likes of databases that should deal with these hard issues.

The rub with having the database handle it is that the concurrency model it uses will only scale across so many processors (more if you’re read mostly, less if your not) and cope with so many concurrent accesses from the stateless component. Once again to get our database layer to scale, we’ll need to partition our data into shards across multiple databases (an approach adopted by a number of top-line websites) or find some other way to reduce concurrent load on the database instance.

The act of partitioning can mean we reach a point where we can no longer expect to have atomic updates because the mechanisms for achieving it (e.g. two-phase commit) stop us scaling. When this happens we must construct complex or at least exotic solutions such as that proposed by Pat Helland.

Okay we got rid of our concurrency problem and swapped it for a partitioning problem which then turned into something of an exotic problem. Are we any better off? It seems no matter which way we go we end up with some tough problems to solve.

Perhaps there’s a sweet-spot tradeoff where the combination of a CMT box, with data partitioned across a number of processes and each process containing a simple concurrency model covers most situations. Even if that’s the case it seems developers will have to learn a few new tricks.

Technorati Tags: , , ,

Update: A good comment over on Reddit.

Comments 2 Comments »