The debate about all these many core processors continues to circle the blogosphere. Tim Bray had this to say which set me thinking (always a bad thing):

Any time we have a piece of state that needs to be accessed concurrently we hit problems. One can hide this problem using messaging (or similar) but the key aspect in these solutions is that we can partition operations into streams against discrete elements of data (a discrete element could be a group of things) that don’t interfere with each other. Partitioning however can be problematic:

  1. Our data has to be amenable to partitioning via hashing or some other method.
  2. It gets tricky when we need to deal with availability and disaster recovery.
  3. Getting the correct granularity of partitioning can be challenging.

Which is interesting because whilst we’ve eliminated the concurrency issue, we’re now faced with a different one (partitioning) which could be just as hard to cope with and requires just as much thought from a developer and/or architect. Coincidentally, Werner Vogels (Amazon) is going to be talking about an internal data store (HASS) at the Google Scability Conference and specifically the problems of partitioning and consistent hashing (my original interest with respect to this talk was in the context of the CAP conjecture).

Another means of avoiding all these concurrency issues is to push them somewhere else. More often than not this becomes an exercise in creating a supposedly stateless system which in reality simply puts all the state in one place, usually the database. The argument is that this is acceptable because it’s only the likes of databases that should deal with these hard issues.

The rub with having the database handle it is that the concurrency model it uses will only scale across so many processors (more if you’re read mostly, less if your not) and cope with so many concurrent accesses from the stateless component. Once again to get our database layer to scale, we’ll need to partition our data into shards across multiple databases (an approach adopted by a number of top-line websites) or find some other way to reduce concurrent load on the database instance.

The act of partitioning can mean we reach a point where we can no longer expect to have atomic updates because the mechanisms for achieving it (e.g. two-phase commit) stop us scaling. When this happens we must construct complex or at least exotic solutions such as that proposed by Pat Helland.

Okay we got rid of our concurrency problem and swapped it for a partitioning problem which then turned into something of an exotic problem. Are we any better off? It seems no matter which way we go we end up with some tough problems to solve.

Perhaps there’s a sweet-spot tradeoff where the combination of a CMT box, with data partitioned across a number of processes and each process containing a simple concurrency model covers most situations. Even if that’s the case it seems developers will have to learn a few new tricks.

Technorati Tags: , , ,

Update: A good comment over on Reddit.

2 Responses to “Dodging the Concurrency Bullet”
  1. Kaizyn says:

    Erlang solved this problem largely by using ‘green threads’ and letting the language interpreter deal with being multi-threaded and/or load-balancing across however many processors you make available to the system. I suspect it won’t be too difficult for other languages to adopt the same sort of architecture.

  2. Dan Creswell says:

    Possibly, however I think the very essence of the Erlang language itself is what allows it to perform this magic. It may be that one could turn this essence into a general systems design pattern that could be applied across platforms and languages. However I’d be surprised to see this essence widely adopted in existing languages because I think it would change their nature significantly (this is a similar argument to Java gaining all sorts of features that were never really in it’s design envelope leading to various kludges).

    I think Erlang is cool but there’s only so much magic that can be done without the programmer helping through correct design. A remote processor conceptually might look the same as a local one but it’s going to exhibit significant differences in terms of I/O etc. For a certain class of application (the likes of MapReduce) this won’t matter and naive design will suffice but in many other cases that’s not true. It’s these other cases where the programmer will need to help out with quality design and that’s where we need some progress.