It’s well known that abstracting away network failure in inter-process communication is a bad thing. but there are other similarly harmful abstractions one might adopt when handling networks such as assuming uniformity.
In recent times there’s been a resurgence of interest in using messaging between processes as a mechanism for taming concurrency rather than the (possibly) more conventional approach of using threads and locking. This model is very appealing in it’s simplicity and some variations even allow for process failure (though I think there’s still some interesting discussion to be had around being certain that a process has failed rather than become partitioned away by network failure – split brain scenarios etc).
Some are wondering if this messaging approach could be extended beyond concurrent programming across multiple cores in a single box to deal with concurrent programming across networked machines. I think there’s maybe a small fly in the ointment – latency. If all processes are communicating via messaging inside of a single SMP box we will likely have at least approximately uniform latency between processes which is reasonably easy to manage. The same cannot be said of messaging across processes in a NUMA system or on a network. Things get still more tricky if one has processes running on a mix of SMP and NUMA machines all living on a network and messaging each other.
Managing such a mix is difficult – one must consider carefully where to deploy things and the nature of the messages you send (what you’d consider moving around an SMP system’s bus is probably not the same sort of payload you’d want to place on a network). When a process fails, one potentially cannot start up a replacement anywhere rather it must be placed carefully and appropriately.
Technorati Tags: concurrency, distributed systems, messaging, networks

Entries (RSS)