Archive for June 10th, 2008

There are many distributed algorithms and they vary in lots of ways including:

Communication Method: Possibilities include shared memory, point-to-point or broadcast messages etc.

Failure Model: Perhaps the algorithm assumes complete reliability. Perhaps it copes with some types of processor failure (including stop, transient failure or byzantine where the processor behaves arbitrarily). It might cope with problems in it’s communications layers (including message loss and duplication).

Timing Model: The algorithm might require computation and communication to progress in lock-step (synchronous) or it might cope with steps in arbitrary order with arbitrary speed (asynchronous). In between these two extremes exists an area of algorithms that have partial timing information (e.g. processors can access partially synchronised clocks). Asynchronous/Synchronous is independently applied to processors and communication channels.

The easiest to program are the synchronous algorithms. Asynchronous algorithms are harder to program because the order of happenings is uncertain however they have the advantage of needing no consideration of timing. Asynchronous algorithms also present some unique challenges for consensus which can be addressed by means of a failure detector. Many distributed systems provide stronger guarantees in respect of timing than is assumed in the asynchronous model thus we get to the partially synchronous model which perhaps surprisingly is the most difficult to program. Algorithms in this class are potentially efficient and the most realistic but care must be taken to ensure the timing assumptions they make are not violated (perhaps by failing to arrange for some aspect of process behaviour to act within the assumptions).

Such a classification helps us choose algorithms appropriate to our network environment (which should include consideration of how often manual intervention will be required), A popular leader election algorithm simply requires each process to broadcast its UID across the network and maintenance of a lease. If a process doesn’t receive a UID higher than its own it can assume it is the leader. This algorithm works in a synchronous network with no failures. It can also be adapted to work in an asynchronous network with reliable FIFO channels and no failures. However it can fail in the presence of a network partition or packet loss leading to split brain behaviour which would need to be addressed with manual action or additional fault handling in other parts of the system.

  • Share/Bookmark

Comments Comments Off