The release of the Dynamo paper has generated a lot of interest around the net. That’s more than appropriate because I don’t think there can be any doubt that Dynamo is a great piece of work.
It seems there might be a further bonus that’s largely gone unmentioned (even Greg seems to have missed it) but has been hinted at by Werner at various points in the past. Read carefully and you’ll find some details of a custom invocation infrastructure:
“Both get and put operations are invoked using Amazon’s infrastructure-specific request processing framework over HTTP. There are two strategies that a client can use to select a node: (1) route its request through a generic load balancer that will select a node based on load information, or (2) use a partition-aware client library that routes requests directly to the appropriate coordinator nodes. The advantage of the first approach is that the client does not have to link any code specific to Dynamo in its application, whereas the second strategy can achieve lower latency because it skips a potential forwarding step.”
Notice how they have support for both smart and dumb clients with the smart client setup being somewhat akin to a pattern that’s been seen in Google’s software including Chubby. The choice to reuse http would give them an option to leverage many a load balancer’s capability to apply custom routing by URL which would assist in service invocation routing.
Other interesting tidbits include:
- A mention at Google Scalability Conference of a lightweight rendering engine that might invoke upwards of 150 requests per page. Given some of the latencies discussed in the dynamo paper I am wondering if this custom framework might have some support for making collections of requests in parallel.
- Common service types are stateless aggregator services that can perform a lot of caching (wondering how much the use of http helps here) or stateful services.
- A statement from a past interview with Vogels:
“The first category is the services that make up the Amazon platform. There we use interface specifications such as WSDL but we use optimized transport and marshalling technology to ensure efficient use of CPU and network resources.”.
See the mention of the custom framework again but also a possible hint that they make use of a variety of interface specifications (perhaps including something homebrew).
Food for thought?
Technorati Tags: distributed systems, performance, amazon, scalability
Comments Off
It’s well known that abstracting away network failure in inter-process communication is a bad thing. but there are other similarly harmful abstractions one might adopt when handling networks such as assuming uniformity.
In recent times there’s been a resurgence of interest in using messaging between processes as a mechanism for taming concurrency rather than the (possibly) more conventional approach of using threads and locking. This model is very appealing in it’s simplicity and some variations even allow for process failure (though I think there’s still some interesting discussion to be had around being certain that a process has failed rather than become partitioned away by network failure – split brain scenarios etc).
Some are wondering if this messaging approach could be extended beyond concurrent programming across multiple cores in a single box to deal with concurrent programming across networked machines. I think there’s maybe a small fly in the ointment – latency. If all processes are communicating via messaging inside of a single SMP box we will likely have at least approximately uniform latency between processes which is reasonably easy to manage. The same cannot be said of messaging across processes in a NUMA system or on a network. Things get still more tricky if one has processes running on a mix of SMP and NUMA machines all living on a network and messaging each other.
Managing such a mix is difficult – one must consider carefully where to deploy things and the nature of the messages you send (what you’d consider moving around an SMP system’s bus is probably not the same sort of payload you’d want to place on a network). When a process fails, one potentially cannot start up a replacement anywhere rather it must be placed carefully and appropriately.
Technorati Tags: concurrency, distributed systems, messaging, networks
2 Comments »
Distributed systems practitioners get really excited by swarm theory because it holds the promise of being able to assert a state change in a system without centralized control or a global interaction (e.g. two-phase commit). Bees, ants and the like manage to conduct a co-ordinated life within a co-operative using only local interactions. They don’t have to communicate with the entire colony to agree a good place to nest or a good source of raw materials. It should be no surprise then that distributed systems practitioners also get excited about epidemic behaviours given that they possess similar qualities.
These natural systems contain a good measure of resilience as well. Each bee for example carries a little bit of state with it that it communicates (gossips) with other bees it meets. This state is the result of something it’s encountered in it’s environment. Should this bee die losing the state it’s unlikely to matter as another bee will probably encounter the same environmental conditions and thus the state will be recovered. Victory through weight of numbers.
So nature has provided us a mechanism that can make a binding decision using very loose co-ordination and eventual consistency, whilst functioning entirely off local interactions – no centralized control. Naturally more scalable. What we don’t get is a predictable time for the point at which this decision will be made (concurrent transactional systems aren’t entirely predictable in this respect either). In addition for certain kinds of decision, it’s entirely possible to have race conditions leading to a less than optimal choice. For some systems this doesn’t matter (nature is happy to be a little sloppy) but in cases where it is important there are solutions available.
One key challenge in building these systems is that much of our existing communications middleware is either point-to-point (e.g. RPC or RMI) with fixed addresses or broadcast (message queues or multicast) neither of which is well suited to the random one-to-one nature of gossip driven approaches such as those described above. What’s needed is some form of registry which can cope with dynamically changing membership that takes account of locality of nodes and an efficient means of directed inter-node communication algorithm which mimics the relatively random properties of gossiping.
Technorati Tags: biology, distributed systems, epidemic, swarm, theory
Comments Off
A little known fact in distributed systems is that once you make them resilient in the face of failure, handling upgrades is relatively simple. Why is this?
To make a distributed system resilient in the face of failure requires that we eliminate all single sources of truth. Truth must be maintained by and sourced from a collective which can maintain all relevant state in the face of nodes joining, leaving and failing.
There’s an additional subtlety to resilience which is that it requires scalability and in particular we need to be able to spread load across the collective whilst the members change otherwise the loss of a node will potentially mean we can no longer handle the current load in our system (we’d no longer have enough processors to cope). Likely as not, to make this work will require us to relax system constraints enough to allow truth to propagate asynchronously.
How does all this help with upgrading though? Because the typical upgrade cycle for a node looks like this:
- Shut down a node.
- Upgrade software on node.
- Restart node.
Such a sequence of events when considered from the point of view of other nodes looks like failure – the node disappears, later returns and needs to be re-knitted into the collective.
Sadly, nothing comes for free so to make upgrading work we need to ensure a level of backward compatibility between versions of the software on our nodes and we also need to account for this in our communication protocols, however there are plenty of examples to help us here such as http and DNS.
Some references:
- Tom Limoncelli of Google mentions this in a talk at Lisa ’06.
- Bill Clementson in a recent blog about Erlang but it’s largely relevant to any kind of distributed system.
Technorati Tags: availability, design, distributed systems, versioning
Comments Off
When we write programs one of the things we seek to do is encapsulate our data so as to allow us to manage our dependencies and keep our code clean. Most languages OO or otherwise provide mechanisms to support this way of working.
The thing about the average database is that it doesn’t really encourage similar behaviour. It is all too tempting (and easy) to just allow everyone to access everything. Whilst we confine ourselves to a single application using the database, the problem is to some extent contained but often what we actually do is allow multiple applications access to the same database. The exact way in which this is done varies:
- Sometimes we bundle all our middle tier code together even though it has separate roles and responsibilities and integrate all of it via a single database.
- Sometimes we have multiple applications each running in a different process.
With each application we put on top of the database the problem gets worse increasing the number of invisible dependencies tying unrelated elements of code together by virtue of accessing a shared schema.
What’s happening is we’re sharing too much intimate knowledge across our system, something we’re all taught to fear. The solution is as always to prevent direct access to this intimate knowledge by interposing layers of abstraction. One way to do this is by requiring access to data to be wrapped up behind an interface. Historically we’ve done this by having a system own the database and expose interfaces that other systems can use to get the data.
Unfortunately there is a well-known issue with this approach which is that the level of granularity is wrong and these additional integration interfaces rapidly balloon into complex beasts. What we need is a a database wrapping entity that has a finer level of granularity than an entire system. Then the integration interfaces will be simpler because there will naturally be a less complex schema underpinning this more limited functionality.
What are we talking about? Services. We end up with a system of lots of discrete services each wrapping up their own data storage.
There are other benefits to this approach:
- Each service can utilize the most appropriate storage option for it’s contained data whilst having zero impact on other services that might have different needs.
- Each service is an independent entity that can be managed (monitored, deployed etc) separately.
- Centralized access patterns are more easily broken down which is useful in cases where we deploy across multiple data-centres.
Who would do such a thing?
Technorati Tags: architecture, database, distributed systems, enterprise
3 Comments »