<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Pragmatic Dictator &#187; Distributed Systems</title>
	<atom:link href="http://www.dancres.org/blitzblog/category/distributed-systems/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dancres.org/blitzblog</link>
	<description></description>
	<lastBuildDate>Fri, 04 Jun 2010 11:58:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Time Marches On</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F09%2F24%2Ftime-marches-on%2F&amp;seed_title=Time+Marches+On</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F09%2F24%2Ftime-marches-on%2F&amp;seed_title=Time+Marches+On#comments</comments>
		<pubDate>Wed, 24 Sep 2008 14:49:41 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Distributed Systems]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[development]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/?p=233</guid>
		<description><![CDATA[Those specifying requirements often express them without consideration for the passing of time, assuming that actions are instantaneous. A naive development team with limited experience in distributed systems will then make the classic mistake of attempting to implement those requirements to the letter. This can lead to a bunch of undesirable outcomes including: Brittleness in [...]]]></description>
			<content:encoded><![CDATA[<p>Those specifying requirements often express them without consideration for the passing of time, assuming that actions are instantaneous. A naive development team with limited experience in distributed systems will then make the classic mistake of attempting to implement those requirements to the letter.  This can lead to a bunch of undesirable outcomes including:</p>
<ul>
<li>Brittleness in the face of failure.</li>
<li>High cost solutions.</li>
<li>Poor scaling properties.</li>
<li>Disappointment as the expectations of the requirements source aren&#8217;t met.</li>
</ul>
<p>Consider a system where we have two (network) hops to an observer and one hop to the initiator of an action (assuming uniform network latency for each hop).  Potentially for every two actions there will be a single observation.  Thus each observation of the system is out of date by the time it reaches the observer.</p>
<p>Administrative actions can suffer similar problems, in that it could take several hops for the request to arrive at the system.  A user may be only one hop away and could be performing many operations in the time it takes for one of our actions to reach the system.  For example if we wish to block a user, whilst our request is in transit they might perform several operations.</p>
<p>Things are made worse by network failures which can further delay or prevent execution of an action and slow down the rate of updates to an observer.</p>
<p>How then do we account for these troubles when specifying requirements? By qualifying them with appropriate SLA&#8217;s.  In the example above, appropriate SLA&#8217;s might include:</p>
<ul>
<li>Time for propagation of an administrative action.</li>
<li>Maximum acceptable time after the action is triggered for a user to be blocked.</li>
</ul>
<p>SLA&#8217;s such as the above:</p>
<ol>
<li>Help us to identify appropriate solutions (e.g. do we need to pay for multiple independent routes between data-centres).</li>
<li>Allow us to make appropriate use of asynchronous operations and eventual consistency.</li>
</ol>
<p>Since SLA&#8217;s have significant impact on the way in which a requirement will be implemented it is essential to perform appropriate expectation management, discussing and communicating the implications with the requirements source, they cannot be solely the domain of techies.  Remember also that in many situations customers <a href="http://blogs.msdn.com/pathelland/archive/2008/09/01/confidence-in-the-cloud.aspx">prefer availability over consistency</a>.</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F09%2F24%2Ftime-marches-on%2F&amp;seed_title=Time+Marches+On/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mindset</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F07%2F10%2Fmindset%2F&amp;seed_title=Mindset</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F07%2F10%2Fmindset%2F&amp;seed_title=Mindset#comments</comments>
		<pubDate>Thu, 10 Jul 2008 21:48:13 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Distributed Systems]]></category>
		<category><![CDATA[development]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/?p=229</guid>
		<description><![CDATA[Neglecting to account for failure is an age old problem. Consider this common error (Purify anybody?): #include &#60;stdio.h&#62; #include &#60;stdlib.h&#62; struct rhubarb { int aVal; int anotherVal; char* aString; }; ...... struct rhubarb* mystruct; mystruct = malloc(sizeof(struct rhubarb)); mystruct->aVal = 55; ...... Of course the following code should have been included after the malloc: /* [...]]]></description>
			<content:encoded><![CDATA[<p>Neglecting to account for failure is an age old problem. Consider this common error (Purify anybody?):</p>
<pre>
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
struct rhubarb {
  int aVal;
  int anotherVal;
  char* aString;
};
......
  struct rhubarb* mystruct;
  mystruct = malloc(sizeof(struct rhubarb));
  mystruct->aVal = 55;
......
</pre>
<p>Of course the following code should have been included after the malloc:</p>
<pre>
/*
  If memory wasn't allocated, do something appropriate.
*/
if (mystruct == NULL) {
  .....
}
</pre>
<p>An equivalent mistake is easily possible when building a distributed system in http or RMI by ignoring error codes or exceptions that are designed to communicate failures that we ought to handle.  It&#8217;s similarly easy to ignore latency, or implement brittle and dumb retry logic or assume something is reliable (like a message queue) when it isn&#8217;t.  Many have managed to concoct systems with http that breach the idempotent &#8220;constraints&#8221; of REST and whilst Erlang provides link() and receive timeouts, we&#8217;re not forced to use them.</p>
<p>In essence there is no way to ensure developers do the right thing in a single-process or distributed context. No technology, tool or design approach can prevent developers from making poor implementation decisions which limits the value in re-hashing (<a href="http://service-architecture.blogspot.com/2008/07/convenience-over-correctness.html">Steve</a>, <a href="http://steve.vinoski.net/blog/2008/07/01/convenience-over-correctness/">Steve</a> and <a href="http://www.stucharlton.com/blog/archives/000553.html">Stu</a>) RPC rights and wrongs.</p>
<p>I believe the best chance we have for doing distributed right is not by providing some de-facto standard toolset, rather it&#8217;s through education<sup>[1]</sup> and mentoring to encourage the correct mindset.  Such a mindset allows a developer building a distributed system to choose the most appropriate tools and use them right.</p>
<p>[1] Material to be covered would be substantially broader then the fallacies, failure handling, latency and should probably include: <a href="http://research.microsoft.com/users/lamport/pubs/time-clocks.pdf">logical time</a>, <a href="http://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf">FLP</a>, <a href="http://citeseer.ist.psu.edu/356748.html">failure detectors</a>, <a href="http://www.cs.wustl.edu/~kjg/CS333_SP97/snapshot.html">global snapshots</a> and <a href="http://research.microsoft.com/users/lamport/pubs/lamport-paxos.pdf">Paxos</a>.</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F07%2F10%2Fmindset%2F&amp;seed_title=Mindset/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Corruption</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F07%2F01%2Fcorruption%2F&amp;seed_title=Corruption</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F07%2F01%2Fcorruption%2F&amp;seed_title=Corruption#comments</comments>
		<pubDate>Tue, 01 Jul 2008 11:29:22 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Distributed Systems]]></category>
		<category><![CDATA[availability]]></category>
		<category><![CDATA[networks]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/2008/07/01/corruption/</guid>
		<description><![CDATA[Amazon has had a few problems of late, one of the more interesting ones being something S3 users encountered. It took Amazon a little while to identify the root cause: We&#8217;ve isolated this issue to a single load balancer that was brought into service at 10:55pm PDT on Friday, 6/20. It was taken out of [...]]]></description>
			<content:encoded><![CDATA[<p>Amazon has had a <a href="http://news.zdnet.co.uk/internet/0,1000000097,39431423,00.htm">few</a> <a href="http://news.cnet.com/8301-10784_3-9963164-7.html">problems</a> of late, one of the more interesting ones being something S3 users <a href="http://developer.amazonwebservices.com/connect/thread.jspa?threadID=22709">encountered</a>. It took Amazon a little while to identify <a href="http://developer.amazonwebservices.com/connect/message.jspa?messageID=93408#93408">the root cause</a>:</p>
<blockquote>
<p>We&#8217;ve isolated this issue to a single load balancer that was brought into service at 10:55pm PDT on Friday, 6/20. It was taken out of service at 11am PDT Sunday, 6/22. While it was in service it handled a small fraction of Amazon S3&#8242;s total requests in the US. Intermittently, under load, it was corrupting single bytes in the byte stream.</p>
</blockquote>
<p>Perhaps they had anticipated this scenario as the S3 API features explicit support for software-level check-summing via MD5:</p>
<blockquote>
<p>For all PUT requests, Amazon S3 computes its own MD5, stores it with the object, and then returns the computed MD5 as part of the PUT response code in the ETag. By validating the ETag returned in the response, customers can verify that Amazon S3 received the correct bytes even if the Content MD5 header wasn&#8217;t specified in the PUT request. Because network transmission errors can occur at any point between the customer and Amazon S3, we recommend that all customers use the Content-MD5 header and/or validate the ETag returned on a PUT request to ensure that the object was correctly transmitted. This is a best practice that we&#8217;ll emphasize more heavily in our documentation to help customers build applications that can handle this situation.</p>
</blockquote>
<p>Some developers were surprised that any of this was necessary, expecting TCP/UDP checksums to be sufficient however Stevens points out in <a href="http://www.amazon.com/TCP-Illustrated-Protocols-Addison-Wesley-Professional/dp/0201633469/">TCP/IP Illustrated Vol I</a>:</p>
<div style="margin-left: 4em">
<p>Also, if your data is valuable, you might not want to trust the UDP or the TCP checksum, since these are simple checksums and were not meant to catch all possible errors.</p>
</div>
<p>Takeaways:</p>
<ol>
<li>Not all types of failure are binary &#8211; working or not working.</li>
<li>Leaving the responsibility of data-safety to software layers further down the stack may not be best.</li>
<li>Mechanisms for failure handling must be embedded in APIs.</li>
</ol>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F07%2F01%2Fcorruption%2F&amp;seed_title=Corruption/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Taste Is Everything</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F06%2F12%2Ftaste-is-everything%2F&amp;seed_title=Taste+Is+Everything</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F06%2F12%2Ftaste-is-everything%2F&amp;seed_title=Taste+Is+Everything#comments</comments>
		<pubDate>Thu, 12 Jun 2008 21:05:24 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Distributed Systems]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/2008/06/12/taste-is-everything/</guid>
		<description><![CDATA[It seems it&#8217;s generally accepted[1] that SOA means breaking up your system into a set of co-operating components partitioned by business process. If you&#8217;re not doing that, you&#8217;re not doing SOA. It never ceases to amaze me how we get so zealous about fixed methods for architecting a system. I suspect it&#8217;s because we&#8217;d like [...]]]></description>
			<content:encoded><![CDATA[<p>It seems it&#8217;s generally accepted<sup>[1]</sup> that SOA means breaking up your system into a set of co-operating components partitioned by business process. If you&#8217;re not doing that, you&#8217;re not doing SOA. It never ceases to amaze me how we get so zealous about fixed methods for architecting a system. I suspect it&#8217;s because we&#8217;d like to believe that architecture (and much of the act of development) can be done with fixed rules, cookie cutter style, get your catalog of patterns and technology, apply them &#8211; job done. The ultimate embodiment of this behaviour is deployment of a piece of technology in the belief that once the integration is complete the system has radically shifted in terms of it&#8217;s architecture (e.g. deploying an ESB suddenly makes your system SOA).</p>
<p>So if the fixed methods of SOA are thrown out and technology is not the solution, how do we build a system? Let&#8217;s first consider some of the things we&#8217;d like from our architecture:</p>
<ol>
<li>Avoid <a href="http://www.dancres.org/blitzblog/2007/07/11/the-siren-call-of-the-database/">integration via the database</a> &#8211; otherwise data coupling will cripple us</li>
<li>Support for granular updates &#8211; taking down the whole system is not desirable</li>
<li>Fast rollback of changes &#8211; in case an update breaks</li>
<li>In-production testing &#8211; there&#8217;s no substitute for real traffic in tests</li>
<li>Minimal shared resources such as storage &#8211; so should there be an outage, impact is minimised</li>
<li>Horizontal scaling &#8211; more boxes equals more power</li>
<li>Support for scalable development &#8211; dev teams should be able to act in isolation most of the time</li>
<li>Support for appropriate <a href="http://citeseer.ist.psu.edu/544596.html" title="CAP Theorem">CAP</a> tradeoffs &#8211; making everything consistent can be bad for availability</li>
</ol>
<p>Although we wish to avoid coupling via the database, the reality is that our code still requires access to the data in some form or another. The best we can do under this circumstance is to limit the amount of code that directly accesses the data. We achieve this by vertically slicing (as opposed to horizontal sharding) our data and consolidating the code that is most closely related to it (e.g. performs updates) into a single encapsulated unit. All other access to the data must go via the code element of its associated unit (note that one needn&#8217;t always go to a unit for the data, it&#8217;s perfectly acceptable to cache).</p>
<p>In this way we limit the impact of data-schema changes to it&#8217;s associated unit, other parts of the system need not be concerned but there&#8217;s still some work to do. If the code within a unit were to be co-located within all processes containing code that wishes to make use of it, we&#8217;d need to restart all those processes when we wish to deploy a new version of that code (for whatever reason). Such a deployment model also encourages several bad habits:</p>
<ol>
<li>Ignoring the remoteness of the data &#8211; it&#8217;s hidden behind some form of interface and it&#8217;s tempting to attempt to hide failure behind that interface</li>
<li>Focus on synchronous method calls &#8211; it&#8217;s natural for a developer to write synchronous method calls when the code being called looks local (note that method calls can support asynchronous behaviours)</li>
</ol>
<p>To avoid these issues, we deploy each unit in it&#8217;s own process accessed via some network endpoint that dependants use to interact with it thus:</p>
<ul>
<li>Each unit can now easily be allocated it&#8217;s own independent storage, apply it&#8217;s own sharding policy etc.</li>
<li>The network endpoint can support multiple protocol versions or we can opt to terminate multiple network endpoints onto a unit, a powerful primitive for supporting several versions of a remote interface simultaneously.</li>
<li>The network endpoint can be terminated onto some form of load balancer or custom routing implementation (which might be part of the code within the unit itself perhaps because it&#8217;s P2P based) facilitating horizontal scaling, hot upgrades, A/B testing, in-production tests etc.</li>
<li>Each unit can be assigned to a development team and much work can be done independently of development efforts elsewhere, making for less contention in development.</li>
<li>Each unit can implement whatever CAP tradeoff makes sense.</li>
</ul>
<p>If we arrange for the network endpoint of each unit to be <a href="http://www.dancres.org/blitzblog/2008/02/27/dns-games/">discovered dynamically at runtime</a> we gain the ability to move our units around (e.g. for DR reasons) and have means for our system to dynamically knit itself together reducing configuration issues. Such an arrangement can also make it easier to deal with ordered startup issues (where some set of things must be available before others).</p>
<p>Of course it&#8217;s not all good news, we will have to manage our desire for <a href="http://en.wikipedia.org/wiki/ACID">ACID</a> guarantees because many of the mechanisms (such as <a href="http://en.wikipedia.org/wiki/Two-phase-commit_protocol">two-phase commit</a>) for achieving this in a distributed system are <a href="http://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf">fraught with problems</a>. Fortunately, people have been <a href="http://www.addsimplicity.com/adding_simplicity_an_engi/2006/12/avoiding_two_ph.html">thinking</a> <a href="http://www.allthingsdistributed.com/2007/12/eventually_consistent.html">about</a> <a href="http://blogs.msdn.com/pathelland/archive/2007/05/16/link-to-life-beyond-distributed-transactions-an-apostate-s-opinion.aspx">this</a> for a while. We&#8217;ll also have to take care of <a href="http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing">the fallacies</a> but even this has some positive aspects as failure and upgrade in some cases <a href="http://armstrongonsoftware.blogspot.com/2007/07/scalable-fault-tolerant-upgradable.html">can be considered the same</a> (noting that abstractions for message passing, failure detectors and the like can be implemented in many languages, not just Erlang).</p>
<p>So what remoting approaches might we use? REST/http, WS-*, RMI, CORBA, messages, custom protocol &#8211; whatever is suitable for our situation (noting that some choices impact the means by which we can handle evolution of protocols etc). What guidelines might we follow in determining how to split our code and data? There are a number of different approaches including:</p>
<ol>
<li>Considering similarities in consistency, availability and partitioning (<a href="http://citeseer.ist.psu.edu/544596.html" title="CAP Theorem">CAP</a>) requirements</li>
<li>Data access localities</li>
<li>Data relationships</li>
<li>Jurisdictional requirements</li>
<li>Roles and responsibilities (at coarser level than OO)</li>
<li>Features (e.g. recommendations)</li>
<li>Business processes</li>
<li>Constituent elements of an overall business process</li>
</ol>
<p>Most systems likely require a combination of these rather than one fixed approach, <a href="http://www.artima.com/intv/tasteP.html">taste and gut instinct</a> count for a lot. And what might we call these units I speak of? I prefer to call them services as do a few <a href="http://blogs.msdn.com/pathelland/archive/2007/05/20/soa-and-newton-s-universe.aspx">other</a> <a href="http://www.acmqueue.org/modules.php?name=Content&amp;pa=showpage&amp;pid=403">people</a> but there&#8217;s no doubt that&#8217;ll be confusing, have to think of something else&#8230;&#8230;.</p>
<p>[1] I know that Steve <a href="http://service-architecture.blogspot.com/2008/04/how-you-know-its-soa.html">might well argue otherwise</a>.</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F06%2F12%2Ftaste-is-everything%2F&amp;seed_title=Taste+Is+Everything/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DNS To The Rescue</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F02%2F27%2Fdns-games%2F&amp;seed_title=DNS+To+The+Rescue</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F02%2F27%2Fdns-games%2F&amp;seed_title=DNS+To+The+Rescue#comments</comments>
		<pubDate>Tue, 26 Feb 2008 22:14:58 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Distributed Systems]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/2008/02/27/dns-games/</guid>
		<description><![CDATA[I mentioned a while back that one could exploit DNS to ease some of the common static configuration issues around hostnames, ports etc. What follows is a simple outline solution, we&#8217;ve moved a long way beyond this at Betfair but the details will have to remain secret for now (sorry). Let&#8217;s assume that we have [...]]]></description>
			<content:encoded><![CDATA[<p>I mentioned a while back that one could <a href="http://www.dancres.org/blitzblog/2008/01/21/static-shock/">exploit DNS</a> to ease some of the common static configuration issues around hostnames, ports etc.  What follows is a simple outline solution, we&#8217;ve moved a long way beyond this at Betfair but the details will have to remain secret for now (sorry).</p>
<p>Let&#8217;s assume that we have several different releases in testing at any one time such that we wish to segment our development/testing systems into separate enclaves (each handling a separate release) and may wish to add more enclaves over time.  Assume also that production is an enclave in its own right.</p>
<p>Firstly we define a set of logical hostnames that refer to the significant components of our system such as databases, file servers etc.  Other elements such as webservers are probably independent and not referenced from other parts of the system and thus do not need names.  These logical hostnames are what feature in our configuration files and do not need to change from enclave to enclave because we are going to use DNS to map from these logical hostnames to real physical machines.</p>
<p>Thus we want is a separate namespace for hosts in each of these enclaves so as to prevent leakage.  To that end we map each namespace onto a separate domain within our DNS setup.</p>
<p>[Note our DNS setup would typically consist of a set of servers that maintain records for our own internal domains and possibly forward other requests for say external web address to other servers.]</p>
<p>Each enclave therefore has:</p>
<ol>
<li>A separate namespace represented as a unique domain</li>
<li>A set of services deployed onto physical machines</li>
<li>A mapping from logical machine names to physical machine names (or IP addresses)</li>
<li>A collection of configuration files all referencing logical machine names</li>
</ol>
<p>Each domain (namespace) contains the logical to physical mapping of machines for its associated enclave.  Each domain can be a separate zone and is thus kept in a separate file read by our DNS master.  This allows us to maintain a template file which can be quickly edited to create a new domain (namespace).  Thus whenever we wish to create a new enclave we setup a new zone, containing the definition of a new domain which is the namespace for that enclave.</p>
<p>To actually resolve a logical hostname we must ensure that it is concatenated with the domain appropriate to the enclave&#8217;s namespace.  Before discussing options, note that each machine will be allocated to an enclave and must be configured accordingly which we can exploit to our advantage:</p>
<ol>
<li>Simple configuration &#8211; ensure that the application has access to the domain to concatenate.  This could be done via command-line argument but better is to source it from a well-known file on the machine which could be setup as part of allocating it to an enclave.</li>
<li>Default search domain &#8211; any name not fully qualified has the default search domain appended to it.  This default is typically part of the resolver configuration of the operating system and again can be setup as part of allocating a machine to an enclave.</li>
</ol>
<p>Missing from the above is the handling of <a href="http://en.wikipedia.org/wiki/TCP_and_UDP_port">ports</a> which might change from one enclave to the next.  This can be tackled with a similar logical/physical mapping approach but must be based on the use of DNS <a href="http://en.wikipedia.org/wiki/SRV_record">SRV</a> records rather than simple hostname mappings.  The JDK provides little help out of the box for querying these records so something like <a href="http://www.dnsjava.org/">dnsjava</a> will be required.</p>
<p><!-- technorati tags start -->
<p style="text-align:right;font-size:10px;">Technorati Tags: <a href="http://www.technorati.com/tag/distributed systems" rel="tag">distributed systems</a>, <a href="http://www.technorati.com/tag/dns" rel="tag">dns</a>, <a href="http://www.technorati.com/tag/testing" rel="tag">testing</a></p>
<p><!-- technorati tags end --></p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F02%2F27%2Fdns-games%2F&amp;seed_title=DNS+To+The+Rescue/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Static Shock</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F01%2F21%2Fstatic-shock%2F&amp;seed_title=Static+Shock</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F01%2F21%2Fstatic-shock%2F&amp;seed_title=Static+Shock#comments</comments>
		<pubDate>Mon, 21 Jan 2008 21:45:57 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Distributed Systems]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/2008/01/21/static-shock/</guid>
		<description><![CDATA[Why do people still use static addressing in configuration files? Fixed hostnames or worse IP addresses? These things make one&#8217;s life a nightmare when moving from one environment to another e.g. desktop to QA, QA to staging or staging to production. With each transition, one must wade through all the relevant configuration files, find all [...]]]></description>
			<content:encoded><![CDATA[<p>Why do people still use static addressing in configuration files?  Fixed hostnames or worse IP addresses?</p>
<p>These things make one&#8217;s life a nightmare when moving from one environment to another e.g. desktop to QA, QA to staging or staging to production.</p>
<p>With each transition, one must wade through all the relevant configuration files, find all these addresses and edit them.  This creates many an opportunity for error such as missing one configuration variable or mistyping an address.  It&#8217;s also a nightmare to maintain accurate documentation for all these scattered settings.</p>
<p>And yet this is so unnecessary if one exploits the abilities of <a href="http://en.wikipedia.org/wiki/Domain_name_system">DNS</a> (and maybe <a href="http://en.wikipedia.org/wiki/Bonjour_(software)">Bonjour</a>) properly.  Just look at some of <a href="http://tools.ietf.org/html/rfc2136">the</a> <a href="http://tools.ietf.org/html/rfc2782">cool</a> <a href="http://tools.ietf.org/html/rfc3927">stuff</a> <a href="http://files.dns-sd.org/draft-sekar-dns-llq.txt">one</a> <a href="http://files.dns-sd.org/draft-sekar-dns-ul.txt">can</a> <a href="http://files.dns-sd.org/draft-cheshire-nat-pmp.txt">do</a>.  Better still most (all?) of it is supported in <a href="http://www.isc.org/index.pl?/sw/bind/index.php">BIND</a>.</p>
<p><!-- technorati tags start -->
<p style="text-align:right;font-size:10px;">Technorati Tags: <a href="http://www.technorati.com/tag/architecture" rel="tag">architecture</a>, <a href="http://www.technorati.com/tag/distributed systems" rel="tag">distributed systems</a>, <a href="http://www.technorati.com/tag/systems" rel="tag">systems</a></p>
<p><!-- technorati tags end --></p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F01%2F21%2Fstatic-shock%2F&amp;seed_title=Static+Shock/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Dark Skies</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F12%2F23%2Fdark-skies%2F&amp;seed_title=Dark+Skies</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F12%2F23%2Fdark-skies%2F&amp;seed_title=Dark+Skies#comments</comments>
		<pubDate>Sun, 23 Dec 2007 18:11:25 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Distributed Systems]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/2007/12/23/dark-skies/</guid>
		<description><![CDATA[Much is being made of a comment from Subodh Bapat especially in conjunction with further words from Greg Papadopoulos. It&#8217;s believable that many a company will choose to host in a so-called &#8220;megacentre&#8221; but that doesn&#8217;t have to mean disaster come the day one of these fails. One can only get so much power into [...]]]></description>
			<content:encoded><![CDATA[<p>Much is <a href="http://blogs.zdnet.com/SAAS/?p=430">being</a> <a href="http://smoothspan.wordpress.com/2007/12/15/to-rule-the-clouds-takes-software-why-amazon-simpledb-is-a-huge-next-step/">made</a> of a <a href="http://www.news.com/8301-10784_3-9828570-7.html">comment</a> from Subodh Bapat especially in conjunction with <a href="http://blogs.zdnet.com/BTL/?p=7231">further words</a> from Greg Papadopoulos.</p>
<p>It&#8217;s believable that many a company will choose to host in a so-called &#8220;megacentre&#8221; but that doesn&#8217;t have to mean disaster come the day one of these fails.  One can only get so much power into one place, so much cooling etc.  Then there&#8217;s latency challenges such that if you&#8217;re hosted in the wrong place your customers will be displeased with the performance of your system.  Which is a long-winded way of saying that whilst one might expect to see consolidation of cloud providers they&#8217;ll still need an awful lot of data-centres to hold all the kit required and provide the appropriate speed-of-light tradeoffs for those they host.</p>
<p>What about resilience?  We know that to solve a useful class of problem (byzantine failure) one requires a minimum of n > 3f where f is the number of failures one wishes to tolerate and n is the number of nodes required.  If we lower our sights a little, the minimum to handle a data-centre failure requires an active-passive approach with remote replication.  Some companies however are moving to active-active models to solve problems of data-centre outage in recognition of the fact that simpler approaches work but mean significant downtime whilst the DR (disaster recovery) site is brought online.</p>
<p>Why if there are techniques available that address these nastier classes of failure  are <a href="http://valleywag.com/tech/breakdowns/truck-driver-in-texas-kills-all-the-websites-you-really-use-321881.php">we</a> <a href="http://radar.oreilly.com/archives/2007/07/mistakes_will_b.html">losing</a> so many &#8220;big&#8221; sites when we lose data-centres?  Because most software houses (enterprise, web or otherwise) assume that failure can be prevented using backup network providers, clusters, replicated disk networks etc.  i.e. hardware-based approaches that allow our software writers to pretend that nothing ever breaks leaving them to just write the important business logic.</p>
<p>To allow for data-centre fallure, the clouds of the future will require us to make considerably fewer assumptions in our software, network addresses might change, storage can become unavailable, processes might move and weaker consistency models must be exploited.  One such cloud has already arrived in the form of Amazon and it&#8217;s notable that many developers are struggling with the new model it offers (they can&#8217;t for example find a suitable traditional database solution).</p>
<p>The challenges of the cloud are not in data-centre failure or consolidation of hosting solutions but in our own ability to write software that runs in these environments.</p>
<p><!-- technorati tags start -->
<p style="text-align:right;font-size:10px;">Technorati Tags: <a href="http://www.technorati.com/tag/amazon" rel="tag">amazon</a>, <a href="http://www.technorati.com/tag/architecture" rel="tag">architecture</a>, <a href="http://www.technorati.com/tag/availability" rel="tag">availability</a>, <a href="http://www.technorati.com/tag/distributed systems" rel="tag">distributed systems</a></p>
<p><!-- technorati tags end --></p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F12%2F23%2Fdark-skies%2F&amp;seed_title=Dark+Skies/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>In Public</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F11%2F10%2Fin-public%2F&amp;seed_title=In+Public</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F11%2F10%2Fin-public%2F&amp;seed_title=In+Public#comments</comments>
		<pubDate>Sat, 10 Nov 2007 16:35:20 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Distributed Systems]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/2007/11/10/in-public/</guid>
		<description><![CDATA[Going to be at the Google London Open Source Jam on Thursday 29th November 2007, 6pm &#8211; 9.30pm for an evening of distributed systems hackery.]]></description>
			<content:encoded><![CDATA[<p>Going to be at the Google London Open Source Jam on Thursday 29th November 2007, 6pm &#8211; 9.30pm for <a href="http://osjam.truemesh.com/?r">an evening of distributed systems hackery</a>.</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F11%2F10%2Fin-public%2F&amp;seed_title=In+Public/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Amazon&#039;s Custom Service Invocation Infrastructure</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F10%2F11%2Famazons-custom-service-invocation-infrastructure%2F&amp;seed_title=Amazon%26%23039%3Bs+Custom+Service+Invocation+Infrastructure</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F10%2F11%2Famazons-custom-service-invocation-infrastructure%2F&amp;seed_title=Amazon%26%23039%3Bs+Custom+Service+Invocation+Infrastructure#comments</comments>
		<pubDate>Thu, 11 Oct 2007 19:02:55 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Distributed Systems]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/2007/10/11/amazons-custom-service-invocation-infrastructure/</guid>
		<description><![CDATA[The release of the Dynamo paper has generated a lot of interest around the net. That&#8217;s more than appropriate because I don&#8217;t think there can be any doubt that Dynamo is a great piece of work. It seems there might be a further bonus that&#8217;s largely gone unmentioned (even Greg seems to have missed it) [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html">release</a> of the Dynamo paper has generated a lot of interest around the net.  That&#8217;s more than appropriate because I don&#8217;t think there can be any doubt that Dynamo is a great piece of work.</p>
<p>It seems there might be a further bonus that&#8217;s largely gone unmentioned (even Greg seems to have <a href="http://glinden.blogspot.com/2007/10/highly-available-distributed-hash.html">missed it</a>) but has been hinted at by <a href="http://www.allthingsdistributed.com/">Werner</a> at various points in the past.  Read carefully and you&#8217;ll find some details of a custom invocation infrastructure:</p>
<p><em>&#8220;Both get and put operations are invoked using Amazon’s infrastructure-specific request processing framework over HTTP. There are two strategies that a client can use to select a node: (1) route its request through a generic load balancer that will select a node based on load information, or (2) use a partition-aware client library that routes requests directly to the appropriate coordinator nodes. The advantage of the first approach is that the client does not have to link any code specific to Dynamo in its application, whereas the second strategy can achieve lower latency because it skips a potential forwarding step.&#8221;</em></p>
<p>Notice how they have support for both smart and dumb clients with the smart client setup being somewhat akin to a pattern that&#8217;s been seen in Google&#8217;s software including <a href="http://labs.google.com/papers/chubby.html">Chubby</a>.  The choice to reuse http would give them an option to leverage many a load balancer&#8217;s capability to apply custom routing by URL which would assist in service invocation routing.</p>
<p>Other interesting tidbits include:</p>
<ol>
<li>A mention at Google Scalability Conference of a lightweight rendering engine that might invoke upwards of 150 requests per page.  Given some of the latencies discussed in the dynamo paper I am wondering if this custom framework might have some support for making collections of requests in parallel.</li>
<li>Common service types are stateless aggregator services that can perform a lot of caching (wondering how much the use of http helps here) or stateful services.</li>
<li>A statement from a past interview with Vogels:
<p>
<em>&#8220;The first category is the services that make up the Amazon platform.  There we use interface specifications such as WSDL but we use optimized transport and marshalling technology to ensure efficient use of CPU and network resources.&#8221;</em>.</p>
<p>See the mention of the custom framework again but also a possible hint that they make use of a variety of interface specifications (perhaps including something homebrew).</li>
</ol>
<p>Food for thought?</p>
<p><!-- technorati tags start -->
<p style="text-align:right;font-size:10px;">Technorati Tags: <a href="http://www.technorati.com/tag/distributed systems" rel="tag">distributed systems</a>, <a href="http://www.technorati.com/tag/performance" rel="tag">performance</a>, <a href="http://www.technorati.com/tag/amazon" rel="tag">amazon</a>, <a href="http://www.technorati.com/tag/scalability" rel="tag">scalability</a></p>
<p><!-- technorati tags end --></p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F10%2F11%2Famazons-custom-service-invocation-infrastructure%2F&amp;seed_title=Amazon%26%23039%3Bs+Custom+Service+Invocation+Infrastructure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Abstracting the Network Still Harmful</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F10%2F10%2Fabstracting-the-network-still-harmful%2F&amp;seed_title=Abstracting+the+Network+Still+Harmful</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F10%2F10%2Fabstracting-the-network-still-harmful%2F&amp;seed_title=Abstracting+the+Network+Still+Harmful#comments</comments>
		<pubDate>Wed, 10 Oct 2007 19:32:43 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Distributed Systems]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/2007/10/10/abstracting-the-network-still-harmful/</guid>
		<description><![CDATA[It&#8217;s well known that abstracting away network failure in inter-process communication is a bad thing. but there are other similarly harmful abstractions one might adopt when handling networks such as assuming uniformity. In recent times there&#8217;s been a resurgence of interest in using messaging between processes as a mechanism for taming concurrency rather than the [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s <a href="http://research.sun.com/techrep/1994/abstract-29.html">well known</a> that abstracting away network failure in inter-process communication is a bad thing.  but there are other similarly harmful abstractions one might adopt when handling networks such as assuming uniformity.</p>
<p>In recent times there&#8217;s been a resurgence of interest in using messaging between processes as a mechanism for taming concurrency rather than the (possibly) more conventional approach of using threads and locking.  This model is very appealing in it&#8217;s simplicity and some variations even allow for process failure (though I think there&#8217;s still some interesting discussion to be had around being certain that a process has failed rather than become partitioned away by network failure &#8211; split brain scenarios etc).</p>
<p>Some are wondering if this messaging approach could be extended beyond concurrent programming across multiple cores in a single box to deal with concurrent programming across networked machines.  I think there&#8217;s maybe a small fly in the ointment &#8211; latency.  If all processes are communicating via messaging inside of a single <a href="http://en.wikipedia.org/wiki/Symmetric_multiprocessing">SMP</a> box we will likely have at least approximately uniform latency between processes which is reasonably easy to manage.  The same cannot be said of messaging across processes in a <a href="http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access">NUMA</a> system or on a network.  Things get still more tricky if one has processes running on a mix of SMP and NUMA machines all living on a network and messaging each other.</p>
<p>Managing such a mix is difficult &#8211; one must consider carefully where to deploy things and the nature of the messages you send (what you&#8217;d consider moving around an SMP system&#8217;s bus is probably not the same sort of payload you&#8217;d want to place on a network).  When a process fails, one potentially cannot start up a replacement anywhere rather it must be placed carefully and appropriately.</p>
<p><!-- technorati tags start -->
<p style="text-align:right;font-size:10px;">Technorati Tags: <a href="http://www.technorati.com/tag/concurrency" rel="tag">concurrency</a>, <a href="http://www.technorati.com/tag/distributed systems" rel="tag">distributed systems</a>, <a href="http://www.technorati.com/tag/messaging" rel="tag">messaging</a>, <a href="http://www.technorati.com/tag/networks" rel="tag">networks</a></p>
<p><!-- technorati tags end --></p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F10%2F10%2Fabstracting-the-network-still-harmful%2F&amp;seed_title=Abstracting+the+Network+Still+Harmful/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
