<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Pragmatic Dictator &#187; Architecture</title>
	<atom:link href="http://www.dancres.org/blitzblog/tag/architecture/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dancres.org/blitzblog</link>
	<description></description>
	<lastBuildDate>Sat, 31 Dec 2011 19:08:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Foundation</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2011%2F02%2F15%2Ffoundation%2F&#038;seed_title=Foundation</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2011%2F02%2F15%2Ffoundation%2F&#038;seed_title=Foundation#comments</comments>
		<pubDate>Tue, 15 Feb 2011 12:26:41 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[design]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/?p=365</guid>
		<description><![CDATA[There are some design basics that development teams routinely fail to account for: Roles Responsibilities Coupling Role The basic justification for the existence of some api, interface or class. A summary of what it&#8217;s for. Just as importantly, the role defines what a particular entity is not for. Responsibility The things that some entity can [...]]]></description>
			<content:encoded><![CDATA[<p>There are some design basics that development teams routinely fail to account for:</p>
<ol>
<li>Roles</li>
<li>Responsibilities</li>
<li>Coupling</li>
</ol>
<h2>Role</h2>
<p>The basic justification for the existence of some api, interface or class. A summary of what it&#8217;s for. Just as importantly, the role defines what a particular entity is <em>not</em> for.</p>
<h2>Responsibility</h2>
<p>The things that some entity can do/knows in support of a role.</p>
<h2>Coupling</h2>
<p>An expression of the dependencies between roles. This property tells us a lot about the state of our design.</p>
<p>Two things that are heavily dependent upon each other might well be serving individual parts of a single role and thus should be consolidated. If everything ends up in a single role, it can suggest that the current approach to classifying behaviours is missing some factors.</p>
<p>Coupling can be temporal such that, for example, one entity cannot dispatch its responsibilities without the presence of another at the same time. This might indicate the need for some work on handling availability issues in a distributed system.</p>
<p>Limited coupling is a sign of cohesion, clarity in roles and responsibilities which can be indicative of a clean, maintainable design.</p>
<h2>Platform Neutral</h2>
<p>These basics apply regardless of the platform one chooses to develop upon. Roles, responsibilities and coupling apply just as well to service architectures, databases (tables and associated triggers and packages) and applications in Java, Scala, Clojure, C# or any other programming environment.</p>
<h2>Warning Signs</h2>
<p>It is very common for individual developers or development teams to allocate additional functions to existing elements of a design unthinkingly, thus eroding its quality. This manifests in many ways including:</p>
<ol>
<li>Some element of the system becomes the source of all information in respect of e.g. configuration or the entirety of customer data.</li>
<li>A single cache contains all data regardless of its nature (e.g. customer, account details, market price).</li>
<li>Some element of the system must always be running otherwise nothing else works.</li>
<li>Some element of the system has functions that span many different bits of data (e.g. customer, account, market price). </li>
</ol>
<h2>Rule of Thumb</h2>
<p>Any entity within a system should do only one thing and it should do it well (often credited as <a href="http://en.wikipedia.org/wiki/Unix_philosophy">Unix Philosophy</a>). This applies to everything from applications and products to services and individual classes.</p>
]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2011%2F02%2F15%2Ffoundation%2F&#038;seed_title=Foundation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Unaware</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2011%2F01%2F03%2Funaware%2F&#038;seed_title=Unaware</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2011%2F01%2F03%2Funaware%2F&#038;seed_title=Unaware#comments</comments>
		<pubDate>Mon, 03 Jan 2011 14:44:44 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Engineering]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/?p=358</guid>
		<description><![CDATA[Design is not rules, it&#8217;s not patterns, it&#8217;s not technological choices or indeed code. Design is tradeoffs, driven by data where possible and gut instinct. It&#8217;s about identifying the core challenges of a problem domain (which might ultimately be one or many systems) and addressing them through creation of appropriate abstractions. These abstractions embody: Functions [...]]]></description>
			<content:encoded><![CDATA[<p>Design is not rules, it&#8217;s not patterns, it&#8217;s not technological choices or indeed code. Design is tradeoffs, driven by data where possible and gut instinct. It&#8217;s about identifying the core challenges of a problem domain (which might ultimately be one or many systems) and addressing them through creation of appropriate abstractions. These abstractions embody:</p>
<ul>
<li>Functions to be performed</li>
<li>Data to be discovered, consumed and produced</li>
<li>Non-functionals (e.g. SLAs)</li>
</ul>
<p>The abstractions are then rendered into the real-world using appropriate hardware, technologies, patterns and languages. A good design:</p>
<ul>
<li>Exhibits few exception cases</li>
<li>Has logic and/or data located neatly and predictably</li>
<li>Applies a small set of core constructs repeatedly</li>
<li>Addresses operational needs</li>
<li>Considers cost versus value delivered</li>
<li>Is as simple as possible</li>
<li>Has the minimum of implementation assumption</li>
</ul>
<p>There are several key failing points in the design process:</p>
<ul>
<li>No adjustment in the face of implementation feedback &#8211; No design is complete or perfect. There will always be missed details leading to brittle code, complex corner cases or convoluted solutions. It is critical that we monitor our progress and adapt the design accordingly.</li>
<li>No up front design &#8211; Design is the skeleton upon which we hang technology choices and code structure. In it&#8217;s absence we rapidly descend into a world of difficult to navigate code and costly constraints set by uninformed product choices.</li>
<li>No care in following the design &#8211; A key element of design is to place the right things in the right places. Failing to do this at code time increases coupling, makes maintenance difficult and can impact both performance and scalability. Similar effects occur as the result of poor technology selection.</li>
</ul>
<p>Design and implementation go hand in hand yet many of us lack awareness of where the boundary between these two elements lies. We don&#8217;t understand how these elements interact with each other or appreciate the impact of decisions we make in respect of one element on the other.</p>
<p> </p>
]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2011%2F01%2F03%2Funaware%2F&#038;seed_title=Unaware/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Performing</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2010%2F06%2F04%2Fperforming%2F&#038;seed_title=Performing</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2010%2F06%2F04%2Fperforming%2F&#038;seed_title=Performing#comments</comments>
		<pubDate>Fri, 04 Jun 2010 00:14:15 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Engineering]]></category>
		<category><![CDATA[Systems]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/?p=337</guid>
		<description><![CDATA[My current company has for obvious business reasons got a serious interest in delivering a quality website experience during the World Cup and thus I&#8217;ve been spending a lot of time focused on our own performance and capacity management of late. P&#38;C is one of those 80/20 tradeoffs. There&#8217;s always more one can do or [...]]]></description>
			<content:encoded><![CDATA[<p>My <a href="http://www.sportingindex.com/">current company</a> has for obvious business reasons got a serious interest in delivering a quality website experience during the World Cup and thus I&#8217;ve been spending a lot of time focused on our own performance and capacity management of late.</p>
<p>P&amp;C is one of those 80/20 tradeoffs. There&#8217;s always more one can do or measure or test, equally getting the basics in place will deliver substantial benefit. I&#8217;d go further and argue that without a solid grasp of the basics, one cannot easily determine what else beyond that might be required. Here then are the basics that I&#8217;ve found myself repeating over and over:</p>
<ul>
<li>Have an enquiring mind &#8211; anomalies are not to be ignored or dismissed on the basis of pure speculation. Determining root cause is essential to prevent surprises in production. Some recent examples:
<ol>
<li>In one test we noticed that every so often we&#8217;d get a substantial blip in disk I/O on servers that should be processing entirely out of memory. Along with that blip there&#8217;d be a corresponding reduction in throughput, we could have ignored it, after all things sorted themselves out relatively quickly but we chose to investigate. All these servers were periodically running a cleanup job the developers were unaware of and had not factored into their capacity calculations. The implications for production would have been a regularly overloaded, badly performing website. We&#8217;ve since tuned the jobs, adjusted their schedules and  increased our capacity to ensure we can always spread the load around enough to accommodate them.</li>
<li>An examination of the distribution of load on the boxes behind our load-balancers revealed a higher than expected amount of variance in CPU and connections. A review of the application revealed that any particular user&#8217;s traffic is sticky to one box, unfortunate as it&#8217;s stateless, time for a code change. We also spent time looking at the monitoring infrastructure and discovered that in certain cases we&#8217;d get false reports of 100% CPU utilisation, that one will be fixed with an OS patch.</li>
</ol>
</li>
<p></p>
<li>Gather the right data &#8211; there&#8217;s no value in allowing oneself to be limited by what is easily available via some set of tools people are comfortable with. One tool we were using had an unreasonably low ceiling on the number and rate of samples it could handle such that any graphs it produced showed hardly anything of the true profile of e.g. CPU utilisation, memory consumption or I/O. Forming any opinion about system behaviour in respect of load was going to be an exercise in speculation. We junked the tool and are looking for a replacement, in the meantime we&#8217;ve fallen back to making use of low level performance counters which we can sample local to the machine and whack onto disk for later analysis via scripts, opensource tools etc.</li>
<p></p>
<li>Design tests that support reasoning &#8211; One should indeed try and replicate production load behaviours to judge overall system behaviour. The challenge of such testing is that it can be difficult to relate performance data back to exactly what was going on during some period of a test and make a diagnosis or be confident of an improvement. There are a number of things we can do to improve the situation:
<ol>
<li>Ensure tests are deterministic such that any given run can be compared against other runs. This isn&#8217;t as simple as it looks when e.g. you wish to gradually increase load at a fixed rate that is being produced by more than one box.</li>
<li>Have tests produce sufficient logging that one can easily identify what was going on at particular points in the sampled data. Logging of course can actually affect test behaviour and that isn&#8217;t always desirable.</li>
<li>Build additional tests that target particular user journey&#8217;s through the system. Doing this for all possible journey&#8217;s can be costly so it makes sense to focus on testing those which are most popular with users. These kinds of tests restrict the reasoning tree making analysis, diagnosis and solution identification much easier.</li>
</ol>
</li>
<p></p>
<li>Measure what customers care about &#8211; they don&#8217;t care about CPUs, I/O or memory, they worry about things like response times. It is important to focus on maintaining a quality user experience not endlessly improving system efficiency. Considering user factors such as response times stops us expending huge effort on CPU utilisation when we should be focusing on say, network I/O, browser performance or reducing the amount of data we push to the browser before a page can render.</li>
<p></p>
<li>Beware of averages &#8211; it is very tempting to combine datasets via the use of averaging unfortunately such a practice can easily hide spikes that might be indicative of a problem. On more than one occasion an engineer has presented a graph that tracks the average CPU and a table that summarises min, avg and max. After which they&#8217;ve pronounced load testing was a success and yet they have no explanation for why the average is never more than 50% but the max is 100% and whether or not this is good or bad.</li>
<p><br/></p>
<li>More than load &#8211; excessive focus on measuring the effect of a particular load can make us blind to another important metric, resource cost per unit of work &#8211; these are the collection of tests and analysis that help us understand what to tune and how much to keep our appetite for boxes and bandwidth reasonable. One simple thing teams can do per sprint (assuming you&#8217;re agile, why wouldn&#8217;t you be?) is point a profiler at each component and look for the low hanging fruit that is poor algorithm selection or inefficient code (e.g. repeated scanning of lists where a hashmap would be better or repeatedly computing something that could be cached).</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2010%2F06%2F04%2Fperforming%2F&#038;seed_title=Performing/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Sustainable Architecture</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2009%2F10%2F06%2Fsustainable-architecture%2F&#038;seed_title=Sustainable+Architecture</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2009%2F10%2F06%2Fsustainable-architecture%2F&#038;seed_title=Sustainable+Architecture#comments</comments>
		<pubDate>Tue, 06 Oct 2009 21:50:45 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Engineering]]></category>
		<category><![CDATA[evolution]]></category>
		<category><![CDATA[refactoring]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/?p=302</guid>
		<description><![CDATA[I&#8217;ve spent a significant amount of my career helping to unpick messed up architectures and wondering how they ever come to be. Certainly it can&#8217;t be because they&#8217;re appealing to work with: Making changes becomes increasingly expensive &#8211; make one small change and it spiders into changes across many other areas and gets into corners [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve spent a significant amount of my career helping to unpick messed up architectures and wondering how they ever come to be. Certainly it can&#8217;t be because they&#8217;re appealing to work with:</p>
<ol>
<li>Making changes becomes increasingly expensive &#8211; make one small change and it spiders into changes across many other areas and gets into corners one least expects.</li>
<li>Replacing components of the system because for example they&#8217;re no longer supported, don&#8217;t perform adequately or can&#8217;t scale requires significant reverse engineering to understand dependencies etc.</li>
<li>It only takes one piece of the system failing to bring everything to its knees.</li>
<li>Isolating the root cause of a bug takes significant amounts of effort because it&#8217;s difficult to quickly eliminate large chunks of the system.</li>
</ol>
<p>More often than not it&#8217;s believed (I&#8217;m guilty) these systems come into being through incompetence or indiscipline on behalf of the developers involved but I think there&#8217;s maybe another contributory factor: Much of the advice on design and architecture is couched in terms of design from scratch, there&#8217;s less guidance in regard to working with an existing architecture.</p>
<p>The result is that when developers start out building a system they have a lot of advice they can apply but as it grows, it becomes more difficult to apply the advice and discern what changes are appropriate, so the architecture unravels. Is there a way to avoid this unravelling? I believe there is and it&#8217;s derived from the process for fixing up an errant architecture.</p>
<p>These architectures have smells equivalent to the code-level examples Fowler discusses in his <a href="http://books.google.co.uk/books?id=1MsETFPD3I0C&#038;lpg=PP1&#038;dq=martin%20fowler%20refactoring&#038;pg=PP1#v=onepage&#038;q=&#038;f=false">book</a> on refactoring such as:</p>
<ol>
<li>Some area of the system is too tightly coupled, making changes harder.</li>
<li>Some part of the system contains an assumption that there is only one resource of some type (e.g. a database) limiting scaling.</li>
<li>Many components of the system are reliant upon one key component being constantly available such that if it fails, nothing works.</li>
</ol>
<p>Having identified these smells we need to perform appropriate cleanup which, for the list of examples above might include:</p>
<ol>
<li>Placing additional APIs (interfaces) within the tightly coupled area of the system to reduce shared implementation knowledge and create well-bounded islands of data.</li>
<li>Introducing a resource discovery pattern to abstract away the assumption of a single resource at a single address.</li>
<li>Introducing concepts like acceptable staleness of data which allows caching for a period of time, eventual consistency which supports making updates and resolving the outcome at a later date or asynchronous operations.</li>
</ol>
<p>It&#8217;s important to realise that in any substantial system we will be unable to eradicate a smell completely in a single update because it&#8217;s too risky. There will be many places in the code we might forget to patch up, a high likelihood we&#8217;ll miss something in testing, low probability we&#8217;ll get API designs exactly right etc. We must gradually introduce modifications over a period of time (months or even years) rather than perform significant rewrites. This isn&#8217;t as bad as it seems because no architecture is perfect for very long once it&#8217;s exposed to users. It also suggests that perhaps we need to focus on documenting techniques for gradual evolution of an architecture.</p>
<p>If we were to get better at spotting these architectural smells early (slight odour as opposed to horrific stench) and working to address them sooner than later it might be possible to avoid having a system&#8217;s architecture unravel, leading to something more sustainable.</p>
<p><strong>Updated:</strong> to include additional commentary on APIs and perfection.</p>
]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2009%2F10%2F06%2Fsustainable-architecture%2F&#038;seed_title=Sustainable+Architecture/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Clouded Vision</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2009%2F01%2F25%2Fcutting-corners%2F&#038;seed_title=Clouded+Vision</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2009%2F01%2F25%2Fcutting-corners%2F&#038;seed_title=Clouded+Vision#comments</comments>
		<pubDate>Sun, 25 Jan 2009 18:01:10 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[cloud computing]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/?p=241</guid>
		<description><![CDATA[Cloud computing platforms offer many benefits including: Cheaper operational costs. Dynamic scaling in response to load spikes. Roll-on, roll-off deployments for e.g. newspaper archive processing. These platforms exist as the result of the investment of companies such as Amazon, Google and Microsoft in developing cost-effective infrastructure with system to administrator ratios of 2500:1 (whilst the [...]]]></description>
			<content:encoded><![CDATA[<p>Cloud computing platforms offer many benefits including:</p>
<ol>
<li>Cheaper operational costs.</li>
<li>Dynamic scaling in response to load spikes.</li>
<li>Roll-on, roll-off deployments for e.g. <a href="http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/">newspaper archive processing</a>.</li>
</ol>
<p>These platforms exist as the result of the investment of companies such as Amazon, Google and Microsoft in developing cost-effective infrastructure with <a href="http://mvdirona.com/jrh/TalksAndPapers/JamesRH_Expedia.pdf">system to administrator ratios of 2500:1</a> (whilst the average enterprise manages around 150:1 and inefficient properties manage maybe 10:1).</p>
<p>Key to allowing these infrastructures to be efficient and in turn deliver the benefits above is having applications architected such that:</p>
<ol>
<li>They don&#8217;t require masses of administrator intervention when they go wrong.</li>
<li>They can be installed with minimal administrator effort because there&#8217;s no need to worry about tweaking URLs, IP addresses, database connections etc.</li>
<li>They readily support horizontal scaling e.g. because they contain an abstraction that can support sharding of data-storage.</li>
</ol>
<p>In essence an application must be <a href="http://mvdirona.com/jrh/talksAndPapers/JamesRH_Lisa.pdf">designed</a> for zero administrator intervention and <a href="http://research.microsoft.com/apps/pubs/default.aspx?id=64604">fully automated deployment</a>.  It should also have a <a href="http://community.citrix.com/blogs/citrite/chrisfl/2008/10/13/Cloud%20Economics%20101%20Part%202%20-%20Premise%20Plus%20Cloud">variable workload component that magnifies the savings</a> of the architectural properties above.</p>
<p>Strange then that many <a href="http://www.scripting.com/stories/2008/10/27/microsoftsCloudStrategy.html">a developer expects</a> to move their existing application, full of enterprise DNA (static configuration, vertical clusters, no horizontal scaling, high administration costs) to such an offering with minimal change. They even complain when it proves difficult because all those &#8220;enterprise features&#8221; aren&#8217;t present. Why does this happen?</p>
<p>I believe it&#8217;s because these developers have fundamentally misunderstood how cloud computing delivers its benefits. They see the cheap prices but don&#8217;t stop to consider where the cost saving comes from. Some of it is achieved by cloud platform vendors getting large discounts on huge hardware orders but a significant proportion comes from the fact that they don&#8217;t need to provide (via human resources or APIs) the sysadmin functions required for conventional hosting solutions.</p>
<p>Quite simply typical applications, their architectures and associated administration practices are not setup for cloud platforms. Some of them may be able to run on these platforms with sufficient hackery, brute force and associated cost. However if the motivation for a move to the cloud is merely to reduce kit costs one might well be better off looking for a cheaper conventional hosting solution.</p>
<p>In summary, making the best of the cloud requires that we take an architectural view, <a href="http://www.kavistechnology.com/blog/?p=440">something that we&#8217;ve proven remarkably bad at</a> over and over.  Simply deploying an application unchanged to the cloud is unlikely to deliver much benefit.</p>
]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2009%2F01%2F25%2Fcutting-corners%2F&#038;seed_title=Clouded+Vision/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Remodelling</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F12%2F02%2Fremodelling%2F&#038;seed_title=Remodelling</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F12%2F02%2Fremodelling%2F&#038;seed_title=Remodelling#comments</comments>
		<pubDate>Tue, 02 Dec 2008 19:24:37 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Engineering]]></category>
		<category><![CDATA[software development]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/?p=238</guid>
		<description><![CDATA[The codebase of a subsystem or maybe the whole system has turned into a big ball of mud. It&#8217;s claimed too brittle, too complex and too costly to continue developing. It&#8217;s at this point that a grand rewrite is proposed accompanied by statements of how things will be different: We&#8217;ll eliminate static wiring using Spring [...]]]></description>
			<content:encoded><![CDATA[<p>The codebase of a subsystem or maybe the whole system has turned into a <a href="http://www.laputan.org/mud/mud.html">big ball of mud</a>.  It&#8217;s claimed too brittle, too complex and too costly to continue developing.  It&#8217;s at this point that a grand rewrite is proposed accompanied by statements of how things will be different:</p>
<ul>
<li>We&#8217;ll eliminate static wiring using Spring</li>
<li>We&#8217;ll model everything as a service</li>
<li>We&#8217;ll adopt test-driven development and make use of jMock</li>
<li>We&#8217;ll build everything using a RESTful approach</li>
<li>We&#8217;ll avoid using RPC in favour of messaging</li>
<li>&#8230;&#8230;.</li>
</ul>
<p>Things will be so much better in this brave new world but&#8230;&#8230;they won&#8217;t.  The reason the codebase has got into a mess is because we failed to execute on important principles such as:</p>
<ol>
<li>Take account of coupling and cohesion.</li>
<li>Be clear about people&#8217;s roles and responsibilities to avoid unqualified or inappropriate decision making.</li>
<li>Clarity and simplicity of roles and responsibilities in design elements.</li>
<li>Maintain modular, well-isolated code and <a href="http://en.wikipedia.org/wiki/The_Mythical_Man-Month#Conceptual_Integrity">conceptual integrity</a>.</li>
<li>Avoid shared data-schemas or integration via the database.</li>
<li>Make the software testable and maintain the tests.</li>
<li>Select technology based on appropriate design work.</li>
<li><a href="http://www.artima.com/intv/fixitP.html">No broken windows</a>.</li>
<li>Track and maintain appropriate metrics.</li>
<li>Review projects to identify and disseminate useful lessons to developers, architects and customers.</li>
<li>Account for the operational aspects of our software in requirements and design.</li>
<li>Review to ensure code aligns with appropriate design principles.</li>
<li>Surface, balance and mitigate risks.</li>
</ol>
<p>It&#8217;s these principles and others that enable superior engineering which in turn delivers a good-quality, maintainable codebase. Any rewrite will end up a ball of mud just like it&#8217;s predecessor unless the style of engineering is adapted to incorporate principles such as these.</p>
<p>Some propose that frameworks can prevent mistakes, ensure a quality design and deliver testable code. I think experience suggests otherwise as we routinely (by accident or design) bend frameworks to fit some problem they weren&#8217;t really designed for leading to ugly, broken, poorly designed, brittle code. What would stop us doing it with new frameworks delivered as part of a grand re-write?</p>
<p>Should we successfully revise our engineering practices would we then have sufficient leverage to restructure our ball of mud into something nicer to work with?  Maybe, maybe not but we might be better equipped to answer the question: re-write or re-factor?</p>
]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F12%2F02%2Fremodelling%2F&#038;seed_title=Remodelling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sooner Than Later</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F10%2F01%2Fsooner-than-later%2F&#038;seed_title=Sooner+Than+Later</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F10%2F01%2Fsooner-than-later%2F&#038;seed_title=Sooner+Than+Later#comments</comments>
		<pubDate>Wed, 01 Oct 2008 16:29:00 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[operations]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/?p=235</guid>
		<description><![CDATA[When building systems, there are some operational elements that it pays to get to grips with sooner than later: Deployment Packaging Configuration Monitoring Logging Failing to address these elements is detrimental to core aspects of what we need to do from day one: Get changes out &#8211; ship a new feature, deploy an urgent bug-fix [...]]]></description>
			<content:encoded><![CDATA[<p>When building systems, there are some operational elements that it pays to get to grips with sooner than later:</p>
<ul>
<li>Deployment</li>
<li>Packaging</li>
<li>Configuration</li>
<li>Monitoring</li>
<li>Logging</li>
</ul>
<p>Failing to address these elements is detrimental to core aspects of what we need to do from day one:</p>
<ul>
<li>Get changes out &#8211; ship a new feature, deploy an urgent bug-fix or make a tweak to handle a load-spike.</li>
<li>Determine if things have started up and configured properly.</li>
<li>Be sure things are still running right.</li>
<li>Identify and react to problems quickly.</li>
<li>Obtain data important to future architectural decisions.</li>
</ul>
<p>Even in light of the above many of us are still tempted into leaving this until later by which time:</p>
<ol>
<li>Our software will have grown substantially making it difficult and expensive to adapt when we do decide to address the operational issues.</li>
<li>We&#8217;ll be losing inordinate amounts of time on manual trouble-shooting and dealing with the consequences of human error (a <a href="http://research.microsoft.com/~gray/papers/TandemTR85.7_WhyDoComputersStop.doc">key contributor to downtime</a> and other problems).</li>
<li>Operations will likely have become tightly bound to whatever our software currently looks like such that when we start addressing the issues, we&#8217;ll break all their assumptions (and the tooling they built around them).</li>
</ol>
<h2>Some Specifics</h2>
<p>Having configuration buried inside your binaries where it cannot be easily managed is an inconvenience.  We don&#8217;t really want to have to do a whole new build just to change configuration settings (though one might want to do a re-deploy of the whole lot together to allow for audit-trails and have half a chance of having all boxes configured similarly at the same time).</p>
<p>When it comes to deployment and packaging it pays to adopt something akin to the <a href="http://en.wikipedia.org/wiki/XCOPY_deployment">xcopy install approach</a>. Everything required is contained inside of the distribution with minimal external dependencies (necessary external dependencies should ideally be satisfied dynamically at runtime rather than with static configuration).  Such an approach for desktop software would be unattractive but with servers and an imperative to automate installation it&#8217;s very attractive.</p>
<p>What about all those existing packaging systems such as rpm? Many of these mechanisms have a design assumption around a single version of something on a machine. This can inhibit fast rollback because rather than stopping one process and starting another one has to (in simple terms):</p>
<ol>
<li>Stop a process.</li>
<li>Uninstall it&#8217;s binaries and dependencies.</li>
<li>Install the binaries for the old process and dependencies.</li>
<li>Start the other process up.</li>
</ol>
<p>In some cases it will also be necessary to perform further configuration (did we back it up?), suddenly it&#8217;s looking like a lot of work to buy ourselves appropriate risk-mitigation for broken upgrades.</p>
<p>Monitoring often requires an amount of configuration which can make for a bootstrap problem where one needs monitoring to detect a configuration issue but the monitoring isn&#8217;t configured yet.  Thus it can be useful to have some very simple monitoring based on a primitive that can run without explicit configuration such as multicast.</p>
<h2>Important Step</h2>
<p>These key operational elements should be accounted for early on in the design of system and grown alongside other functional aspects.<sup>*</sup> There&#8217;s plenty of information on this topic publicly available including:</p>
<ul>
<li>Randy Shoup &#8211; <a href="http://www.infoq.com/presentations/shoup-ebay-architectural-principles">eBay Marketplace Architecture</a>.</li>
<li>Dan Pritchett &#8211; <a href="http://www.infoq.com/presentations/operational-manageability">Architecture Quality: Operational Manageability</a>.</li>
<li>Wayne Fenton &#8211; <a href="http://www.infoq.com/presentations/Operational-Scalability-Wayne-Fenton">Operational Stability in The Next Generation Web World</a> (a variation of the talk above).</li>
<li>Michael Isard &#8211; <a href="http://research.microsoft.com/users/misard/abstracts/osr2007.html">Autopilot: Automatic Data Centre Management</a>.</li>
<li>James Hamilton &#8211; <a href="http://mvdirona.com/jrh/talksAndPapers/JamesRH_Lisa.pdf">Designing and Deploying Internet Scale Services</a>.</li>
</ul>
<p>* Initially implementation can be simple scripts but at some point it becomes necessary to take a more serious approach in respect of tools and infrastructure development.  This means investing in properly skilled architects and engineers, performing appropriate testing etc.</p>
]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F10%2F01%2Fsooner-than-later%2F&#038;seed_title=Sooner+Than+Later/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Time Marches On</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F09%2F24%2Ftime-marches-on%2F&#038;seed_title=Time+Marches+On</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F09%2F24%2Ftime-marches-on%2F&#038;seed_title=Time+Marches+On#comments</comments>
		<pubDate>Wed, 24 Sep 2008 14:49:41 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Distributed Systems]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[development]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/?p=233</guid>
		<description><![CDATA[Those specifying requirements often express them without consideration for the passing of time, assuming that actions are instantaneous. A naive development team with limited experience in distributed systems will then make the classic mistake of attempting to implement those requirements to the letter. This can lead to a bunch of undesirable outcomes including: Brittleness in [...]]]></description>
			<content:encoded><![CDATA[<p>Those specifying requirements often express them without consideration for the passing of time, assuming that actions are instantaneous. A naive development team with limited experience in distributed systems will then make the classic mistake of attempting to implement those requirements to the letter.  This can lead to a bunch of undesirable outcomes including:</p>
<ul>
<li>Brittleness in the face of failure.</li>
<li>High cost solutions.</li>
<li>Poor scaling properties.</li>
<li>Disappointment as the expectations of the requirements source aren&#8217;t met.</li>
</ul>
<p>Consider a system where we have two (network) hops to an observer and one hop to the initiator of an action (assuming uniform network latency for each hop).  Potentially for every two actions there will be a single observation.  Thus each observation of the system is out of date by the time it reaches the observer.</p>
<p>Administrative actions can suffer similar problems, in that it could take several hops for the request to arrive at the system.  A user may be only one hop away and could be performing many operations in the time it takes for one of our actions to reach the system.  For example if we wish to block a user, whilst our request is in transit they might perform several operations.</p>
<p>Things are made worse by network failures which can further delay or prevent execution of an action and slow down the rate of updates to an observer.</p>
<p>How then do we account for these troubles when specifying requirements? By qualifying them with appropriate SLA&#8217;s.  In the example above, appropriate SLA&#8217;s might include:</p>
<ul>
<li>Time for propagation of an administrative action.</li>
<li>Maximum acceptable time after the action is triggered for a user to be blocked.</li>
</ul>
<p>SLA&#8217;s such as the above:</p>
<ol>
<li>Help us to identify appropriate solutions (e.g. do we need to pay for multiple independent routes between data-centres).</li>
<li>Allow us to make appropriate use of asynchronous operations and eventual consistency.</li>
</ol>
<p>Since SLA&#8217;s have significant impact on the way in which a requirement will be implemented it is essential to perform appropriate expectation management, discussing and communicating the implications with the requirements source, they cannot be solely the domain of techies.  Remember also that in many situations customers <a href="http://blogs.msdn.com/pathelland/archive/2008/09/01/confidence-in-the-cloud.aspx">prefer availability over consistency</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F09%2F24%2Ftime-marches-on%2F&#038;seed_title=Time+Marches+On/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Vital Statistics</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F08%2F14%2Fvital-statistics%2F&#038;seed_title=Vital+Statistics</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F08%2F14%2Fvital-statistics%2F&#038;seed_title=Vital+Statistics#comments</comments>
		<pubDate>Thu, 14 Aug 2008 18:35:09 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Systems]]></category>
		<category><![CDATA[infrastructure]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/?p=231</guid>
		<description><![CDATA[How big does a website have to get before custom infrastructure becomes necessary? When a website reaches this stage, what infrastructure gets built? Before trying to answer these questions we must have some means of measuring the size of a website. I&#8217;ve settled on the number of machines as a reasonable approximation because: As a [...]]]></description>
			<content:encoded><![CDATA[<p>How big does a website have to get before custom infrastructure becomes necessary?  When a website reaches this stage, what infrastructure gets built?  Before trying to answer these questions we must have some means of measuring the size of a website.  I&#8217;ve settled on the number of machines as a reasonable approximation because:</p>
<ul>
<li>As a codebase grows it must be split up along functional boundaries, and spread across multiple processes.  More code equals more processes and more machines to run them on.</li>
<li>More customers, means more load and requires more machines to handle it.</li>
<li>More data means more storage and more processors to chew through it.</li>
</ul>
<p>Now let&#8217;s see how many machines some of the big players are running and what infrastructure they&#8217;re talking about:</p>
<p><em>TicketMaster</em> have at least <a href="http://code.google.com/p/spine-mgmt/">3000 machines and have built Spine</a> to help them manage configuration of their infrastructure.</p>
<p><em>eBay</em> have built a custom deployment tool (<a href="http://www.addsimplicity.com/downloads/eBaySDForum2006-11-29.pdf">Roller</a>), logging infrastructure, configuration management for their software services, messaging software <a href="http://www.infoq.com/presentations/Operational-Scalability-Wayne-Fenton">and more</a>.  They&#8217;re running around <a href="http://www.ecommercetimes.com/rsstory/63922.html">15000 machines across four geographical locations</a>.</p>
<p><em>Microsoft</em> have built a custom deployment, configuration and monitoring infrastructure called <a href="http://research.microsoft.com/users/misard/abstracts/osr2007.html">Autopilot</a> focused on many thousands of machines.  <a href="http://perspectives.mvdirona.com/2008/04/02/FirstContainerizedDataCenterAnnouncement.aspx">In fact we&#8217;re talking hundreds of thousands</a>.</p>
<p><em>Google</em> are <a href="http://news.cnet.com/8301-10784_3-9955184-7.html">dealing</a> <a href="http://perspectives.mvdirona.com/2008/06/11/JeffDeanOnGoogleInfrastructure.aspx">in</a> a million or more machines and expending effort on software <a href="http://www.pmg.csail.mit.edu/iris/ajmani03scheduling-abstract.html">to</a> <a href="http://www.pmg.csail.mit.edu/pubs/ajmani06modular-abstract.html">handle</a> staged, automatic upgrades.  Of course they&#8217;ve already built <a href="http://labs.google.com/papers/gfs.html">GFS</a>, <a href="http://labs.google.com/papers/chubby.html">Chubby</a> etc.</p>
<p><em>Twitter</em> have moved beyond the half-dozen or so machines they used to have to <a href="http://www.akitaonrails.com/2008/6/17/chatting-with-blaine-cook-twitter">&#8220;a lot of servers&#8221;</a> (hundreds?) and are seemingly <a href="http://twitter.com/help/jobs">still hiring</a> operations staff but have built a <a href="http://rubyforge.org/projects/starling/">custom queue server</a>.</p>
<p><em>Facebook</em> have at least <a href="http://www.paragon-cs.com/wordpress/2008/04/16/scaling-mysql-up-or-out-panel-uc/">10000 webservers, 800 MemcacheD instances and 1800 MySQL instances</a>.  They&#8217;ve built a <a href="http://lists.danga.com/pipermail/memcached/2007-May/004098.html">custom configuration-serving infrastructure, management and monitoring tools</a>.  They <a href="http://developers.facebook.com/opensource.php">also</a> contribute to MemcacheD and have built Cassandra and Thrift.  They also appear to be <a href="http://blog.facebook.com/blog.php?post=2406207130">busy building</a> their own optimized webservers and a replacement for squid.</p>
<p><em>Amazon</em> have <a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html">tens of thousands of servers</a> (surely more?) and have constructed Dynamo, S3, EC2, SQS etc.</p>
<p>A few tentative conclusions:</p>
<ol>
<li>It would seem that by the time a website has moved into the thousands of boxes it will have had to address configuration and monitoring.  Which suggests development efforts started before this threshold (perhaps at a couple of hundred boxes?)</li>
<li>As the machine count moves towards the tens of thousands, automated deployment becomes essential and there&#8217;s a need to develop more service-specific infrastructure.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F08%2F14%2Fvital-statistics%2F&#038;seed_title=Vital+Statistics/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Taste Is Everything</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F06%2F12%2Ftaste-is-everything%2F&#038;seed_title=Taste+Is+Everything</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F06%2F12%2Ftaste-is-everything%2F&#038;seed_title=Taste+Is+Everything#comments</comments>
		<pubDate>Thu, 12 Jun 2008 21:05:24 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Distributed Systems]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/2008/06/12/taste-is-everything/</guid>
		<description><![CDATA[It seems it&#8217;s generally accepted[1] that SOA means breaking up your system into a set of co-operating components partitioned by business process. If you&#8217;re not doing that, you&#8217;re not doing SOA. It never ceases to amaze me how we get so zealous about fixed methods for architecting a system. I suspect it&#8217;s because we&#8217;d like [...]]]></description>
			<content:encoded><![CDATA[<p>It seems it&#8217;s generally accepted<sup>[1]</sup> that SOA means breaking up your system into a set of co-operating components partitioned by business process. If you&#8217;re not doing that, you&#8217;re not doing SOA. It never ceases to amaze me how we get so zealous about fixed methods for architecting a system. I suspect it&#8217;s because we&#8217;d like to believe that architecture (and much of the act of development) can be done with fixed rules, cookie cutter style, get your catalog of patterns and technology, apply them &#8211; job done. The ultimate embodiment of this behaviour is deployment of a piece of technology in the belief that once the integration is complete the system has radically shifted in terms of it&#8217;s architecture (e.g. deploying an ESB suddenly makes your system SOA).</p>
<p>So if the fixed methods of SOA are thrown out and technology is not the solution, how do we build a system? Let&#8217;s first consider some of the things we&#8217;d like from our architecture:</p>
<ol>
<li>Avoid <a href="http://www.dancres.org/blitzblog/2007/07/11/the-siren-call-of-the-database/">integration via the database</a> &#8211; otherwise data coupling will cripple us</li>
<li>Support for granular updates &#8211; taking down the whole system is not desirable</li>
<li>Fast rollback of changes &#8211; in case an update breaks</li>
<li>In-production testing &#8211; there&#8217;s no substitute for real traffic in tests</li>
<li>Minimal shared resources such as storage &#8211; so should there be an outage, impact is minimised</li>
<li>Horizontal scaling &#8211; more boxes equals more power</li>
<li>Support for scalable development &#8211; dev teams should be able to act in isolation most of the time</li>
<li>Support for appropriate <a href="http://citeseer.ist.psu.edu/544596.html" title="CAP Theorem">CAP</a> tradeoffs &#8211; making everything consistent can be bad for availability</li>
</ol>
<p>Although we wish to avoid coupling via the database, the reality is that our code still requires access to the data in some form or another. The best we can do under this circumstance is to limit the amount of code that directly accesses the data. We achieve this by vertically slicing (as opposed to horizontal sharding) our data and consolidating the code that is most closely related to it (e.g. performs updates) into a single encapsulated unit. All other access to the data must go via the code element of its associated unit (note that one needn&#8217;t always go to a unit for the data, it&#8217;s perfectly acceptable to cache).</p>
<p>In this way we limit the impact of data-schema changes to it&#8217;s associated unit, other parts of the system need not be concerned but there&#8217;s still some work to do. If the code within a unit were to be co-located within all processes containing code that wishes to make use of it, we&#8217;d need to restart all those processes when we wish to deploy a new version of that code (for whatever reason). Such a deployment model also encourages several bad habits:</p>
<ol>
<li>Ignoring the remoteness of the data &#8211; it&#8217;s hidden behind some form of interface and it&#8217;s tempting to attempt to hide failure behind that interface</li>
<li>Focus on synchronous method calls &#8211; it&#8217;s natural for a developer to write synchronous method calls when the code being called looks local (note that method calls can support asynchronous behaviours)</li>
</ol>
<p>To avoid these issues, we deploy each unit in it&#8217;s own process accessed via some network endpoint that dependants use to interact with it thus:</p>
<ul>
<li>Each unit can now easily be allocated it&#8217;s own independent storage, apply it&#8217;s own sharding policy etc.</li>
<li>The network endpoint can support multiple protocol versions or we can opt to terminate multiple network endpoints onto a unit, a powerful primitive for supporting several versions of a remote interface simultaneously.</li>
<li>The network endpoint can be terminated onto some form of load balancer or custom routing implementation (which might be part of the code within the unit itself perhaps because it&#8217;s P2P based) facilitating horizontal scaling, hot upgrades, A/B testing, in-production tests etc.</li>
<li>Each unit can be assigned to a development team and much work can be done independently of development efforts elsewhere, making for less contention in development.</li>
<li>Each unit can implement whatever CAP tradeoff makes sense.</li>
</ul>
<p>If we arrange for the network endpoint of each unit to be <a href="http://www.dancres.org/blitzblog/2008/02/27/dns-games/">discovered dynamically at runtime</a> we gain the ability to move our units around (e.g. for DR reasons) and have means for our system to dynamically knit itself together reducing configuration issues. Such an arrangement can also make it easier to deal with ordered startup issues (where some set of things must be available before others).</p>
<p>Of course it&#8217;s not all good news, we will have to manage our desire for <a href="http://en.wikipedia.org/wiki/ACID">ACID</a> guarantees because many of the mechanisms (such as <a href="http://en.wikipedia.org/wiki/Two-phase-commit_protocol">two-phase commit</a>) for achieving this in a distributed system are <a href="http://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf">fraught with problems</a>. Fortunately, people have been <a href="http://www.addsimplicity.com/adding_simplicity_an_engi/2006/12/avoiding_two_ph.html">thinking</a> <a href="http://www.allthingsdistributed.com/2007/12/eventually_consistent.html">about</a> <a href="http://blogs.msdn.com/pathelland/archive/2007/05/16/link-to-life-beyond-distributed-transactions-an-apostate-s-opinion.aspx">this</a> for a while. We&#8217;ll also have to take care of <a href="http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing">the fallacies</a> but even this has some positive aspects as failure and upgrade in some cases <a href="http://armstrongonsoftware.blogspot.com/2007/07/scalable-fault-tolerant-upgradable.html">can be considered the same</a> (noting that abstractions for message passing, failure detectors and the like can be implemented in many languages, not just Erlang).</p>
<p>So what remoting approaches might we use? REST/http, WS-*, RMI, CORBA, messages, custom protocol &#8211; whatever is suitable for our situation (noting that some choices impact the means by which we can handle evolution of protocols etc). What guidelines might we follow in determining how to split our code and data? There are a number of different approaches including:</p>
<ol>
<li>Considering similarities in consistency, availability and partitioning (<a href="http://citeseer.ist.psu.edu/544596.html" title="CAP Theorem">CAP</a>) requirements</li>
<li>Data access localities</li>
<li>Data relationships</li>
<li>Jurisdictional requirements</li>
<li>Roles and responsibilities (at coarser level than OO)</li>
<li>Features (e.g. recommendations)</li>
<li>Business processes</li>
<li>Constituent elements of an overall business process</li>
</ol>
<p>Most systems likely require a combination of these rather than one fixed approach, <a href="http://www.artima.com/intv/tasteP.html">taste and gut instinct</a> count for a lot. And what might we call these units I speak of? I prefer to call them services as do a few <a href="http://blogs.msdn.com/pathelland/archive/2007/05/20/soa-and-newton-s-universe.aspx">other</a> <a href="http://www.acmqueue.org/modules.php?name=Content&amp;pa=showpage&amp;pid=403">people</a> but there&#8217;s no doubt that&#8217;ll be confusing, have to think of something else&#8230;&#8230;.</p>
<p>[1] I know that Steve <a href="http://service-architecture.blogspot.com/2008/04/how-you-know-its-soa.html">might well argue otherwise</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&#038;feed=Articles+%28RSS2%29&#038;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F06%2F12%2Ftaste-is-everything%2F&#038;seed_title=Taste+Is+Everything/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

