<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Pragmatic Dictator &#187; Systems</title>
	<atom:link href="http://www.dancres.org/blitzblog/category/systems/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dancres.org/blitzblog</link>
	<description></description>
	<lastBuildDate>Fri, 04 Jun 2010 11:58:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Performing</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2010%2F06%2F04%2Fperforming%2F&amp;seed_title=Performing</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2010%2F06%2F04%2Fperforming%2F&amp;seed_title=Performing#comments</comments>
		<pubDate>Fri, 04 Jun 2010 00:14:15 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Engineering]]></category>
		<category><![CDATA[Systems]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/?p=337</guid>
		<description><![CDATA[My current company has for obvious business reasons got a serious interest in delivering a quality website experience during the World Cup and thus I&#8217;ve been spending a lot of time focused on our own performance and capacity management of late. P&#38;C is one of those 80/20 tradeoffs. There&#8217;s always more one can do or [...]]]></description>
			<content:encoded><![CDATA[<p>My <a href="http://www.sportingindex.com/">current company</a> has for obvious business reasons got a serious interest in delivering a quality website experience during the World Cup and thus I&#8217;ve been spending a lot of time focused on our own performance and capacity management of late.</p>
<p>P&amp;C is one of those 80/20 tradeoffs. There&#8217;s always more one can do or measure or test, equally getting the basics in place will deliver substantial benefit. I&#8217;d go further and argue that without a solid grasp of the basics, one cannot easily determine what else beyond that might be required. Here then are the basics that I&#8217;ve found myself repeating over and over:</p>
<ul>
<li>Have an enquiring mind &#8211; anomalies are not to be ignored or dismissed on the basis of pure speculation. Determining root cause is essential to prevent surprises in production. Some recent examples:
<ol>
<li>In one test we noticed that every so often we&#8217;d get a substantial blip in disk I/O on servers that should be processing entirely out of memory. Along with that blip there&#8217;d be a corresponding reduction in throughput, we could have ignored it, after all things sorted themselves out relatively quickly but we chose to investigate. All these servers were periodically running a cleanup job the developers were unaware of and had not factored into their capacity calculations. The implications for production would have been a regularly overloaded, badly performing website. We&#8217;ve since tuned the jobs, adjusted their schedules and  increased our capacity to ensure we can always spread the load around enough to accommodate them.</li>
<li>An examination of the distribution of load on the boxes behind our load-balancers revealed a higher than expected amount of variance in CPU and connections. A review of the application revealed that any particular user&#8217;s traffic is sticky to one box, unfortunate as it&#8217;s stateless, time for a code change. We also spent time looking at the monitoring infrastructure and discovered that in certain cases we&#8217;d get false reports of 100% CPU utilisation, that one will be fixed with an OS patch.</li>
</ol>
</li>
<p></p>
<li>Gather the right data &#8211; there&#8217;s no value in allowing oneself to be limited by what is easily available via some set of tools people are comfortable with. One tool we were using had an unreasonably low ceiling on the number and rate of samples it could handle such that any graphs it produced showed hardly anything of the true profile of e.g. CPU utilisation, memory consumption or I/O. Forming any opinion about system behaviour in respect of load was going to be an exercise in speculation. We junked the tool and are looking for a replacement, in the meantime we&#8217;ve fallen back to making use of low level performance counters which we can sample local to the machine and whack onto disk for later analysis via scripts, opensource tools etc.</li>
<p></p>
<li>Design tests that support reasoning &#8211; One should indeed try and replicate production load behaviours to judge overall system behaviour. The challenge of such testing is that it can be difficult to relate performance data back to exactly what was going on during some period of a test and make a diagnosis or be confident of an improvement. There are a number of things we can do to improve the situation:
<ol>
<li>Ensure tests are deterministic such that any given run can be compared against other runs. This isn&#8217;t as simple as it looks when e.g. you wish to gradually increase load at a fixed rate that is being produced by more than one box.</li>
<li>Have tests produce sufficient logging that one can easily identify what was going on at particular points in the sampled data. Logging of course can actually affect test behaviour and that isn&#8217;t always desirable.</li>
<li>Build additional tests that target particular user journey&#8217;s through the system. Doing this for all possible journey&#8217;s can be costly so it makes sense to focus on testing those which are most popular with users. These kinds of tests restrict the reasoning tree making analysis, diagnosis and solution identification much easier.</li>
</ol>
</li>
<p></p>
<li>Measure what customers care about &#8211; they don&#8217;t care about CPUs, I/O or memory, they worry about things like response times. It is important to focus on maintaining a quality user experience not endlessly improving system efficiency. Considering user factors such as response times stops us expending huge effort on CPU utilisation when we should be focusing on say, network I/O, browser performance or reducing the amount of data we push to the browser before a page can render.</li>
<p></p>
<li>Beware of averages &#8211; it is very tempting to combine datasets via the use of averaging unfortunately such a practice can easily hide spikes that might be indicative of a problem. On more than one occasion an engineer has presented a graph that tracks the average CPU and a table that summarises min, avg and max. After which they&#8217;ve pronounced load testing was a success and yet they have no explanation for why the average is never more than 50% but the max is 100% and whether or not this is good or bad.</li>
<p><br/></p>
<li>More than load &#8211; excessive focus on measuring the effect of a particular load can make us blind to another important metric, resource cost per unit of work &#8211; these are the collection of tests and analysis that help us understand what to tune and how much to keep our appetite for boxes and bandwidth reasonable. One simple thing teams can do per sprint (assuming you&#8217;re agile, why wouldn&#8217;t you be?) is point a profiler at each component and look for the low hanging fruit that is poor algorithm selection or inefficient code (e.g. repeated scanning of lists where a hashmap would be better or repeatedly computing something that could be cached).</li>
</ul>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2010%2F06%2F04%2Fperforming%2F&amp;seed_title=Performing/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Concurrency</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2009%2F10%2F10%2Fconcurrency%2F&amp;seed_title=Concurrency</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2009%2F10%2F10%2Fconcurrency%2F&amp;seed_title=Concurrency#comments</comments>
		<pubDate>Sat, 10 Oct 2009 10:03:11 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Systems]]></category>
		<category><![CDATA[concurrency]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[fallacies]]></category>
		<category><![CDATA[network]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/?p=321</guid>
		<description><![CDATA[Building a concurrent system ultimately boils down to: Partitioning the data into chunks that can be separately acted upon Applying computations against those chunks to produce results The smaller or more fine-grained the chunks, the more concurrent activity will be possible. In theory the closer one can get to one chunk per core the better [...]]]></description>
			<content:encoded><![CDATA[<p>Building a concurrent system ultimately boils down to:</p>
<ol>
<li>Partitioning the data into chunks that can be separately acted upon</li>
<li>Applying computations against those chunks to produce results</li>
</ol>
<p>The smaller or more fine-grained the chunks, the more concurrent activity will be possible. In theory the closer one can get to one chunk per core the better but in reality it&#8217;s rare (a function of throughput and size of calculation) one needs to do computation across all chunks simultaneously such that a core can be assigned many chunks any one of which it will dispatch operations against at a moment in time.</p>
<p>There are many solutions for building concurrent systems but those that provide some abstraction which makes request routing easy to implement are likely to work best as it makes re-balancing of computation easier. One shouldn&#8217;t immediately assume that message passing is the answer as there are many ways to achieve routing (e.g. via DNS).</p>
<p>Any solution represents a transparency tradeoff. If for example routing is hidden inside of the solution, this can make it easy to get something up and running but we might find it difficult to transition from one box to a multi-box deployment. There are many tradeoffs to be made and for any case where control is given to the developer/architect it&#8217;s likely there will be libraries/frameworks to ease the initial implementation burden, programming languages alone will not be enough (<a href="http://www.scala-lang.org/">Scala</a> makes such a differentiation quite difficult given it&#8217;s language extension capabilities).</p>
<p>One aspect discussed less often is the difference between processing on a set of cores all in one box versus processing across a set of cores on many boxes. The latter brings the following challenges all related to the <a href="http://blogs.sun.com/jag/resource/Fallacies.html">fallacies</a> of distributed computing:</p>
<ol>
<li>Cores are more likely to become inaccessible</li>
<li>The latency of an operation can become substantially more variable</li>
<li>Any centralised functions (e.g. job scheduler or watchdogs) are more vulnerable to becoming isolated from the resources they manage such that processing ceases.</li>
</ol>
<p>The latency factor is particularly challenging as few concurrent approaches make it sufficiently explicit that developers/architects are encouraged to be appropriately mindful.</p>
<p>Thus far, as has been the case throughout our history, the solutions are polarising into those that work within the confines of a single box and those that work across multiple boxes with the emphasis on the former. I fully expect developers and architects to fall into the old trap of using a single-box solution to solve a multi-box problem with all the associated issues. Of the solutions that work across multiple boxes, very few account fully for the impact of the network.</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2009%2F10%2F10%2Fconcurrency%2F&amp;seed_title=Concurrency/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Vital Statistics</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F08%2F14%2Fvital-statistics%2F&amp;seed_title=Vital+Statistics</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F08%2F14%2Fvital-statistics%2F&amp;seed_title=Vital+Statistics#comments</comments>
		<pubDate>Thu, 14 Aug 2008 18:35:09 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Systems]]></category>
		<category><![CDATA[infrastructure]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/?p=231</guid>
		<description><![CDATA[How big does a website have to get before custom infrastructure becomes necessary? When a website reaches this stage, what infrastructure gets built? Before trying to answer these questions we must have some means of measuring the size of a website. I&#8217;ve settled on the number of machines as a reasonable approximation because: As a [...]]]></description>
			<content:encoded><![CDATA[<p>How big does a website have to get before custom infrastructure becomes necessary?  When a website reaches this stage, what infrastructure gets built?  Before trying to answer these questions we must have some means of measuring the size of a website.  I&#8217;ve settled on the number of machines as a reasonable approximation because:</p>
<ul>
<li>As a codebase grows it must be split up along functional boundaries, and spread across multiple processes.  More code equals more processes and more machines to run them on.</li>
<li>More customers, means more load and requires more machines to handle it.</li>
<li>More data means more storage and more processors to chew through it.</li>
</ul>
<p>Now let&#8217;s see how many machines some of the big players are running and what infrastructure they&#8217;re talking about:</p>
<p><em>TicketMaster</em> have at least <a href="http://code.google.com/p/spine-mgmt/">3000 machines and have built Spine</a> to help them manage configuration of their infrastructure.</p>
<p><em>eBay</em> have built a custom deployment tool (<a href="http://www.addsimplicity.com/downloads/eBaySDForum2006-11-29.pdf">Roller</a>), logging infrastructure, configuration management for their software services, messaging software <a href="http://www.infoq.com/presentations/Operational-Scalability-Wayne-Fenton">and more</a>.  They&#8217;re running around <a href="http://www.ecommercetimes.com/rsstory/63922.html">15000 machines across four geographical locations</a>.</p>
<p><em>Microsoft</em> have built a custom deployment, configuration and monitoring infrastructure called <a href="http://research.microsoft.com/users/misard/abstracts/osr2007.html">Autopilot</a> focused on many thousands of machines.  <a href="http://perspectives.mvdirona.com/2008/04/02/FirstContainerizedDataCenterAnnouncement.aspx">In fact we&#8217;re talking hundreds of thousands</a>.</p>
<p><em>Google</em> are <a href="http://news.cnet.com/8301-10784_3-9955184-7.html">dealing</a> <a href="http://perspectives.mvdirona.com/2008/06/11/JeffDeanOnGoogleInfrastructure.aspx">in</a> a million or more machines and expending effort on software <a href="http://www.pmg.csail.mit.edu/iris/ajmani03scheduling-abstract.html">to</a> <a href="http://www.pmg.csail.mit.edu/pubs/ajmani06modular-abstract.html">handle</a> staged, automatic upgrades.  Of course they&#8217;ve already built <a href="http://labs.google.com/papers/gfs.html">GFS</a>, <a href="http://labs.google.com/papers/chubby.html">Chubby</a> etc.</p>
<p><em>Twitter</em> have moved beyond the half-dozen or so machines they used to have to <a href="http://www.akitaonrails.com/2008/6/17/chatting-with-blaine-cook-twitter">&#8220;a lot of servers&#8221;</a> (hundreds?) and are seemingly <a href="http://twitter.com/help/jobs">still hiring</a> operations staff but have built a <a href="http://rubyforge.org/projects/starling/">custom queue server</a>.</p>
<p><em>Facebook</em> have at least <a href="http://www.paragon-cs.com/wordpress/2008/04/16/scaling-mysql-up-or-out-panel-uc/">10000 webservers, 800 MemcacheD instances and 1800 MySQL instances</a>.  They&#8217;ve built a <a href="http://lists.danga.com/pipermail/memcached/2007-May/004098.html">custom configuration-serving infrastructure, management and monitoring tools</a>.  They <a href="http://developers.facebook.com/opensource.php">also</a> contribute to MemcacheD and have built Cassandra and Thrift.  They also appear to be <a href="http://blog.facebook.com/blog.php?post=2406207130">busy building</a> their own optimized webservers and a replacement for squid.</p>
<p><em>Amazon</em> have <a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html">tens of thousands of servers</a> (surely more?) and have constructed Dynamo, S3, EC2, SQS etc.</p>
<p>A few tentative conclusions:</p>
<ol>
<li>It would seem that by the time a website has moved into the thousands of boxes it will have had to address configuration and monitoring.  Which suggests development efforts started before this threshold (perhaps at a couple of hundred boxes?)</li>
<li>As the machine count moves towards the tens of thousands, automated deployment becomes essential and there&#8217;s a need to develop more service-specific infrastructure.</li>
</ol>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F08%2F14%2Fvital-statistics%2F&amp;seed_title=Vital+Statistics/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Hindsight</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F01%2F04%2Fhindsight%2F&amp;seed_title=Hindsight</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F01%2F04%2Fhindsight%2F&amp;seed_title=Hindsight#comments</comments>
		<pubDate>Fri, 04 Jan 2008 11:30:53 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Systems]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/2008/01/04/hindsight/</guid>
		<description><![CDATA[In general, writing software is made hard by our inability to predict the future. We&#8217;re always caught between the two stools of building what we need now and what we might need in the future. Writing infrastructure is even harder because it&#8217;s an expression of common patterns and challenges of implementation in a specific context [...]]]></description>
			<content:encoded><![CDATA[<p>In general, writing software is made hard by our inability to predict the future.  We&#8217;re always caught between the two stools of building what we need now and what we might need in the future.  Writing infrastructure is even harder because it&#8217;s an expression of common patterns and challenges of implementation in a specific context such as one company&#8217;s systems or services.  The broader the context (e.g. all enterprises), the harder it gets to account for all possible permutations of usage.</p>
<p>So to a core trouble with building infrastructure: common patterns and challenges of implementation can only be established by reviewing history.  Basically until some systems have been built, we can&#8217;t tell with certainty what the infrastructure should look like.</p>
<p>It&#8217;s all too easy to believe that some problem we&#8217;re currently faced with is generic and therefore best tackled by:</p>
<li>Custom Infrastructure or &#8230;</li>
<li>A third-party technology stack or &#8230;</li>
<li>Defining some standard format, code convention etc</li>
<p>How do we know something is generic?  Experience.  To be confident something is generic we must have seen it across our systems universe (that is the collection of systems in our domain of concern).  The danger is that multiple teams adopt their own solution to the same problem but this isn&#8217;t necessarily a bad thing.  Each team is doing valuable investigation work that will help identify the most promising options for a solution.  The trick of course is to have sufficient cross-team discussion about architecture so as to avoid excessive proliferation of independent solutions.</p>
<p>I have a little mantra that I use to remind myself and others of this thorny issue: <em>Experience leads infrastructure</em>.</p>
<p><!-- technorati tags start -->
<p style="text-align:right;font-size:10px;">Technorati Tags: <a href="http://www.technorati.com/tag/architecture" rel="tag">architecture</a>, <a href="http://www.technorati.com/tag/philosophy" rel="tag">philosophy</a>, <a href="http://www.technorati.com/tag/systems" rel="tag">systems</a></p>
<p><!-- technorati tags end --></p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2008%2F01%2F04%2Fhindsight%2F&amp;seed_title=Hindsight/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Indebted&#8230;</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F11%2F03%2Findebted%2F&amp;seed_title=Indebted%26%238230%3B</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F11%2F03%2Findebted%2F&amp;seed_title=Indebted%26%238230%3B#comments</comments>
		<pubDate>Sat, 03 Nov 2007 11:03:52 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Systems]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/2007/11/03/indebted/</guid>
		<description><![CDATA[&#8230;to Steve McConnell for distilling out the key issues around technical debt: &#8220;The reason most often cited by technical staff for avoiding debt altogether is the challenge of communicating the existence of technical debt to business staff and the challenge of helping business staff remember the implications of the technical debt that has previously been [...]]]></description>
			<content:encoded><![CDATA[<p>&#8230;to Steve McConnell for <a href="http://blogs.construx.com/blogs/stevemcc/archive/2007/11/01/technical-debt-2.aspx">distilling out the key issues</a> around technical debt:</p>
<p>&#8220;The reason most often cited by technical staff for avoiding debt altogether is the challenge of communicating the existence of technical debt to business staff and the challenge of helping business staff remember the implications of the technical debt that has previously been incurred. Everyone agrees that it&#8217;s a good idea to incur debt late in a release cycle, but business staff can sometimes resist accounting for the time needed to pay off the debt on the next release cycle. The main issue seems to be that, unlike financial debt, technical debt is much less visible, and so people have an easier time ignoring it.&#8221;</p>
<p>I would quibble with the &#8220;easier to ignore&#8221; aspect though as I think for the most part both kinds of debt attract the same behaviour &#8211; sticking our heads in the sand allowing things to get worse&#8230;..</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F11%2F03%2Findebted%2F&amp;seed_title=Indebted%26%238230%3B/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ugly Reality</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F11%2F01%2Fugly-reality%2F&amp;seed_title=Ugly+Reality</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F11%2F01%2Fugly-reality%2F&amp;seed_title=Ugly+Reality#comments</comments>
		<pubDate>Thu, 01 Nov 2007 21:06:34 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Systems]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/2007/11/01/ugly-reality/</guid>
		<description><![CDATA[So we&#8217;re building an Internet service not a Web service, decide on an address, allocate a port, write our daemon (yes, I like Unix) job done. One might think so but there&#8217;s a killer deployment issue lurking in the background &#8211; the Firewall. The average corporate security policy really doesn&#8217;t like opening ports on external [...]]]></description>
			<content:encoded><![CDATA[<p>So we&#8217;re building an <a href="http://en.wikipedia.org/wiki/Internet">Internet</a> service not a <a href="http://en.wikipedia.org/wiki/World_Wide_Web">Web</a> service, decide on an address, allocate a port, write our daemon (yes, I like Unix) job done.  One might think so but there&#8217;s a killer deployment issue lurking in the background &#8211; the Firewall.</p>
<p>The average corporate security policy really doesn&#8217;t like opening ports on external firewalls (and often the same goes for internal ones).  Best case we&#8217;ll have to wade through masses of red tape, worst case we&#8217;ll be given a flat no to our request for an open port.  What to do?</p>
<p>Find an open port, find a way to tunnel through it.  Which is the port most likely to be open?  80 and we all know which protocol runs over that.  Sure as night follows day, we end up building a solution that tunnels over http.</p>
<p>Do we want to be &#8220;of the web&#8221;?  No.</p>
<p>Would we by desire tunnel over http?  No. It&#8217;s not designed for that purpose and it&#8217;ll likely let us know come implementation time.</p>
<p>Should we re-design our service to fit with <a href="http://en.wikipedia.org/wiki/Resource_oriented_architecture">ROA</a>?  No.  Hopefully we did the research and looked at this option before choosing to implement an Internet service as opposed to a Web service.</p>
<p>So Internet and Web services are different but can end up looking similar enough to lead to confusion.  Ain&#8217;t reality ugly? Tradeoffs must be made, the results are often less than pretty and there might well be a lot of complaining.</p>
<p><!-- technorati tags start -->
<p style="text-align:right;font-size:10px;">Technorati Tags: <a href="http://www.technorati.com/tag/architecture" rel="tag">architecture</a>, <a href="http://www.technorati.com/tag/networks" rel="tag">networks</a>, <a href="http://www.technorati.com/tag/systems" rel="tag">systems</a></p>
<p><!-- technorati tags end --></p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F11%2F01%2Fugly-reality%2F&amp;seed_title=Ugly+Reality/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Flip Side</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F10%2F29%2Fflip-side%2F&amp;seed_title=Flip+Side</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F10%2F29%2Fflip-side%2F&amp;seed_title=Flip+Side#comments</comments>
		<pubDate>Mon, 29 Oct 2007 21:42:23 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Systems]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/2007/10/29/flip-side/</guid>
		<description><![CDATA[Everybody&#8217;s pitch is about the great things that can be done with their technology, method, architectural approach etc. They&#8217;ll sing it from the roof tops, put it up on websites, spam us with email and so on. But they rarely (if ever) discuss the flip-side &#8211; what their stuff is not good for. Sometimes: selecting [...]]]></description>
			<content:encoded><![CDATA[<p>Everybody&#8217;s pitch is about the great things that can be done with their technology, method, architectural approach etc.  They&#8217;ll sing it from the roof tops, put it up on websites, spam us with email and so on.</p>
<p>But they rarely (if ever) discuss the flip-side &#8211; what their stuff is not good for.  Sometimes:</p>
<ul>
<li>selecting an appropriate solution is more about what something is bad at than good at</li>
<li>the easiest way to understand something is to know it in terms of what it can&#8217;t do</li>
</ul>
<p>It&#8217;s amazing how often asking someone about the negatives of their stuff results in silence.</p>
<p><!-- technorati tags start -->
<p style="text-align:right;font-size:10px;">Technorati Tags: <a href="http://www.technorati.com/tag/architecture" rel="tag">architecture</a>, <a href="http://www.technorati.com/tag/design" rel="tag">design</a>, <a href="http://www.technorati.com/tag/systems" rel="tag">systems</a></p>
<p><!-- technorati tags end --></p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F10%2F29%2Fflip-side%2F&amp;seed_title=Flip+Side/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Incoherent Ramblings</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F09%2F12%2Fincoherent-ramblings%2F&amp;seed_title=Incoherent+Ramblings</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F09%2F12%2Fincoherent-ramblings%2F&amp;seed_title=Incoherent+Ramblings#comments</comments>
		<pubDate>Wed, 12 Sep 2007 20:21:42 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Systems]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/2007/09/12/incoherent-ramblings/</guid>
		<description><![CDATA[There&#8217;s a lot of noise about transactional memory, thought I should do a bit of research. Having read a number of papers I&#8217;m left wondering just what all the racket is for. At least for me the benefits are unclear. Let&#8217;s consider this paper which discusses amongst other things &#8220;transactifying&#8221; Berkeley Db (a piece of [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a lot of noise about transactional memory, thought I should do a bit of research.  Having read a number of papers I&#8217;m left wondering just what all the racket is for.  At least for me the benefits are unclear.</p>
<p>Let&#8217;s consider this <a href="http://research.sun.com/scalable/pubs/ASPLOS2006.pdf">paper</a> which discusses amongst other things &#8220;transactifying&#8221; <a href="http://www.oracle.com/database/berkeley-db/index.html">Berkeley Db</a> (a piece of software I know quite well).  It contains a comparison of the original version of Db&#8217;s locking system (which used a global lock) and the paper&#8217;s authors&#8217; modified version.  Initial changes were to replace all uses of the global lock with a set of transactions.  A test was run and the transactional version was worse all around than the original &#8211; ooops.</p>
<p>The root cause boiled down to three issues:</p>
<ol>
<li>False sharing &#8211; a problem which occurs when variables accessed by different threads happen to fall in the same cacheline &#8211; this was solved with a traditional approach known as padding.</li>
<li>Statistics collection &#8211; Db collected a bunch of statistics keeping them accurate by using the global lock.  Rather than address what is surely a common problem, the authors simply turned this feature off.</li>
<li>Object pooling &#8211; the pooling associated with lock descriptors and their related objects had to be changed from single linked-lists to collections of linked-lists to improve potential for concurrent access.</li>
</ol>
<p>The tests were re-run and beyond a certain level of scale the transactional memory version was now better but wait, there&#8217;s a problem.  Notice that all the work being done to make the transactional version better is broadly the same as the work one would do to make the locking version better.  How much of the scalability gain is due to better concurrent structure and how much is down to transactional memory?  Is the work we&#8217;ve just done any simpler than what we already have to do for conventional thread/lock based systems?</p>
<p>Another under-discussed factor across many papers in this area is related to the assertion that transactional memory is better than locking due to it being more efficient in the non-conflict case.  However many modern lock primitives are now <a href="http://blogs.sun.com/dave/entry/lets_say_you_re_interested">also</a> <a href="http://blogs.sun.com/dagastine/entry/java_synchronization_optimizations_in_mustang">optimized</a> for this circumstance.</p>
<p>What about the fact that, one must make sure to correctly isolate the atomic actions in a system and bound them appropriately with transactions just as one currently does with locking? We still have to make sure we do that consistently across the entire system or risk the usual concurrency debugging nightmares.</p>
<p>Many of the transactional memory systems appear to be based on optimistic approaches &#8211; does that make sense for all algorithms and systems we might build? Other transactional systems have evolved to provide both optimistic and pessimistic options (in an attempt to cover all design possibilities) and the programmer must make the appropriate choice for their application.  Will transactional memory systems also need to move this way and if so, how will the programmer work with that?</p>
<h2>Asserting Order</h2>
<p>I&#8217;m not going to write-off transactional memory but it seems that should it turn out to be more scalable than the conventional lock-based approaches we use:</p>
<ol>
<li>It&#8217;s really not much simpler to program with.</li>
<li>It&#8217;s no use in the distributed case.</li>
</ol>
<p>Meanwhile:</p>
<ol>
<li>There are other approaches around that do work across both multi-core and multi-box/distributed cases with little change (some would argue the amount of change is zero but I don&#8217;t buy that).</li>
<li>Dealing with concurrency <a href="http://www.dancres.org/blitzblog/2007/06/20/dodging-the-concurrency-bullet/">is about much more</a> than whether you use locks or transactions.</li>
</ol>
<p><!-- technorati tags start -->
<p style="text-align:right;font-size:10px;">Technorati Tags: <a href="http://www.technorati.com/tag/concurrency" rel="tag">concurrency</a>, <a href="http://www.technorati.com/tag/distributed systems" rel="tag">distributed systems</a>, <a href="http://www.technorati.com/tag/scalability" rel="tag">scalability</a>, <a href="http://www.technorati.com/tag/systems" rel="tag">systems</a></p>
<p><!-- technorati tags end --></p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F09%2F12%2Fincoherent-ramblings%2F&amp;seed_title=Incoherent+Ramblings/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>SOA: The Truth Is Out There</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F08%2F22%2Fsoa-the-truth-is-out-there%2F&amp;seed_title=SOA%3A+The+Truth+Is+Out+There</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F08%2F22%2Fsoa-the-truth-is-out-there%2F&amp;seed_title=SOA%3A+The+Truth+Is+Out+There#comments</comments>
		<pubDate>Wed, 22 Aug 2007 09:50:19 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Systems]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/2007/08/22/soa-the-truth-is-out-there/</guid>
		<description><![CDATA[The question is, can we see it? If this article is anything to go by the answer would be no. SOA is an approach to building systems, it certainly couldn&#8217;t be called a style (much to the annoyance of some) but it sure isn&#8217;t a technology. And this is the problem &#8211; so many view [...]]]></description>
			<content:encoded><![CDATA[<p>The question is, can we see it?  If <a href="http://www.infoworld.com/archives/emailPrint.jsp?R=printThis&#038;A=/article/07/08/20/soa-report_1.html">this article</a> is anything to go by the answer would be no.</p>
<p>SOA is an approach to building systems, it certainly couldn&#8217;t be called a <a href="http://www.ics.uci.edu/~fielding/pubs/dissertation/net_arch_styles.htm">style</a> (much to the annoyance of some) but it sure isn&#8217;t a technology.</p>
<p>And this is the problem &#8211; so many view everything to do with building systems as being about deploying the right technologies rather than adopting an approach and driving technology selection from there.</p>
<p>No surprise then that SOA adoption is isolated to small parts of various businesses &#8211; that&#8217;s the maximum level of use that can be achieved whilst it is treated as a technological shift.  Change across the entire business is essential for SOA to get real traction &#8211; systems shouldn&#8217;t be viewed as necessary evils that cost, rather they should be considered as means for delivering enhanced business value.  Processes and culture are the real challenge not hardware and software.</p>
<p>IT/Systems Development needs to be considered a first class citizen within an organization rather than simply the poor cousin that mops the floors and cleans the toilets.  Fewer conversations like &quot;here&#8217;s what we want&quot; and more discussion around &quot;here&#8217;s what we&#8217;re trying to do, how can you help?&quot;.  Switched on readers may notice an interesting parallel with Web vs Enterprise&#8230;.</p>
<p>Web is (amongst other things) about enabling users and agents to do interesting (interconnected/social) stuff more effectively, whilst Enterprise is often treated as little more than automating laborious tasks with strict controls.  Loosely coupled versus tightly coupled, granular and cooperative versus monolithic and uncooperative.</p>
<p><!-- technorati tags start -->
<p style="text-align:right;font-size:10px;">Technorati Tags: <a href="http://www.technorati.com/tag/enterprise" rel="tag">enterprise</a>, <a href="http://www.technorati.com/tag/soa" rel="tag">soa</a>, <a href="http://www.technorati.com/tag/technology" rel="tag">technology</a>, <a href="http://www.technorati.com/tag/web" rel="tag">web</a></p>
<p><!-- technorati tags end --></p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F08%2F22%2Fsoa-the-truth-is-out-there%2F&amp;seed_title=SOA%3A+The+Truth+Is+Out+There/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dodging the Concurrency Bullet</title>
		<link>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F06%2F20%2Fdodging-the-concurrency-bullet%2F&amp;seed_title=Dodging+the+Concurrency+Bullet</link>
		<comments>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F06%2F20%2Fdodging-the-concurrency-bullet%2F&amp;seed_title=Dodging+the+Concurrency+Bullet#comments</comments>
		<pubDate>Wed, 20 Jun 2007 20:26:07 +0000</pubDate>
		<dc:creator>Dan Creswell</dc:creator>
				<category><![CDATA[Systems]]></category>

		<guid isPermaLink="false">http://www.dancres.org/blitzblog/2007/06/20/dodging-the-concurrency-bullet/</guid>
		<description><![CDATA[The debate about all these many core processors continues to circle the blogosphere. Tim Bray had this to say which set me thinking (always a bad thing): Any time we have a piece of state that needs to be accessed concurrently we hit problems. One can hide this problem using messaging (or similar) but the [...]]]></description>
			<content:encoded><![CDATA[<p>The debate about all these many core processors continues to circle the blogosphere.  Tim Bray had <a href="http://www.tbray.org/ongoing/When/200x/2007/06/07/Concurrency">this</a> to say which set me thinking (always a bad thing):</p>
<p>Any time we have a piece of state that needs to be accessed concurrently we hit problems.  One can hide this problem using messaging (or similar) but the key aspect in these solutions is that we can partition operations into streams against discrete elements of data (a discrete element could be a group of things) that don&#8217;t interfere with each other.  Partitioning however can be problematic:</p>
<ol>
<li>Our data has to be amenable to partitioning via hashing or some other method.</li>
<li>It gets tricky when we need to deal with availability and disaster recovery.</li>
<li>Getting the correct granularity of partitioning can be challenging.</li>
</ol>
<p>Which is interesting because whilst we&#8217;ve eliminated the concurrency issue, we&#8217;re now faced with a different one (partitioning) which could be just as hard to cope with and requires just as much thought from a developer and/or architect.  Coincidentally, Werner Vogels (Amazon) is going to be talking about an internal data store (HASS) at the Google Scability Conference and specifically the problems of partitioning and consistent hashing (my original interest with respect to this talk was in the context of the <a href="http://citeseer.ist.psu.edu/544596.html">CAP conjecture</a>).</p>
<p>Another means of avoiding all these concurrency issues is to push them somewhere else.  More often than not this becomes an exercise in creating a supposedly stateless system which in reality simply puts all the state in one place, usually the database.  The argument is that this is acceptable because it&#8217;s only the likes of databases that should deal with these hard issues.</p>
<p>The rub with having the database handle it is that the concurrency model it uses will only scale across so many processors (more if you&#8217;re read mostly, less if your not) and cope with so many concurrent accesses from the stateless component.  Once again to get our database layer to scale, we&#8217;ll need to partition our data into shards across multiple databases (an approach adopted by a number of top-line websites) or find some other way to reduce concurrent load on the database instance.</p>
<p>The act of partitioning can mean we reach a point where we can no longer expect to have atomic updates because the mechanisms for achieving it (e.g. two-phase commit) stop us scaling.  When this happens we must construct complex or at least exotic solutions such as that <a href="http://blogs.msdn.com/pathelland/archive/2007/05/16/link-to-life-beyond-distributed-transactions-an-apostate-s-opinion.aspx">proposed</a> by <a href="http://blogs.msdn.com/pathelland/">Pat Helland</a>.</p>
<p>Okay we got rid of our concurrency problem and swapped it for a partitioning problem which then turned into something of an exotic problem.  Are we any better off?  It seems no matter which way we go we end up with some tough problems to solve.</p>
<p>Perhaps there&#8217;s a sweet-spot tradeoff where the combination of a <a href="http://en.wikipedia.org/wiki/Simultaneous_multithreading">CMT</a> box, with data partitioned across a number of processes and each process containing a simple concurrency model covers most situations.  Even if that&#8217;s the case it seems developers will have to learn a few new tricks.</p>
<p><!-- technorati tags start --></p>
<p style="text-align: right; font-size: 10px">Technorati Tags: <a rel="tag" href="http://www.technorati.com/tag/architecture">architecture</a>, <a rel="tag" href="http://www.technorati.com/tag/concurrency">concurrency</a>, <a rel="tag" href="http://www.technorati.com/tag/software">software</a>, <a rel="tag" href="http://www.technorati.com/tag/systems">systems</a></p>
<p><!-- technorati tags end --></p>
<p><strong>Update:</strong> A <a href="http://programming.reddit.com/info/202w2/comments/c020389">good comment</a> over on <a href="http://www.reddit.com">Reddit</a>.</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://www.dancres.org/wordpress/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://dancres.org/feeder/?FeederAction=clicked&amp;feed=Articles+%28RSS2%29&amp;seed=http%3A%2F%2Fwww.dancres.org%2Fblitzblog%2F2007%2F06%2F20%2Fdodging-the-concurrency-bullet%2F&amp;seed_title=Dodging+the+Concurrency+Bullet/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
