Archive for September, 2007

Planning and estimation discussions always come back to:

  1. Agreeing what will be done
  2. Agreeing how much it will cost
  3. Agreeing a deadline by which the what will be done

Because these questions require that one knows everything down to the deepest detail and that all possible happenings are known (which means knowing when people will be ill and for how long, how much time they’ll need to take off for dealing with family troubles, problems with the plumbing etc) and the risks are mitigated such that they absolutely will not affect your project in unpredictable fashions.

That’s not to say that one can’t set deadlines but one has to expect to trade features away, adjust resourcing etc. Of course none of this is news and yet so many places claim to be agile whilst continuing to have the what, how much, when discussion.

Technorati Tags: , , ,

  • Share/Bookmark

Comments 2 Comments »

Colin Mcrae

Colin Mcrae
1968 – 2007

  • Share/Bookmark

Comments Comments Off

There’s a lot of noise about transactional memory, thought I should do a bit of research. Having read a number of papers I’m left wondering just what all the racket is for. At least for me the benefits are unclear.

Let’s consider this paper which discusses amongst other things “transactifying” Berkeley Db (a piece of software I know quite well). It contains a comparison of the original version of Db’s locking system (which used a global lock) and the paper’s authors’ modified version. Initial changes were to replace all uses of the global lock with a set of transactions. A test was run and the transactional version was worse all around than the original – ooops.

The root cause boiled down to three issues:

  1. False sharing – a problem which occurs when variables accessed by different threads happen to fall in the same cacheline – this was solved with a traditional approach known as padding.
  2. Statistics collection – Db collected a bunch of statistics keeping them accurate by using the global lock. Rather than address what is surely a common problem, the authors simply turned this feature off.
  3. Object pooling – the pooling associated with lock descriptors and their related objects had to be changed from single linked-lists to collections of linked-lists to improve potential for concurrent access.

The tests were re-run and beyond a certain level of scale the transactional memory version was now better but wait, there’s a problem. Notice that all the work being done to make the transactional version better is broadly the same as the work one would do to make the locking version better. How much of the scalability gain is due to better concurrent structure and how much is down to transactional memory? Is the work we’ve just done any simpler than what we already have to do for conventional thread/lock based systems?

Another under-discussed factor across many papers in this area is related to the assertion that transactional memory is better than locking due to it being more efficient in the non-conflict case. However many modern lock primitives are now also optimized for this circumstance.

What about the fact that, one must make sure to correctly isolate the atomic actions in a system and bound them appropriately with transactions just as one currently does with locking? We still have to make sure we do that consistently across the entire system or risk the usual concurrency debugging nightmares.

Many of the transactional memory systems appear to be based on optimistic approaches – does that make sense for all algorithms and systems we might build? Other transactional systems have evolved to provide both optimistic and pessimistic options (in an attempt to cover all design possibilities) and the programmer must make the appropriate choice for their application. Will transactional memory systems also need to move this way and if so, how will the programmer work with that?

Asserting Order

I’m not going to write-off transactional memory but it seems that should it turn out to be more scalable than the conventional lock-based approaches we use:

  1. It’s really not much simpler to program with.
  2. It’s no use in the distributed case.

Meanwhile:

  1. There are other approaches around that do work across both multi-core and multi-box/distributed cases with little change (some would argue the amount of change is zero but I don’t buy that).
  2. Dealing with concurrency is about much more than whether you use locks or transactions.

Technorati Tags: , , ,

  • Share/Bookmark

Comments 3 Comments »

The longer one holds onto the single shared memory, multi-core, big box approach, the harder and more costly it gets to shift to distributed.

Every time we buy a bigger box for increased load we’re wasting money come the day there isn’t a bigger box to buy (something that is looking increasingly likely for many of us). All that money would have been better spent on buying racks of smaller boxes. It’s possible we can recover some of our losses by repurposing that big iron via virtualization rather than throwing it away (like all our previous big boxes) but of course, if that box dies it takes an awful lot of VM’s with it.

Every time we assume we can keep all our data in a single memory or database (even if it’s a cluster) we’re embedding assumptions into our software that will be broken come the day we must partition across multiple memories or databases.

Each time we choose an algorithm that doesn’t easily partition or assumes a single memory/database we’re storing up trouble in our data and computational models.

In big monolithic systems it’s possible to create (by force) a never-fails environment which allows developers to ignore various edge cases. The move to a system built out of many separate parts makes failure almost impossible to avoid. This requires us to adjust our system design to take account of all those edge cases we previously ignored.

The time we spend gaining experience in building big monolithic systems has limited application when we switch to building distributed systems. We must learn new habits and adopt new modes of thought and that costs time.

In the worst cases, an organization’s processes, tools and departmental structure become heavily optimized for managing these big monolithic software and hardware systems such that it needs serious revision to cope with the move to horizontal, many box scaling. Typical problem areas include:

  1. Monitoring – suddenly there’s a much greater number of machines to gather stats from. Existing gui representations mightn’t cope with such a large number.
  2. Diagnosis – no longer does a single timestamp imply an order on events making analysis of logging information and root cause identification harder.
  3. Deployment – previous methods simply break as the level of automation provided is inadequate for the number of machines and software components involved.
  4. Testing – existing testing practices where everything can live on the developer’s desktop or in a single VM are no longer viable. There are too many moving parts and the convenience of isolation provided by testing at the desktop or in a single VM is lost.

I doubt threads will ever go away but learning to build and manage systems constructed in any of the following ways might be worthwhile:

  1. Multiple communicating reliable processes on a reliable bus
  2. Multiple communicating unreliable processes on a reliable bus
  3. Multiple communicating unreliable processes on an unreliable bus

[ Where bus is typically a backplane or a network ]

Technorati Tags: , , ,

  • Share/Bookmark

Comments 4 Comments »

One got addicted and the other ran away
Some settle down a familiar place
One lets go the wheel while the other one steers

One got the money that the other put away
Some held the world and the others couldn’t stay
A few just follow their dreams while the others stood clear
After all these years
After all these years

All These YearsAdemaKill the Headlights

Technorati Tags: ,

  • Share/Bookmark

Comments Comments Off