Archive for May, 2007

IMHO, this is terrible for the sport. If one wishes to ban team orders one must ruthlessly enforce the rules otherwise a gray area is left open to exploitation. Ron Dennis stated that his reasons for issuing orders were to ensure his drivers finished the race, trouble is a convenient side effect of this action is that Alonso takes the ten points which neatly helps to clean up the “issues” around the championship lead. Herein lies the problem, one cannot be certain that Ron Dennis had pure intentions, we must just believe which leads to a bunch of possible not so nice interpretations around the ruling:

  1. Lewis Hamilton brings a lot of extra interest to F1 and we wouldn’t wish to poison that with a team orders embarrassment.
  2. The stakeholders in the Monte Carlo race might not want it’s reputation tarnished.
  3. Nobody wants another Ferrari win, anything else is preferable.

So here we are then, Ferrari could be judged to have taken action all those years ago to give themselves the best chance of having one of their drivers win the championship and that’s punished. Mclaren could be judged to have done exactly the same thing and weren’t punished. Just where is the line then? Perhaps it would be better to accept the team orders ban cannot be effectively applied and junk it so as to avoid this kind of mess?

Technorati Tags: , ,

Comments Comments Off

Intel wades in on the “we can’t do any more magic concurrency for software” issue.

It’s been debated often enough and always seems to come down to the fact that the average programmer isn’t able to cope with concurrency and needs higher levels of abstraction to do it for them. The thing is, we already have such abstractions e.g. transactions and we know they can only take us so far. Worse there are other abstractions out there such as blackboard systems which these average programmers either can’t or won’t try to cope with.

So what is to be done? Well if the last couple of decades are anything to go by, absolutely nothing! Why? It’s the talent limit. How many chip designers are there in the world? How many motherboard designers? How many car designers? How many developers? I’m willing to bet that there are considerably more people in the developer category than any of the others. This is because we’ve lowered the bar in terms of developer quality to cope with the wide demand for bums on seats and lets face it, it’s unlikely that the average enterprise is going to change it’s policies in this respect. It’s interesting to note that it’s much harder to lower the bar in for example chip design, it either works or it doesn’t whilst software is almost expected to be flaky these days.

Until we decide to clean house in software land, we’ll not get progress on these thorny issues because they only matter to the few and when all said and done, there is a school of thought that might suggest that software is good enough, concurrency, efficiency, quality and green’ness be damned.

Update: An example of how challenging it can be to make concurrency easy can be found here. On the surface we’ve done some good things and yet we are still open to the simplest of errors (in this case a missing synchronized clause).

Comments 2 Comments »

I’m a firm believer in making the minimum number of mistakes and one of the most effective ways to achieve this is to learn from the history of others particularly those leading the field. In distributed systems, one of the leaders is Amazon whom it could be argued are unlike anything else out there and thus are not applicable. However it’s surely a mistake to not look at Amazon and see if there’s anything we might find useful simply because if we are successful we could face the same problems. Here then is my perspective on what some of the more interesting published data-points are:

Amazon.com started 10 years ago as a monolithic application, running on a Web server, talking to a database on the back end. This application, dubbed Obidos, evolved to hold all the business logic, all the display logic, and all the functionality that Amazon eventually became famous for: similarities, recommendations, Listmania, reviews, etc. For years the scaling efforts at Amazon were focused on making the back-end databases scale to hold more items, more customers, more orders, and to support multiple international sites. This went on until 2001 when it became clear that the front-end application couldn’t scale anymore…….

……..The many things that you would like to see happening in a good software environment couldn’t be done anymore; there were many complex pieces of software combined into a single system. It couldn’t evolve anymore. The parts that needed to scale independently were tied into sharing resources with other unknown code paths. There was no isolation and, as a result, no clear ownership.

At the same time, there was continued difficulty in the back-end database scaling effort. Databases—and by that time we were using several databases—were a shared resource, which made it very hard to scale-out the overall business. So both the front-end and back-end processes were restricted in their evolution because they were shared by many different teams and processes.

Notice how the monolithic single database architecture hasn’t just confined scalability and performance but the speed with which new features could be added. As an enterprise one might certainly argue that the level of scale of amazon is irrelevant to them but the ability to add features or change? That sounds like something we should all be interested in. Here’s how they changed things:

We went through a period of serious introspection and concluded that a service-oriented architecture would give us the level of isolation that would allow us to build many software components rapidly and independently. By the way, this was way before service-oriented was a buzzword. For us service orientation means encapsulating the data with the business logic that operates on the data, with the only access through a published service interface. No direct database access is allowed from outside the service, and there’s no data sharing among the services.

Over time, this grew into hundreds of services and a number of application servers that aggregate the information from the services. The application that renders the Amazon.com Web pages is one such application server, but so are the applications that serve the Web-services interface, the customer service application, the seller interface, and the many third-party Web sites that run on our platform.

If you hit the Amazon.com gateway page, the application calls more than 100 services to collect data and construct the page for you….

…….It depends a bit on what kind of page you visit—whether it is a product page, a checkout page, etc. It also depends on how effective caching is for the objects on that page, as well as some other factors.

Many are deeply concerned with avoiding remoteness whenever possible, notice that Amazon don’t run away screaming, rather they pragmatically engineer and include use of caching etc.

The first and foremost lesson is a meta-lesson: If applied, strict service orientation is an excellent technique to achieve isolation; you come to a level of ownership and control that was not seen before. A second lesson is probably that by prohibiting direct database access by clients, you can make scaling and reliability improvements to your service state without involving your clients. Other lessons are related to how you access services: If you want to be able to aggregate services easily, if you want to insert advanced infrastructure techniques such as decentralized request routing or distributed request tracking, you need a single unified service-access mechanism.

Another lesson we’ve learned is that it’s not only the technology side that was improved by using services. The development and operational process has greatly benefited from it as well. The services model has been a key enabler in creating teams that can innovate quickly with a strong customer focus. Each service has a team associated with it, and that team is completely responsible for the service—from scoping out the functionality, to architecting it, to building it, and operating it.

There is another lesson here: Giving developers operational responsibilities has greatly enhanced the quality of the services, both from a customer and a technology point of view. The traditional model is that you take your software to the wall that separates development and operations, and throw it over and then forget about it. Not at Amazon. You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. This customer feedback loop is essential for improving the quality of the service.

There’s a hint in respect of how Amazon do what they do technically and notice how the benefits of this approach also touch on process and quality. For example development teams can make more independent progress than they might were they are all forced to work in lockstep via centralized approaches.

There is quite a bit of development happening in Eclipse, but IntelliJ’s IDEA is also popular for Java development. Some development happens in Visual Studio. Developers of our services can use any tools they see fit to build their services. Developers themselves know best which tools make them most productive and which tools are right for the job. If that means using C++, then so be it. Whatever tools are necessary, we provide them, and then get the hell out of the way of the developers so that they can do their jobs…..

…….I think part of the chaotic nature—the emerging nature—of Amazon’s platform is that there are many tools available, and we try not to impose too many constraints on our engineers. We provide incentives for some things, such as integration with the monitoring system and other infrastructure tools. But for the rest, we allow teams to function as independently as possible. Developers are like artists; they produce their best work if they have the freedom to do so, but they need good tools. As a result of this principle, we have many support tools that are of a self-help nature. The support environment around the service development should never get in the way of the development itself.

Here’s further evidence of how Amazon “decouple” their development teams. The teams are empowered to choose the tools for the job that make them most effective but at the same time they are expected to follow some guidelines in respect of monitoring and other infrastructure.

We have a very good understanding of how customers interact with the site as is. When we expose new features we measure how they change the customer’s behavior. For example, does it take the customer fewer steps to find what he or she needs? This is hard because you are measuring human behavior; there are some things that customers are delighted about immediately and there are other things that they have to get used to…..

…..We measure whether or not a new feature is successful in terms of customer satisfaction: Do people find things more easily? If we can improve the convenience of shopping on Amazon, then we have booked a major success. If we can help them find things that they might not have thought of before, that is also excellent. Customers tend to vote with their wallets, so if there is a clear negative result, we know what to do with that service.

Measurement is king and we’re not just talking performance stats!

First thing, I think there’s a whole list of good practices that we have in terms of design, in terms of architecture, in terms of building. And one of those points — one of the bullets on that list is that you have to design for failure, meaning that failure of components, whether they’re hardware, software, humans, is a fact of life, and you have to architect as if they are continuously happening to you. And if you do that and you happen to hit a good streak, then you’re fine. But failure in any large-scale system is the normal case, not the exception. So build, for example, for fast recovery. That’s an essential part. You know, stuff fails, comes back up, and you have to make sure that it can be inserted back into the functioning set as soon as possible.

Unreliability of all sorts means the same thing, no service. It’s not just about network failure or machine failure but problems in software at multiple levels. Tackling these issues whilst maintaining service is challenging and typically requires the application to co-operate in an active fashion. This is counter to the established norm which would be to write the application to be naive of issues and run it on top of a cluster.

I think if you talk to anybody in industry that is responsible for running a very large-scale, geographically distributed, distributed system, such as Amazon is, relying on third parties, on vendors, to actually deliver this availability for you is very dangerous. We’ve seen that there are a number of vendors out there that are exceptional in providing highly available systems in very contained environments. There aren’t that many systems out there of the scale of Amazon. The problems that we have-I won’t say the problems-the challenges that we have in delivering this very highly available system…there are not that many others that have these kinds of challenges. And so third party software is clearly not geared to meeting our challenges there.

Great success brings with it great challenge. If you are a large successful service-provider (be that web, enterprise, mobile etc) there may come a time where the vendors simply don’t cater for you. You have moved outside of their target market and the scope of both their product and their experience. Migrating away from vendors cleanly is going to be a heck of a hairy thing to achieve.

Technorati Tags: , , ,

Comments 5 Comments »

The author of Code Complete and my favourite project management book Rapid Development is blogging over at Construx.

This should be good…..

Comments 1 Comment »

Dan Pritchett presents some good insight in respect of large scale systems.

I’m left to ponder the fact that whilst Dan is deliberately creating these systems….

I find myself looking at nondeterministic systems a lot lately. Many solutions for the challenges of extreme scale involve relaxing constraints and coping with the ensuing chaos.

….there’s many a techie out there building large systems unaware of the fact that they can’t assert order on such a beast. All they can do is hold back the wave of chaos whilst having to clear up the odd drop that made it over the levee (e.g. consequences of a network outage or operation ordering problems). They are a long way from appreciating that chaos is a given, let alone actively managing it.

In a further twist of irony for all that chaos seems to introduce complexity, if you accept it’s existence and work with it you often end up with something simpler than was ever possible with “old world” thinking.

Technorati Tags: , ,

Comments Comments Off

All seats are taken for what should be a really interesting day’s worth of interaction. But fear not, I’m reliably informed that the presentations will all be posted up on Google Video afterwards.

Nice one Google, many other conferences aren’t that willing to share.

Technorati Tags: , , ,

Comments Comments Off

There’s truth in your lies
doubt in your faith
what you build you lay to waste
there’s truth in your lies
doubt in your faith
all I’ve got’s what you didn’t take

So I
I won’t be the one
be the one to leave this in pieces
and you
you will be alone
alone with all your secrets
your regrets
don’t lie

Linkin Park - Minutes to Midnight

Technorati Tags: , ,

Comments Comments Off

I realized a while ago that I have this bunch of material I keep to hand at all times - thought it might be worth sharing. It’s in the sidebar on the right. I’ll be adding to it over time, for a start there’s some stuff over on my website that should be in there.

Enjoy.

Update: Added some more links and fixed all the one’s the stupid HTML editor didn’t correctly tag, mutter.

Comments 2 Comments »

So many times the hardest part of moving things forward is effecting change in culture and habitual behaviours.

Habitual behaviours are the worst because one can accept the need for change, even be making it but also be completely unaware of subconscious behaviours from the old world getting in the way or continuing to drive us.

Self awareness then and being aware of our behaviours (consciously knowing what we are doing) is very important if we are to make key cultural change. It requires mentoring, questioning and motivating. Alas some people simply won’t be able to make the change. Most important of all, these changes happen sloooooowly. Although the more people who “get it” the more amplified the change gets and hopefully the quicker the remainder learn new habits.

So I’m left to ponder, just how often have we really changed things? How often have we fooled ourselves into believing we’ve achieved a revolution when actually we’ve managed little more than a slight evolution. If everyone can understand something so quickly is that not because it fits with existing understanding and behaviours? If everyone gets to carry over their creature comforts (gets what they are used to) have we not re-asserted old world thinking?

Technorati Tags: , ,

Comments 1 Comment »

…is that it takes forever to achieve and by the time you deliver it, the world you built it for is gone.

Better to make a quick best guess, ship it and then iterate it in concert with the changing of the world.

Technorati Tags: , , ,

Comments 6 Comments »

That’s right I’m stupid, I must be cos I learn something new and significant every day.

And I’m not talking about some API detail or some new framework I mean I’m changing my thinking, I’m changing me, I’m seeing a bigger picture. I want topsight[1], not nitty gritty detail. Detail is easy to cope with if you can see the higher view, how things fit and interact.

If I were clever, there’d be nothing more to learn and I’d have answers to all questions. Knowledge is all very fine but it comes through learning and that comes through experiencing things and that’s what makes you smarter. Having a big brain isn’t enough.

[1] Topsight - a term used by Gelernter in MirrorWorlds, meaning the ability to view the whole system rather than small details.

Comments 4 Comments »

How many times does something like this have to happen before we understand that:

1. Business isn’t about ethics.
2. Many a lawyer feeds off point (1).
3. The cost of (1) and (2) is a human cost.

And when will we all stand up, demand a change and make sure something is done about it? Next time your tax bill rises or you get bad service from a company who will you blame? Business or yourself for silently putting up with it?

Technorati Tags: , ,

Comments Comments Off

Dare has this to say about API versioning. His basic approach can be summarized as don’t change anything at all and separate your URL space using some version moniker. This is a good piece of advice but I’m more wary of other aspects of the posting such as the suggestion that we should support all versions over time to maintain backward compatibility.

I’m tempted to suggest that this is a classic Microsoft (and enterprise) mindset: pile up a stack of legacy and backward compatibility requirement which can potentially lead to a mess of complexity and brittleness underneath the APIs. I feel the reasoning around client migration also needs re-examination. This is because Live Messenger doesn’t live in a browser. It is a desktop native binary application and therefore has completely different characteristics from a genuine web application in terms of dispersal, upgrading and updates.

Consider that for many a browser based application, upgrades are but a reload away. Further many a web service that is mashed up as part of another offering will ultimately present to a user as a browser-based UI. i.e. We don’t need all our desktops to upgrade, we need to upgrade the integration point in the backend services of the mashups that use the web service. That’s at least potentially considerably fewer entities that need to upgrade than is normal in the desktop world.

Dare’s posting also seems to have an underlying current of belief that one can define an API perfectly in the first place. We have known for a long time that such a feat of prediction is often very difficult though it can be made more manageable via functional partitioning.

Limited lifetime is one of the best antidotes we have to the complexity of legacy and the brittleness of code. We resist it far too often leading to the mess we have in many an enterprise:

  1. Instability
  2. Slower Feature Delivery
  3. Difficult maintenance

The web, the human body and many real-world systems (cars for example) exploit the concept of limited lifetime successfully, perhaps this philosophy should be more widely applied in software?

Technorati Tags: , , , , ,

Comments Comments Off

Patrick and Fuzzy jumped on something from Mark Baker about Apollo and whether or not it’s built on top of the web.

Mark’s argument essentially amounts to “make it work in the existing browser” as opposed to “extend beyond the browser”. There’s a fine dividing line here because the current environment provided by the browser is limited.

There are a multitude of ways to deal with the fact that browsers don’t support local file access etc. One might be to properly “web-enable” the local computer’s operating system having it provide access to resources etc via 127.0.0.1 in a browser friendly fashion.

Another would be something like Apollo or Java or …..

More interesting for me is that whilst the web is supposed to be this de-centralized thing (and it is indeed in terms of services and combining them together) we’re not really de-centralizing the implementation of individual services such that we are able to push substantial logic into the client-side.

The current model, as I’ve said before, can make maintenance of services easier because new code for the browser is but a page refresh away. I think we could maintain a good deal of this constraint whilst getting closer to something like Apollo.

[ And how Jini like is the web-deployment model? Seems very similar - push the impl to the client just when it needs it. ]

So maybe Apollo isn’t the solution, maybe the existing browser isn’t the solution, maybe we need to consider the browser to be a work in progress, in need of extension so we can build better “on top of the web”.

Who knows?

Why do we want this kind of logic in the client anyway? Well because whilst statelessness is great and all, the reality is that if the state is not held inside of a human being’s head, it’s going to be in the browser and manipulating that state in efficient fashions (not pushing it all back to the server in every call) dictates a fair amount of client-side intelligence. We might also want to batch things up and so on for the purposes of making better use of the network. Lastly we might want enhanced user-interaction but eye-candy is sweet, you can never get enough and your browser could end up obese.

[For more on state in web apps, see Dave Orchard's note]

Technorati Tags: , , ,

Comments Comments Off

Leave cement lying still for too long and it sets. Once it sets, making adjustments becomes exceedingly difficult. You’ll need pneumatic drills to break it down and bulldozers to remove it. Then you’ll need to bring in a whole new batch to be re-cast in accordance with your wishes.

So it is that software left unstirred for too long becomes brittle and will demand wholesale replacement. One can build a certain amount on such a foundation but eventually it will need a complete reconstruction affecting everything built atop. In this world stirring is refactoring.

Cement has other properties that determine just how quickly it sets and it is the same for software, in the form of do’s and dont’s like loose coupling and separation of concerns. It’s fascinating to note that we spend a massive amount of time focusing on making software malleable at compile/build time (Spring anyone?) but considerably less effort on ensuring similar flexibility post deployment making for brittleness in face of failure, upgrade, configuration changes, scaling etc.

Some aspects of this analogy (not sure what the equivalent of stirring would be) can also be applied to organizational behaviours. For example if one focuses a business heavily on tactical operation for an extended period of time, adapting to more strategic operations will be impeded by it’s structure, modus operandi and culture. Necessary changes might including hiring, firing, reorganization (with significant impact on existing hierarchy), process revision etc.

Technorati Tags: ,

[Blame the child in me for the gratuitous photo's of construction tools :) ]

Comments Comments Off