As I’ve said in a previous posting, I’m not a fan of infrastructure level clustering. It basically comes down to the fact that this kind of approach to resilience and scale is achieved through centralization and strict control of the environment. Whilst such a level of control and centralization might be possible in certain well-defined, small scale circumstances, it gets much more difficult across a network of any size and with any more than a few machines.

There are other problems too like the fact that clustering in this fashion is infrastructure specific not application specific. Basically, the infrastructure will offer you configuration options that fit with what the infrastructure can support/implement generically in an application unaware fashion. You must then fit your application around what the infrastructure will offer which can mean design or performance compromises or maybe maintenance load (hand-holding by system admins etc).

There can be no argument that many adopt this approach and accept the compromises and for a certain class of system, as I’ve said, it’s probably acceptable. However, the typical profile for most systems is that they grow and evolve over time such that those compromises you made are no longer acceptable or viable. They become an albatross around the architects neck. Witness the number of people entrenched in battles to get scaling out of application servers or to add new services etc.

This approach to clustering often goes hand in hand with the denial of the realities of networking. It’s often assumed nothing breaks, there’s always enough bandwidth etc. And everything is coded like it runs in one JVM creating a single difficult to manage monolithic piece of code (something I find ironic given that the people that adopt this approach quite often talk about loose-coupling, service oriented architecture, networked applications etc) which is deliberately naive of it’s under-pinnings.

In an ironic twist, the people that choose to build things this way then throw their hands up in horror and complain when things become difficult to scale, or they haven’t got the level of control they want or they don’t like the way their code has turned out or whatever. What did they expect? They turned all these responsibilities over to some vendor’s software stating “I don’t want to deal with that, you handle it, just tell me what to do” and then in the very next breath said “why on earth did you tell me to do it that way?”

So if I don’t want to suffer this, what might I do? Start looking at building more modular, network aware modules which can be dynamically interlinked at runtime. And find a way to achieve resilience/scale at application level using simple unreliable components. I said in my previous posting:

“I can’t help but feel that there must be a better way……..”

Yep, there is - and I have a prototype to prove it. And, in case you’re wondering, yes the prototype does have persistent state and no it doesn’t require RAID or any other cool hardware, a bunch of blades on a network is all that’s needed. And of course, it uses Jini.

Comments are closed.