Archive for the “Architecture” Category

Alright, we’ve previously established that at least some enterprises have a substantial software investment outside of the classic business process arena. We’ve also seen an example of advice that fails to take account of this class of enterprise. Now it’s time to talk architecture.

The more traditional enterprises (those that only need software for business process automation and support) made a mistake in their past which needs undoing. They focused on building applications not systems. Each application was designed to tackle some individual aspect of their business processes which when they needed to be integrated caused much pain. The result has been a trend (SOA) to break up all those application silo’s into a collection of shared services on top of which appropriate applications can be built by the enterprises themselves or others. In the latter case shared services must in some way be exposed to others outside the firewall.

Creating services as above is merely one method of partitioning code and data. This method does not apply so well outside of business processes so we must find another model. ROA presents one such model, one that is data centric and works well in the world of the Web where we desire ad-hoc (chaotic) assembly of resources (mashups). The nature of the Web is such that it can be difficult to know who your users are (there are too many) and to manage transitions from one version of an API to another. Basically you don’t know what your dependencies are and it can be difficult therefore to measure the impact of changes. Some have suggested that it’s not appropriate to retire old APIs or resources but this can have significant impact in terms of maintenance such that at least some organisations do deprecate old APIs and retire them eventually. ROA like any other architecture has it’s limitations.

ROA is of course derived from REST. REST includes a set of constraints which are essentially just useful architectural patterns. Some would claim they deliver scalability but I prefer to state that they are “scalability enabling”, they don’t inhibit scaling. However it’s important to realise that behind the Web layer one must still build scalable infrastructure. Building this infrastructure the right way (architecturally) yields scalability not REST itself. Many websites rely on running multiple copies of their web application, scaling via their database and caching solution which will often be more than enough.

However there is a class of enterprise website for which this approach fails, because the consistency models provided by databases and the like actually can’t scale as far as is required. A further complication is that managing everything as a single web application becomes impossible. Each part of the application has its own unique demands in respect of tuning, configuring, monitoring and maintenance:

  • Tuning for one part of the application has adverse effects on other parts.
  • Configuration becomes a nightmare because there are so many different settings to worry about. Something that works well in testing isn’t appropriate for production leading to separate profiles that must be maintained and kept in sync leading to forgotten changes etc.
  • Monitoring produces so much of a mixture of data that it becomes a major exercise to filter out just what you need.
  • Maintenance becomes an exercise in chasing down long chains of dependencies to make a simple change.

It becomes necessary to break up the application, storage, caching and so on into more manageable pieces that can run separately as a distributed system. Each element provides a service but not as would generally fit with the “classic” definition of SOA. Our requirements for partitioning are driven by multiple forces and thus the decision as to how exactly to break up the application must be determined on a case by case basis. Such a decision could be driven by amongst other things business model (web applications are still surrounded by business processes), scaling needs, specific storage requirements of underlying data or provision of a specific feature for the website.

One might choose to expose such a service using WS-*, messaging, a Web/Resource approach, CORBA or even some form of custom service invocation layer. Perhaps surprisingly there’s a growing number of examples where the custom service invocation layer option is used. I believe this is because all other approaches represent a compromise achieved through limitation of architectural options in such a manner as to be inappropriate for these demanding cases.

We cannot call this architectural approach SOA, ROA, EDA or anything else, it is simply about creating isolated, independent elements and minimising dependencies. It is something we’ve been doing inside of our programs for years. It also allows us to construct a working, manageable system at large scale. It is common sense. CSA anyone?

Technorati Tags: , , ,

Comments 2 Comments »

Check out this article from Computing. It is apparent good advice for SOA implementation but as mentioned in my previous post, something has been forgotten – some enterprises provide software as a web app that is their product and revenue generator. This software could be rendered into services behind the firewall, yet is not about business processes and must be treated differently.

A quote from the article:

Mistake No. 3: Leaving SOA to the techies

When the SOA process is left mostly with the IT side of the organisation, services risk being designed to optimise software performance and reliability, but may not fully reflect the business requirements.

Clarity of business interfaces is essential for cross-application integration or multi-organisation use.

What about an interface that provides a specific website feature and is a service in it’s own right? Such an interface is unlikely to be exposed across organizations because it provides a business specific feature we do not wish to share with others. Further such an interface probably has few business requirements though the underlying service may need to support auditing or customer tracking tools.

A further quote:

Mistake No. 1: Irrational SOA exuberance

Excessive numbers of services ­ those that cannot be readily matched to the business model of the application ­ are a sign of an SOA environment where applications need to be checked as they are completed.

Such environments may feature repositories full of services, volumes of documentation and an impressive collection of new tools and middleware, but what they will not have is agility, incremental software versioning or reuse.

Again let’s consider a service that provides a website feature such as recommendations. How much does it have to match the business model? One might argue that SOA is only concerned with business processes but surely we can model other things as services?

So what exactly is a service, what is SOA and where does REST fit in? I’ll cover that next….

Technorati Tags: , ,

Comments Comments Off

One designs or architects solutions to problems – technologies be they ESB’s, CORBA etc are merely implementations of (parts of) an architectural solution. Note that one might compromise one’s architecture slightly because it’s expedient to use an off-the-shelf product. Regardless the important point is that architecture comes first, not last and yet that’s rarely the case these days. Instead technology rules…..

Consider how prevalent the use of frameworks is within our industry and think about the fact that in many cases one simply writes a POJO or two and leaves the rest to the framework. The framework makes life easier, it solves the big problems but it also exerts force on the design of our software as after all we must write it to follow the appropriate conventions, implement the appropriate methods etc.

The very worst example of the framework trend is seen in the decision to purchase a mammoth framework offering that provides everything in one box as an “integrated solution”. A huge stack that get’s connected into everything and exerts massive gravity on our architecture. Everything becomes an exercise in warping aspects of our system to fit with this stack and the assumptions of its creators. Essentially we’ve bought “architecture in a box”.

We’ve been doing this sort of thing for a long time and there’s even a business case to go with it. Enterprises want commodity developers. These chaps are not trusted to take on the bigger challenges rather it’s deemed appropriate to use frameworks (in the form of middleware) to address these big issues and confine the developers to the task of simply implementing the business logic.

There’s an entire industry of analysts and others devoted to producing endless tech comparisons to determine just which of the myriad of frameworks will be the single, final silver bullet solution that allows one to implement an entire system in a matter of weeks. This stuff is gobbled up by the the commodity programmer brigade.

Changing these behaviours is challenging: Architecture RIP? Possibly but a recent comment on ESB’s from Ron Schmelzer means there’s still hope. And there is at least one part of the developer universe that sees value in architecture.

Technorati Tags: , ,

Comments Comments Off

The longer one holds onto the single shared memory, multi-core, big box approach, the harder and more costly it gets to shift to distributed.

Every time we buy a bigger box for increased load we’re wasting money come the day there isn’t a bigger box to buy (something that is looking increasingly likely for many of us). All that money would have been better spent on buying racks of smaller boxes. It’s possible we can recover some of our losses by repurposing that big iron via virtualization rather than throwing it away (like all our previous big boxes) but of course, if that box dies it takes an awful lot of VM‘s with it.

Every time we assume we can keep all our data in a single memory or database (even if it’s a cluster) we’re embedding assumptions into our software that will be broken come the day we must partition across multiple memories or databases.

Each time we choose an algorithm that doesn’t easily partition or assumes a single memory/database we’re storing up trouble in our data and computational models.

In big monolithic systems it’s possible to create (by force) a never-fails environment which allows developers to ignore various edge cases. The move to a system built out of many separate parts makes failure almost impossible to avoid. This requires us to adjust our system design to take account of all those edge cases we previously ignored.

The time we spend gaining experience in building big monolithic systems has limited application when we switch to building distributed systems. We must learn new habits and adopt new modes of thought and that costs time.

In the worst cases, an organization’s processes, tools and departmental structure become heavily optimized for managing these big monolithic software and hardware systems such that it needs serious revision to cope with the move to horizontal, many box scaling. Typical problem areas include:

  1. Monitoring – suddenly there’s a much greater number of machines to gather stats from. Existing gui representations mightn’t cope with such a large number.
  2. Diagnosis – no longer does a single timestamp imply an order on events making analysis of logging information and root cause identification harder.
  3. Deployment – previous methods simply break as the level of automation provided is inadequate for the number of machines and software components involved.
  4. Testing – existing testing practices where everything can live on the developer’s desktop or in a single VM are no longer viable. There are too many moving parts and the convenience of isolation provided by testing at the desktop or in a single VM is lost.

I doubt threads will ever go away but learning to build and manage systems constructed in any of the following ways might be worthwhile:

  1. Multiple communicating reliable processes on a reliable bus
  2. Multiple communicating unreliable processes on a reliable bus
  3. Multiple communicating unreliable processes on an unreliable bus

[ Where bus is typically a backplane or a network ]

Technorati Tags: , , ,

Comments 4 Comments »

There’s been some renewed discussion on the relative merits of push and pull for circulating changes.

What I find fascinating is how there’s often a tendency to polarize solutions one way or the other – either we’re entirely push (with failover support etc because we absolutely cannot afford for it to fail) or we must be entirely pull (and worry about what speed of polling to use and build infrastructure that can scale with it etc).

The Good and Bad of Push

Push allows for timely delivery of information updates. If the rate is high enough it makes sense to batch updates together for more efficient delivery. Significantly from the perspective of most, push ensures that we burn CPU cycles as and when there’s something worth doing in contrast to pull where we can waste cycles (though some can be saved with e.g. appropriate use of caching) finding out nothing has changed.

The downside to push comes when clients can’t receive their updates due to network partition or their own downtime (failure, running out of battery power, whatever). When this happens, if we stay push focused we must build appropriate mechanisms for tracking what messages a client has or has not received and hold on to them which can get messy/complex.

And how do we know the client is back? Because it will reconnect, it will pull if you will…..

The Need to Pull

Pull allows a client to dictate when it receives it’s updates and can be particularly attractive in the case of slow update rates. Pull also allows us to recover from various lost event scenarios like:

  1. Delivery failure – given a rough idea of rate of event delivery and a period of silence (that is no event has been received) we can perform a check for lost events by performing a pull. And a failed pull tells us quite clearly something is broken.
  2. Client offline or dead for some period of time.

Recovery is performed by going back to the "event archive" and finding all the events we missed (we can easily do this so long as we have noted the last event we’ve seen, this works really nicely if we do batching of events) after which we can return to the push mode of operation.

We can limit the size of the archive somewhat by bounding the maximum amount of time a client can be down for whilst still being able to restore itself.

To make this work requires that we provide some way to identify each event uniquely and the ability to page through the "event archive" efficiently.

The Best Of Both

Rather than focus solely on either approach in isolation, I think the best solution is to use a combination. This has a couple of advantages:

  1. Clients can potentially use whichever method is more appropriate for them.
  2. It provides significant opportunity for fine tuning.
  3. It provides a nice simple recovery model.
  4. Responsibility is balanced throughout the system keeping complexity down.

[ I'm not alone in this belief as Bill describes exactly such a hybrid approach from the perspective of his favourite technologies (I quite like them too). What I wanted to do was describe the underpinning patterns because I believe this allows us to be technology agnostic and build a working system in whatever environment we're faced with (for example JavaSpace05 could be used as a substrate). ]

Update: A variation on the scheme allows a client to pull some base state and a set of events from the archive after which it resumes listening to events. The size of the archive can then be managed by every so often updating the base state and storing events since then – basically we’re checkpointing.

Technorati Tags: , , ,

Comments 4 Comments »