Archive for the “Architecture” Category

I commented in my previous post on the fact that working with message brokers can lead to tensions where we’re forced into letting the broker do more than we’d like.

I suspect that the bigger the list of features provided in a single broker the greater the probability we’ll be forced to comply with (and attempt to work around) undesirable models and behaviours that aren’t core to our requirements. Thus under certain circumstances a specific solution that addresses one core problem whilst asserting a minimum of additional constraints can be more attractive than a jack-of-all-trades solution.

It also occurred to me that brokers might not be the only piece of software to exhibit this characteristic. RPC systems tend to suffer similar problems having a tendency to tightly bind endpoint addressing, transport and argument marshalling. I also wonder if these issues contribute to some of the backlash against the behemoth that is an RDBMS implementation. For example whilst we might like our RDBMS to look after our data, we might not wish to put up with the querying, concurrency, deployment and management models it also asserts even with a myriad of configuration options.

Technorati Tags: , ,

  • Share/Bookmark

Comments 2 Comments »

Say the word messaging to a subset of developers and for some reason the immediate knee-jerk is to assume that means using some kind of message broker (Tibco, ActiveMQ or whatever). Utter the term “asynchronous communication” and that is typically equated to messaging and thus also implies use of a message broker.

I find this strange because messaging is possible via a myriad of methods including carrier pigeon, pony express, tcp, multicast, http and many other transports. As for supporting asynchronous operations well that’s governed by the API(s) provided by the transport. In fact this is only partially true because a lower-level synchronous API can be wrapped in an additional layer to produce an asynchronous API. Transports and layering are often seen together, for example it allows us to introduce support for reliable delivery if the underlying transport is not robust (often it isn’t).

There’s no denying that all (or most) of these features (asynchronous APIs, messaging, reliable delivery) are provided in various of the message broker implementations but rarely in the form of a set of composable layers. The preferred approach is usually a myriad of configuration options leaving us at the mercy of the vendor to provide and support just the right combination of configuration possibilities to match our design challenge.

The indivisible nature of these brokers hampers us in other ways too. It can be difficult to use the broker purely for messaging without being forced to work with say its models for routing control and security. I’m sure this is appealing to the vendors but it doesn’t seem like such a great deal to me.

Technorati Tags: , ,

  • Share/Bookmark

Comments 1 Comment »

Bill ponders:

“Sometimes I wonder how a deployed n-tiered scale up monolith can be gradually refactored to a scale out model or run on scale out infrastructure. It’s well documented that companies like Amazon and Ebay have done just that, only how they did it tends to get left out of the slide decks. I suspect it involves thinking quite differently about what ‘good’ application code is….”

I think one reason for the absence of this information in slide decks is that it’s a competitive advantage, a barrier to entry. Mastering scale-out allows one company to handle more load and more customers whilst offering more function than another. Why teach other companies how to compete? For me though the real issue is that the transition is a major effort that defies easy explanation via slideware.

Makeup of a Monolith

The rough form of a monolithic system is a logical (maybe physical) n-tier with simple scale out in the places where it’s easy e.g. a stateless web tier. The remainder is typically a single database containing all data which is fronted by some kind of cache (some systems will also talk to legacy systems via various mechanisms and some will have an extra database or two). Some may view their database replication strategy/cluster to be horizontal scaling but to be the real deal would require explicit partitioning or sharding. All code has access to all of the data in the database and it’s typically not well controlled which leads to what I term the “integration via database” anti-pattern.

In essence to move to a scale out architecture requires us to partition up all the state we have held in the database (admitting there’s incentive to break up the middle tier to avoid it getting overly large and unwieldy). Sounds easy but of course is anything but because there’s just so much beyond “thinking quite differently about what ‘good’ application code is” to learn and change. One might expect that we build things up in horizontal slices, create some new infrastructure, port to the new infrastructure, the usual “upgrade the middleware” type approach but this doesn’t work due to lack of hindsight and the sheer size of the challenge. The antidote to this is to take one little step at a time.

Basic Approach

Instead of the “big scary re-architecture” we take vertical slices through the monolith, separating them out so that we can evolve each with a limited amount to consider. We push these pieces all the way through into deployment so as to expose all our processes and tools to this new type of entity. This is not easy but:

  1. At least it’s being done against a small vertical slice which makes it a more manageable problem.
  2. The transition cannot be managed in one big step, it will be necessary to manage old and new alongside each other so we need to get used to that.

Another option for learning how to work outside of the monolith is to implement some new set of features in this new style and drive that through the delivery chain. Obviously whilst this develops knowledge it’s not breaking down the existing monolith, so let’s move on to consider the basic steps (order is not fixed) for extracting a vertical element:

  1. Identify some reasonable sized function or chunk of data to partition.
  2. Define an interface which will wrap this function and/or data.
  3. Move code and data behind the interface.
  4. Modify other code to access code and data via the interface rather than e.g. direct access to the database.

Step (1) often requires us to examine both data and function. Ideally the interface we introduce in step (2) would look like a remote service. It may for some period of time still be part of the monolith and therefore local. In some cases we can’t make the interface remote initially because the refactor would be too complex so we must later come back and recast the interface as a remote service. What form does this new interface take? Ignoring religious issues it could be WS-*, REST, some other form of remote invocation (sometimes custom but not fine-grained method calls) or messaging.

The arrival of this new interface allows us to introduce behaviours such as asynchronous operation and eventual consistency into a controlled area of our codebase. It’s possible that the data we’ve wrapped behind the interface is still being drawn from the original database so we can also consider moving this data to it’s own separate storage mechanism (which may or may not be another database) and introduce sharding and partitioning.

Challenges

Some developers will fall back on “remoteness dogma” stating that remoteness is a performance inhibitor. Indeed they are right but scale is at least (if not more) important and becomes a key focus when we can’t buy a bigger box to make the monolith perform. For me the real issue at hand is the corruption of the mind that comes from thinking transactionally. In this world we focus on minimising the amount of time a transaction takes for fear of lock contention and blocking threads too long. We become obsessed with consistency, feeling compelled to do all work ahead of the actual transaction. This thinking naturally leads to optimising the execution paths heavily and eschewing remoteness. Key to tackling these issues is socialising use of asynchronous techniques and the fact that consistency is in the eye of the beholder (and determined by how often they can observe the relevant data).

During the recasting of a vertical slice into a standalone element it’s usually necessary to ensure it can cope with existing load (and a bit more). Establishing what the current load looks like can be challenging as monitoring and statistics are often centred around information available from the database, application servers, load-balancers and operating systems i.e. the infrastructure. This data is certainly useful but says little about what’s actually happening in the application code. A good way to address this problem and improve the lot of the operational teams is to introduce a programme of application instrumentation that will deliver appropriate statistics and high-level diagnostics.

Inevitably we will be adopting and/or building some new infrastructure but we must be careful how much we try to acquire and/or custom build in advance of real-world experience (see the hindsight link above). Fortunately there’s little that’s actually required up front other than some mechanisms for service location and statistics gathering so the impact of mistakes is usefully limited. Follow-on candidates might include messaging, deployment and security. Remember also that many vendors are still producing infrastructure suited to big-iron monolithic development and single data-centre environments (which can make resilience in the face of certain kinds of failure difficult).

Static configuration (per-machine or in tools) can be extremely troublesome containing a lot of URL’s, server addresses, machine names and database references. All of these items need changing at each stage from development, through testing, QA, staging and production. The move toward a more distributed approach will only make this worse as it creates a need to copy and tweak more configuration on more machines. It’s important from the early stages to focus on eliminating as much of this as possible. In an ideal world a machine would be configured with at most its designated duty and environment (testing, development, production) obtaining everything else it needs from services in the environment.

Wrap Up

To go horizontal is extremely challenging and cannot be addressed by the typical re-architecting initiatives many companies indulge in. There’s too much to learn and change such that the only option is a slow, step by step, learn as you go transition that gradually chips away at the monolith. For Amazon it seems this transition has taken at least five years and over at eBay it looks like they started the transition sometime around 1999/2000.

Technorati Tags: ,

  • Share/Bookmark

Comments Comments Off

Alright, we’ve previously established that at least some enterprises have a substantial software investment outside of the classic business process arena. We’ve also seen an example of advice that fails to take account of this class of enterprise. Now it’s time to talk architecture.

The more traditional enterprises (those that only need software for business process automation and support) made a mistake in their past which needs undoing. They focused on building applications not systems. Each application was designed to tackle some individual aspect of their business processes which when they needed to be integrated caused much pain. The result has been a trend (SOA) to break up all those application silo’s into a collection of shared services on top of which appropriate applications can be built by the enterprises themselves or others. In the latter case shared services must in some way be exposed to others outside the firewall.

Creating services as above is merely one method of partitioning code and data. This method does not apply so well outside of business processes so we must find another model. ROA presents one such model, one that is data centric and works well in the world of the Web where we desire ad-hoc (chaotic) assembly of resources (mashups). The nature of the Web is such that it can be difficult to know who your users are (there are too many) and to manage transitions from one version of an API to another. Basically you don’t know what your dependencies are and it can be difficult therefore to measure the impact of changes. Some have suggested that it’s not appropriate to retire old APIs or resources but this can have significant impact in terms of maintenance such that at least some organisations do deprecate old APIs and retire them eventually. ROA like any other architecture has it’s limitations.

ROA is of course derived from REST. REST includes a set of constraints which are essentially just useful architectural patterns. Some would claim they deliver scalability but I prefer to state that they are “scalability enabling”, they don’t inhibit scaling. However it’s important to realise that behind the Web layer one must still build scalable infrastructure. Building this infrastructure the right way (architecturally) yields scalability not REST itself. Many websites rely on running multiple copies of their web application, scaling via their database and caching solution which will often be more than enough.

However there is a class of enterprise website for which this approach fails, because the consistency models provided by databases and the like actually can’t scale as far as is required. A further complication is that managing everything as a single web application becomes impossible. Each part of the application has its own unique demands in respect of tuning, configuring, monitoring and maintenance:

  • Tuning for one part of the application has adverse effects on other parts.
  • Configuration becomes a nightmare because there are so many different settings to worry about. Something that works well in testing isn’t appropriate for production leading to separate profiles that must be maintained and kept in sync leading to forgotten changes etc.
  • Monitoring produces so much of a mixture of data that it becomes a major exercise to filter out just what you need.
  • Maintenance becomes an exercise in chasing down long chains of dependencies to make a simple change.

It becomes necessary to break up the application, storage, caching and so on into more manageable pieces that can run separately as a distributed system. Each element provides a service but not as would generally fit with the “classic” definition of SOA. Our requirements for partitioning are driven by multiple forces and thus the decision as to how exactly to break up the application must be determined on a case by case basis. Such a decision could be driven by amongst other things business model (web applications are still surrounded by business processes), scaling needs, specific storage requirements of underlying data or provision of a specific feature for the website.

One might choose to expose such a service using WS-*, messaging, a Web/Resource approach, CORBA or even some form of custom service invocation layer. Perhaps surprisingly there’s a growing number of examples where the custom service invocation layer option is used. I believe this is because all other approaches represent a compromise achieved through limitation of architectural options in such a manner as to be inappropriate for these demanding cases.

We cannot call this architectural approach SOA, ROA, EDA or anything else, it is simply about creating isolated, independent elements and minimising dependencies. It is something we’ve been doing inside of our programs for years. It also allows us to construct a working, manageable system at large scale. It is common sense. CSA anyone?

Technorati Tags: , , ,

  • Share/Bookmark

Comments 2 Comments »

Check out this article from Computing. It is apparent good advice for SOA implementation but as mentioned in my previous post, something has been forgotten – some enterprises provide software as a web app that is their product and revenue generator. This software could be rendered into services behind the firewall, yet is not about business processes and must be treated differently.

A quote from the article:

Mistake No. 3: Leaving SOA to the techies

When the SOA process is left mostly with the IT side of the organisation, services risk being designed to optimise software performance and reliability, but may not fully reflect the business requirements.

Clarity of business interfaces is essential for cross-application integration or multi-organisation use.

What about an interface that provides a specific website feature and is a service in it’s own right? Such an interface is unlikely to be exposed across organizations because it provides a business specific feature we do not wish to share with others. Further such an interface probably has few business requirements though the underlying service may need to support auditing or customer tracking tools.

A further quote:

Mistake No. 1: Irrational SOA exuberance

Excessive numbers of services ­ those that cannot be readily matched to the business model of the application ­ are a sign of an SOA environment where applications need to be checked as they are completed.

Such environments may feature repositories full of services, volumes of documentation and an impressive collection of new tools and middleware, but what they will not have is agility, incremental software versioning or reuse.

Again let’s consider a service that provides a website feature such as recommendations. How much does it have to match the business model? One might argue that SOA is only concerned with business processes but surely we can model other things as services?

So what exactly is a service, what is SOA and where does REST fit in? I’ll cover that next….

Technorati Tags: , ,

  • Share/Bookmark

Comments Comments Off