When we write programs one of the things we seek to do is encapsulate our data so as to allow us to manage our dependencies and keep our code clean. Most languages OO or otherwise provide mechanisms to support this way of working.

The thing about the average database is that it doesn’t really encourage similar behaviour. It is all too tempting (and easy) to just allow everyone to access everything. Whilst we confine ourselves to a single application using the database, the problem is to some extent contained but often what we actually do is allow multiple applications access to the same database. The exact way in which this is done varies:

  1. Sometimes we bundle all our middle tier code together even though it has separate roles and responsibilities and integrate all of it via a single database.
  2. Sometimes we have multiple applications each running in a different process.

With each application we put on top of the database the problem gets worse increasing the number of invisible dependencies tying unrelated elements of code together by virtue of accessing a shared schema.

What’s happening is we’re sharing too much intimate knowledge across our system, something we’re all taught to fear. The solution is as always to prevent direct access to this intimate knowledge by interposing layers of abstraction. One way to do this is by requiring access to data to be wrapped up behind an interface. Historically we’ve done this by having a system own the database and expose interfaces that other systems can use to get the data.

Unfortunately there is a well-known issue with this approach which is that the level of granularity is wrong and these additional integration interfaces rapidly balloon into complex beasts. What we need is a a database wrapping entity that has a finer level of granularity than an entire system. Then the integration interfaces will be simpler because there will naturally be a less complex schema underpinning this more limited functionality.

What are we talking about? Services. We end up with a system of lots of discrete services each wrapping up their own data storage.

There are other benefits to this approach:

  1. Each service can utilize the most appropriate storage option for it’s contained data whilst having zero impact on other services that might have different needs.
  2. Each service is an independent entity that can be managed (monitored, deployed etc) separately.
  3. Centralized access patterns are more easily broken down which is useful in cases where we deploy across multiple data-centres.

Who would do such a thing?

Technorati Tags: , , ,

  • Share/Bookmark
3 Responses to “The Siren Call of the Database”
  1. [...] Dan Creswell) This entry was written by iand and posted on 11 July 2007 at 9:32 pm and filed under rdf. [...]

  2. Udi Dahan says:

    I’m agreeing with everything up to the conclusion:

    “We end up with a system of lots of discrete services each wrapping up their own data storage.”

    My experience has been slightly different. I find the result to contain relatively few services, each one being of a business-level granularity. I’d also submit that designing services database-first (while not explicitly stated in the post, but is implied) has a high probability of leading us to a poor business-level service decomposition, not the least due to the duplication of schema elements (or master data definitions if you will) across databases.

    What are your thoughts?

  3. Dan Creswell says:

    “I find the result to contain relatively few services, each one being of a business-level granularity”

    Mmmm well that’s a granularity argument isn’t it? What is a service for you? Is it the same as mine? For me a business-level service might well be built out of a bunch of smaller services (how small those services are might be worth further discussion). Does that fit with your view?

    “I’d also submit that designing services database-first (while not explicitly stated in the post, but is implied)…..”

    Hmmm I wasn’t meaning to imply any such thing. Clearly schema is dictated by what you want to do and is design driven. It would be instructive for me (always looking to improve my communications) if you could point out what I said that caused you to see such an implication?

    “not the least due to the duplication of schema elements (or master data definitions if you will) across databases.”

    Hmmmm, do you believe strongly in normalized databases? There are lots of real world cases where we duplicate elements for speed but in this case I’m not advocating that behaviour. In line with my comment re: services underlying business services above, I would expect some service to own a chunk of data and that chunk of data would be accessed by other services via the owning services APIs (whatever form they take).

    i.e. We’re divorcing the how of our storage of the data which will require one schema from the how we pass the data between services which can be another schema. Thus there’s no reason why a service couldn’t internally use a flat file system for storage of data.

    There’s lots more to discuss here but I’d like to get a better grip on what you’re thinking first….

  4.