JavaSpaces and Databases
Posted by: Dan Creswell in Distributed Systems, JavaSpaces, JiniBeen meaning to tackle this subject for a while. With a hectic week of coding behind me, a day focused on communication ahead and a number of google queries hitting my blog/site on this subject, it seems like it’s time to do this.
Right so it’s often said that JavaSpaces are all about flows of objects hence the API being the way it is. What does this actually mean in real terms?
- Most objects in the space are transient - that is they temporarily reside in the space before heading elsewhere.
- Some objects remain in the space forever because they represent “bootstrap state” for clients.
As an aside, it’s worth noting at this stage that leases are orthogonal to the above classification. Bootstrap state might need refreshing or become stale in which case, having old state clean itself up automatically is helpful. Temporary state is to be used by some operation somewhere and that may need to be timed out in which case, again it’s useful if the state automatically cleans itself up.
Right, so what’s in a JavaSpace at any particular time? It’s a snapshot of a set of conversations between multiple senders and recipients. Each conversation is going to have a small amount of state and it’s only relevant to a conversation for a short period of time because the conversation will naturally move on to other things. This material then, isn’t really query’able there’s not much structure around, not even much data. This is in marked contrast to an RDBMS which tends to contain everything and the kitchen sink. When you store everything you need a good mechanism for locating the things of interest - an advanced query language, when you store only a little, locating things is much easier and the querying that much simpler, JavaSpaces simple one might say.
Now, there is a class of application that doesn’t fit this description and does indeed have the JavaSpace holding a lot of state. It’s typically a form of the blackboard pattern, caching or some other form of shared state. Now, caching tends to be performed on entities with unique keys and thus fits cleanly with the JavaSpaces API. Other forms of shared state don’t fit so well - why is this?
If we go back to LINDA, we see that the tuplespace concept was conceived as a tool for simplifying concurrent access to state within what was a single SMP machine (which might be somewhat distributed in the form of a hypercube or a NUMA system, anyone remember transputers?). There was no concept of remoteness present in this concept. When you add remoteness to LINDA you get JavaSpaces or something similar (tuplespaces plus leases, new kinds of exception and in some cases, code movement). And it’s this addition of remoteness that makes these other forms of shared state difficult (though not impossible) to handle. Typically because the amount of state is large but for network efficiency we want to transfer only a little of it which forces us down the route of granular data representations and query languages - sounds like RDBMS?
So, does that mean we can’t use JavaSpaces to handle shared state type problems? No but if you try and solve this problem entirely within the JavaSpace you’re making a mistake because whilst they’re great for solving some parts of this problem, they aren’t good for other aspects.
In summary, databases handle large amounts of shared state and provide query languages to assist with state location and updating. They don’t provide tools for remote co-ordination - this is the domain of the JavaSpace and it becomes supremely powerful once you mix in the simple concurrency model and the ability to move code and have it be secured along with the data. There is some crossover between these two technologies but they’re going in very different directions. Which suits your problem is determined by whch direction is closer to that of your system.
Entries (RSS)
June 13th, 2006 at 2:22 am
At least one vendor has built a query facility, via vendor specific extensions, on top of the JavaSpaces API.
This muddies the purity of the statement that JavaSpaces is “snapshot of a set of conversations between multiple senders and recipients” because one could use the “space” as a database, relying on the persistent space with LRU eviction policy to keep the “working set” reasonable.
I have been tempted to “abuse” spaces in just that way so that both an RDBMS and a space are not needed. What do you think?
Also, I hope you will reveal more about your non-clustered HA solution!
thanks again,
June 13th, 2006 at 10:13 am
Hi Steve,
Cool questions:
Okay so, I’m aware some people have put a query facility on top of the JavaSpaces API and it does indeed muddy the waters. I guess their philosophy is to make a JavaSpace a “jack of all trades” which is the opposite of mine hence my original scribblings on the differences. IMHO, many a Java framework attempts to be “jack of all trades” and the result is horrible API’s, horrible configuration, horrible complexity in general.
Random thoughts:
LRU policy - you don’t have a guarantee that any space implementation actually does have such a policy or even uses caching.
Querying - most RDBMS’en have advanced query optimizers to accelerate performance - JavaSpaces don’t really yield to the same approach. In the case of the vendor I think you’re talking about, they’ve actually built their implementation on top of an SQL database with it’s associated advantages and disadvantages.
Do one thing well - I wouldn’t use a database as a substitute for a JavaSpace or vice versa, nor would I try and “have it all” in one thing. Especially in the context of SOA but in decent architecture in general, the “do one thing well” mantra yields better results. “Have it all” often leads to a compromise which loses you the key benefits of each and messes up your architecture as your system gains features over time. However, I have actually combined RDBMS and JavaSpaces as discrete components for certain kinds of system. And of course, I’m not going to say that my way is the only way!
Request for feedback:
I’d be interested to hear why you would like to avoid having both a JavaSpace and an RDBMS - is it architecture, license cost, machine cost or something else?
Non clustered HA solution:
Bits of this are being tested out with various of my customers in different kinds of deployment. I’m busy figuring out the best way to package it all up and make it available.
I could just supply it as a framework but getting the documentation right etc is difficult. I’m tempted to express it as a collection of design patterns - whether that’d be in book form or something else, dunno. I’m also thinking about providing it as part of some Web 2.0 style offering.