JavaSpaces and Databases
Posted by: Dan Creswell in Distributed Systems, JavaSpaces, JiniBeen meaning to tackle this subject for a while. With a hectic week of coding behind me, a day focused on communication ahead and a number of google queries hitting my blog/site on this subject, it seems like it’s time to do this.
Right so it’s often said that JavaSpaces are all about flows of objects hence the API being the way it is. What does this actually mean in real terms?
- Most objects in the space are transient - that is they temporarily reside in the space before heading elsewhere.
- Some objects remain in the space forever because they represent “bootstrap state” for clients.
As an aside, it’s worth noting at this stage that leases are orthogonal to the above classification. Bootstrap state might need refreshing or become stale in which case, having old state clean itself up automatically is helpful. Temporary state is to be used by some operation somewhere and that may need to be timed out in which case, again it’s useful if the state automatically cleans itself up.
Right, so what’s in a JavaSpace at any particular time? It’s a snapshot of a set of conversations between multiple senders and recipients. Each conversation is going to have a small amount of state and it’s only relevant to a conversation for a short period of time because the conversation will naturally move on to other things. This material then, isn’t really query’able there’s not much structure around, not even much data. This is in marked contrast to an RDBMS which tends to contain everything and the kitchen sink. When you store everything you need a good mechanism for locating the things of interest - an advanced query language, when you store only a little, locating things is much easier and the querying that much simpler, JavaSpaces simple one might say.
Now, there is a class of application that doesn’t fit this description and does indeed have the JavaSpace holding a lot of state. It’s typically a form of the blackboard pattern, caching or some other form of shared state. Now, caching tends to be performed on entities with unique keys and thus fits cleanly with the JavaSpaces API. Other forms of shared state don’t fit so well - why is this?
If we go back to LINDA, we see that the tuplespace concept was conceived as a tool for simplifying concurrent access to state within what was a single SMP machine (which might be somewhat distributed in the form of a hypercube or a NUMA system, anyone remember transputers?). There was no concept of remoteness present in this concept. When you add remoteness to LINDA you get JavaSpaces or something similar (tuplespaces plus leases, new kinds of exception and in some cases, code movement). And it’s this addition of remoteness that makes these other forms of shared state difficult (though not impossible) to handle. Typically because the amount of state is large but for network efficiency we want to transfer only a little of it which forces us down the route of granular data representations and query languages - sounds like RDBMS?
So, does that mean we can’t use JavaSpaces to handle shared state type problems? No but if you try and solve this problem entirely within the JavaSpace you’re making a mistake because whilst they’re great for solving some parts of this problem, they aren’t good for other aspects.
In summary, databases handle large amounts of shared state and provide query languages to assist with state location and updating. They don’t provide tools for remote co-ordination - this is the domain of the JavaSpace and it becomes supremely powerful once you mix in the simple concurrency model and the ability to move code and have it be secured along with the data. There is some crossover between these two technologies but they’re going in very different directions. Which suits your problem is determined by whch direction is closer to that of your system.

Entries (RSS)