Archive for the “JavaSpaces” Category

I’m currently working on getting 2.0-alpha3 out the door which means all the usual release procedures and in particular a lot of thumb-twiddling whilst various tests run. People often ask about the testing that goes into a Blitz release and so, given that I’m suffering from test-induced thumb-twiddling it seems like this would be a good time to write something down.

Blitz is basically a layered Jini service with remote wrappers cleanly separated from the core JavaSpace implementation. Thus we can thrash the core without needing to do a full remote deployment which provides many benefits such as eliminating a lot of network I/O that can hide race conditions and deadlocks.

My test machine is a dual processor dual core machine (2 x 2.6Ghz Xeons with HyperThreading, 2Gb, 2x70Gb SCSI Disks) and it’s main duties are to run the long term soak tests for long periods of time (unit tests get run as part of day-to-day development on my PowerMac Dual G5). There are two basic types of soak test that get run regularly they are:

  1. TxnStress – a multi-threaded test which fills a Blitz instance with a fixed number of entry’s and then randomly takes and re-writes one of those Entry’s with each take/write pair done under a separate transaction.
  2. Stream – a multi-threaded test which writes a sequence of Entry’s into Blitz whilst another thread takes them. Writers and Takers are run in pairs up to a configured number.

Whilst these tests run I typically have jconsole hooked up plotting memory consumption so leaks can be detected and fixed. In addition, Blitz’s in-built statistics API is used to check queue sizes, number of entry’s etc (this is the same API used by the remote dashboard). Finally, each test has code that ensures we are not leaking Entry’s or missing them. Blitz also has a debug-mode where one can simply connect to a pre-configured socket and trigger the dumping of statistics to the console at any time.

These tests get run against Blitz in both transient and persistent mode (which is a mere flip of a configuration variable).

Stream is just about done, so I’ll be shipping a release to SourceForge shortly.

Technorati Tags: , ,

  • Share/Bookmark

Comments Comments Off

Yep, there is a specification for JavaSpace bulk operations but it’s clear from recent discussions it’s still not well known. It was released in Jini 2.1 and provides:

  1. Bulk write
  2. Bulk take
  3. Bulk read
  4. New notify method

Bulk take and write are straightforward, bulk read less so because due to the nature of reads it has to provide a streaming style of interface. And the new notify method allows you to get a copy of the Entry which triggered the event but also provides facilities for Entry’s becoming available/visible again as the result of transaction aborts etc.

The JavaDoc for JavaSpace05 is here.

Technorati Tags: , , ,

  • Share/Bookmark

Comments 8 Comments »

Been meaning to tackle this subject for a while. With a hectic week of coding behind me, a day focused on communication ahead and a number of google queries hitting my blog/site on this subject, it seems like it’s time to do this.

Right so it’s often said that JavaSpaces are all about flows of objects hence the API being the way it is. What does this actually mean in real terms?

  1. Most objects in the space are transient – that is they temporarily reside in the space before heading elsewhere.
  2. Some objects remain in the space forever because they represent “bootstrap state” for clients.

As an aside, it’s worth noting at this stage that leases are orthogonal to the above classification. Bootstrap state might need refreshing or become stale in which case, having old state clean itself up automatically is helpful. Temporary state is to be used by some operation somewhere and that may need to be timed out in which case, again it’s useful if the state automatically cleans itself up.

Right, so what’s in a JavaSpace at any particular time? It’s a snapshot of a set of conversations between multiple senders and recipients. Each conversation is going to have a small amount of state and it’s only relevant to a conversation for a short period of time because the conversation will naturally move on to other things. This material then, isn’t really query’able there’s not much structure around, not even much data. This is in marked contrast to an RDBMS which tends to contain everything and the kitchen sink. When you store everything you need a good mechanism for locating the things of interest – an advanced query language, when you store only a little, locating things is much easier and the querying that much simpler, JavaSpaces simple one might say.

Now, there is a class of application that doesn’t fit this description and does indeed have the JavaSpace holding a lot of state. It’s typically a form of the blackboard pattern, caching or some other form of shared state. Now, caching tends to be performed on entities with unique keys and thus fits cleanly with the JavaSpaces API. Other forms of shared state don’t fit so well – why is this?

If we go back to LINDA, we see that the tuplespace concept was conceived as a tool for simplifying concurrent access to state within what was a single SMP machine (which might be somewhat distributed in the form of a hypercube or a NUMA system, anyone remember transputers?). There was no concept of remoteness present in this concept. When you add remoteness to LINDA you get JavaSpaces or something similar (tuplespaces plus leases, new kinds of exception and in some cases, code movement). And it’s this addition of remoteness that makes these other forms of shared state difficult (though not impossible) to handle. Typically because the amount of state is large but for network efficiency we want to transfer only a little of it which forces us down the route of granular data representations and query languages – sounds like RDBMS?

So, does that mean we can’t use JavaSpaces to handle shared state type problems? No but if you try and solve this problem entirely within the JavaSpace you’re making a mistake because whilst they’re great for solving some parts of this problem, they aren’t good for other aspects.

In summary, databases handle large amounts of shared state and provide query languages to assist with state location and updating. They don’t provide tools for remote co-ordination – this is the domain of the JavaSpace and it becomes supremely powerful once you mix in the simple concurrency model and the ability to move code and have it be secured along with the data. There is some crossover between these two technologies but they’re going in very different directions. Which suits your problem is determined by whch direction is closer to that of your system.

  • Share/Bookmark

Comments 2 Comments »

Many people ask me the following:

What’s the difference between JMS and JavaSpaces?

JMS is essentially a “one-way” comunication technology – fire and forget. Hope that someone on the other side accepts the message and does the right thing with it. This is okay for many things in particular certain aspects of financial systems.

JavaSpaces can certainly support this “one-way” pattern but they are designed to be more generic which means you can use them for JMS-type applications but they might not be the best choice. The key difference comes down to the fact that JavaSpaces can support a more “interactive” style of communication, typically two or multi-party, single or multi round bi-directional “conversations” which explains why they are a favourite for master/worker patterns. JavaSpaces can also be a potential solution for cases where a JMS approach is suffering from “topic explosion”.

Many people get hung up on JavaSpace.notify() because it is “not reliable”. This needs careful explanation – most JavaSpaces will endeavour to deliver all appropriate events to a client. If the client becomes permanently unavailable (NoSuchObjectException) further events will not be delivered to it.

A client that is temporarily unavailable might not receive all events with this typically occurring due to network partition. Those applications that can’t tolerate such loss tend to be deployed on reliable networks with backup infrastructure to ensure no events are lost. It has to be done this way because even JMS implementations have finite storage and if a network is down too long, there is potential for overflow leading to lost messages.

As of JavaSpace05 a client can figure out what events it missed by re-registering for notify and then using contents to locate any messages not received. Typically, such messages are leased so the client must return within the leased time period but that could be as much as 24 hours. The reason for leasing the messages is to automatically clean them up after a period of time.

Clients of JavaSpace.notify() have an additional option which is to have all their events routed to an EventMailbox located (in network hop terms) close to the JavaSpace which they can then receive events from (and as of Jini 2.1 you can have those events pulled by your client).

Update: Updated the last paragraph, as of Jini 2.1, clients can pull events – shame on me for getting that wrong.

  • Share/Bookmark

Comments Comments Off

Updated to include additional commentary on speed of persistence

I often get asked the question “What kind of performance can I get with a JavaSpace?

For anyone who knows me, you won’t be surprised to hear that I respond with “it depends“. But if you’re reading this entry you’ll want more than that right? :)

There are a multitude of factors involved – some of them are user related and some of them are implementation related. This sounds complicated but we can essentially derive all we need to know from examining the user factors and relating them to the underlying implementation constraints:

  • How big is the typical Entry?
  • How many Entry’s do you plan to hold?
  • Typical search patterns
  • How many concurrent access do you expect?
  • How much use do you make of notify?
  • Do you need persistence?
  • Use of transactions

Before we start lets define a couple of terms. JavaSpaces implementations tend to support two basic modes of operation (some support others as well which lie in between these two extremes). Persistent mode means that as soon as an Entry is written it is guarenteed to be available after a crash. Transient mode is the opposite, Entry’s are typically only held in memory and will be lost after a crash.

The size and constituents of an Entry have a direct relationship to the number of network packets you’re likely to incur transporting them back and forth. In many cases you can tune your TCP/IP stack accordingly. I haven’t mentioned RMI yet have I? That’s because owing to the power of Jini and the use of smart proxies, it is possible for a JavaSpace to use a custom protocol for communication with the server rather than RMI. JERI (Jini’s RMI replacement) allows for some serious customization at the invocation layers. How about serialization costs? Well, they are there but there are various techniques for accelerating this which some JavaSpaces implementations exploit.

The number of Entry’s and number of types you have in your application stresses several aspects of a JavaSpaces engine. First and foremost, large numbers of Entry’s may not fit in memory so you’ll be hitting on the caching algorithm the engine uses if any (some implementations can’t swap and either always hit a database for access or attempt to retain everything in memory blowing up if you overfill them). Basically, just like databases, the more memory you can give a JavaSpace the better. The number of different types determines how easily the engine can partition Entry’s. It can also affect the number of database tables and size of database footprint if any (this is particularly relevant to persistent JavaSpaces but also relevant for transient JavaSpaces that swap to disk if memory is exhausted). Finally, the number of Entry’s stresses the indexing algorithms the engine uses and this leads us into the discussion about search patterns.

Some JavaSpaces applications adopt a flow or streaming type approach where only a few Entry’s are present at any one time and are rapidly taken by some process. Often these processes use general templates which are wholly wildcard matches (all their fields are null). Such behaviours can’t really be accelerated by indexes and the optimal Entry storage mechanism here is a linked list which is linearly scanned. However, linked lists are slow for random access and this brings us to the other behaviour where an application typically searches for Entry’s using specific values in fields of a template (think primary key if you like). Under this circumstance, large numbers of Entry’s will penalize searches along linked lists and various of the JavaSpaces implementations use some variant of hash-based searching to accelerate these searches.

No discussion of access patterns is complete without some discussion of FIFO. Basically, FIFO has all sorts of nasty side effects – it tends to penalize indexing, renders various concurrency optimizations useless and tends to incur more disk searching in cases where swapping is required because the number of Entry’s is larger than available memory – it’s basically cache defeating (you actually want the least recently accessed Entry and that’s the first element to be swapped to disk by most decent caching algorithms). One can change cache policies etc but this can slow down other forms of search in cases where you want random access on top of the FIFO behaviour.

The amount of concurrent access stresses the transport. nio is the basic solution adopted here and those JavaSpaces running a custom socket protocol or using JERI (which has an nio option) are at an advantage. Concurrency also exaggerates issues associated with swapping particularly disk thrashing and the effectiveness of the cache policy. It can cause lock contention on caches (particularly in the presence of swapping) and other areas of the engine particularly in the implementation of the behaviour where a blocking take/read is awoken by the arrival of a new Entry from a write.

The number of notify requests dictates the amount of latent template matching the engine has to do for newly written Entry’s. High rates of Entry arrival can overflow queues and stack up behind delivery of the events to remote clients. JavaSpaces implementations typically combat this by applying throttling and using multiple threads to process new Entry’s and deliver events. There’s a certain amount of advantage in using indexing/hashing but it’s mostly a brute force exercise based around thread pools. More threads and more CPU’s means faster notify processing but the network and transport will also be a potential bottleneck. Essentially, notify is a push mechanism whilst take and read are pull mechanisms. Notify requires extra CPU effort because it has to deliver the payload to the client as well as determine matching whilst take and read take some of the delivery load off the engine.

The performance of persistent mode is largely dictated by the performance of the disk subsystem. Specifically it’s governed by the speed with which the OS and disk subsystem can work together to get a forced sync of some data onto the disk platters. The forced sync is essential to guarentee persistence post crash. We can improve performance here with battery-backed write buffers etc which come as part of the more advanced disk infrastructures from the likes of EMC (or something like this). There are various techniques for improving throughput to the disk which amount to reducing the amount of head activity (some JavaSpaces implementations support options in this area, others don’t). If your application is going to require the JavaSpaces implementation to swap to disk (assuming it supports that mode of operation) it’s best to place the database component of the JavaSpaces engine on one disk and the logs on another as these two components tend to have conflicting disk access patterns which can slow things down substantially.

Anyone used to tuning databases will recognise much of what I say above but there’s one other thing I need to mention. Many databases have some aggressive optimizations which trade some level of recovery guarentee (i.e. they may lose the last few updates) for a boost in performance. They still use logs but they buffer updates to the logs in memory for some period of time before flushing them to disk. Some JavaSpaces implementations support this option (this is the intermediate point between full persistence and transient modes) and may even use it as the default. They can give substantial performance improvement but you can lose Entry’s. If you absolutely cannot afford to lose Entry’s make sure you’ve disabled this option if it’s present.

Note that there are some more exotic methods of doing logging which involve passing log entry’s over the network (typically in batches) for holding on another machine. If one sends the log entry’s to several machines, they can each keep copies in memory and not touch disk giving a reliable but fast log assuming your network can cope with the concurrent load of logging and traffic for takes, writes etc.

Transactions need to be co-ordinated by a transaction manager and typically use a two-phase commit protocol, even with a single participant (one JavaSpace) you incur a number of additional network roundtrips. It’s worth knowing that the default settings for Mahalo are not optimal for many high throughput situations. Certain JavaSpaces implementations also optimize the common case of one transaction against one JavaSpace to reduce the roundtrips (there are several ways to do this).

Okay, so the above is the why’s and the wherefore’s, I’ll leave you with a few figures:

  1. On 100 Mbit/sec ethernet, an RMI call with a reasonable sized payload (2k or so when I last tested) will take about 2ms
  2. Blitz’s core engine is well capable of sub-millisecond writes and takes even in persistent mode where most of the cost is in the disk logging activity).

If you still have questions, feel free to post a comment.

  • Share/Bookmark

Comments 1 Comment »