Archive for the “Jini” Category

Many people ask me the following:

What’s the difference between JMS and JavaSpaces?

JMS is essentially a “one-way” comunication technology – fire and forget. Hope that someone on the other side accepts the message and does the right thing with it. This is okay for many things in particular certain aspects of financial systems.

JavaSpaces can certainly support this “one-way” pattern but they are designed to be more generic which means you can use them for JMS-type applications but they might not be the best choice. The key difference comes down to the fact that JavaSpaces can support a more “interactive” style of communication, typically two or multi-party, single or multi round bi-directional “conversations” which explains why they are a favourite for master/worker patterns. JavaSpaces can also be a potential solution for cases where a JMS approach is suffering from “topic explosion”.

Many people get hung up on JavaSpace.notify() because it is “not reliable”. This needs careful explanation – most JavaSpaces will endeavour to deliver all appropriate events to a client. If the client becomes permanently unavailable (NoSuchObjectException) further events will not be delivered to it.

A client that is temporarily unavailable might not receive all events with this typically occurring due to network partition. Those applications that can’t tolerate such loss tend to be deployed on reliable networks with backup infrastructure to ensure no events are lost. It has to be done this way because even JMS implementations have finite storage and if a network is down too long, there is potential for overflow leading to lost messages.

As of JavaSpace05 a client can figure out what events it missed by re-registering for notify and then using contents to locate any messages not received. Typically, such messages are leased so the client must return within the leased time period but that could be as much as 24 hours. The reason for leasing the messages is to automatically clean them up after a period of time.

Clients of JavaSpace.notify() have an additional option which is to have all their events routed to an EventMailbox located (in network hop terms) close to the JavaSpace which they can then receive events from (and as of Jini 2.1 you can have those events pulled by your client).

Update: Updated the last paragraph, as of Jini 2.1, clients can pull events – shame on me for getting that wrong.

Comments Comments Off

As I’ve said in a previous posting, I’m not a fan of infrastructure level clustering. It basically comes down to the fact that this kind of approach to resilience and scale is achieved through centralization and strict control of the environment. Whilst such a level of control and centralization might be possible in certain well-defined, small scale circumstances, it gets much more difficult across a network of any size and with any more than a few machines.

There are other problems too like the fact that clustering in this fashion is infrastructure specific not application specific. Basically, the infrastructure will offer you configuration options that fit with what the infrastructure can support/implement generically in an application unaware fashion. You must then fit your application around what the infrastructure will offer which can mean design or performance compromises or maybe maintenance load (hand-holding by system admins etc).

There can be no argument that many adopt this approach and accept the compromises and for a certain class of system, as I’ve said, it’s probably acceptable. However, the typical profile for most systems is that they grow and evolve over time such that those compromises you made are no longer acceptable or viable. They become an albatross around the architects neck. Witness the number of people entrenched in battles to get scaling out of application servers or to add new services etc.

This approach to clustering often goes hand in hand with the denial of the realities of networking. It’s often assumed nothing breaks, there’s always enough bandwidth etc. And everything is coded like it runs in one JVM creating a single difficult to manage monolithic piece of code (something I find ironic given that the people that adopt this approach quite often talk about loose-coupling, service oriented architecture, networked applications etc) which is deliberately naive of it’s under-pinnings.

In an ironic twist, the people that choose to build things this way then throw their hands up in horror and complain when things become difficult to scale, or they haven’t got the level of control they want or they don’t like the way their code has turned out or whatever. What did they expect? They turned all these responsibilities over to some vendor’s software stating “I don’t want to deal with that, you handle it, just tell me what to do” and then in the very next breath said “why on earth did you tell me to do it that way?”

So if I don’t want to suffer this, what might I do? Start looking at building more modular, network aware modules which can be dynamically interlinked at runtime. And find a way to achieve resilience/scale at application level using simple unreliable components. I said in my previous posting:

“I can’t help but feel that there must be a better way……..”

Yep, there is – and I have a prototype to prove it. And, in case you’re wondering, yes the prototype does have persistent state and no it doesn’t require RAID or any other cool hardware, a bunch of blades on a network is all that’s needed. And of course, it uses Jini.

Comments Comments Off

Updated to include additional commentary on speed of persistence

I often get asked the question “What kind of performance can I get with a JavaSpace?

For anyone who knows me, you won’t be surprised to hear that I respond with “it depends“. But if you’re reading this entry you’ll want more than that right? :)

There are a multitude of factors involved – some of them are user related and some of them are implementation related. This sounds complicated but we can essentially derive all we need to know from examining the user factors and relating them to the underlying implementation constraints:

  • How big is the typical Entry?
  • How many Entry’s do you plan to hold?
  • Typical search patterns
  • How many concurrent access do you expect?
  • How much use do you make of notify?
  • Do you need persistence?
  • Use of transactions

Before we start lets define a couple of terms. JavaSpaces implementations tend to support two basic modes of operation (some support others as well which lie in between these two extremes). Persistent mode means that as soon as an Entry is written it is guarenteed to be available after a crash. Transient mode is the opposite, Entry’s are typically only held in memory and will be lost after a crash.

The size and constituents of an Entry have a direct relationship to the number of network packets you’re likely to incur transporting them back and forth. In many cases you can tune your TCP/IP stack accordingly. I haven’t mentioned RMI yet have I? That’s because owing to the power of Jini and the use of smart proxies, it is possible for a JavaSpace to use a custom protocol for communication with the server rather than RMI. JERI (Jini’s RMI replacement) allows for some serious customization at the invocation layers. How about serialization costs? Well, they are there but there are various techniques for accelerating this which some JavaSpaces implementations exploit.

The number of Entry’s and number of types you have in your application stresses several aspects of a JavaSpaces engine. First and foremost, large numbers of Entry’s may not fit in memory so you’ll be hitting on the caching algorithm the engine uses if any (some implementations can’t swap and either always hit a database for access or attempt to retain everything in memory blowing up if you overfill them). Basically, just like databases, the more memory you can give a JavaSpace the better. The number of different types determines how easily the engine can partition Entry’s. It can also affect the number of database tables and size of database footprint if any (this is particularly relevant to persistent JavaSpaces but also relevant for transient JavaSpaces that swap to disk if memory is exhausted). Finally, the number of Entry’s stresses the indexing algorithms the engine uses and this leads us into the discussion about search patterns.

Some JavaSpaces applications adopt a flow or streaming type approach where only a few Entry’s are present at any one time and are rapidly taken by some process. Often these processes use general templates which are wholly wildcard matches (all their fields are null). Such behaviours can’t really be accelerated by indexes and the optimal Entry storage mechanism here is a linked list which is linearly scanned. However, linked lists are slow for random access and this brings us to the other behaviour where an application typically searches for Entry’s using specific values in fields of a template (think primary key if you like). Under this circumstance, large numbers of Entry’s will penalize searches along linked lists and various of the JavaSpaces implementations use some variant of hash-based searching to accelerate these searches.

No discussion of access patterns is complete without some discussion of FIFO. Basically, FIFO has all sorts of nasty side effects – it tends to penalize indexing, renders various concurrency optimizations useless and tends to incur more disk searching in cases where swapping is required because the number of Entry’s is larger than available memory – it’s basically cache defeating (you actually want the least recently accessed Entry and that’s the first element to be swapped to disk by most decent caching algorithms). One can change cache policies etc but this can slow down other forms of search in cases where you want random access on top of the FIFO behaviour.

The amount of concurrent access stresses the transport. nio is the basic solution adopted here and those JavaSpaces running a custom socket protocol or using JERI (which has an nio option) are at an advantage. Concurrency also exaggerates issues associated with swapping particularly disk thrashing and the effectiveness of the cache policy. It can cause lock contention on caches (particularly in the presence of swapping) and other areas of the engine particularly in the implementation of the behaviour where a blocking take/read is awoken by the arrival of a new Entry from a write.

The number of notify requests dictates the amount of latent template matching the engine has to do for newly written Entry’s. High rates of Entry arrival can overflow queues and stack up behind delivery of the events to remote clients. JavaSpaces implementations typically combat this by applying throttling and using multiple threads to process new Entry’s and deliver events. There’s a certain amount of advantage in using indexing/hashing but it’s mostly a brute force exercise based around thread pools. More threads and more CPU’s means faster notify processing but the network and transport will also be a potential bottleneck. Essentially, notify is a push mechanism whilst take and read are pull mechanisms. Notify requires extra CPU effort because it has to deliver the payload to the client as well as determine matching whilst take and read take some of the delivery load off the engine.

The performance of persistent mode is largely dictated by the performance of the disk subsystem. Specifically it’s governed by the speed with which the OS and disk subsystem can work together to get a forced sync of some data onto the disk platters. The forced sync is essential to guarentee persistence post crash. We can improve performance here with battery-backed write buffers etc which come as part of the more advanced disk infrastructures from the likes of EMC (or something like this). There are various techniques for improving throughput to the disk which amount to reducing the amount of head activity (some JavaSpaces implementations support options in this area, others don’t). If your application is going to require the JavaSpaces implementation to swap to disk (assuming it supports that mode of operation) it’s best to place the database component of the JavaSpaces engine on one disk and the logs on another as these two components tend to have conflicting disk access patterns which can slow things down substantially.

Anyone used to tuning databases will recognise much of what I say above but there’s one other thing I need to mention. Many databases have some aggressive optimizations which trade some level of recovery guarentee (i.e. they may lose the last few updates) for a boost in performance. They still use logs but they buffer updates to the logs in memory for some period of time before flushing them to disk. Some JavaSpaces implementations support this option (this is the intermediate point between full persistence and transient modes) and may even use it as the default. They can give substantial performance improvement but you can lose Entry’s. If you absolutely cannot afford to lose Entry’s make sure you’ve disabled this option if it’s present.

Note that there are some more exotic methods of doing logging which involve passing log entry’s over the network (typically in batches) for holding on another machine. If one sends the log entry’s to several machines, they can each keep copies in memory and not touch disk giving a reliable but fast log assuming your network can cope with the concurrent load of logging and traffic for takes, writes etc.

Transactions need to be co-ordinated by a transaction manager and typically use a two-phase commit protocol, even with a single participant (one JavaSpace) you incur a number of additional network roundtrips. It’s worth knowing that the default settings for Mahalo are not optimal for many high throughput situations. Certain JavaSpaces implementations also optimize the common case of one transaction against one JavaSpace to reduce the roundtrips (there are several ways to do this).

Okay, so the above is the why’s and the wherefore’s, I’ll leave you with a few figures:

  1. On 100 Mbit/sec ethernet, an RMI call with a reasonable sized payload (2k or so when I last tested) will take about 2ms
  2. Blitz’s core engine is well capable of sub-millisecond writes and takes even in persistent mode where most of the cost is in the disk logging activity).

If you still have questions, feel free to post a comment.

Comments 1 Comment »

For a number of applications, the standard JavaSpace interface’s single Entry operations are sufficient to construct a scalable and simple solution.

However, there are certain cases where perhaps we’d like to write or take a large batch of Entry’s. With the existing interface this is possible but would incur a significant number of network round trips which can hurt performance. Fortunately, with the release of JINI 2.1 we gained an additional interface JavaSpace05 which provides bulk write/take and some other goodies (which I’ll mention later).

Not only do these bulk operations permit transfer of multiple Entry’s but they also have a provision for the use of multiple templates using an OR-based matching strategy. i.e. take an entry if it matches template A or template B or template C. As with the original operations you get timeouts, leasing and transactions.

Other additions include a second notify method (registerForAvailabilityEvent) which provides more detail concerning the lifetime of an Entry. For example, one can now receive an event to indicate the availability of an Entry as the result of an aborted take.

Lastly we have contents which allows for a form of multiple read against multiple templates. This is treated slightly differently from multiple take in recognition of it’s non-destructive nature. Some developers like to view this feature as an iterator but there are some key differences such as the ability for iteration to never end (due to a constant stream of new writes) and remoteness.

Currently Outrigger (the example JavaSpaces implementation in the Jini Starter Kit) and Blitz provide complete implementations of JavaSpace05. GigaSpaces provide their own equivalents but as yet haven’t announced compatible support for this new standard extension.

Comments Comments Off

11/01/06 – Updated to include more commentary on tradeoffs and reliability

3/01/06 – Updated to include ifExists issues

Plenty of people have asked me to add clustering to Blitz and I’ve certainly been spending time looking at that idea but it’s time for a confession:

I dislike clustering intensely!

Especially when I consider it in the context of JINI philosophy. See, clustering is an attempt to hide and handle partial failure seamlessly allowing clients to imagine that all is well. It offends me on other levels as well – clustering is not simple to implement, not simple to deploy and not simple to manage. Not exactly a great example of the KISS principle and it flys in the face of the “Recovery Oriented Computing” approach taken by the likes of Google and Amazon.

Here are some of the other things that cause me to pause for thought:

(1) Data partitioning – how does one partition Entry’s across multiple JavaSpaces? There has to be some key field that we differentiate on. Or we just differentiate on Entry type which means all Entry’s of one type will end up in one place – not good for scaling. Okay, so which field is key? How do we express that? Do you want it in the Entry itself or in a configuration file? Do you want to change that dynamically or shut everything down, reconfigure and restart. And make no mistake moving data around as the result of changing configuration is going to be slow and painful!

(2) Load balancing – If one uses a master-slave approach, there’s no need for load balancing, there’s only one machine. But if you’re using multiple active machines you need to select a collection of potential query nodes on the basis of the partitioning information above after which you can consider load at each node to determine a choice. But how do you measure that load? Queries per second? Network traffic? CPU time? Disk load? And there’s another nasty factor that kicks around in this mix………..

(3) Replication – if you have multiple active machines serving queries for the same set of Entry’s, each machine has to inform others of changes to state via something like two-phase commit, paxos or whatever. So more nodes sharing Entry’s means more network traffic as they arbitrate over state changes. i.e. Just because you have several nodes providing access to shared data doesn’t mean you’ll scale because they must arbitrate in the same way as SMP processor boxes must. This multiple active machines approach works really well for read-mostly loads but JavaSpaces are update-mostly because take and write are the common operations.

(4) Performance through asynchronous replication – you can’t do this and be an official JavaSpace implementation because part of the JavaSpaces specification (under Operation Ordering) says:

Operations on a space are unordered. The only view of operation order can be a thread’s view of the order of the operations it performs. A view of inter-thread order can be imposed only by cooperating threads that use an application-specific protocol to prevent two or more operations being in progress at a single time on a single JavaSpaces service. Such means are outside the purview of this specification.

For example, given two threads T and U, if T performs a write operation and U performs a read with a template that would match the written entry, the read may not find the written entry even if the write returns before the read. Only if T and U cooperate to ensure that the write returns before the read commences would the read be ensured the opportunity to find the entry written by T (although it still might not do so because of an intervening take from a third entity).

Pay particular attention to that statement about co-operating threads and write/take. Note that when the write completes, it must be visible to the take immediately. Were we to use asynchronous replication, we would not be compliant with this requirement. I suspect given the way one should use take this shouldn’t matter in practice but it has implications for timeliness which might be inconvenient.

(5) Network partitioning – to handle these problems between client and cluster-member requires multiple network routes into the cluster. One cannot select another cluster-member if they are all accessed through a single network pipe. The same principles apply to intra-cluster communications. Whilst there are algorithms to tolerate these problems the ability to make progress under updates will be inhibited and clients may be repeatedly attempting to act on out-of-date state only to be aborted at transaction commit time as the algorithm determines that there is no viable resolution at this time. And even once the network is fixed, there will be a period of instability as things come back into sync.

(6) ifExists – fundamentally, the ifExists variants peer inside of transactions. In a master slave configuration this is not an issue but when one has multiple active partitions, there is a requirement to co-ordinate actions across all nodes which may have a possible match and a blocking transaction. Worse is that you can’t optimize this in any fashion, when the ifExists is issued, one must query all relevant bits of the cluster, arrange for any necessary callbacks and block accordingly.

Whilst it is possible to tackle all of these issues in a clustered implementation, a developer would only have so much control and it mightn’t be what is required for a particular application. There are all sorts of issues which may occur that have strange or undesirable manifestations from an application perspective and are not easily cured. The benefits of such a system are dubious in my opinion given the typical operation profile of a JavaSpaces application.

I can’t help but feel that there must be a better way……..

Other Observations

Many talk about a need for clustering when what they actually need is reliability. That is to say they want their systems to continue to make progress in the face of failures. There are many implementation options available for achieving this and clustering is but one. These same people then make things worse by also placing the requirements for load balancing, partitioning etc onto the cluster solution. This makes things considerably more complex by inter-mingling all sorts of complex and conflicting requirements. This results in either a clustering solution that is horribly difficult to configure or a less configurable solution that has a fixed set of trade-offs assumed for these inter-mingled requirements.

“Continuing to make progress in the face of failures” is a very foggy statement from an engineering perspective. Do we make progress across all work regardless of number of failures. Do we mean make progress with most work in the face a fixed number of failures or some other variation? If you want progress for all work regardless of number of failures, you better have big pockets for a lot of kit. Further, your performance ceiling will be limited because solutions for this problem require a lot of network traffic between members of the partition responsible for the data. More members means more resilience but more traffic and therefore less performance.

Comments 3 Comments »