“Make as much stateless as possible” is the mantra but I wonder if we’re being a little over-zealous in our application? Consider this note in Fielding‘s REST thesis:
“Like most architectural choices, the stateless constraint reflects a design trade-off. The disadvantage is that it may decrease network performance by increasing the repetitive data (per-interaction overhead) sent in a series of requests, since that data cannot be left on the server in a shared context. In addition, placing the application state on the client-side reduces the server’s control over consistent application behavior, since the application becomes dependent on the correct implementation of semantics across multiple client versions.”
Thus while statelessness is often claimed to achieve scalability, in certain applications that may not be the case due to the resultant load on the network.
Our pursuit of statelessness leads us to behaviours such as making a single entity responsible for the maintenance of all state. Often it’s a database that becomes a black hole sucking up hardware, network bandwidth, admin time and endless tuning effort. It also becomes the focus of our reliability concerns, with a need for clustering, RAID arrays etc. Stand around long enough and you’ll hear terrified utterings from staff such as “if we ever lose the database….”
Making some single thing responsible for all these aspects of our system is asking for trouble. Having all these heavyweight concerns squeezing down on a single element ultimately leads to breakage.
History shows that we aren’t entirely happy with this “single point of responsibility for all state”. We have cookies in browsers, local storage in browsers, thin clients that rely on servers to store all state and so on.
Perhaps we’re ignoring an underlying message: Maintenance of state is a shared responsibility for a system. We should seek to place that responsibility in appropriate places at appropriate times and be much more aware of responsibility boundaries and when it’s appropriate, share that responsibility amongst components.
Generally we consider TCP to be responsible for ensuring that state makes it to the other end of the connection. One hands some data to the TCP layer and we expect that it will ensure the data reaches the recipient. But is this true? What happens if we suffer a power outage before TCP transmits the data? When the machine restarts, is TCP going to restart and resend all that unsent data? Clearly not, whoever delegated responsibility to TCP for this data will now need to take steps to recover the situation.
What about a message queue? Typically we place some data in the queue and demand that it absolutely must deliver that message and not lose it in the meantime. That’s an awful lot of responsibility for a single component to carry! As an aside we’re also potentially making that queue a performance bottleneck of the future.
Then there’s the Web which in many cases puts responsibility on the client for maintenance of state. This is achieved through retries, restoring backups, re-entering details etc. Notably, this is the case even if the client “fails” e.g. your home router goes down or the PC overheats. There’s a certain amount of illusion here too where we believe the responsibility for state maintenance has been placed elsewhere e.g. Flickr. Ideally they don’t want to lose all your precious pictures but if they do, who will have to restore all that information?
I think it’s interesting that placing such responsibility with some single entity is perceived as the easy solution but it has a lot of hidden costs like redundant hardware, clustering, strict data-centre environment control, backups etc.
Spreading responsibility might ultimately be easier and fit with our desire for utility computing but it’s not commonplace and thus we’re lacking well documented patterns, software components etc. We are seeing some examples however, I would speculate that S3‘s API is the way it is precisely because it relies on spreading responsibility for state across a co-operative shared-nothing system rather than placing it all in a single shared-everything cluster.
Technorati Tags: amazon, distributed systems, enterprise, web, utility computing

Entries (RSS)
I’ve written up a TAG finding on State in Web applications at http://www.w3.org/2001/tag/doc/state. Your comments would be appreciated.