Over the past few 10 months I’ve been involved in a number of challenging enterprise projects that have, to put it simply, had to replace standard asynchronous messaging architectures (like JMS) and Space-based implementations in order to stay in business (meet SLAs etc). One project in particular, where the team had invested over 11 months of their lives with a Space-based implementation, only to find that the vendor supplied architecture did not scale, lost transactions, failed to meet SLAs and ultimately did not make it into production (but that’s another story) lead me to believe that the way we understand, implement and teach scalable system design is, well…. broken (or heavily dependent on the concept of a Transaction – which aren’t as scalable as people seem to assume).
Rather than talking about “what went wrong”, which I’ll be doing over time on this blog (and at Javapolis in December 2007), I thought it might be interesting to reflect on the fundamental challenges of messaging systems, message-based architectures and how they are implemented.
Given my five or so years engineering experience with trading exchanges and automated trading systems (by no means an expert, by no means a pup, but I’ve implemented a couple) I can safely make the following observations.
- Most trading systems seem to scale better, have better performance and availability profiles than most implementations of JMS and Javaspaces (well the one’s I’ve worked on seem to anyway).
- Almost all successful trading systems seem only to make use of JMS (and Javaspaces if adopted) for system integration ie: exchange -> front office -> middle office -> back office. They are not integral to the exchange / matching processing as they tend to have very high-latencies (seconds not milliseconds). Let me make this very clear – such systems don’t use these technologies in core business logic. Personal experience suggests that even modern messaging/space-based systems have 10x to 1000x higher latencies than that typically required.
- Trading systems get their performance and scalability by using different architectural approaches – eg: Avoiding JEE, multi-phase transactions / multi-pass hand-shake protocols / going to disk etc and rely more on Recoverable Computing.
Why should I care? Why does this matter?
It’s pure personal frustration I guess. There’s nothing wrong is JMS, Javaspaces etc. They have a purpose and that purpose is typically integration or ordering of events.
I guess I’m routinely starting to see that the fundamental premises of stateless-ness and bus-based architectures is failing us as we demand scalability etc.
Every week I work with different projects, companies, architectures and architects, all of which face the challenge of delivering predictably scalable systems (10x to 1000x), with mandatory requirements such as high-availability (sub 1, 2 or 3 second recovery) and high-performance (1 to 2 millisecond response time)… to be delivered tomorrow – or in some cases, that afternoon!
In most circumstances I see that the aforementioned design approaches, inter-connected systems via message-buses (put “enterprise” in front of that if you like) or in some very rare cases, a space-based approach, where by messages/entries are placed in a queue/space/topic, written to disk, read out and delivered to a consumer (within a transaction) is practically ensuring all of the requirements mentioned above, simply can’t be met.
So here’s my challenge. If we are so reliant on messaging, why don’t we have implementations that operate in the same manner that we build financial exchanges? Why don’t we learn from lessons of engineering financial trading systems? On one side you have “sellers” – ie: Producers / Publishers / Writers and on the other side you have “buyers” – Consumers / Takers etc. We then simply “cross” (match) the “buyers” and “sellers”. Simple really. In fact the conditions for message delivery are much simpler than those of financial exchanges / automated trading / matching systems, but achieve the same result. Buyers buy, Sellers sell, Publishers publish and Consumers consume.
Perhaps the next JMS / Spaces implementation will be like this… but grid enabled.