Billy Newport (IBM) has a nice post on his blog about the possible demise of Data Grids and XTP with the advances being made in Solid State Disks (SSD). He raises some very valid points about the potential improvements SSD deliver over traditional disk systems, in particular performance, something that Data Grids and XTP try to address. So the question is… is Data Grid doomed? Well, not exactly. As Billy points out, one of the core issues that Data Grids attempt to address is that of scalability, and SSD doesn’t solve this. No matter how much of the stuff you think you can stuff into a single (or collection of) servers, the stuff has to be managed, and that essentially means partitioning.
So how does Oracle Coherence fit into this discussion? How is SSD addressed? Well put simply, Oracle Coherence (once Tangosol Coherence) was essentially designed from day one (over seven years ago now) to virtualize storage. In fact Coherence itself doesn’t care what or where the actual underlying data is stored. Coherence will happily manage and automatically partition data across the configured storage, whether it be disk, ram or SSD. Essentially Coherence virtualizes data management.
While most users adopt the standard practice of using out-of-the-box in-memory storage, the storage sub-system of Coherence can be completely customized and actually ships with several non-memory-based alternative mechanisms for managing data, including a scheme that lets you manage data outside the Java heap, say on ram-disk… SSD here we come! This is not an all or nothing option. Individual data storage areas (often called cache regions or domains, but in Coherence terms are called NamedCaches) can use different schemes all within the same cluster of applications. Better still, the schemes can be composed from other schemes to essentially produce a plethora of other schemes… essentially making storage options virtually limitless (no pun intended :P)
eg: If you want your application to keep recently used data from the grid managed in your local process, but the rest of the data for the grid to be stored directly on disk (say SSD), you can use what’s called the “near-scheme”. The near-scheme typically combines a local-scheme (for your local data) and some other scheme, like an external-scheme (say using off-heap-memory-mapped storage) for the grid data. You could even compose a scheme with out-of-the-box Coherence such that recently used data is kept in the Java heap, the next on SSD and then next, in say flat-files, or on another site.
Here’s a brief list of some of the standard Coherence schemes; local-scheme, distributed-scheme, replicated-scheme, near-scheme, optimistic-scheme, overflow-scheme, disk-scheme, read-write-backing-map-scheme (usually used for database integration), external-scheme (often file-based), paged-external-scheme, remote-scheme… and so on.
If it somehow turns out that Coherence doesn’t support how you would like to have data stored, on the devices, technologies or sites that you desire, you can roll-your-own scheme… using the class-scheme. More information on the caching schemes in Coherence are available here.
Do people actually do this? Yep. All of the time. All of the investment banks in London that use Coherence in some form (and that’s most of them), customize the storage schemes to suit their applications and infrastructure, including a whole bunch that make use of off-heap, memory-mapped files (the next best thing to SSD) for managing data.
Want to know more? If you’re in London drop by my talk at QCON 2008 where I’ll be talking about this kind of stuff: Pimp My Data Grid: New things to do with a Data Grid to deliver better application performance, scalability and resilience.