Category Archives: monitoring

Coherence across a WAN? Push Replication rocks!

We’ve done a lot of work in the past six months to simplify how to use and deploy Coherence Data Grids around the globe.  It’s always been possible to do these things – Coherence provides some great infrastructure like *Extend to do this – but we’ve lacked a concrete framework.  The Push Replication Pattern is making some serious in-roads to solve some of the challenges faced when designing a globally distributed Data Grid.

There are some great advantages of this pattern;

  1. We’ve provided the complete source code for it.  You can embed, change, enhance it as you like.  No restrictions.
  2. It provides support for completely asynchronous, but guaranteed in-order updates (with batching) between multiple-sites (Data Grids).
  3. It provides a completely pluggable infrastructure layer to programmatically resolve data conflicts between sites.
  4. It avoids the classic problems of other approaches, like explicitly setting up dedicated and single point of failures with Mirror Services, Gateways or routers.  The solution simply embeds in your application.
  5. It’s completely monitorable via JMX.
  6. It supports almost every type of WAN replication/synchronization scenario, including; one-to-one (uni-directional), one-to-one (multi-directional), many-to-one (uni-directional “centralized”), one-to-many (uni-directional “hub”), many-to-many (multi-directional “mesh”) architectures – with all of the above mentioned guarantees.
  7. It’s not just a theoretical pattern – it’s in production in several large projects.
  8. We’re constantly enhancing it based on customer demand and feedback.

I’ll leave you with this quote:

“The [push replication] tools are working very well.  This is very good news for us, as it allows us populate and re-populate caches across the globe in a safe and consistent way.  Thanks…”

QCon: In Finance Exchange (free event)

Ok… I’m a little late in blogging this, but on Wednesday next week (8th of October) I’ll be speaking at QCon: In Finance eXchange on the topic of Patterns for managing order books and reference data on a global basis.  Over the past few years I’ve been involved in numerous commercial projects (predominantly with tier-1 investment banks) that face the challenge of “how to manage and keep multiple globally distributed clusters containing fragmented order books and reference data (interconnected by potentially unreliable wide-area-networks) in sync (close to real time)”. Or similarly, how to keep your Disaster Recovery site(s) in sync and potentially in active+active configuration at all times.

While most of these challenges have been solved in the past using a combination of technologies (including messaging platforms, enterprise service buses, database log shipping and Oracle Coherence), several recent implementations have solely been based on Oracle Coherence, providing a simple, elegant, high-capacity and close-to-real-time solution without the need for additional servers or infrastructure.  In the talk, I’ll cover some of these new patterns.

Even if you’re not interested in this talk, there are two things I think you’ll really like about this event; a). it’s FREE and b). the speakers are well known (perhaps not me… it’s a privilege to be invited to talk!)… but people like Rod Johnson (of Spring Source) and Eric Evans (Domain Driven Design guru) and many others will be there to share some of their insights.  

If you’re in London, work in Financial Services (I hope you’re doing ok!) and have some time free, drop by.

Next Generation Coherence Data Grid Monitoring with Evident ClearStone Live

Monitoring any data grid running on a few physical servers is usually a pretty trivial task.  Most solutions usually provide some kind of “console” or “gui” that is capable of displaying a few simple values, or it’s usually easy to setup JMX on each JVM and use say JConsole etc.  The real challenge arrives when you scale-out a data grid past a few servers (often what people do with Coherence – like 20, 50, 200 or 500+ JVMs) and you need to “visualize” what is happening, especially in real time (ie: with say second accuracy).  To be honest, doing this with JConsole just doesn’t cut it for big clusters.  What you might use to monitor a few servers or JVMs often doesn’t work when you have a few hundred.  GUI design alone becomes and issue – how do you visually layout information so that it’s useful?

One of the nice features of Oracle Coherence is that it provides almost every statistic imaginable through an out-of-the-box clustered implementation of JMX.  What is clustered JMX?  It basically means that you don’t need to hook-up a JMX console/connection separately to monitor each JVM in a cluster (a complete pain), but you can connect a single JMX console to a single JVM in a cluster and via that JVM access all of the aggregated information about every JVM in the cluster (regardless of how big the cluster is and without having to reconfigure it as the cluster size changes at runtime).  While this makes collecting information about a large cluster just as easy as monitoring a single JVM, visualizing the relationships and potential correlations in a complex clustered system often requires more that something like JConsole.

While there are several options available for visualizing JMX information presented by solutions such as Oracle Coherence, including the impressive SL RTV (real time view), Wily Tech Introscope and Oracle’s own Enterprise Manager, a new player has entered the market in the form of ClearStone Live from Evident Software.

As presented at the last Oracle Coherence SIG (in London) by Rob Minaglia and Ivan Ho of Evident Software (an Oracle Partner), ClearStone Live has been explicitly designed from the ground up to manage and visualize large volumes of real-time grid-based information, especially like those that use Oracle Coherence, in a simple and efficient manner.  

While it may seem relatively straight forward to visualize and graph information about a grid, one of the biggest challenges (as explained by Ivan) is how to collect, store and report on that information in a real-time manner (say with a second accuracy).  They had to build infrastructure to cope with these kinds of demands, both for real-time capture but also for real-time interactive visualization.  As explained, basically it’s pointless to be performing 1000’s, 10,000’s or 100,000’s of transactions per second if you can only monitor a system in 30 second (or greater) snap-shots.  

To achieve the kinds of performance and throughput required by customers, Evident Software adopted a novel approach – ClearStone Live uses its own Data Grid (based on Oracle Coherence) to manage data.  That is, ClearStone Live uses (embeds) Oracle Coherence internally to manage and report on up to 24 hours of real-time information.

Ok that’s cool, but probably the most impressive part in the sneak preview was the extremely rich interface (based on Adobe Flex)… oh and the support for simultaneously monitoring multiple clusters – perfect for multiple grid-based applications or clusters running on multiple/remote sites!

Here are some quick screen shots from the live demonstration of “Live”.

The first visualization is what Evident Software calls “a health visualization” (see below).  It essentially shows a quick view of the number of objects, caches, servers, clients, connections, memory utilization and capacity in a Coherence Cluster.

 

Of course there are a whole bunch of metrics you can select to have displayed in your “health visualization”.

Additionally you can also display the performance characteristics – of course configurable from a variety of sources – all in the one chart.  The neat feature here (like in most places), was the ability to scroll backwards in time over through collected information and dynamically reconfigure the charts on-the-fly.

But one of my favorite and possibly the most useful interactive features is that of the annotated charts. They are a bit like the Google Financial charts in that as you scroll backwards through time, ClearStone Live will highlight important events that occurred in the life of the cluster – for example a new server joining – so that you can correlate those events to the impact on other parts of the system. 

And last, but certainly not all of the available visualizations, was the “heat map”.  This was beautiful in its simplicity in that you could use it to highlight “heat” on either a Cache, Service or JVM level, including the ability to control the color ranges and thus how “hot” things appeared.

While the product is still in alpha (our viewing at the SIG was only a sneak preview), it was really impressive.

I’m certainly looking forward to being able to visualize both application specific and custom MBeans, correlated against cluster-wide Java Platform information (like JVM GC’s etc) when this and Coherence 3.4 become publicly available.

But most importantly, it’s yet more options for those using Coherence to perform enterprise level monitoring.