Not all messaging products deliver the same qualities of services. For example, RabbitMQ does not support transactional handling of messages. This inevitably leads to lost and duplicate data, which may not be a big deal in a monitoring dashboard where few missed or duplicate messages make little if any difference, but in case of financial transactions, health records or other sensitive data this could lead to serious consequences.
There are many products that provide message exchange capabilities. The most widely used messaging product on the market is IBM MQ*, formerly WebSphere MQ, formerly MQSeries. IBM MQ provides configurable levels of Qualities of Services, including non-persistent, persistent and also transactional assured message delivery. Each has its pros and cons, mainly trading performance for reliability. The transactional assured message delivery is slower, but it supports 2 Phase Commit (2PC) protocol (XA transactions) and can be used in combination with database or other IBM MQ servers to reliably transfer messages and gracefully recover in case of software, network and hardware failures. IBM MQ includes its own transaction manager, so in cases when you deal with MQ and DB2 you do not even need an application server and can use the MQ built in Transaction Manager. IBM MQ can also participate in transactions managed by external XA coordinators, such as WebSphere, WebLogic, JBoss, TXSeries, CICS, Tuxedo, Microsoft Transaction Server, etc.
The benefits of true transaction support are its ACID properties. This does have performance overhead because of synch points with the file system, but it vastly simplifies the job of application developer as semantics of working with transactions are very simple:
start_transaction(); ... do the work ... such as put_message() ... or write() or read() from the database, etc. ... and you can get_message() from another queue ... and so you can access any number of 2PC enabled resources commit_transaction();
If any of the resources involved in between start() and commit() above are not available or have failed, the transaction manager enforces ACID properties with no extra effort on the part of the application developer – it will rollback or commit all of the changes or none of them.
In contrast, RabbitMQ does not provide support for 2PC distributed transactions. The argument is that transactions are very expensive in terms of performance overhead (and they often are) and you are better off avoiding them at all costs. In fact, you can’t even rely on ACID behavior in RabbitMQ even when you work with a single queue (ouch!):
I agree that 2PC is not needed 90% of the time in systems were messages are not critical, such as performance statistics, logs, dashboard events, clickstream, chat, message board, stock quote, etc.
What can you do if your messaging server does not support true transaction with Atomicity, Consistency, Isolation and Durability? You have to work around it and use different patterns to try to get some of these QoS back. Some developers build compensating transaction pattern, idempotent message pattern, or build a duplicate message filtering system, eventual consistency, etc., but the burden to develop these often complicated systems is on the application developer and vast majority of these systems still do not provide a guarantee of once and only once delivery. And do not forget that these patterns are not free – there is cost to implement, and there is runtime cost as well. For example, if you are doing message filtering to avoid message duplication, you have to rely on the database or large scale cache to store hash of your messages, and that means disk or network I/O, plus your cache is still not transactional, is it? Do you implement another pattern around the lack of transactionality in your cache? But does your cache support other ACID properties? And so it goes a chain reaction (or Pandora box?)…
The workarounds presented above will work (often only to a limited degree) when you build a new system, but it is much harder to implement these when you need to connect existing legacy systems. Here is an example of one of many possible complications. In the post “Edward Snowden” era, chances are you are encrypting your messages all the way between initial publisher and final consumer. How do you filter and de-duplicate messages that are encrypted when your messaging system is used as a pipe between legacy systems?
In some cases, getting rid of transactions will cause more trouble than it is worth. But I agree that generally speaking, the application architecture needs to avoid XA requirement as much as possible, same goes for REST based applications. As more and more people realize it, they start building new applications with this stateless, idempotent design principles in mind and this is one of the core tenets in microservices architecture. The adoption of these principles explains the rise of popularity of Apache Kafka messaging, which similar to RabbitMQ has not transaction support. I think XA transactions have their place. The cost of avoiding 2PC often ends up being too high as it includes:
- The human cost (and time) of custom development to “fake” 2PC (but not really) or implement multiple patterns working around the problem;
- Performance overhead of custom implemented patterns (from #1 above);
- Cost of a mistake and lost or duplicate data – what is the value of your data in a single message? In a few hundred messages?.
For certain class of systems, these costs can be significantly higher than the savings in performance or savings in software licenses for IBM MQ or other reliable transactional messaging system. I would suggest that you pick the right tool for the right job.
In case you do not have business critical data and can live with non-transactional messages, you may want to consider other aspects of QoS, such as security, performance, manageability, scalability, etc. But these are topics for future posts.
* Note: According to Gartner’s March 2016 report, IBM MQ has 66% market share in the $1.35B messaging market worldwide. VMware/Pivotal was below #15 in “Other vendors”. I suspect that is because Gartner uses license and support revenue to measure the market share and vast majority of RabbitMQ deployments are either done in (a) non-critical systems where support is not needed and thus free Open Source RabbitMQ is being used, or (b) by internet giants where they have dedicated support teams of their own and do not need vendor support.