This article describes results of the persistent messaging performance tests that I carried out in the last 2 months using IBM MQ v8.0 (formerly WebSphere MQ) and Apache ActiveMQ 5.11. Both IBM and Apache have released updates to the software in the 12 months since my previous test in 2014 (you can read more about last year test configuration and results or see performance test video here), hence this new round of tests was in order. In this article I am sharing complete instructions and methodology of my testing in Open Source style, including full set of scripts for install and load generation, so that the reader can run his own benchmark to validate my results.
Despite all of the choices one has for connecting applications and services within and across organization boundaries, message oriented middleware (MOM) still remains the primary choice for architects and developers to reliably deliver messages. There are handful of messaging products on the market, including, IBM WebSphere MQ, Tibco EMS, Progress SonicMQ, Apache ActiveMQ, Pivotal RabbitMQ, etc. On a surface all of these products appear to do the same thing – deliver message from point A to point B in a secure and reliable manner, there are also non-reliable delivery options, as well as publish-subscribe models. However there are significant differences between these products in reliability, security, performance, admin capabilities and cost.
Before you start implementing your enterprise project you need to understand the technical and cost limitations of the software you are going to use. I have described some of the common pitfalls of such decision process in my blog post: “How to NOT buy enterprise software”.
For additional information comparing IBM MQ and Apache ActiveMQ features, such as administration, monitoring, deployment, security, etc. see my presentation from InterConnect 2015.
Performance test configuration
The setup for this test is very similar to the test from 2014, except for the newer hardware and more recent versions of messaging products from IBM and Apache. Because of the newer hardware, the results of my test from 2014 are not directly comparable to the results from year 2015. Please note that back in 2014 my server had 3 independent SSD disks, while the new server in 2015 test had a single SSD, the new server had faster CPU, more memory, more network bandwidth and faster performing SSD disk (the IO of this single disk is roughly similar to the combined IO capacity of 3 SSD disks from the test in year 2014).
Many enterprise usage patterns for message oriented middleware are based on reliable and assured delivery. The findings below represent point-to-point persistent messaging tests. I have not had the time to fine tune and properly evaluate performance characteristics for the non-persistent messaging, so I will leave that as an exercise to the reader.
Performance benchmarking could be complex and time consuming. For one thing, there are many different choices for possible configurations of hardware, software, load drivers, test applications, etc. No matter the choice, I am sure there will be folks criticizing my selections. Diagram below shows configuration of the test environment used for the benchmark.
- Host server used for the benchmark is IBM xSeries x3630 M4 model 7158AC1, 24 cores Intel Xeon CPU E5-2440 0 @ 2.40GHz, 256 GB RAM, single 300 GB SSD drive with IO tested by hdparm at 460 MB/sec.
- Client load drivers and messaging servers ran in VM guests powered by CentOS 2.6.32-504.3.3.el6.x86_64 under the control of VMware ESX 5.0.0 914586.
- Each VM has 8 processor cores and 40 GB of RAM for each of the two server VMs and 20 GB of RAM for the load generator VM.
- Messaging software used in the test is IBM WebSphere MQ v18.104.22.168 and Apache ActiveMQ 5.11.0 running on Oracle JDK 8 update 31 and Oracle JDK 1.7.076 (there was no measurable difference between running AMQ on JDK 8 vs. JDK 7).
- Load driver used for the benchmark is based on IBM Performance Harness for JMS (see Requestors and Responders in the diagram above).
- Virtual machines were interconnected via private isolated virtual network using VMXNET3 virtual adapter with 13 Gbit/sec bandwidth as measured by iperf tool.
- You may want to see the video of how all this works in this blog post (this video shows test from 2014, but the concept is the same, so I did not bother to re-record the video).
Persistent messaging performance comparison
The results you can see in the diagram below were obtained by iterative tuning of both Apache ActiveMQ and IBM MQ. I have tried different settings on the server and on the client, while trying to find the maximum rate of messages per second for both systems. It turned out that for my configuration (your own mileage will vary), the best message rate was obtained with 100 concurrent requestor client threads running IBM Performance Benchmark for JMS (I ran tests with 1 up to 150 concurrent threads). I have tested different message sizes – from 20 bytes up to 10 MB. The message rate for 20 bytes messages was very similar to 256 bytes, so I did not include it in the graph. If you really need to get high performance for very small messages, you might be better off using MQTT protocol and I did not have the time to experiment with it. On the other hand, the performance of 10 MB messages was severely limited by the network and the disk IO, so I did not include it into the results.
Each server instance had 5 request queues and 5 reply queues, overall 20 queues for the benchmark overall = 2 servers * (5 request Qs + 5 reply Qs). Persistent test was run with transactional JMS messages for both WMQ and AMQ. I experimented with a different number of queues (from 1 to 100) to obtain best results for both servers, but the performances between MQ and AMQ were more or less consistent for different number of queues, so for simplicity sake I am only including results for a configuration of 20 queues.
The best performance was found when running 2 instances of MQ Queue Managers and 2 instances of Apache ActiveMQ Brokers. In this test I only had single SSD in my server, but you can get more performance if you add more SSDs and have different queue managers write/read from separate SSDs. Usually you would want IBM MQ to use one SSD for the transaction log file and another SSD for the queue data file. In my test the queue data file shared the SSD with transaction log. If I had multiple SSD disks, I could have been able to run IBM MQ and ActiveMQ even faster than what you see below because all transaction logs and queue data could be split across individual disks and I could have more than two server instances per VM.
I did run several 24 hour tests and encountered issues with AMQ using LevelDB, so I was forced to switch to the KahaDB for testing true sustained performance. In any case, KahaDB is the default and recommended persistence engine in the ActiveMQ 5.11. Surprisingly, the performance of LevelDB was no better and in some cases worse than the KahaDB. The Apache ActiveMQ support forum provided no answer to my questions on this subject. Nor did I get any support from the Apache forum on several other questions I posted there on the recommended configuration of LevelDB, configuring JVM options for best performance in AMQ environment, etc.
I should mention that ActiveMQ works fine with KahaDB when no failover is required, but if you need true high availability in a clustered environment with no lost or duplicate messages, ActiveMQ with KahaDB will not work as shown in this report and these videos. You would likely need to use JDBC store for ActiveMQ messages as described in the docs. However significant drawback is that this JDBC store will run significantly slower than KahaDB. You trade reliability and failover for speed. I have not tested JDBC store for ActiveMQ and can’t say how much slower it is, but considering the significant additional footprint of the database, networking, setup, security, configuration, maintenance it seems like a very steep price to pay for the “free” ActiveMQ messaging server – steep price not only in terms of hardware and licenses for the database, but also the labor costs to maintain all that setup.
Each test was run at least three times for the period of one hour and the average result is shown below:
Production workload would normally have many different message sizes, but it is easy to approximate the performance using numbers above as a reference. It is possible to change individual tuning options in WMQ and AMQ to make certain messages sizes run a bit faster at the cost of slowing down other message sizes. After many hundreds of different permutations of tuning options I settled on the one that gave overall best performance for both MQ and AMQ – the reason being that production workload usually has a mix of different message sizes.
As you can see, in my tests IBM MQ 8.0 was 42% to 108% faster compared to Apache ActiveMQ 5.11 for persistent tests. What does this mean? You would need to use up to 108% more resources to run the same workload on ActiveMQ as you would on IBM MQ, which in turns means for ActiveMQ configuration:
- up to two times more hardware cost
- up to two times data center space
- up to two times more cooling
- up to two times more power
- up to two times more software installed (and additional cost if you buy ActiveMQ support)
- up to two times more administration cost to manage all of the above
Even if all other functions of IBM MQ and ActiveMQ were equal (and they are not – as you can see in my other post), the mere performance gains may pay for the cost of the IBM MQ licenses and be worth the investment. You do not have to trust my word. I made all of the scripts and tuning options available for all to download and try via this GitHub project.
How credible is this test?
I work for IBM, but I have done my best when tuning IBM MQ and AMQ, have read publicly available performance manuals from IBM, Apache and Red Hat. Having said that, there is no guarantee that I achieved optimal performance for either MQ or AMQ. I am sure someone in the IBM UK Hursley Lab or someone who is an Apache ActiveMQ performance expert can make things go a little bit faster. How much faster? Another 10%, 20% – I do not know. But given that I have spent several weeks tuning both systems, I would not expect the average end user of MQ or AMQ to get better results than mine. If you want – feel free to look at my scripts and settings and make suggestions on how to improve upon this work, I am happy to listen and rerun the tests. All of this is 100% in the open for independent review.
The detailed description of the testing methodology can be found in this post.
Installation and setup
In the full spirit of openness I have published all of the configuration settings, tuning options as well as automated shell script to generate the load. In the followup blog post I will publish detailed installation instructions. For now you can see installation instructions for the 2014 test in my google doc.
I am lazy (in a good way), therefore after having worked on this project for a bit, I quickly got tired of manually changing client and server configuration, starting and stopping servers, synchronizing TCP and load driver tuning options across all of my VMs, etc. So I wrote an automated script that does it all in one step. To execute fully automatic test all you need to do is to login into the client VM and run this command: ./run.sh. This command will copy the latest configuration files from your client load driver to both servers, start queue managers, start responder threads and start iostat command to log CPU, memory and disk usage into the log files on all three VMs, start requestor threads, iterate over multiple message sizes and tuning settings and finally consolidate performance results from the multiple output files into a single number per test. Phew, that was a lot of automation 🙂.
For those not interested in reading install doc, you can simply have a look at the tuning configuration and load scripts in this GitHub project.
PS. I found that writing automated scripts was the biggest fun in this project. I could barely restrain myself from automating ad infinitum – creating neural network model for automatically finding optimal settings across all of the hundreds of possible tuning variables.