We keep saying that CloverETL has a small footprint and is multiplatform out of the box. So why not test this assertion – in an extreme situation? Instead of just taking the core, we’ll go with the full installation of the CloverETL Server, put it onto the smallest machine possible, and see if it can still perform something useful. Sound like fun? Let’s do it!
(A quick reminder: the CloverETL Server at hand is Java-based, runs inside Apache Tomcat or something similar, and provides runtime, automation, scheduling, monitoring, and parallel execution for the CloverETL rapid data integration platform.)
When I was assigned this task, I immediately thought, “Let’s go hardcore.” Let’s try the cheapest computer I know of: the Raspberry Pi. For around $35, you get 256MB RAM (shared with GPU), 700MHz ARM single core CPU and an SD card reader as a hard drive (model B with an Ethernet port) – all packed into a space the size of a credit card.
Obviously, there is more powerful server hardware out there. For example, there’s the $59 Odroid-U3, which has four 1.7GHz ARM Cortex A9 cores coupled with 2GB RAM, but that would not be the smallest platform. And as I happen to have a Raspberry on my desk, let’s try the Pi right now.
First Things First
I installed Raspbian OS, which is a full-fledged, yet lightweight Linux distribution based on Debian, compiled and optimized for the Raspberry hardware specs. First, I needed Java. To my surprise, the installation of Oracle JDK required running just one single command and that was it. Then I decided to tune the OS a little to maximize the available RAM (For example, by giving GPU the least amount of RAM possible (=16MB), replacing OpenSSH by Dropbear (+10MB), disabling DHCP client, etc.) As you’ll see later, RAM proved to be the main bottleneck.
Nevertheless, at that moment, I had to resolve the question of which application container server to use. I ruled out the standard Tomcat immediately, as it’s too resource intensive, and chose Jetty instead. It has much smaller resource requirements and is also officially supported by CloverETL! Well, it turned out that the only supported version was version 6, which has become obsolete by a few years. But fortunately, it’s still available in Raspbian (thank you Debian for being obsolete!) One more command and Jetty was up and running. It turned out that deploying CloverETL was not as straightforward a procedure as the Installation Guide makes it out to be, but after I installed all additional libraries for Jetty, it did run. For this process, I decided to use MySQL instead of the built-in Derby. It can be optimized to use very little resources and more importantly, does not eat precious heap memory of the main Java process.
Problems with RAM
The biggest problem, which I anticipated from the beginning, was indeed the small RAM size. The latest version of Raspberry Pi (2.0) has double the RAM (512MB), which would have been much, much better. Unfortunately for me, I was working with the older one. Anyway, after careful tuning, I was able to get about 220MB of usable RAM, which is really not much. In order to avoid swapping too often, I gave only 150MB to the Jetty java process as heap (-Xmx option) and was curious if it would be enough for CloverETL. I also added a USB flash drive for swap and data files so the main storage on SD card wouldn’t be affected by swapping.
The Moment of Truth
Now, the moment of truth. How did it perform? I was quite surprised that I was able to start the Server, connect to it, and execute some smaller example transformations that come bundled with it. What a blast! CloverETL Server running on this tiny board with just a few chips on it!
As to be expected, however, more complex graphs started crashing on out-of-memory errors. Also, it was by no means fast, but it worked. Naturally, the performance was not comparable to a desktop PC. It turned out that the omnipresent and painful lack of memory and subsequent constant swapping to the USB flash drive were killing it. The SD card generally has a quite low IOPS of around 100 and its sequential bandwidth of 20MBps also isn’t the fastest. Based on what I measured, I believe it was the combination of lack of memory, swapping, and slow IO due to the virtually absent disk cache that made some of the transformations perform a hundred times slower than when on a desktop.
A Small Footprint Indeed
So what’s the take away message here? I believe the initial thesis still holds true: CloverETL is truly portable and has a very small footprint indeed. Of course, for real world jobs, CloverETL clearly needs something you can at least remotely call a computer (not even saying a “server”). That would be a platform with at least 1 GB of RAM and some “faster-than-RaspberryPi” CPU. CloverETL would actually run just fine on hardware even slightly more powerful – e.g. the already mentioned Odroid-U2/U3 or a similar platform, provided that the workload is light to moderate and that transformations are optimized for low memory usage. I, however, enjoyed this exercise with Raspberry Pi very much and will keep it as a possible new appliance that we could one day sell to a customer. Just kidding.