On Oct. 21 OpenSys released a new version of its ETL tool, CloverETL Designer version 2.8.1. It’s mainly bugfix version but also brings a new component, ParallelReader, that makes delimited data file (CSV) processing faster than ever before.
I decided to make a test and compare ParallelReader’s performance with CloverETL’s UniversalDataReader and also with two ETL competitors Talend Open Studio (3.1.3) and Pentaho Data Integration (3.2.0).
As a testing task I chose simple SQL query and I tried to rewrite it to ETL transformation.
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
where l_shipdate <= date ‘1998-09-03’
group by l_returnflag, l_linestatus
order byl_returnflag, l_linestatus;
This query is one of standard queries used for performance testing of database engines. More info on http://www.tpc.org/tpch/. The dataset for testing was generated by dbgen utility available on tpc.org too. The size of dataset is 725MB.
All transformation was run on my laptop: Intel Core 2 Duo @ 1666 Mhz, 2048 MB RAM, 200GB SATA 5400 RPM, Windows Vista Home Premium 32bit.
Java parameters was set up to
-server -Xmx256m -Xmx1536m.
The results aren’t surprising :
- CloverETL – ParallelReader
- CloverETL – UniversalDataReader
If you don’t trust me you can verify results on your own computer. All transformation graphs and testing dataset are available on rapidshare.com or filefactory.com (200 MB). CloverETL Designer can be downloaded on www.cloveretl.com.
Deeper and more extensive comparison will be published soon. Watch www.cloveretl.com, watch this blog. The latest news about CloverETL are also available on CloverETL linkedin group and CloverETL facebook group. Don’t hesitate and join.