In my previous post I have focused on tips for tweaking the FastSort component – performance sort component available in all commercial CloverETL editions. Today, I would like to touch the original ExtSort component which has been in CloverETL for a while and is available in both commercial and free (Community, opensource engine) editions.
While FastSort is optimized to deliver stunning performance in resource-rich environment, ExtSort is a modest hard worker used to working with limited resources. This proves to be useful especially in tranformations with lots of parallel branches with sorting.
Again I won’t discuss the case where there are only a few records and all sorting can be done in memory. For external sorting, ExtSort organizes its temp data in sorted chunks and places them onto a fixed number of tapes – each tape being a single temporary file. Since ExtSort is single-threaded, multiple ExtSort-s running at once do not consume that much resources as FastSort does.
With ExtSort, there isn’t that much magic as with tweaking FastSort. First what you can do is specify a number of temporary directories where tapes will be stored. This is useful when you’re able to harness multiple physical drives under multiple mount points, otherwise just make sure you have enough free space on the drive.
The only two attributes you can fiddle with is Buffer capacity and Number of tapes. There’s also the Sorter initial capacity but this will be deprecated soon and nowadays acts exactly the same as Buffer capacity. Buffer capacity determines the size of each chunk and thus needs to fit into memory. If you’re familiar with FastSort, Run size is the equivalent parameter there. Increasing Buffer capacity is generally a good idea for boosting ExtSort’s performance, but expect increased memory requirements, although still far from those of FastSort.
Number of tapes defaults to 6 and generally will yield best results, however again, increasing the number slightly does not take too much resources and can help with performance.
Compared to FastSort, ExtSort is pretty easy to configure and will work almost under any conditions. However, there’s always a catch – with ExtSort it is the rather inferior performance compared to its stronger brother.