CloverETL Data Quality – An Introduction to Validator

validator_available_rules_active_rules

Before your data enters the ETL process, it's in your best interest to only work with "good" data – that is, data that conforms to its respective domain rules, ranges, allowed values, and perhaps other unusual restrictions. If it doesn't, you'll want to log and remove all erroneous records so as not to pollute your transformation, as well as to have means to report and fix the data later on.[Continue reading]

Data Profiling with CloverETL Profiler beta

The process of data integration, data migration, consolidation and other data manipulation projects consists of a variety of steps and tasks. Javlin supports many critical tasks within these projects with a versatile ETL tool that provides technical solutions to transform data and connect different systems and data sources with various data formats.[Continue reading]

Address Cleansing and Transliteration with CloverETL and AddressDoctor

Process

Data quality usually goes hand in hand with data integration. The new version CloverETL 3.1 has enriched its data cleansing capabilities through integration with AddressDoctor. AddressDoctor contains address and geo data for more than 240 countries all over the globe. Along with correcting and fixing mail addresses, AddressDoctor can also be used for transliteration of non-Latin writing systems into Latin characters or enriching addresses with latitude and longitude information.[Continue reading]

Data sampling with CloverETL

SimpleRandomSampling

Testing data transformations is generally not an easy task. When creating and testing a transformation you might want to get a data sample to check if your transformation works properly. In this point a question arises: How to create a representative data probe on the full data set? Obviously, the easiest way is to read just part of data from the beginning. But such data sample can be very unreliable.  I've prepared a few simple graphs that create a data probe which can be regarded as representative for the full data set.[Continue reading]

Spell Checking for Better Data Quality – AspellLookup Table

AspellLookupTable in action.

AspellLookupTable is a commercial lookup table which has been around since CloverETL 2.6. Because Aspell is a free software spell checker, you might  be wondering what it is used for in CloverETL. In fact, AspellLookupTable does not perform any spell checking at all, it "just" allows you to lookup data records with keys similar to the one you provide. This may be useful e.g. when looking for a street whose name is misspelled to a certain extent.[Continue reading]

Data Profiling with CloverETL

BasicStatistic

Before you start to develop any data transformation you should explore your data (make data profiling). There are a lot of tools on the market that can help you. But why to install and learn another software when you can use the tool you are familiar with? CloverETL is mainly data transformation tool but it can be easily used for data profiling as well (as I will show you in this blog post).[Continue reading]