CloverETL 3.5.0

It's a new year and with it comes a new production version of CloverETL! We've put together and tested features and improvements from the previous two milestones, and have also added quite a bit on top of that.

CloverETL 3.5 brings a brand new Data Quality package, including a very useful Validator component. There are also a number of improvements on the Server front - usability and monitoring improvements come along with enhanced management of the Cluster, and there are some neat new perks for administrators as well.


Released Jan 22, 2014
  • Production
Recent Important Releases
3.5.0 Jan 22, 2014
3.4.3 Nov 27, 2013
3.4.0 May 5, 2013
3.3.1 Feb 1, 2013
3.3.0 Oct 23, 2012
 
See all releases


Interested in trying it out? Learn how to Get Started or sign in to download.


ETL Developers (This item is most useful for developers; it either brings new functionality for transforming data or brings optimizations)

Administrators (This item is an improvement or feature that will help setup, install, administer and manage the application)

Support staff (This item helps staff supporting the production operation to identify potential problems or avoid such)

Cluster   [  See All  |  Hide All  ]

Cluster status monitoring Cluster New 3.5.0-M1
In addition to sending its health status over the network, the Clustered nodes now store the information about their health in the metadata database.We use this information to better report issues related to unresponsive nodes, network communication, and incorrect node configuration.
Database connection robustness Cluster New 3.5.0-M1
We've added more robust handling of potential errors in connectivity to node's metadata database. The node will even try to re-establish the connection to the database if it's lost completely.This should prevent failures when the node gets disconnected from the metadata database. There are various scenarios when disconnects can occur: when the connection is unreliable, e.g. the database is in different data center; the database is under a high load; or there is a network failure between node and database.
JGroups port discovery and negotiation Cluster New 3.5.0-M1
To simplify the setup and administration of the Cluster, nodes now automatically negotiate and discover ports for asynchronous messaging over JGroups. Negotiated port values can be found in the Monitoring screen.

Server/Cluster Administration   [  See All  |  Hide All  ]

Configuration migration Server New 3.5.0-M1
The configuration migration allows the Server administrator, support staff, or an ETL developer to export current configuration from the Administration Console (e.g. users, sandboxes, scheduling) to an XML file. The file can be imported back to the Server to apply the configuration. Users can freely edit the XML before importing, only keeping selected parts of the configuration. The Server additionally supports a dry-run of imports to validate modified configurations or test conformance to a given configuration.The exported configuration is useful for several administrative tasks:
- Versioning of Server configuration: The XML file can be placed under version management system in order to quickly restore configuration of the Server or Cluster, e.g. user & group, temp spaces, etc.
- Versioning of ProjectcConfiguration: Project-related configuration can be version controlled together with ETL and jobflow graphs.
- Deployment between environments: The configuration migration allows for transfer configuration (schedules, event listeners, launch services) between environments (e.g. from Development to Test, from Test to Production)
- Cluster configuration and validation: In a clustered configuration, the import transparently configures all nodes connected to Cluster. The dry-run feature can be used to quickly discover differences from the expected configuration; if a dry-run is showing differences, it means the cluster node has been unexpectedly modified.
Monitoring UI Server New 3.5.0-M1
Revamped monitoring screen provides a quick overview of the Server or Cluster node's health and performance, as well as key node configuration and environment settings.The monitoring screen benefits multiple types of users:
- Admin/Installer: These users can use the new monitoring screen to verify that nodes are using the correct configuration and that the configuration (e.g. network and database connectivity) works correctly
- Support staff: The support team can use the monitoring screen to quickly find out about the health status and issues in the Cluster group or individual nodes.
- ETL developers: The developers can use the information from the monitoring screen to find out current load and memory consumption on each node. This is useful especially when developing and benchmarking partitioned ETL transformations spreading across multiple Cluster nodes.

Connectors   [  See All  |  Hide All  ]

Mongo DB Engine New 3.5.0-M1
We've added a dedicated connection and components providing full access to the popular NoSQL database MongoDB: MongoDBReader, MongoDBWriter, and MongoDBExecute – for both the hosted and locally installed versions.

These components allow users to read and write MongoDB collections, invoke aggregation commands (aggregate, count, distinct), and remove commands. The writer supports bulk loading data into MongoDB using the bulk insert API. The executor component executes JavaScript code against MongoDB database (using eval command) and retrieves potential result or information about errors.
The most common use we see is receiving data from MongoDB and integrainge them into one's reporting (decomposing MongoDB JSON into relational structure, data cleansing and loading into reporting database). We've also had several cases of migration both from and to MongoDB.
Exasol connectivity Engine New 3.5.0-M2
New JDBC connectivity to ExaSolution analytical database. Provides support for reading and writing data into Exasol, metadata introspection, invoking built-in analytical functions, as well as interacting with EXAPowerlytics scripts. High performance bulk loading is supported using built-in IMPORT command, either from local server or from remote CloverETL Server/Cluster over WebDAV protocol.The new connector helps to quickly establish data interchange between operational systems and the analytical data store in Exasol, perform SQL orchestration using Jobflow ETL and pushdown scenarios, and distribute distilled analytical datasets from Exasol to destination systems.
HTTP Connector - streaming and binary data support Engine Improvement 3.5.0-M2
Binary data access
Besides sending and receiving data in text format, HTTPConnector also now supports binary data (using CloverETL's byte and cbyte field). The binary access to data is available via new virtual fields "requestContentByte" and "contentByte". The field "requestContentByte" allows sending an HTTP request in binary format and is accessible via the Input mapping attribute. The field "contentByte" makes the HTTP response available and can be accessed in the component's Output mapping attribute.

Streaming
The HTTPConnector is is now able to upload large files using HTTP Chunked transfer encoding.
Sending and processing binary data from HTTP endpoints is useful for processing BLOB-like data (images, documents, archives, videos) which must often be transferred without change. The streaming transfer reduces memory allocation when uploading large data files and makes data immediately available for processing on the HTTP endpoint.
JSONExtract Engine Improvement 3.5.0-M2
A streaming parser driven by graphical UI capable of processing large volumes of JSON data from files or RESTful services. The parser is able to generate JSON structure from sample data or use existing schema (in XSD format). Supports regular objects, nesting as well as JSON arrays.
The component can read data directly from HTTP endpoints or from files. If the RESTful service requires advanced communication, the JSONExtract can be coupled with HTTPConnector to handle the transport part.
JSON is a popular data exchange format with today's RESTful webservices and Cloud API. Some of our customers, however, also want to process large JSON files (offline data from social networks, logging events, data from JSON database) which requires stream-based parsing to avoid high memory consumption. Besides stream-based parsing, the component also offers a convenient graphical UI where even complex JSON messages can be easily decomposed.
XMLWriter: Support for CDATA Engine New 3.5.0
When outputting XML document, it's now possible to create CDATA elements and populate them with data.CDATA elements have lost on popularity; however, they can still be used by legacy applications consuming XML or in cases when one XML document is embedded to another.
XMLReader: Mapping editor recognizes schema from xsi:xmlSchemaLocation Engine New 3.5.0
Besides user-defined schema, the visual editor in XMLReader now automatically loads and displays the structure of an XML document based on schema specified in the document's xsi:xmlSchemaLocation attribute.Loading the schema directly from xmlSchemaLocation attribute simplifies the set up of the requirements for the visual parser and renders the document structure based on actual XML schema.
HP Vertica connectivity Engine New 3.5.0
Provides connectivity to popular HP Vertica analyitcal database, providing full read, write, execute and metadata introspection access. Bulk loading is possible via COPY SQL statement.HP Vertica is popular as an analytical datastore and is often populated from Hadoop data sources as a store serving reporting layer in data warehousing and analytical scenarios. The new connectivity allows users to load data into Vertica, orchestrate analytical SQL queries, and populate data marts.

Security   [  See All  |  Hide All  ]

Audit log Server New 3.5.0-M1
Audit log is a separate Server log capturing information about user's (administrators) access and (potentially harmful) activities on a Server instance. The information stored in the audit log can be used for post-mortem or real-time access and security auditing. In default mode, the Audit log contains information about access and operations that change or affect server configuration. For high-security environments the audit log can also collect information about any operation retrieving the Server's configuration (note: This significantly reduces Server performance and response time).

The audit log events contain the following information:

- event timestamp
- user name/session key
- user source IP address
- operation name
- operation input parameters
- operation final result/error message
The audit log can be used by an administrator to track issues and configuration changes to user activities (post-mortem analysis). The audit log can also be forwarded (e.g. via log4j syslog appender) to remote machines where it can serve as a real-time feed to fraud detection systems.
SFTP key-based authentication Engine Improvement 3.5.0-M1
In addition to user/password authentication, we also now support authentication using standard SSH keys when accessing files in Reader, Writer, and File Operation components using the sftp:// protocol.
When an ETL or a Jobflow job utilizes SFTP with key-based authentication, CloverETL automatically searches the directory ssh-keys in the project where the job is located and attempts to authenticate using key file stored in the directory. If multiple key files are located in the directory, we follow the SSH protocol specification and try up to 6 SSH keys before failing to authenticate.
With key-based authentication, it is not necessary to specify user and password in the ETL – this increases security, as the job does not directly contain any sensitive information.
Secure parameters Server New 3.5.0-M2
This Server-only feature allow sfor storing sensitive information needed in jobs such as user names, passwords, authentication tokens, or private URLs in encrypted form instead of plain text. The plain text value of a secure parameter is never displayed by the Server or Designer UI . A parameter value, however, is automatically decrypted during a job execution. It's also possible for any developer to obtain the original value of the parameter via a CTL function, so Secure Parameters are not protected from ETL developers.

The encryption and decryption process requires a master password, stored in the CloverETL Server database. Only users with specific permission are allowed to set or modify the master password.
The Secure Parameters allow for storing sensitive information in text files (parameter files, job definition) in an encrypted form instead of plain text. This is usually one of the basic requirements of IT security. With the current implementation, Secure Parameters provide a lot of flexibility, as they are not restricted to passwords only and, therefore, can not only beused at any place in the job (e.g. URL attributes, CTL mapping) and but also accessed programatically. However, this means that parameter values are fully accessible to ETL developers.
Configurable value encryption (BouncyCastle) Server New 3.5.0-M2
The CloverETL Server allows administrators to configure an algorithm that will be used for Secure Parameter value encryption. Any PBE (Password Based Encryption) algorithm can be used. The algorithm can be one of PBE algorithms provided by Java platform, or the Server can support configuring an alternative cryptography provider, such as the Bouncy Castle cryptography library.The default encryption algorithm is PBEWithMD5AndDES, as it is available in all versions of the Java Virtual Machine, does not require any additional dependencies, and can be distributed without limitations (US Export regulations). However, this algorithm is cryptographically weak, and some administrators may want to use stronger algorithms. But to do so, Oracle Java requires JCE Unlimited Stregth Jurisdictions Policy Files, which must be installed directly into system's JVM. Another option is to use the Bouncy Castle cryptography library. The library only needs to be on the application server classpath, so it does not require additional changes in JVM.
Password encryption in Server configuration files Server New 3.5.0
Server administrators are now able to encrypt passwords (database, JMS, SMTP server) or values of any other sensitive configuration attribute. The value is encrypted using the secure-cfg-tool utility available for download together with Server WAR files. NOTE: This is a different functionality than Secure Parameters, and it only works within a Server configuration files (not the conf# prefix in encrypted value). The values decrypt on-the-fly when Server starts.Encrypting sensitive values in Server configuration files adds additional level of security along with protecting the files via permissions etc.

Data Quality   [  See All  |  Hide All  ]

Data Quality Package + Validator Data Quality Major Feature 3.5.0-M2
The Data Quality package provides essential components for implementing data quality processes in CloverETL - the Data Profiler and Validator. The Data Profiler helps discover statistical properties of data sets before ingesting them with data integration processes, as well as inspects the statistical qualities in-flow via ProfilerProbe components. For continuous measurement and collection, the Data Quality package also contains a license enabling the Data Profiler Reporting Console, where profiling results are automatically uploaded and can be easily inspected by users via Web Browser or access via open JSON API.

The Data Quality package also contains a license for the Validator: a component dedicated to complex data validation using business rules. The Validator provides convenient UI, support data validation and conversion with built-in business rules, CTL expressions, as well as user-defined rules and grouping and conditional validations. The component also automatically generates error reports containing information about records which did not pass validation with information about location, data, rule,s and parameters that lead to the record's being rejected.
The Data Profiler is a handy tool for quickly inspecting datasets before processing them. Finding value distributions, duplicates, sizes, and minimum and maximum values speeds up development and helps with the definition of the structure for data sources and targets. The Profiler also can yield critical information such as number, phone, or data formats, text patterns, and special characters. ETL developers can use this information directly in Validator to handle common discrepancies with data or enforce data patterns required by downstream systems.

The Validator can be seen as a "data quality firewall" that detects, removes, and partially corrects invalid data. Any rejected data is conventiently logged with sufficient contextual information, making it easy to correct with automated tools or to send for manual correction by users.
Validator: If-then-else conditional validation Data Quality New 3.5.0
A new construct that allow creating conditional validation. (e.g. IF countryCode="US" THEN validatePhone(phone,"US"))The conditional validation can reduce unnecessary validation steps and remove unwanted error messages.
Validator: Output phone number formatting in PhoneNumber rule Data Quality New 3.5.0
The new setting for the PhoneNumber rule allows developers to specify an output formatting of valid phone numbers.When a phone number is stored in the form of text strings, the formatting can improve the number's readability and adherence to international standards.
Validator: Rule descriptions Data Quality New 3.5.0
Any instance of the validation rule used in Validator now contains a separate field for a business description.The field is useful for documentation purposes as well as helping other team members to get acquainted with more complex validation logic.
Validator: CTL return codes Data Quality New 3.5.0
Validator now handles return values from CTL transformation rules:
- validation rule now will never return INVALID – always NOT_VALIDATED (a neutral validation result)
- transformation exceptions are propagated to the component and abort the whole graph
- return SKIP does not save the output values
- return STOP breaks the validation
- getMessage() is used
The return value gives developers more control in guiding the flow of the validation. The getMessage() function lets developers set meaningful error message from CTL rules.

Parameters   [  See All  |  Hide All  ]

Parameter files use XML Engine New Compatibility 3.5.0-M2
Starting with version 3.5, the parameter files are stored in XML format instead of the Java properties file format. Parameter files with thed old format will be automatically converted to the new XML format upon saving.The new XML format allows us to easily add additional information to the parameter, such as the secure flag or a description field.
Parameters have description Engine New Compatibility 3.5.0-M2
Every parameter can have a business description (such as purpose, format, default value), simplifying its administration.The parameter description is visible in the UI - the support team or other ETL developers can use the information when reusing your job or calling it from Jobflow.

BigData   [  See All  |  Hide All  ]

Tested CDH 4.4.0 & 4.5.0 Engine 3.5.0
Connectivity to Hadoop stack, including HDFS, Hive, Impala and Map-Reduce Jobtracker was tested with Cloudera CDH 4.4.0 and Cloudera CDF 4.5.0 releases.As Hadoop technologies quickly evolve, we test and update our connectivity with recent releases.
Tested Impala (JDBC) Engine 3.5.0
The Impala JDBC connectivity allows for sending SQL queries and receiving response from Cloudera's Impala service, which is part of the Hadoop ecosystem.Under Cloudera CDH distribution, the Impala offers one of the fastest querying mechanisms with support of a fairly large portion of SQL.

CTL   [  See All  |  Hide All  ]

CTL error reporting Engine Improvement 3.5.0
When any component using CTL encounters an error during its execution, it prints additional information together with error message, including the violating code and values that caused the error.Additional information about the error location and violating values help developers and support teams quickly identify problematic CTL code or illegal data records on the input.
getRawParamValue() Engine 3.5.0
The getRawParamValue() and getRawParamValues() functions provide access to Secure Parameters without decrypting their value.Retrieving the encrypted value can use useful when dynamically creating parameter files while maintaining security or storing the value in secure manner outside CloverETL (e.g. in database table or JMS message) for later use.

Known Issues & Compatibility   [  See All  |  Hide All  ]

Changed behavior of string functions (e.g. isNumber) failing on empty or null strings. CLO-945 Compatibility 3.5.0-M1
Changed behavior of base64byte() function CLO-884 Compatibility 3.5.0-M1
Changed record count field to "long" in the Profiler - might produce incompatible integer/long in metadata coming from ProfilerProbe and ExecuteProfileJob. CLO-884 Compatibility 3.5.0-M1
CloverDataReader/Writer data files not compatible between version 3.4 and 3.5 when these components are used in a mix of jobflows and transformation graphs. Read comments in related issue. Jobflows CLO-1382 Compatibility 3.5.0-M2
Functions ceil() and floor() return decimal instead of number for parameters of type decimal CTL2 CLO-2005 Compatibility 3.5.0-M2
Some conversion functions return NULL instead of throwing an exception CTL2 CLO-1586 Compatibility 3.5.0-M2
Some date functions return NULL instead of throwing an exception CTL2 CLO-1584 Compatibility 3.5.0-M2
Null 'case' values in switch() are now allowed CTL2 CLO-737 Compatibility 3.5.0-M2
Function get() for lookups now always returns NULL for keys not found or NULL key, exception if unknown field is requested. Before, there was a difference between compiled and interpreted mode - returning null or exception respectively CTL2 CLO-1582 Compatibility 3.5.0-M2
Fixlen data reading changed in regards to automatic trimming. Read the related issue CLO-1405 Compatibility 3.5.0-M2
Built-in MySQL JDBC driver updated to version 5.1.26 (formerly 5.1.22). The new version is optimized for MySQL 5.6. CLO-1886 Compatibility 3.5.0-M2
FastSort defaults have been changed. Maximum open files is now 1000 by default, Number of sorting threads is 1 CLO-1775 Compatibility 3.5.0-M2
Old version of Designer on Windows is automatically uninstalled when new version is installed (only versions 3.5.0-M2 and newer are uninstalled) Designer CLO-1028 Compatibility 3.5.0-M2
Handling of null/empty values in Validator Data Quality CLO-1162 Compatibility 3.5.0
ParallelReader error metadata no longer uses integer for record number CLO-2509 Compatibility 3.5.0
EmailReader: Fixed inconsistent behavior on POP3 vs. IMAP CLO-2501 Compatibility 3.5.0
Validate names of graph parameters CLO-2235 Compatibility 3.5.0
XMLExtract: explicit mapping to have priority over implicit automap by name CLO-2029 Compatibility 3.5.0
RecordFilter interface declares two isValid() methods CLO-2668 Compatibility 3.5.0
str2bits does not check validity of its input string CLO-2022 Compatibility 3.5.0

Data integration software and ETL tools provided by the CloverETL platform offer solutions for such data management tasks such as data integration, data migration, or data quality. CloverETL is a vital part of such enterprise solutions as data warehousing, business intelligence (BI) or master data management (MDM). CloverETL’s flagship applications are CloverETL Designer and Server. CloverETL Designer is a visual data transformation designer that helps define data flows and transformations in a quick, visual, and intuitive way. CloverETL Server is an enterprise ETL runtime environment. It offers a set of enterprise features such as automation, monitoring, user management, real-time ETL, clustering, or cloud data integration.