CloverETL Cluster

Parallel data integration, big data, robustness, and reliability

What is the Cluster?

A high-performance offering for big data, parallel data processing, and robustness

Arrange a Demo ›

Contact a representative to get a live web demo

Learn about the Cluster in
a short video

CloverETL Cluster Performance Scaling

Parallel Data Integration Platform

CloverETL Cluster delivers the very best in scalability and price/performance. The Cluster serves the needs of organizations having to process huge volumes of data either in a limited period of time or mission critical environments.

Whether you have one large dataset or a large number of smaller datasets, CloverETL Cluster aims to provide near linear scalability as you add servers to your cluster.

Online Demo ›

Performance

Always meet processing deadlines

Do you have so much to do with your data and just a limited time window? Being constrained by recurring deadlines for data processing is an important factor in organizations with complex workflows or a need for a quick data turnaround.

CloverETL Cluster is designed to scale the processing power to match exactly your needs, so that you can fulfill tight SLAs get job on time.

Robustness

Failover safety

In a mission critical environment you want to make sure that the tools you're using work at 100% all the time. That includes having measures to reduce downtime caused by hardware failure or software malfunction.

Building an infrastructure around CloverETL Clusters helps you automatically rebalance the cluster should any of the nodes go down, report the problem and bring it back on when possible.

Scalability

Future-proof decision

Going with a tool that supplies an entire organization's data needs is not an easy task. Making sure, that the toolset you choose can grow with you and your needs is a key decision. On average, an organization sees 40% to 60% data volume increase each year.

With CloverETL Cluster putting additional capacity to the setup is a seamless process with no additional extra cost for redesign or migration.

Big Data Analytics

Reducing data into small, valuable results

With the emergence of affordable technology, both hardware and software, processing big data comes from a niche of highly specialized teams to the hands of vast majority of businesses, opening new possibilities of getting the most of value of company's data assets.

CloverETL enables users to embrace the opportunity with long-proven tools and features. The CloverETL Cluster is a perfect scalable solution to both growing volume and complexity of the data and required processing.

In a usual big data scenario, huge volumes of data need to be processed in a limited time frame to produce a much smaller data set that can be further analyzed and used, typically for reporting. With the Cluster, you are able to scale the processing power exactly to match your current needs, with the option to add extra nodes as you grow.

CloverETL can also act as an ideal ingest-transport agent for other big data processing tools such as Apache Hadoop or Hive, Teradata, modern No-SQL databases, etc. through its extensive connector library.

Processing Huge Data

Transforming huge amounts of data

CloverETL is an full-fledged ETL tool, scalable to huge amounts of data. Not all operations fit into the commonly understood "big data" reduce strategy. Sometimes, you just have lots of data and need to transform it all, not get rid of it. Sometimes is just too much work and it needs to be done in time, with all its complexities.

That's where a powerful ETL tool needs to come to play. Only a robust and scalable ETL platform can efficiently process large data sets, with complex business logic that needs to be documented and easily accessible for future reference, with process orchestration, monitoring and recovery.

Check out how CloverETL works in a technical presentation showing the concepts of parallel processing, redundancy, scalability and load balancing

Distributed Data, Distributed Process

Smart performance optimization

In a cluster environment, the network between nodes is a fragile asset. Use it incorrectly, and network congestion can easily kill any performance gained by distributing the process across multiple nodes.

Data Locality Matters

The CloverETL Cluster avoids overloading the network by intelligently selecting where each part of the process takes place. The closer it is to the actual location of the data, the less network overhead it takes to transfer data to its destination.

Partitioned Storage

In the Cluster, you can load and manage files in a partitioned manner—parts of the data set are spread accross predefined nodes for further processing. This cluster partitioning process can be done either during the load phase or even outside the box using an API.

Parallel Processing

Running a transformation on partitioned data makes the Cluster run the process for each part on the very same node it is located on. This minimizes network traffic, avoiding unnecessary data exchange. Manual override is possible so that you can manually define where processing should take place - for example, using a beefy box to do complex data processing. In that case, the Cluster will send data from remote nodes to that particular one automatically.

Reliability

Redunancy geared towards high availability

The Cluster is great way to achieve high availability of the data integration service. The automatic load balancer keeps track of node readily available and plans incoming job requests to balance the load across the pool. Should any of the nodes be unavailable, the jobs are scheduled to remaining nodes, with failed jobs being reported. There is no master node which could cause failure of the whole system; the Cluster is a peer-to-peer network which is capable of renegotiating the master worker when it goes down.

Arrange a Demo

Contact Us ›
If you're interested in the CloverETL Cluster, feel free to contact
one of our representatives and arrange for a demo.

CloverETL Server Tiers

See full product comparison chart ›

Server
Standard
Server
Corporate
Cluster
Total CPU cores 4 16 Inquire
Execution Monitoring / History
The Server automatically collects transformation logs and runtime statistics. Both logs and statistics are easily accessible for email distribution or trend monitoring strategies.
Parallel Execution of Multiple Transformations
Any number of transformations can run simultaneously. Additionally supports starting multiple instances of one transformation with configurable maximum.
Security / LDAP User Management
Access to transformations, data, services and server configuration is protected by user security module. Communication between the Server and its clients can be optionally secured by HTTPS protocol.
Scheduling
Internal scheduler lets you schedule transformations to run once or periodically on specific times. It supports advanced scheduling expressions similar to those provided by UNIX cron scheduler.
Jobflows—a Workflow Management Module
Workflows simplify deployment of the Server into an operational environment and integration with your production support architecture. The Server workflows allow sequencing jobs of several types: transformations, system scripts/batches, JMS messages, emailing tasks or internal Groovy scripts
File Triggers and Message Triggers
File triggers help to easily implement scenarios, where data is uploaded for ETL processing in form of files. The Server can automatically observe change or arrival of a file then automatically trigger its processing. Message triggers allow to trigger a pre-defined action (e.g. execute graph) upon receiving a message through the observed message queue. Ideal for Enterprise Service Bus (ESB) and similar deployments.
Launch Services (Transformation as Web Service)
Launch Services provide infrastructure for SOA-oriented and real-time ETL deployments. Transformations can be configured to be executed as web-services, while data and parameters are passed dynamically as part of the web-service call.
Distributed Execution / Big Data
Server clustering enables cooperation of multiple Server machines in a networked environment which brings both fail-over and scalability. It is also suitable for cloud deployments on public clouds (EC2, Rackspace, …) as well as in-house data centers.
Load Balancing
Configurable load balancing rules together with automated performance monitoring lets you tune utilization of individual nodes operating in a clustered environment.
Failover
The Cluster performs automatic health check monitoring of all nodes in the cluster and in case of node failure, redirects processing to remaining active nodes.
Autoscaling
New nodes can be dynamically added to a running cluster on demand. Suitable for operational environments such as Amazon EC2 that allow dynamic allocation of computational resources.


Data integration software and ETL tools provided by the CloverETL platform offer solutions for such data management tasks such as data integration, data migration, or data quality. CloverETL is a vital part of such enterprise solutions as data warehousing, business intelligence (BI) or master data management (MDM). CloverETL’s flagship applications are CloverETL Designer and Server. CloverETL Designer is a visual data transformation designer that helps define data flows and transformations in a quick, visual, and intuitive way. CloverETL Server is an enterprise ETL runtime environment. It offers a set of enterprise features such as automation, monitoring, user management, real-time ETL, clustering, or cloud data integration.