By continuing to browse this website, you agree to our use of cookies. Learn more at the Privacy Policy page.
Contact Us
Contact Us

Batch processing

Batch processing is a data engineering practice for processing large data volumes (known as “batches”) at pre-defined intervals. 

Batch processing is used when input data is not processed in real-time and is not required for immediate decision-making.

When should you use batch data processing? 

Companies typically use batch processing software to elaborate large data volumes in the following cases. 

  • The data is not used for real-time decision-making. 
  • The system is designed to process large data volumes (e.g., transaction logs). 
  • There are periods when the system is idle (i.e., it ingests no incoming data). 
  • The data is used for repetitive tasks that do not require human supervision. 

Batch vs stream processing

It is easier to understand the key features of batch processing by examining it side by side with a commonly used alternative, stream processing. 

Feature

Batch processing

Stream processing

Data ingestion

Large data volumes collected over time and processed in batches

Continuous data processing in real time

Latency

High, data is processed with a significant delay after the ingestion

Low, the data is processed as soon as it is collected

Use cases

Payroll, billing systems, end-of-day reporting, ETL

Fraud detection, sensor data processing, live analytics

Implementation complexity

Lower due to predefined datasets

Higher due to the pressure to maintain low latency

Fault tolerance

The recovery is easier due to the ability to rerun failed batches

Requires complex fault tolerance mechanisms to keep processing data in real time

How does batch processing work? 

Batch processing typically comprises four stages. 

  • Data gathering process: collecting data input over an extended period of time. 
  • Data loading: collected data is stored in a system. 
  • Processing: the system automatically activates data transformation workflows. 
  • Data output: the data is shared with teams in a usable format (e.g. a report or a dashboard). 

Batch processing use cases

Let’s examine the most common applications of batch processing. 

Finance

Financial teams use batch processing for end-of-day transaction processing, fraud surveillance, and risk management. 

Healthcare

Batch processing is used in multiple areas of healthcare, from computational chemistry and drug research to genomic sequences and genetic testing. This processing approach allows scientists and clinicians to aggregate large amounts of data that inform research decisions and improve the understanding of biochemical processes. 

Advertising and media

Brands and media companies use batch processing to process data (visuals, files, content metadata) to accelerate content and campaign management, scale operations, and automate creative workflows. 

Back to AI and Data Glossary

FAQ

icon
What is ETL batch processing?

Batch processing is the practice of extracting, transforming, and loading data from various sources in bulk. Typically, batch processing has a determined schedule or is activated manually. 

When to use batch processing systems?

Batch processing is typically preferred to streaming processing for high data loads and workflows that do not require real-time data. Besides, many legacy data tools do not support other types of data processing. 

What is a practical example of batch processing architecture?

Transaction data logs in finance are typically processed in batches. Less common use cases for this approach are managing content assets in advertising by media or chemical data by pharma companies. 

Connect with Our Data & AI Experts

To discuss how we can help transform your business with advanced data and AI solutions, reach out to us at hello@xenoss.io

    Contacts

    icon