By continuing to browse this website, you agree to our use of cookies. Learn more at the Privacy Policy page.

Batch processing

Batch processing is a data engineering practice for processing large data volumes (known as “batches”) at pre-defined intervals. 

Batch processing is used when input data is not processed in real-time and is not required for immediate decision-making.

When should you use batch data processing? 

Companies typically use batch processing software to elaborate large data volumes in the following cases. 

  • The data is not used for real-time decision-making. 
  • The system is designed to process large data volumes (e.g., transaction logs). 
  • There are periods when the system is idle (i.e., it ingests no incoming data). 
  • The data is used for repetitive tasks that do not require human supervision. 

Batch vs stream processing

It is easier to understand the key features of batch processing by examining it side by side with a commonly used alternative, stream processing. 

FeatureBatch processingStream processing
Data ingestionLarge data volumes collected over time and processed in batchesContinuous data processing in real time
LatencyHigh, data is processed with a significant delay after the ingestionLow, the data is processed as soon as it is collected
Use casesPayroll, billing systems, end-of-day reporting, ETLFraud detection, sensor data processing, live analytics
Implementation complexityLower due to predefined datasetsHigher due to the pressure to maintain low latency
Fault toleranceThe recovery is easier due to the ability to rerun failed batchesRequires complex fault tolerance mechanisms to keep processing data in real time

How does batch processing work? 

Batch processing typically comprises four stages. 

  • Data gathering process: collecting data input over an extended period of time. 
  • Data loading: collected data is stored in a system. 
  • Processing: the system automatically activates data transformation workflows. 
  • Data output: the data is shared with teams in a usable format (e.g. a report or a dashboard). 

Batch processing use cases

Let’s examine the most common applications of batch processing. 

Finance

Financial teams use batch processing for end-of-day transaction processing, fraud surveillance, and risk management. 

Healthcare

Batch processing is used in multiple areas of healthcare, from computational chemistry and drug research to genomic sequences and genetic testing. This processing approach allows scientists and clinicians to aggregate large amounts of data that inform research decisions and improve the understanding of biochemical processes. 

Advertising and media

Brands and media companies use batch processing to process data (visuals, files, content metadata) to accelerate content and campaign management, scale operations, and automate creative workflows. 

Back to AI and Data Glossary

FAQ

icon
What is ETL batch processing?

Batch processing is the practice of extracting, transforming, and loading data from various sources in bulk. Typically, batch processing has a determined schedule or is activated manually. 

When to use batch processing systems?

Batch processing is typically preferred to streaming processing for high data loads and workflows that do not require real-time data. Besides, many legacy data tools do not support other types of data processing. 

What is a practical example of batch processing architecture?

Transaction data logs in finance are typically processed in batches. Less common use cases for this approach are managing content assets in advertising by media or chemical data by pharma companies. 

Connect with Our Data & AI Experts

To discuss how we can help transform your business with advanced data and AI solutions, reach out to us at hello@xenoss.io

    Contacts

    icon