How AI fixes 5 manufacturing quality control workflows

Home › Blog › AI quality control in manufacturing: Reducing errors across 5 critical workflows

Manufacturing organizations run on thin margins and tighter cycles, so making mistakes gets expensive fast. Siemens benchmarking estimates that unplanned downtime now saps about $1.4 trillion in revenue from the world’s 500 largest manufacturers.

Quality failures also continue to dent margins: in the US, average recall costs reach up to $99.9 million per event.

To address systematic error patterns and enforce stricter quality standards, manufacturers are implementing AI-powered quality control systems. While data shows that most of these efforts are early-stage pilots, 96% of manufacturers plan to adopt machine learning organization-wide next year.

The early adopters are already reaping the benefits. 50% of manufacturers report cost savings following AI adoption, and 72% saw a productivity spike in at least one business function.

This analysis examines five manufacturing workflows where human error creates the highest financial and operational risk.

Each section documents a high-profile failure, quantifies business impact, and presents AI implementations that measurably reduce error rates.

The workflows analyzed include supplier material inspection (TSMC case study), fastener torque control (Boeing incident analysis), pharmaceutical batch record review (Curia implementation), IT systems management (Toyota outage, Lenovo solution), and end-of-line quality inspection (Ford computer vision deployment).

Xenoss engineers have supported manufacturing clients across these workflow categories, implementing machine learning systems that reduce defect rates while improving inspection throughput.

Workflow #1: Supplier material inspection: AI-powered quality control for incoming components

Global trade restrictions and tariff adjustments complicate supplier relationship management for manufacturers. They are restricted in bringing offshore suppliers on board and have to make regulatory adjustments to maintain these relationships.

These operational pressures create inspection bottlenecks where quality issues from external suppliers enter production systems undetected.

Product recall rates demonstrate the severity of supplier quality control gaps. European regulators have reported over 3,800 recall instances for three consecutive quarters. In the US, the total number of products recalled in Q1 2025 has grown 25% compared to Q1 2024.

McKinsey analysis quantifies product recall costs in high-impact sectors: automotive manufacturers face up to $600 million per recall event, encompassing direct costs, supply chain disruption, and reputational damage.

Cautionary tale: TSMC, $550-million impact of supplier contamination

Context: Inspection capacity constraints prevented Taiwanese Semiconductor Manufacturing Company (TSMC) from identifying contaminated photoresist materials shipped to its Northern Taiwan fabrication facility. TSMC had to scrap over 30,000 low-quality wafers before they reached customers.

Business impact: Industry analysts peg the direct costs of TSMC product recalls at $550 million. The mishap also put the company at risk of losing contracts with its biggest clients—NVIDIA, MediaTek, and HiSilicon, who depend on TSMC for critical semiconductor supply with minimal disruption tolerance

How AI helps get material inspection under control

For manufacturers across many industries, inspecting components from outside suppliers is a manual process. In chip manufacturing, the industry-standard automated optical inspection requires generating thousands of defect images for manual review by operators. This process is both resource-intensive and error-prone.

Chipmakers are turning to AI to improve AOI efficiency. Automated defect classification (ADC) software uses deep learning to recognize defect patterns and detect them in generated images.

What is Automated Defect Classification?

Automated Defect Classification (ADC) is a quality control technology that uses computer vision and machine learning to automatically identify and categorize defects in manufactured products.

Instead of manual inspection, ADC systems analyze images or sensor data to detect and classify anomalies such as cracks, scratches, or dimensional variations according to predefined standards. ADC is widely used in industries like semiconductors, automotive, and electronics to improve inspection speed, consistency, and accuracy while reducing human error and labor costs.

These deep learning models train on labeled defect datasets, learning to distinguish between acceptable variation and quality-impacting defects.

CNN architectures process image features at multiple scales, achieving pattern recognition accuracy that exceeds human baseline performance and maintains consistent judgment across millions of inspection images.

Differences between manual, automated, and AI-assisted automated defect classification — AI-based automated defect classification improves both the speed and accuracy of supplier screening

ADC supports manufacturers in three areas: lowering the impact of human error (typically 40-60% fewer false negatives), reducing the inspection cycle time, and lowering per-unit inspection costs through automation of repetitive classification tasks.

Case study: TSMC hybrid AI-human inspection architecture

TMSC pairs AI-enhanced auto defect classification with human-in-the-loop review to improve supplier quality control.

Self-learning systems are trained on common defect patterns and can accurately recognize them on millions of defect images. TSMC embeds machine learning into workflows in two ways.

For inline edge computing, ADC is embedded in the tool and detects are flagged during material processing.

The edge deployment approach embeds neural networks on specialized hardware (typically NVIDIA Jetson or similar inference accelerators) co-located with inspection tools.

This architecture enables sub-second defect detection, allowing operators to quarantine suspect materials immediately before they enter production workflows. Edge deployment minimizes latency, critical for inline inspection.

Offline cloud computing

After materials complete initial processing, TSMC runs a second layer of analysis on centralized cloud infrastructure with GPU clusters. This setup handles the heavy computational work that edge devices can’t manage, running larger neural networks with more layers and combining multiple models to catch defects that slipped through initial inspection.

The cloud system does three things: it double-checks what the edge inspection found, it looks for patterns across multiple batches from the same supplier, and it stops problematic materials from moving to the next production stage.

Running analysis in the cloud also makes it easier to improve the models over time. TSMC can retrain the system on new defect examples without touching the edge equipment on the factory floor.

TSMC uses two separate methodologies to inspect incoming materials during and after processing — TSMC integrates inline edge and offline cloud ADC systems to detect defects in materials both during and after semiconductor processing

Business impact: TSMC reports that deploying ML-assisted auto-defect classification in its packaging fabs, alongside ML-enhanced mask inspection, brought a product quality lift, shorter production cycles, and higher machine productivity.

ADC capabilities helped reduce operator load and escaped defects, protecting yield at advanced nodes and accelerating throughput.

Workflow #2: Fastener torque control

Assembly line fastener failures stem from three common operational issues: torque tools configured to incorrect specifications, over-dependence on manual torque measurement without digital verification, and lack of systems to capture and analyze torque data for quality assurance.

These seemingly minor errors create significant safety and financial risks when fasteners fail in critical applications.

Cautionary tale: Boeing 737 MAX-9 door failure from inadequate fastener control

The Alaska Airlines incident, where a Boeing plane door came off mid-flight, exposing the cabin to open air during flight, was attributed to a loose bolt. Although there were no casualties, the impact of the event was staggering.

The FAA began an investigation into Boeing’s plants. Airlines had Boeing’s 737 MAX-9 airliners grounded because passengers were apprehensive about flying them. The company was banned from expanding production until it satisfied the FAA’s and NTSB’s demands.

Business impact: According to the company’s earnings report, Boeing shed $443 million due to customer doubts over MAX-9 safety. The company had to pay Alaska Airlines a $160 million settlement. Following the incident, Boeing’s stock lost 9% on the market.

Machine learning streamlines fastener control

Finding a way to measure torque data and flag loose bolts would help prevent incidents and reduce the maintenance load on factory workers.

But applying machine learning to fastener control is not trivial.

Assembly tasks are prone to variations in production – these changes create unpredictable forces and alter component reliability. Machine learning models have to consider this variability to estimate and measure torques accurately.

To solve this problem, a team of researchers at the University of Applied Sciences in Munich built a convolutional neural network (CNN) that ingests time-series torque data to identify the error zone based on the shape of the signal graph.

The system analyzes the torque signature, which shows how force changes over time during the fastening process. Each fastener type produces a characteristic curve shape when properly installed. The CNN learns these patterns from correctly installed fasteners, then flags deviations that indicate incorrect torque settings, cross-threading, or missing components.

These models reached 97% accuracy on benchmark tests.

Audi’s AI-powered spot weld inspection system

The auto-maker wanted to increase the speed of spot weld quality checks without compromising inspection accuracy.

Traditionally, Audi teams used ultrasound to monitor spot-weld quality manually. This method limited the factory’s productivity and allowed roughly 5,000 spot welds to be checked per vehicle. The sampling approach created a risk that defective welds in uninspected areas would reach customers.

To ramp up productivity, Audi built an AI platform. First, it runs targeted real-time inspections during the welding process, using sensor data to identify welds that deviate from expected parameters.

Second, it monitors equipment performance over time, tracking patterns that indicate when welding equipment requires maintenance before quality degradation occurs.

This predictive maintenance component prevents systematic defects from poor equipment performance.

Business impact: The new workflow allows maintenance teams to analyze 1.5 million spot welds on 300 vehicles each shift.

The expanded coverage means every weld receives evaluation rather than statistical sampling, reducing the risk of undetected defects reaching production.

Teams can now identify and address quality issues in real-time rather than discovering problems during final inspection or post-delivery.

Build predictive analytics software that spots trends before they happen

Use machine learning to forecast demand, detect risks, and optimize decisions across your operations.

Start your predictive project

Workflow #3. Batch record review

Manufacturers in life sciences have to create specific resources to comply with Good Manufacturing Practice (GMP), a set of quality assurance guidelines approved by the WHO.

One of the GMP requirements is conducting regular batch record reviews. Each batch record documents the manufacturing pipeline and processing steps, materials used for production, and tests conducted for every batch.

It is both a quality assurance document that teams use to streamline internal processes and a legal document that regulators rely on during inspections.

Even as process automation in life sciences is growing at a 14.03% CAGR and is expected to reach over 13 billion by 2030, manual batch record reviews are still a standard practice.

The 2024 Life Science Quality Trends Report found that 42% of manufacturers still use paper documentation for quality processes and have no automation for reviewing batch records.

But the opportunity cost of manual reviews is staggering. An article published in BioPharm International reports that the average review time for a batch record report is 48 hours, with some manufacturers taking up to 500 hours to go through a single batch record.

Human batch review also increases vulnerability to human error. In a Reddit post, a staff member at a chemical manufacturer shared that paper batch records often come with blank spaces (e.g., missing dates) or no verification.

A Reddit user shares an account of repeated human errors in batch record reviews — A Reddit post from a chemical manufacturing worker highlights how manual batch record reviews often lead to repeated human errors and accountability gaps.

Without an automation system that flags these errors and promotes accountability in filing records, life sciences manufacturers risk missing critical production errors and making mistakes that ruin product batches and erupt in reputational scandals.

Cautionary tale: Batch record failures halt Johnson & Johnson vaccine production

In 2021, the Emergent BioSolutions plant in Baltimore, which produced both the Johnson & Johnson and AstraZeneca vaccines, miscombined ingredients for the formulas.

Adding the ingredients for the AstraZeneca COVID-19 vaccine to the J&J batch destroyed 15 million doses, according to The New York Times, during a period of critical vaccine supply shortages

After the incident, the FDA investigated the manufacturer’s operations and found several CGMP gaps at the plant. Emergent BioSolutions was slammed with Form 483, a document detailing FDA violations found at manufacturing sites.

The inspector’s conclusion flagged batch review practices as “the failure to conduct investigations into unexplained discrepancies”.

Business impact: The plant, projected to ship tens of millions of Johnson & Johnson doses the month following the incident, had to stop the production of the one-dose vaccine while the Food and Drug Administration investigated the error. After the investigation, the FDA told Johnson & Johnson to discard 60 million more vaccine doses.

Machine learning architecture for batch record digitization and compliance verification

Machine learning technologies can reliably support every step of batch record digitization and review.

OCR

Optical character recognition (OCR) helps manufacturers digitize paper records and confirm the accuracy of record data.

For example, an OCR platform will retrieve the table of used materials from a paper record, transform it into a digital document, and cross-check it against a list of approved suppliers, ERP data, and material expiry rules.

After the validation is complete, the quality assurance team can stay confident that only approved and usable materials were used in the batch and avoid the error that happened at the Johnson & Johnson vaccine manufacturer.

Real-time data analytics

Real-time data analytics contextualizes this data and helps detect early signs of deviation from best practices.

Electronic batch record review systems use these capabilities to integrate with manufacturing execution systems, quality management systems (QMS), and laboratory information management systems (LIMS) to make sure batch reviews match internal data.

Each incoming batch record review can also be linked to quality control protocols to assess if the company’s production pipeline complies with Good Manufacturing Practices.

Predictive analytics

Predictive analytics facilitates proactive maintenance by examining past batch records and identifying early warning signs that created deviations from GMP. These can later be compiled in a checklist for QA teams and connected to the manufacturer’s internal toolset:

Manufacturers who switch to AI-assisted batch record review see improved performance both across regulatory regulations and worker productivity. Aizon, an AI startup specializing in digitizing and automatically reviewing batch records, helped chemical manufacturers scale batch review from 10 batches per month to over 1000 batches per year.

Curia’s AI platform for batch analytics and yield optimization

Curia is one of the largest European contract development and manufacturing companies that specializes in producing small-molecule drugs and biologics. The company currently boasts global biotech partnerships across the US, Europe, and Asia.

Maintaining stable production lines for multiple clients pushes Curia to develop rigorous QA standards and improve its batch record review practices.

Challenge: The company wanted to have a system that would detect variations in chemical reactions and determine how they affect product quality.

Before building an AI stack for batch report reviews, Curia QA technicians used manual records and Excel spreadsheets. Fragmented data came in from multiple sources in different formats, making it impossible to put it all together and generate accurate reports.

Solution: To reduce human error in batch reports, Curia adopted an AI stack for analyzing and comparing batches. The platform ingested, fractioned, and polished raw data on materials, critical quality attributes (CQAs), critical process parameters (CPPs), and process metrics.

Predictive analytics models helped identify cause-and-effect relationships among production conditions, workflows, and variability across drug batches. Based on material and production data, they generate yield predictions and offer fractionation recommendations that help lift yield.

Business impact: AI-assisted batch report review and analysis increased the lift for underperforming batches in the first three months after deployment and reduced the annual cost of goods sold (COGS).

Workflow #4. IT systems management

A reliable connection between ERP, MES, warehouse control, and scheduling systems is vital for uninterrupted production.

If the manufacturer’s ERP is down, on-site teams will no longer be able to trace raw materials and assign them to production.

Likewise, an unresponsive warehouse management system will prevent materials from arriving at needed cells, pushing operators to sit idle even when all equipment is in order.

Silos in a manufacturer’s IT stack increase the risk of downtime, which costs companies millions in productivity.

According to Siemens research, in FMCG, the cost of a lost hour is $36. In the automotive industry, it can rise to $2.3. million. The trend is even more telling: the economic impact of IT-related downtime has been increasing in most industries for the last five years.

The cost of downtime for manufacturers in major industries has been rising in the 2020s — Unplanned downtime costs have surged across all manufacturing sectors in the 2020s, hitting especially hard in automotive and heavy industry.

However, IT incidents caused by poor capacity planning and security vulnerabilities are still common. The Q2 2025 Kaspersky analysis reports 135 confirmed events involving the denial of database systems and the leakage of sensitive data.

In Q2 2025, companies reported 135 security outages. 47% of events affected manufacturers — In Q2 2025, nearly half of all 135 reported security outages hit manufacturers

Cautionary tale: Database deletes at Toyota stopped car production for 36 hours at 14 plants

Problem: In August 2023, Toyota had to deal with a glitch in its production system that prevented the car manufacturer from ordering new components. Without the parts needed for production, the company could no longer maintain production lines. Toyota shut down operations at 14 factories for 36 hours.

Cause: Internal investigations discovered that the outage was caused by a vulnerability on servers that manage component ordering. During a regular maintenance check the company ran the day before, engineers accidentally deleted database records and triggered an insufficient disk space warning that caused the system to shut down.

Business impact: The 36-hour outage froze 28 production lines and halted Toyota’s entire domestic manufacturing and one-third of its global output. The total damage of the outage is estimated at roughly 20,000 delayed vehicles and over $500 million in lost revenue.

Machine learning can monitor sensitive IT systems

It’s already industry practice for teams to use Advanced Planning and Scheduling (APS) software to plan operations and monitor mission-critical systems.

What is Advanced Planning and Scheduling software?

Advanced Planning and Scheduling (APS) software optimizes production by aligning materials, labor, and machine capacity in real time. It integrates with ERP, MES, and WMS systems and synchronizes data across planning, execution, and logistics. Modern APS platforms can also coordinate IT system maintenance: schedule updates or backups during low-load windows, forecast the impact of downtime on production schedules, and automatically replan workflows to prevent disruptions caused by outages.

In the last three years, leading ADS providers have been adding machine learning capabilities to these systems to give manufacturers more control over production management.

30% of manufacturers surveyed by IDC reported that AI-powered APS software helped them reach operational KPIs.

These platforms oversee the production schedule and keep track of IT maintenance and orchestration. With generative AI taking care of the bulk of planning and maintenance work, factory team leaders can focus on creative work and team management.

Lenovo’s AI-based APS reduces the time needed to manage critical systems to minutes

Context: Orchestrating factory operations used to be a major bottleneck for Lenovo.

Teams had to manually support thousands of scheduling variables, teams, and over 40 mission-critical IT systems, which put a significant resource strain on the team.

Solution: The new machine learning-assisted platform integrates with Lenovo’s IT infrastructure and orchestrates it for production line management. It ingests insights across the company’s tech stack and generates workflow automation recommendations and scheduling suggestions.

Business impact: Lenovo’s AI platform minimizes human involvement in the company’s IT infrastructure, reducing risks of human error-related shutdowns. Machine learning algorithms now autonomously run over 75% of all scheduling and order processes, which has helped free human workers and increase their productivity by 24%. Since adopting the system, the total production volume for Lenovo factories has also risen by 19%.

With a lean team of 10 internal experts, we developed a leading-edge APS solution in just six months. The AI solution is delivering excellent results against several key performance indicators, and we’re anticipating further benefits as we continue the rollout.

Haimin Gan, Senior IT Manager at Lenovo

Workflow #5. End-of-line inspection

Manufacturers are under significant regulatory pressure to deliver safe, functional, and effective final products.

In life sciences, the Food and Drug Administration requires manufacturers to establish clear acceptance procedures. Manufacturers won’t be allowed to release a device until inspections verify that it meets specifications.

In automotive, International Automotive Task Force regulations require functional testing of finished components to make sure they meet OEM Customer-specific requirements.

That’s why end-of-line testing is mission-critical to prevent product recalls, warranty claims, and brand damage. It’s also one of the most time- and resource-consuming manufacturing workflows.

Manufacturer surveys report that visual checks at the end of the line consume up to 40% of total production cycle time.

Even with that level of commitment, human error in manual end-of-line inspection remains high.

A 2024 survey on industrial visual inspection notes that manual checks have up to 30% defect miss rates due to inspector fatigue or minor issues, such as poor lighting on the factory floor.

Human error during end-of-line inspection causes multi-million-dollar damage to manufacturers. In the US, product recalls due to poor product quality cost manufacturers up to $99 million per event.

Cautionary tale: Poor end-of-line inspection led to massive product recalls

What happened: In September 2025, Hillshire Foods, an FMCG manufacturer, failed to inspect the batch of corn dogs accurately. After the product was released, customers discovered that pieces of wood were mixed into the batter. After a series of customer complaints and reported injuries, the company had to recall the corn dogs voluntarily.

Business impact: The manufacturer was slammed with multiple customer complaints and 5 injury reports.

Later, the company was hit with a class action lawsuit from a frustrated consumer claiming he ate a product “unfit for human consumption” before the company had issued a recall. In total, the product recall led to estimated losses of $58 million.

How AI improves end-of-line inspection

To reduce human error in end-of-line inspection, manufacturers implement machine learning to assist human operators and automate routine workflows.

AI supports factory workers by pointing out defects that inspectors may have missed and ensuring that workflows meet regulatory requirements.

Paired with augmented reality, machine learning also helps onboard new employees by creating personalized step-by-step instructions for inspecting specific types of components.

The introduction of AI in end-of-line inspection rests on three core technologies.

Computer vision helps identify defects and poor assembly, eliminating the need for 2D manuals. Cameras installed on devices ensure that only high-quality products enter production.

Generative AI supports factory operators by offering real-time guidance and practical tips to increase the efficiency of end-of-line inspections.

Real-time analytics helps automate reports and dashboards. Team leaders can use this data intelligence to build a one-stop shop for processing end-of-line inspection results.

Ford: Computer vision helps prevent product recalls

Context: Ford’s Dearborn Truck Plant has one of the highest yields in the automotive industry, producing 300,000 F-150 pickups each year. Quality assurance for the product of this complexity is difficult, and oversight becomes difficult to avoid.

In fact, Ford is the leader among US manufacturers in product recalls, with a track record of 95 recalls in 2025 alone.

Solution: to reduce the strain on human inspectors and make sure smaller wiring, fender, or seat defects don’t slip through the cracks, Ford piloted two in-house machine learning systems: AiTriz and MAIVS. These platforms use real-time computer vision to catch component misalignments and check that all parts are mounted correctly.

Business impact: The company has deployed AiTriz at 35 stations and MAIVS at over 700 stations across the country. New systems, Ford staff told Business Insider, are saving teams a significant amount of time and improving attention to detail in a noisy environment, where subtleties like two wires clicking the wrong way often go unnoticed.

As the vehicle goes through the assembly line, it gets harder and harder to access some of these components. I can’t stress enough how the real-time results are key in saving us time.

Brandon Tolsma, Vision Engineer at Ford MTDC

Bottom line

Compared to other industries, digitization has a slow penetration rate in manufacturing. Companies that maintain manual paper-based workflows have a harder time going digital due to massive ‘data debt’ and a lack of traceable data trails.

Machine learning is not a silver bullet for eliminating accidents and human error. But, for early adopters, it offers one more level of product quality assurance, protection from overreliance on human factors (fatigue or attention to detail), and an uplift in overall staff productivity.