Maria Novikova - CRO, Xenoss

MCP gateway architecture: How to scale AI agent tool access for enterprise

Maria Novikova — Tue, 19 May 2026 16:28:00 +0000

Your engineering team deploys five AI agents. One handles customer support tickets, another monitors infrastructure, a third automates sales outreach, and two more manage internal workflows. Each agent needs access to Slack, Jira, your CRM, two databases, and a handful of internal APIs. That is five agents times eight tools, which means forty individual connections, each with its own credentials, error handling, and retry logic. Now somebody on the security team asks a straightforward question: “Which agent accessed the production database at 2:14 a.m. last Tuesday?” Nobody can answer it.

This is the problem MCP gateways solve. The Model Context Protocol went from Anthropic’s open-source experiment to an industry standard backed by OpenAI, Google, and Microsoft in under two years. The official registry now lists over 9,400 servers, and adoption has crossed 78% among production AI teams. The protocol works, but connecting dozens of agents to hundreds of servers without a central governance layer creates a visibility gap.

This article covers how the MCP gateway architecture works, the three deployment patterns teams are using in production, how Docker and Microsoft Foundry handle it differently, and where managed gateways run out of road for enterprise environments with industrial systems and regulatory requirements.

Summary

An MCP gateway acts as a centralized control plane between AI agents and the MCP servers they call, handling authentication, access control, audit logging, and traffic routing through a single governed endpoint.
Three architecture patterns: reverse proxy (routes traffic, simplest to deploy), aggregation (merges multiple servers behind one endpoint), and multi-tenant (isolates tool access by team or agent identity).
Docker and Microsoft take different approaches. Docker uses container isolation as the security boundary. Microsoft Foundry routes MCP traffic through Azure API Management with Entra ID integration. Cloudflare uses its edge network for Shadow MCP detection.
Managed gateways handle standard SaaS integrations. Custom MCP server engineering is required for SCADA/IoT tool access, legacy system wrappers, and domain-specific compliance policies that no managed platform covers.

What is an MCP gateway?

MCP gateway

is a control plane that manages all communication between AI agents and the MCP servers that those agents use to access tools, databases, APIs, and file systems

Instead of every agent holding its own credentials and managing its own connections to every tool it needs, all requests flow through the gateway. The gateway handles MCP authentication, enforces access policies, logs every tool invocation, and routes requests to the right backend server.

That’s more what an API gateway does for microservices, but designed for the specific communication patterns of AI agents. Agents talk to tools differently than web apps talk to APIs: the connections are stateful, bidirectional, and session-based. An agent might discover available tools, call three of them in sequence while maintaining context, and then close the session. A gateway needs to understand that lifecycle to enforce policies properly.

Why does this matter? 42% of enterprises need their agents to access eight or more data sources. In a direct-connect model, adding one new agent means configuring connections to every tool it needs. Adding one new server means updating every agent that should have access. The complexity grows fast, and with it, the credential management burden, the observability gap, and the security exposure.

Need a gateway architecture tailored to your enterprise security model?

Talk to Xenoss engineers

MCP gateway architecture patterns

Three patterns have emerged in production deployments. Each solves the same core problem (centralizing agent-to-tool governance) but at different levels of sophistication.

MCP gateway architecture replaces the N-by-M connection mesh with a governed hub-and-spoke model

Reverse proxy pattern

The gateway receives MCP requests from agents, validates authentication, logs the invocation, and forwards the request to the target server. It does not modify payloads or combine server responses. This is the simplest pattern and the right starting point for most teams.

Cloudflare’s enterprise MCP architecture follows this approach: MCP Server Portals handle identity verification through Cloudflare Access, while AI Gateway captures logs and metrics for every tool call. Cloudflare also introduced Shadow MCP detection, which flags when employees connect to unregistered MCP servers on the enterprise network.

Aggregation pattern

The aggregation gateway merges multiple MCP servers behind a single endpoint. Agents see one interface that exposes the combined tool catalog of all downstream servers. The gateway handles tool discovery, dispatches invocations to the correct backend, and returns results as if they came from a single server.

Microsoft Foundry Toolboxes work this way: they bundle Web Search, Code Interpreter, Azure AI Search, MCP servers, and OpenAPI tools into one MCP-compatible endpoint.

Composio’s managed gateway does the same with 500+ pre-built integrations and unified authentication. This pattern fits when agents need broad tool access but should not be aware of backend topology.

Multi-tenant pattern

Enterprise environments need to control which teams or agent identities can access which tools. The multi-tenant gateway maps agent identity to tool permissions through integration with enterprise identity providers (Entra ID, Okta, SAML).

A marketing team’s agents might access CRM and analytics tools but not production databases. An engineering team’s agents might have read access to everything but write access only in sandbox environments.

MintMCP implements this through SCIM-driven RBAC, IdP groups, and Virtual MCP Bundles that define per-role tool sets. This is the most complex pattern to deploy but the only one that works for organizations running hundreds of agents with strict access controls.

	Reverse proxy	Aggregation	Multi-tenant
Complexity	Low	Medium	High
Agent view	Agents route to individual servers	Agents see one unified endpoint	Agents see tenant-scoped tool sets
Auth model	Token validation at the gateway	Unified auth with per-server credential brokering	Identity-propagated, per-tenant policies
Best for	Early adoption, small teams	Broad tool access, managed integrations	Enterprise with strict RBAC needs
Production examples	Cloudflare MCP architecture	Composio, Microsoft Foundry Toolboxes	MintMCP, Kong MCP Gateway

Docker MCP server and gateway: Container-based isolation

Docker’s approach treats each MCP server as an isolated container with controlled resource limits, network policies, and filesystem access. The gateway manages container lifecycles and routes agent requests to the right container. Everything runs inside your infrastructure, giving teams full control over data residency, network rules, and runtime configuration.

For teams already comfortable with Docker or Kubernetes, deployment is fast. You define MCP servers as container images, configure resource limits and network access per container, and the gateway handles routing. The isolation model is strong: if one MCP server is compromised, the blast radius stays within that container.

The trade-off is that Docker, rather than being a finished product, provides building blocks. Containerized isolation and routing are covered, but audit logging, identity management, policy enforcement, and centralized monitoring need to be layered on top.

For a small team experimenting with MCP in production, Docker is a solid starting point. For an enterprise that needs SOC 2-compliant audit trails, per-user access policies, and integration with Okta or Entra ID, additional engineering is required on top of Docker’s foundation.

Microsoft MCP gateway: Foundry and Azure API Management

Microsoft’s approach plugs MCP governance into Azure API Management. The Foundry AI Gateway provides a governed entry point where teams can enforce Entra ID authentication, rate limits, IP restrictions, and audit logging without modifying MCP servers or agent code. Every action runs under the signed-in user’s Azure RBAC permissions, so agents cannot exceed the permissions of the human behind them.

Foundry Toolboxes take this further by bundling multiple tools into a single MCP-compatible endpoint. An agent connects to one Toolbox URL and gets access to a curated set of tools (Web Search, Code Interpreter, Azure AI Search, MCP servers, OpenAPI endpoints) governed by a single policy layer. Tenant administrators can apply Conditional Access policies through Azure Policy to control MCP usage organization-wide.

For organizations already on Azure, this is the fastest path to governed MCP. The gateway reuses existing identity, networking, and compliance infrastructure, so there is no new security stack to evaluate.

The limitation is cloud lock-in: outside Azure, Foundry’s governance capabilities drop off significantly. Multi-cloud teams will need a different approach for non-Azure workloads.

MCP server security and authentication at the gateway layer

MCP authentication and security operate across four layers, and skipping any of them creates gaps that agents will eventually exploit, either by accident or through adversarial prompt injection.

Authentication. Every agent-to-gateway connection requires a verified identity. OAuth 2.1 with PKCE is the emerging standard for MCP authentication. Microsoft Foundry uses Entra ID tokens scoped to the MCP endpoint. Managed gateways like Composio handle OAuth flows automatically for 500+ integrations. For custom MCP servers connecting to internal systems, teams typically implement service-to-service auth using mTLS or API keys issued per agent.

Tool-level authorization. Authentication answers “who is this agent?” Authorization answers “what can this agent do?” A gateway must support tool-level granularity: agent A can call “read_customer” but not “delete_customer,” even when both tools live on the same MCP server. Role-based access control, tool allow-lists, and per-identity scoping are the minimum for enterprise deployment.

Audit logging. Every tool invocation needs a record: which agent, which user behind the agent, which tool, what parameters, what response, and when. This is non-negotiable for regulated industries.

The MCP roadmap explicitly calls out audit trails as a required enterprise capability. Gateways that capture this natively (Cloudflare AI Gateway, Microsoft Foundry, MintMCP) save teams from building custom logging infrastructure.

Threat protection. Tool poisoning (a compromised MCP server returning malicious instructions), Shadow MCP usage (employees connecting to unregistered servers), and prompt injection through tool responses are documented attack vectors. Cloudflare’s DLP-based Shadow MCP detection and Lasso Security’s triple-gate pattern (AI layer, MCP layer, API layer) represent current best practices for MCP-specific threat mitigation.

MCP gateway vs API gateway: Three differences that matter

If your organization already runs Kong, Apigee, or AWS API Gateway for microservices, you might assume those can handle MCP traffic too. They can route it. They cannot govern it properly. Three architectural differences explain why a dedicated MCP gateway or an LLM gateway with MCP support is needed.

Sessions, not stateless requests. API gateways treat each HTTP request independently. MCP communication is session-based: an agent opens a connection, discovers tools, invokes several in sequence while maintaining context, and eventually closes the session. Enforcing policies like “this agent can invoke a maximum of five tools per session” or “revoke access if the agent exceeds its context budget” requires session awareness that stateless API gateways don’t provide.

Tool-level granularity, not endpoint-level. API gateways authorize at the URL and HTTP method level. MCP gateways need to parse protocol payloads to understand which specific tool is being invoked within a server. Blocking “delete_records” while allowing “read_records” on the same MCP server endpoint requires protocol-aware inspection that standard API gateways don’t perform.

Agent identity propagation. API gateways authenticate the calling application. MCP gateways need to propagate the agent’s identity and the human user behind the agent all the way to the MCP server, so tool access reflects the user’s permissions. Microsoft handles this with Entra ID on-behalf-of tokens. Other gateways use custom headers or OAuth 2.1 flows. Without identity propagation, agents run with service-level permissions, which violates least-privilege principles.

Where managed MCP gateways need custom engineering

Managed gateways like Composio, MintMCP, and Microsoft Foundry handle the standard integration layer well: connecting agents to Salesforce, Slack, Jira, GitHub, cloud databases, and SaaS APIs. They cover maybe 80% of what enterprise agents need to access. The remaining 20% is where most organizations discover that managed gateways can’t reach.

Industrial and IoT tool access. Manufacturing organizations need agents that can query SCADA systems, pull sensor data from OPC-UA endpoints, or interact with PLCs on the factory floor. No managed MCP gateway ships with connectors for industrial protocols. Bridging the gap between AI agents and operational technology requires custom MCP server development that handles the authentication, latency, and reliability constraints of industrial environments.

Legacy system wrappers. Enterprise agents frequently need to read from mainframes, proprietary ERP instances with custom schemas, or internal tools built on legacy stacks. These systems expose non-standard interfaces (SOAP, custom RPC, file-based protocols) that no managed gateway covers. Wrapping these interfaces in MCP-compliant servers is a custom engineering project that requires understanding both the MCP specification and the legacy system’s behavior.

Domain-specific compliance policies. A healthcare organization’s gateway needs HIPAA-compliant data masking on every tool response containing patient information. A financial institution needs KYC/AML screening before agents can query customer accounts. A defense contractor needs ITAR checks on tool invocations touching export-controlled data. These are not configuration toggles. They are domain-specific policy layers that must be engineered for the specific regulatory environment and tested against real compliance scenarios.

Why this matters: The tools agents need to reach in regulated and industrial environments are the same tools that carry the highest risk. A managed gateway that covers Slack and Jira but cannot govern access to a SCADA system or enforce HIPAA masking on a patient database does not solve the governance problem where it counts.

Build MCP gateway infrastructure for your enterprise systems

Talk to Xenoss engineers

Implementation roadmap for enterprise MCP gateway deployment

Phase 1: Inventory and classify. Map which agents access which tools, tag each connection by sensitivity level (low/medium/high), and identify which tools handle PII, financial data, or regulated information. This is the same access mapping exercise that identity teams run for human users, applied to agent-tool connections.

Phase 2: Deploy a reverse proxy for low-risk tools. Start with the simplest pattern. Route low-sensitivity, read-only tool access through a proxy gateway with authentication and logging. Docker’s container-based approach or Cloudflare’s architecture both work for this. The goal is audit trail coverage and a single point of visibility without complex policy logic.

Phase 3: Add aggregation and identity-based access for high-risk tools. Expand to the aggregation pattern for teams needing unified tool discovery, and add identity-propagated access controls for sensitive tools. Integrate with your existing identity provider so agent access follows the same permission model as human access. Microsoft Foundry or MintMCP add the most value at this phase.

Phase 4: Build custom MCP servers for edge cases. The final phase covers the tools and policies that no managed gateway handles: industrial protocols, legacy system wrappers, and domain-specific compliance logic. These are custom engineering projects that require a deep understanding of both MCP and the systems being connected.

Enterprise MCP gateway deployment follows a phased approach from basic routing to full governance

Bottom line

MCP adoption has reached the point where connecting agents directly to servers without governance is a liability. With 78% of production AI teams using the protocol and over 9,400 servers in the public registry, MCP is an infrastructure. The governance layer around it needs to be just as mature.

An MCP gateway provides centralized authentication, tool-level access control, audit trails, and observability. The architecture pattern (reverse proxy, aggregation, multi-tenant) depends on your scale and security model. The platform (Docker, Microsoft Foundry, Cloudflare, Composio, MintMCP) depends on your existing cloud investments.

For most enterprise environments, the first three deployment phases can be handled by managed platforms. The fourth, connecting agents to industrial systems, legacy infrastructure, and enforcing domain-specific compliance, requires custom engineering. And that fourth phase is where the real governance risk lives.

The post MCP gateway architecture: How to scale AI agent tool access for enterprise appeared first on Xenoss - AI and Data Software Development Company.

Dynamic pricing strategy: How AI-powered pricing drives revenue growth

Maria Novikova — Tue, 03 Mar 2026 10:07:52 +0000

To remain competitive and relevant to customers, organizations need to continuously adjust their prices in line with customer demand and competitors’ growth rates. However, 71% of companies still rely on scattered, limited, and ad-hoc tracking of competitor pricing strategies. Customized AI solutions can analyze large amounts of structured and unstructured data and adjust prices in minutes, freeing up revenue management teams for more value-adding work.

Businesses report up to 16% revenue growth after implementing AI-based dynamic pricing. Buyers are also adapting to the new pricing reality as they begin to see personal benefits, such as usage-based pricing for software and technology, which offers much more flexibility than fixed pricing, allowing users to pay for APIs, specific features, or outcomes.

This guide covers how the artificial intelligence algorithms work, which industries benefit most, and how to implement a system that captures value without triggering price wars or regulatory headaches.

What is AI-driven dynamic pricing optimization?

AI models enable real-time pricing adjustment by gathering comprehensive data on buyer behavior, seasonal changes, product or service demand, market trends, competitor prices, and economic conditions. Algorithms are far more sensitive to even the slightest changes than humans and can help businesses ensure competitive pricing while keeping customers satisfied.

How AI dynamic pricing algorithms drive revenue growth

AI helps businesses increase revenue by performing the following:

Price elasticity optimization: AI calculates the precise point where volume multiplied by margin reaches its maximum. For products with flexible demand, that might mean holding prices higher during periods of higher interest rates. For price-sensitive items, it means finding the floor that still moves inventory levels.
Demand-supply matching: Algorithms prevent the two most common pricing mistakes: leaving money on the table during high demand and decreasing sales velocity during slow periods.
Competitive positioning: Rather than blindly matching competitor prices, AI determines when to undercut, when to hold premium positioning, and when price isn’t the deciding factor at all.

Which algorithms to choose depends on the use case and industry. For instance, reinforcement learning machine learning algorithms work well for real-time optimization, where the system learns from each transaction. Time series models are effective for demand forecasting. And regression models can calculate price elasticity across diverse customer segments.

Integrate AI into your existing pricing strategy to improve price realization, protect margins, and respond to market fluctuations in real time

Explore what we offer

Industries that benefit most from dynamic pricing with AI

Both B2C and B2B industries can benefit equally from AI-driven pricing strategies. Below, we examine different industries and real-life examples of AI implementation to identify what they share and how their approaches differ.

Industry	Primary use case	Key AI application
Retail & e-commerce	Inventory management optimization	Real-time competitor matching
Travel & hospitality	Yield management	Demand-based room/seat pricing
SaaS	Churn reduction	Usage-based tier optimization
Manufacturing & distribution	Quote optimization	Customer-specific contract pricing

Retail and e-commerce

82% of retail executives consider AI adoption the biggest competitive advantage in the coming years. For instance, such retail giants as Amazon reportedly change prices on millions of items multiple times per day. For mid-market retailers, AI pricing can level the playing field and help them target the same customers as Amazon or Walmart.

Example:

AS Watson Group has implemented AI to enable dynamic pricing and ensure steady sales growth.

Dr. Malina Ngai, Group CEO at AS Watson Group, reflects on the results of AI adoption at their company:

We’re using AI for personalized promotions and dynamic pricing. Our recommendation engines suggest products based on customer behavior, which lifts basket size and conversion rates. Hyper-personalization is key. AI curates skincare regimens, sends replenishment reminders, and powers virtual assistants that make online shopping seamless.

Personalized pricing and promotions reinforce one another, as both rely on shared customer insights that businesses can use to enhance the overall shopping experience. In retail environments, AI delivers the greatest value when applied across various touchpoints to improve end-to-end customer engagement and service quality.

Travel and hospitality

An empty hotel room or unsold airline seat means losing revenue for travel and hospitality companies. With the help of AI, these industries optimize booking and increase reservations. For instance, hotels report 20% better forecast accuracy and a 15% revenue uplift after implementing AI-driven pricing strategies.

Example: airBaltic implemented an AI-powered dynamic pricing system to optimize seat assignment fees, replacing static, rule-based pricing with real-time price recommendations driven by customer demand and booking behavior. The airline deployed reinforcement learning models that continuously adjusted prices and were validated through controlled A/B testing against traditional pricing methods.

Within just two months of going live, airBaltic achieved a 6% increase in seat reservation revenue per passenger, surpassing an initial target of 2–3%, while significantly reducing manual pricing effort through automation. The approach enabled more personalized seat offers aligned with traveler preferences, improving both ancillary revenue performance and the customer purchasing experience.

SaaS businesses

In the SaaS industry, AI can optimize pricing tiers, identify behavioral signals of upgrade readiness, and reduce churn by ensuring pricing aligns with perceived value. The recurring revenue model makes even small improvements highly valuable over the customer lifetime.

Example: Zendesk shifted from charging customers for software access or interaction volume to charging only when an AI agent successfully resolves a customer issue without human intervention. Pricing is therefore tied directly to measurable business outcomes rather than system usage, aligning vendor revenue with customer success. Prices begin around $1.50 per successfully resolved interaction, reinforcing the direct link between cost and delivered value. As a result, Zendesk ensured:

Transition from seat-based SaaS monetization to value-based pricing
Clearer ROI visibility for enterprise buyers
Reduced risk perception when adopting AI automation
Pricing scalability aligned with automation performance

Manufacturing and distribution

B2B pricing in the manufacturing industry involves complex matrices, customer-specific terms, volume discounts, and contract negotiations. AI can optimize quotes for sales teams and manage pricing across thousands of SKU-customer combinations that would be impossible to handle manually.

Example: Global logistics and distribution provider UPS has introduced AI into their B2B pricing operations to address the complexity of contract-based shipping services. Instead of relying on manual pricing decisions, UPS implemented an AI-enabled pricing platform that analyzes historical transaction data, customer segments, and past deal outcomes to recommend optimal prices during negotiations.

The company’s AI-enabled Deal Manager platform recommends prices during negotiations, helping sales representatives identify competitive rates while protecting margins. Following implementation, UPS reported a 22 percentage point improvement in win rates in the U.S., alongside stronger revenue quality driven by reduced over-discounting.

Best AI tools for predicting optimal price points

The market demand for AI pricing tools spans several categories, each suited to different organizational needs.

End-to-end pricing platforms: Enterprise suites like PROS, Pricefx, and Zilliant offer built-in AI with broad functionality. They work well for organizations that want packaged solutions.
Cloud ML services: AWS SageMaker, Google Vertex AI, and Azure ML provide infrastructure for building custom pricing models from scratch. They require more technical capability but offer maximum flexibility.
Specialized pricing engines: Solutions like Competera and Intelligence Node focus on specific verticals, often retail. They bring domain expertise but may not fit other industries.
Custom-built systems: When off-the-shelf tools can’t handle proprietary business logic, complex integration requirements, or unique competitive dynamics, custom development becomes the path forward.

For enterprises with high load, real-time requirements, and complex data environments, custom solutions often outperform packaged alternatives, particularly when pricing logic includes specific business rules and exception handling.

AI tools and approaches for predicting optimal price points

Category	Representative tools/platforms	Core strengths	Key use cases	Typical enterprise fit
End-to-end pricing platforms	PROS Pricing • Pricefx • Zilliant	• Out-of-the-box pricing AI & optimization • Demand sensing, price elasticity models • Pricing workflows & governance	• Organizations needing a full pricing suite • Multi-product, multi-market pricing • B2B and B2C pricing operations	Large enterprises/pricing-mature orgs
Cloud ML services	• AWS SageMaker • Google Vertex AI • Azure ML	• Full flexibility to engineer models • Leverage custom features & external signals • Integrate with broader data ecosystem	• Unique pricing strategies • Proprietary signals or advanced econometrics	Tech-savvy teams building bespoke models
Specialized pricing engines	• Competera • Intelligence Node	• Retail-focused dynamic pricing • Competitive price tracking • Category & SKU-level optimisation	• Digital commerce pricing • Competitive index + real-time repricing	Retail/e-commerce & marketplaces
Custom-built systems	Custom ML models & pipelines	• Fully tailored business logic • Integrates deeply with internal systems	• Complex price rules • Non-standard product bundles/market dynamics	Enterprises with niche/proprietary needs

Based on how your current pricing strategy impacts revenue and profitability, choose the appropriate tool or solution. For example, if your margins have consistently fallen below target for several months, investing in custom development may introduce unnecessary risk. However, if budget capacity exists and the projected ROI justifies the investment, custom development can deliver long-term advantages. But you still need to continuously validate the process through structured measurement and controlled experimentation.

Build production-grade AI pricing systems tailored to your data and infrastructure

Tap into Xenoss expertise

Challenges of AI dynamic pricing and how to overcome them

Based on our year-long experience delivering custom AI solutions, we’ve outlined the four challenges below as the most impactful ones in AI infrastructure development.

Data quality and availability

Common data quality and management issues include incomplete transaction histories, inconsistent product categorization, and missing competitor or market data. Mitigation approaches include data enrichment services and, in some cases, synthetic data generation to fill gaps.

Model explainability and trust

Business stakeholders often resist “black box” recommendations. Using interpretable AI techniques and providing transparent pricing logic that explains why the system recommended a specific price builds the confidence needed for adoption.

Integration complexity

Legacy ERP and e-commerce systems weren’t designed for real-time pricing feeds. Modern solutions use middleware, APIs, and event-driven architectures to bridge the gap, but integration work often consumes more project time than model development.

Organizational change management

Pricing teams may view AI as a threat rather than a tool. Training, clear communication about how roles will evolve, and phased rollouts that demonstrate value before full deployment help manage cultural resistance.

The ethical and regulatory landscape of AI pricing

AI-powered dynamic pricing must align with region- and industry-specific regulations, consumer protection laws, and brand risk management practices.

Regulatory momentum in the European Union

In July 2025, the European Commission launched a public consultation under the Digital Fairness Act (DFA), explicitly identifying dynamic pricing as an area requiring stronger consumer safeguards. The commission paid particular attention to practices in which companies advertise attractive entry prices, while algorithms later apply real-time price increases during the purchasing process.

Regulatory expectations became more concrete following the Court of Justice of the European Union’s October 2024 ruling in the Aldi Süd case. After that, the court confirmed that advertised discounts must be calculated against the lowest price offered within the previous 30 days, effectively classifying artificial price increases prior to promotions as a legal risk. As a result, algorithmic pricing systems now fall directly within consumer protection and compliance oversight.

Regulatory developments in the United States

U.S. regulators are focusing primarily on competition and data usage. In July 2024, the Federal Trade Commission (FTC) initiated a Section 6(b) investigation into so-called surveillance pricing, examining how companies use personal and behavioral data to influence prices. This continued in March 2025 when the Department of Justice Antitrust Division submitted a statement of interest addressing risks of algorithmic collusion.

Legislative proposals are also emerging. Senator Amy Klobuchar reintroduced the Preventing Algorithmic Collusion Act in January 2025, seeking amendments to the Sherman Act that would restrict pricing algorithms trained on nonpublic competitor data. At the state level, New York’s S 3008 law, effective July 2025, requires businesses to disclose when algorithmic systems use personal data to determine prices.

The reputational dimension: Transparency over price

Regulation is only one side of the equation. Customer feedback increasingly determines whether dynamic pricing succeeds or fails. The widely criticized Oasis/Ticketmaster ticket pricing episode in 2024, where tickets initially priced at £148 surged to nearly £355, demonstrated that consumer backlash is rarely about price increases alone. The central issue was opacity.

Consumers generally accept surge pricing models, such as those on ride-hailing platforms, because pricing mechanisms are transparent and alternatives are clear. Hidden algorithmic repricing and price gouging, by contrast, create a perception of manipulation, triggering long-term brand damage.

A practical compliance framework for revenue leaders

Successful AI pricing programs share three governance principles:

Transparency by design. Clearly disclose when and why dynamic pricing is applied.
Pricing guardrails. Implement hard price floors and ceilings, and require human approval for significant adjustments.
Data governance and auditability. Maintain traceable records of pricing decisions, particularly when personal or behavioral data informs segmentation.

Responsible implementation is no longer a differentiator but a prerequisite for sustainable AI-driven revenue and price optimization.

How to measure revenue impact from AI pricing

Proving ROI requires controlled experiments and clear attribution. The metrics that you should measure include:

Revenue per transaction. Track changes in average order value. Even small improvements compound across high transaction volumes.
Sales growth. Measure whether optimized pricing increases conversion rates or expands demand without relying on aggressive discounting. Sustained growth indicates that pricing better aligns with customer willingness to pay.
Margin contribution. Measure gross margin improvement. This helps confirm that revenue gains come from smarter pricing decisions rather than higher sales volume alone.
Price realization rate. Compare actual prices achieved to list prices. Improvements typically signal reduced discount leakage and stronger pricing discipline across sales teams or automated channels.
Win rate (B2B). Track quote-to-close conversion. Higher win rates combined with stable margins indicate pricing competitiveness without sacrificing profitability.
Inventory turnover. Measure how pricing affects sell-through and the age of inventory. Faster turnover often reflects better synchronization between demand signals and pricing decisions.
Cost-to-serve reductions. Evaluate whether pricing helps prioritize profitable customers, products, or delivery conditions. AI pricing can reduce operational inefficiencies tied to low-margin transactions.

Without well-established controls, organizations cannot reliably separate AI impact from broader market conditions. For instance, A/B testing against control groups provides the cleanest measurement.

AI-powered dynamic pricing: Implementation takeaways

AI pricing rarely produces dramatic overnight results, and that’s precisely the point. Its value lies in systematically removing revenue leakage that organizations have historically accepted as unavoidable. Over time, better pricing decisions compound into stronger margins, more predictable revenue, and improved operational efficiency.

At Xenoss, we help companies design and implement AI pricing systems that integrate directly into existing sales, data, and operational workflows, ensuring measurable ROI.

The post Dynamic pricing strategy: How AI-powered pricing drives revenue growth appeared first on Xenoss - AI and Data Software Development Company.

Sales automation: How AI transforms B2B sales cycles and improves forecast accuracy

Maria Novikova — Thu, 12 Feb 2026 15:38:06 +0000

B2B sales leaders must keep several plates spinning: hit revenue targets, shorten the deal cycle, and by all means maintain customer trust. And the latter is getting particularly harder every year.

72% of B2B buyers expect one-on-one consultations and personalized, high-touch support. Plus, 67% of sales professionals say that personalization is more important to customers than last year.

But sellers still spend almost 60% of their time on non-selling tasks, which prevent them from actively engaging with clients. AI-powered sales automation software can free up sales teams for building stronger human relationships with customers. Here are some proofs:

94% of sales managers admit that AI agents help them with a better understanding of customers’ needs, and 92% use AI for automating prospecting
64% of Chief Revenue Officers (CROs) plan on integrating AI to automate manual sales tasks
Sales representatives report a 30% increase in win rates thanks to using AI
85% of SDRs use AI to free time for more value-adding work, and 84% apply sales AI tools for training and acquiring new skills

A user on Reddit shares similar excitement for using AI in optimizing the sales cycle and building customer trust:

Where I think AI can make a huge difference is in areas like:

Forecasting and deal health

Analyzing calls and meetings to surface action items, objections, and sentiment

Sales training and roleplaying to get reps ready for real conversations

I don’t think AI will replace sales pros, but I am bullish on AI-augmented reps beating everyone else.

How exactly you can use AI for sales at your company depends on the strengths and weaknesses of your sales team, current revenue goals, and sales pipeline management practices. In this deep-dive analysis, we’ll help you decide on the right AI use cases and implementation patterns, supported by real-life examples and ROI metrics.

AI sales automation: Why revenue leaders need it

Adoption of revenue-specific AI solutions directly correlates with a 13% increase in revenue growth and a 85% higher commercial impact, according to a survey of more than 3,000 revenue and sales leaders.

These gains rarely come from automation alone. Instead, AI strengthens the underlying revenue system by improving signal quality, surfacing deal risk earlier, increasing selling time per rep, and tightening alignment between sales, finance, and RevOps teams.

Justin Shreiber, the CEO and Founder of Terret, offers his point of view on the purpose of AI in the modern sales management process:

AI isn’t replacing sales. It’s forcing revenue teams to become systems thinkers and doubling the value of real human trust.

An AI-driven sales automation tool forces revenue teams to operate like engineered systems rather than collections of individual sellers. Once AI starts:

scoring leads,
flagging deal risk,
automating follow-ups,
predicting outcomes,

any weakness in data quality, process design, or handoffs becomes visible immediately. This pushes CROs to design sales as a repeatable, measurable system. And once these processes start to work like clockwork, real human judgment and credibility don’t disappear, but become twice as valuable, as only people can interpret nuance and build trust in ambiguous situations.

Chris Clement, VP of Sales at EPIC Insights, highlights in his post that this year, CROs will be expected to deliver far beyond monetary value:

In 2026, great CROs are judged on more than numbers:

Sales productivity per head.
Cross-functional alignment with RGM, Finance, and Insights.
Retailer and customer NPS as a measure of partnership quality.

Together, they show three dimensions of modern revenue leadership:

Dimension	What it answers
Productivity	Can we grow efficiently?
Alignment	Can we predict and control growth?
Trust	Will that growth last?

Rather than fearing that AI can replace SDRs or erode human trust, you can leverage AI as an advanced automation tool to increase productivity, help your sales teams engage in valuable conversations with customers, and, consequently, increase revenue. Put simply, AI removes the pressure of managing sales numbers manually and allows sales teams to focus on what drives those numbers: trust, relevance, and real customer interactions.

Deeply integrate AI into your sales strategy to quickly reach revenue targets

Explore what we offer

Sales automation use cases: Lead scoring, forecasting & CRM

96% of revenue teams plan to actively use AI in 2026, and their top priority will be increasing sales reps’ productivity through a number of strategic use cases illustrated below.

AI use case in sales

We’ve grouped those use cases into three categories, and we’ll analyze their impact on SDR productivity and company revenue growth through real-life examples.

AI-powered lead scoring

Traditional CRM sales automation was designed to enforce consistency. It applies predefined rules and workflows to move leads through the sales funnel. For example:

If a lead downloads a whitepaper, add ten points.
If they belong to a target industry, add five more.
If they don’t respond after three emails, mark them as cold.

These systems help standardize processes, but they don’t improve themselves. They rely on assumptions created upfront and rarely adapt to changing customer behavior.

AI-driven lead scoring works differently. Instead of following static logic, it learns from historical outcomes: which leads converted, which deals stalled, which behaviors correlated with closed revenue, and continuously adjusts recommendations based on live data.

Example: Grammarly’s implementation of AI lead scoring solution increased premium plan conversions by roughly 80%, while a machine learning model deployed at a price comparison service drove a 20% jump in lead-to-opportunity conversions.

Sales forecasting models

Only 7% of sales teams achieve at least 90% accuracy in sales forecasting, and 69% of respondents say forecasting has gotten much harder than it was three years ago.

AI-based automated sales tools can be a viable alternative to manual and time-consuming forecasting. Machine learning solutions can provide high predictive accuracy at high speed. For instance, here’s the flow of how AI models can predict the deal win probability by evaluating multiple parameter categories:

Deal-specific factors: Deal size, sales stage, time in stage, discount level, contract terms.
Engagement signals: Email opens, meeting frequency, stakeholder count, response latency.
Customer attributes: Company size, industry, past purchase history, tech stack fit.
External conditions: Budget cycle timing, competitive pressure, and economic indicators.

Example: A leading European food distributor struggled with inaccurate manual sales forecasts, resulting in overstocking, spoilage of perishable goods, and lost revenue during seasonal peaks. To fix this, they developed a custom machine learning forecasting platform that consolidated historical ERP sales data, real-time orders, seasonality trend analysis, and external variables into a centralized predictive analytics model.

The system generated SKU-level demand forecasts and early risk alerts, enabling procurement teams to proactively adjust orders. As a result, the company reduced inventory waste by 34%, improved demand planning accuracy by 29%, and strengthened supplier negotiations while maintaining high product availability during peak demand periods such as Easter and Christmas.

CRM automation and sales activity tracking

CRM data entry is the largest time sink (~8-12 hours weekly) for frontline sales workers who manually log calls, update opportunity stages, and sync calendar activities. Conversation intelligence platforms auto-populate CRM fields by transcribing calls, extracting action items, identifying mentioned competitors, and updating deal stages based on conversation content.

A sales rep on Reddit emphasizes what’s particularly draining for them when working with CRMs:

The biggest time drain isn’t what most people think. Data entry gets all the attention, but the real killer is context switching between CRM tabs to piece together account history before calls. Sales reps spend 12-18 minutes per call just clicking through activity logs, emails, and notes to prep. That’s where automation actually saves hours, not in field updates.

Automate these first for maximum time recovery: Pre-call briefing summaries that pull recent activities into one view, automatic activity logging from email and calendar so reps never manually log touchpoints, and deal stage progression triggers that update fields when specific actions occur. These three alone typically reclaim 6-8 hours per rep per week because they eliminate repetitive navigation and clicks.

Scheduling and preparing for meetings is another tiresome task for salespeople. AI scheduling assistants (e.g., Calendly, Chili Piper integrated with Salesforce) can help sales managers by offering real-time availability, automatically handling time zone conversions, sending prep materials, and rescheduling.

Sales territory planning can be time-consuming when done manually, but it’s crucial for optimizing market coverage.

Noah Berliner, General Manager, Global Head of Sales at Moody’s Analytics, in his interview with Gartner, shares their company’s approach to using generative AI in sales territory planning:

Internally, we built a sales recon tool that provides sellers with all the information they need about their territory. It pulls data from Salesforce, our news data, and company data, showing which products companies in their territory are not buying and what news and sentiment suggest they should buy. It builds a whole territory plan in 10 minutes, something that used to take several weeks.

When choosing appropriate use cases for AI adoption, analyze which sales processes take the most time and effort but yield zero (or almost zero) efficiency for the team. For instance, meeting with clients or visiting them in person can also be time-consuming yet highly efficient. By contrast, daily entering repetitive data in CRMs is both time-consuming and inefficient.

Build AI-powered sales automation platforms to increase customer value and unburden sales teams

Talk to an AI sales expert

Sales automation ROI: Win rates, cycle time & forecast accuracy

ZoomInfo survey revealed the following outcomes from using AI on a daily basis:

How AI impacts sales processes

Plus, 76% of respondents improved their win rates, and 78% decreased their sales cycles. This proves that the best results come from deeply integrated AI systems and their consistent daily use. Infrequent use may not yield the expected results, only discrediting the AI’s value to stakeholders.

Below is a table showing the improvements you can expect from AI sales solutions compared to traditional sales tools and practices.

Capability area	Traditional sales tools	AI-powered sales systems
Lead scoring and prioritization	Rule-based, static point models	Dynamic, behavior-based scoring that learns from real outcomes (engagement patterns, deal history, signals)	• Higher qualified lead conversion (increase by 20–80%) • Reduced SDR time per qualified lead • Improved pipeline quality
Sales forecasting	Manual spreadsheets and rep judgment	Predictive models analyzing engagement signals, deal attributes, sentiment, and lead scoring	• Forecast accuracy (MAPE reduction) • Fewer forecast slippages • Better capacity and revenue predictions
Personalization	Static segmentation (industry, persona)	Real-time personalization at the account & contact level	• Higher response/engagement rates • Higher win/loss ratios • More targeted messaging
Data and CRM hygiene	Manual logging, batch updates	Automated activity capture, CRM enrichment, and error alerts	• Time reclaimed per rep (6–10 hrs/week) • More reliable pipeline data • Reduced administrative cost
Sales execution support	Templates and macros	AI-suggested next steps, call insights, objection detection	• Improved conversation quality • Higher deal progression rates • Reduced coaching cycle time
Deal risk and opportunity insights	Reactive review during pipeline meetings	Proactive alerts on stalled deals, low engagement, and pricing risk	• Fewer late-quarter surprises • Higher win probability forecasting • Better pipeline coverage
Manager/RevOps productivity	Manual reporting, static dashboards	Automated dashboards with predictive signals	• Time saved in reporting • Faster decision cycles • Cross-functional alignment improvement
Training and enablement	Manual role-plays, standard sessions	AI-augmented coaching, scenario simulation, and feedback loops	• Faster ramp time • Higher rep competence scores • Better skill retention

However, to achieve positive outcomes from AI implementation in sales cycles, you’ll need to consider many factors. In the next section, we explain how to get started and finish your AI for sales enablement project efficiently.

B2B sales automation implementation: 6 best practices

After building AI sales systems for B2B organizations across diverse industries, including manufacturing, healthcare, AdTech, and MarTech, we’ve learned what successful AI implementation requires.

1. Audit CRM data quality and sales processes

We start every engagement with a data quality audit and workflow analysis. Our team examines CRM hygiene across multiple criteria: field completion rates (targeting 95% for critical fields like deal stage, close date, and contact roles), stage definition consistency, duplicate record prevalence, and historical data depth. In our experience, organizations typically discover that 30-40% of their CRM records contain incomplete or inconsistent data, undermining model accuracy.

The audit also maps manual bottlenecks, where reps spend time on manual data entry, research, or administrative tasks that AI could handle. For instance, one enterprise client had seven different definitions of “qualified lead” across regional teams. Standardizing that taxonomy before model training prevented garbage-in-garbage-out scenarios.

2. Set sales forecasting accuracy targets

We work with sales leadership to establish specific, measurable AI objectives tied to business outcomes. Rather than vague goals like “improve forecasting,” define success: reducing forecast error from 25% to 10%, increasing pipeline coverage visibility by 30 days, or improving deal win probability accuracy by 15%.

Our standard metrics framework includes forecast accuracy (weighted pipeline vs. actual bookings), monthly mean absolute percentage error (MAPE), pipeline coverage ratios by stage, and changes in deal velocity. We also establish baseline measurements before implementation, so improvement is quantifiable. For one client, we tracked that their manual forecast process had a 32% MAPE. After six months of using a custom AI system, that number dropped to 14%.

3. Select AI models and tools

The build-vs-buy decision depends on the complexity of the sales motion and the uniqueness of the data. We guide clients through this evaluation by analyzing deal-cycle characteristics, product-portfolio complexity, and integration requirements. Off-the-shelf platforms work well for transactional sales with standard motions, short cycles, single-product focus, and straightforward buyer journeys.

Complex selling environments (e.g., multiple products with different sales cycles, enterprise deals with 6-12 month timelines, multi-stakeholder buying committees, or highly customized solutions) typically require custom models trained on proprietary data. The investment in custom development pays off when forecast accuracy directly impacts revenue planning and resource allocation decisions.

4. Integrate with CRM and data infrastructure

Our integration approach connects AI models to the full data infrastructure. We build pipelines that pull from Salesforce, HubSpot, or Microsoft Dynamics, then enrich with marketing automation data (Marketo, Pardot), customer success platforms (Gainsight, ChurnZero), product usage analytics, and finance systems for revenue recognition.

We also implement reverse ETL patterns to sync predictions back to operational systems, ensure deal scores appear in Salesforce opportunity records, recommended actions surface in rep dashboards, and forecast adjustments flow to financial planning tools. One of our manufacturing clients required integration with their ERP system to factor production capacity into deal probability. That bidirectional sync between the data warehouse and six operational systems took three weeks but delivered forecasts that aligned with fulfillment reality.

5. Sales team AI training and governance

AI model accuracy means nothing if reps don’t trust or act on AI recommendations. 85% of sales reps haven’t received any formal training on using AI, yet 78% admit they would like it.

Training programs explain how models generate predictions, what signals drive scores, and when to override AI guidance based on context the model can’t see. You can run workshops where sales managers review deals alongside AI predictions to build intuition about model behavior.

Establish governance frameworks which cover data access controls (who can see which predictions), model update cadences (typically monthly retraining with weekly scoring refreshes), forecast review processes (weekly pipeline reviews with AI-flagged deals), and escalation paths when predictions seem wrong.

You can also implement feedback loops that allow reps to flag incorrect predictions. This human-in-the-loop input improves model accuracy over time. Without change management and clear governance, even accurate AI predictions get ignored.

6. Monitor AI model performance and retrain

Build dashboards to monitor model performance and track it against real-time outcomes. Key metrics include prediction accuracy by deal stage, calibration curves showing whether 70% probability deals close as predicted, and drift detection that identifies when model accuracy degrades due to market changes or process shifts.

For instance, our standard practice includes monthly performance reviews and quarterly model retraining cycles.

Final takeaway

AI in B2B sales doesn’t change the goal. Revenue teams still need to hit targets, shorten cycles, and earn customer trust. The change is in how those results are achieved.

When AI is woven into daily sales workflows, the effects become visible quickly. Reps spend less time navigating CRMs and more time preparing for meaningful conversations. Managers spot risks earlier, rather than reacting at the end of the quarter. Forecasts become clearer, which improves planning across finance, marketing, and operations. The improvements in win rates and cycle time are a natural outcome of that clarity.

The Xenoss team helps you select top sales automation tools. We also design and integrate AI algorithms for lead scoring, forecasting, and sales execution directly into your CRM and data infrastructure, ensuring predictions are reliable, explainable, and aligned with real business metrics.

The post Sales automation: How AI transforms B2B sales cycles and improves forecast accuracy appeared first on Xenoss - AI and Data Software Development Company.

How AI demand forecasting reduces inventory costs and improves accuracy

Maria Novikova — Tue, 10 Feb 2026 19:28:03 +0000

Supply chain teams have spent decades refining demand forecasts, but most still operate with error rates between 20% and 50%. That gap between predicted and actual demand translates directly into excess inventory sitting in warehouses or empty shelves losing sales.

AI-driven forecasting is starting to change this picture. 58% of supply chain executives are prioritizing forecasting and risk management improvements in 2026. And the investment is paying off: 91% of retailers are now actively using or evaluating AI, with 89% reporting measurable revenue increases. Organizations applying machine learning to demand planning typically see error reductions of 20–50% and inventory cost savings in the range of 20–30%.

This article walks through how AI forecasting works, what infrastructure you’ll need, and how to figure out if your organization is ready to make the leap.

AI demand forecasting explained: How machine learning predicts customer demand

AI-powered demand forecasting uses machine learning and predictive analytics to estimate how much product customers will buy. 70% of large organizations will adopt AI-based forecasting by 2030. But many aren’t waiting, 87% of enterprises already use AI for demand forecasting, with companies reporting accuracy improvements of 35% or more.

So what makes AI different from traditional methods? The short answer: scale and adaptability.

AI models can process enormous datasets simultaneously, pulling in historical sales, weather patterns, social media buzz, economic indicators, and more. Traditional statistical methods tend to rely on historical averages and manual adjustments that get updated weekly or monthly. AI forecasts can adjust dynamically as market conditions shift.

AI forecasting systems typically predict:

Demand volume: How many units customers will purchase
Timing: When demand spikes or dips will occur
Geographic distribution: Where demand concentrates across regions
Channel patterns: How demand differs between e-commerce, retail, and wholesale

Why traditional forecasting fails: The case for AI demand forecasting

Limited data processing in spreadsheet-based forecasting

Spreadsheet-based planning tools cannot handle the volume and variety of data modern supply chains generate. Point-of-sale transactions, web traffic, social media signals, weather feeds, and competitor pricing all contain demand signals.

Traditional spreadsheet methods typically work with just 3 to 5 variables, while AI systems can analyze 20 to 50 or more at once. With traditional tools, planners end up working with a narrow slice of what’s available.

How traditional methods miss complex demand patterns

Linear regression and moving averages assume that relationships between variables are fairly straightforward. In practice, demand often follows non-linear patterns. A 10% price cut might boost sales by 5% in one region and 25% in another, depending on local competition and what time of year it is. Traditional methods miss these kinds of interactions entirely.

Slow forecast updates create costly supply chain gaps

Most traditional forecasts update on fixed schedules, usually weekly or monthly. When a competitor launches a flash sale or a viral social media post drives unexpected interest, batch-updated forecasts are already stale.

AI-based systems can adjust forecasts within hours, detecting demand shifts through real-time POS data and external signals. The lag between market changes and forecast updates in traditional systems creates costly misalignment.

Manual forecasting drives high error rates and planner burnout

Demand planners using traditional methods spend significant time on data entry, reconciliation, and manual overrides. Each touchpoint introduces potential for human error and subjective bias. One misplaced decimal or optimistic adjustment can cascade through the entire supply chain.

Factor	Traditional forecasting	AI-driven forecasting
Data sources	Limited historical sales	Internal + external signals
Update frequency	Weekly or monthly batches	Near real-time
Granularity	Category or regional level	SKU-location-day level
Adaptability	Static until manually updated	Continuous learning

Let's discuss your forecasting challenges

Whether you're starting from scratch or optimizing an existing system, Xenoss engineers can help you build AI forecasting that works.

Book a free consultation

How AI improves demand forecasting accuracy

Machine learning pattern recognition for demand signals

Machine learning algorithms identify correlations that human analysts would never spot manually. A model might discover that sales of a particular product spike three days after specific weather patterns in certain zip codes. Combining techniques like LSTM, XGBoost, and Random Forest can reduce forecast error from around 28.76% to 16.43%, a drop of about 42.87%. Those kinds of subtle, multi-dimensional relationships simply aren’t visible through traditional analysis.

AI demand sensing: Using external data to predict shifts early

AI models pull in signals like weather forecasts, economic indicators, social media sentiment, and event calendars to sense demand shifts before they show up in sales data.

This makes a real difference in practice. Unilever’s ice cream division improved forecast accuracy in Sweden by 10% by analyzing weather patterns, enabling it to position inventory before demand spikes.

In key markets, this translated to sales increases of up to 30% within a single year. Demand sensing allows for proactive adjustments rather than reactive scrambling.

SKU-level AI forecasting for precise inventory planning

Rather than forecasting at the category level and allocating downward, AI enables bottom-up forecasting at the individual product-location-day level. This precision lets retailers optimize inventory at the store and customer level rather than at a regional level. This granularity dramatically improves replenishment accuracy and reduces the safety stock buffer needed at each distribution point.

How AI models learn and adapt to changing demand

AI models automatically retrain on incoming data, adapting to evolving consumer behavior without requiring manual intervention. When demand patterns shift due to tariff announcements or geopolitical disruptions, as supply chains experienced throughout 2025’s trade policy volatility, AI systems can detect and adjust within days rather than quarters.

How AI-powered forecasting reduces inventory costs

Lower safety stock requirements with accurate AI forecasts

When forecast confidence improves, planners can carry leaner buffer inventory without risking stockouts. By generating SKU-level forecasts with tighter error bands, these models enable leaner safety stocks that free up working capital previously tied to dormant inventory.

In 2025, packaging manufacturer Novolex reduced excess inventory by 16% and shortened planning cycles from weeks to days by combining historical sales data with external market signals.

Walmart uses AI-powered forecasting to optimize inventory placement decisions across its network, ensuring that safety stock isn’t sitting idle in warehouses while stores face potential shortages.

Unlike static formulas that require manual updates, AI systems continuously adjust safety stock levels based on demand trends, supplier reliability, and market conditions. Businesses using intelligent forecasting reduced excess inventory carrying costs by 20% while simultaneously cutting stockouts by 15%.

Reduced warehousing costs through better demand prediction

Less excess inventory directly reduces warehousing costs, insurance premiums, and material handling expenses. For companies with extensive distribution networks, the savings compound across every facility. Warehousing costs can fall by 5 to 10 percent with AI-driven forecasting in place.

Fewer stockouts: How AI forecasting protects revenue

Better demand sensing prevents out-of-stock situations that send customers to competitors. Lost sales due to stockouts can decrease by up to 65% with AI forecasting. The revenue protection from avoiding stockouts often exceeds the direct cost savings from reduced inventory.

Reducing waste and obsolescence with AI demand planning

Accurate forecasting reduces overproduction and the risk of holding expired or outdated inventory. This matters especially for perishable goods, fashion items, and electronics with short product lifecycles.

Nestlé’s 90-day AI pilot generated $2.3 million in additional revenue while achieving 176% conversion rate improvement, demonstrating how targeted AI can drive both top-line growth and waste reduction.

Core capabilities of AI-driven forecasting systems

Real-time demand sensing and dynamic forecast updates

Streaming data pipelines let models update predictions as new signals arrive, including social media spikes, competitor price drops, or unexpected weather events. 62% of supply chain leaders say AI agents embedded in operational workflows accelerate speed to action. 70% of executives expect their employees to be able to drill deeper into analytics for real-time analysis as AI agents automate operational processes. This represents a fundamental shift from batch systems that wait for scheduled updates.

What-if scenario planning for supply chain decisions

AI platforms let planners model “what-if” scenarios:

What happens to demand if we run a 15% promotion next month?
What if a key supplier faces delays?

67% of companies that deployed agentic AI in supply chain and inventory management in 2025 saw a significant increase in revenue. Scenario planning transforms forecasting from a prediction exercise into a genuine decision-support tool.

Multi-channel inventory optimization across sales channels

AI-driven forecasting supports sophisticated allocation across e-commerce, retail, and wholesale channels. The system can optimize where to position inventory based on predicted demand by channel and location.

Automated reordering connected to AI forecasts

Production-grade systems connect forecasts directly to ERP and ordering systems, automatically generating purchase orders or triggering production schedules. Automation reduces manual effort and speeds the replenishment cycle.

How AI demand forecasting works step by step

1. Data collection and integration

The process begins with aggregating relevant data: historical sales, inventory levels, promotions, and external signals, into a unified data layer. Data quality at this stage determines everything that follows.

2. Feature engineering and preparation

Raw data gets transformed into features the model can actually use: lag variables (past values that help predict future ones), encoded categories, and handled missing values. Feature engineering often consumes more time than model training itself, but it’s where much of the value gets created.

3. Model training and validation

Machine learning models train on historical data, then validate against a holdout period the model hasn’t seen. Validation reveals whether the model generalizes to new situations or merely memorizes patterns from training data.

Current AI models achieve 87% accuracy for 30-day demand forecasts, 76% for 90-day predictions, and 62% for annual planning.

4. Deployment and real-time inference

Validated models deploy to production environments where they generate forecasts on a scheduled or an on-demand basis. The deployment architecture determines whether forecasts update in minutes or hours.

5. Continuous monitoring and retraining

A feedback loop tracks forecast accuracy over time, detecting model drift when performance degrades because market conditions have changed. Fully autonomous forecasting still requires human judgment, which is why continuous monitoring remains essential. Automated retraining on fresh data maintains accuracy as conditions evolve.

Data and infrastructure requirements for AI forecasting

Historical sales and transaction data

Most AI forecasting implementations require two to three years of clean, granular transactional data. The quality and completeness of historical records directly impact model accuracy.

External data sources and APIs

Weather APIs, economic indicators, promotional calendars, and competitor pricing feeds enhance forecast accuracy. The challenge lies in integrating diverse sources reliably and maintaining data freshness.

Real-time data pipeline architecture

Enabling real-time demand sensing requires streaming or micro-batch pipelines built with tools like Apache Kafka, Flink, or managed cloud services. Organizations moving toward autonomous decision-making need infrastructure supporting simultaneous analysis of inventory levels, supplier performance, and market trends. Batch-only architectures limit how quickly you can respond to market changes.

Compute and storage considerations

Training and running AI models at scale requires cloud compute instances, GPU resources for complex models, and scalable storage. Infrastructure costs scale with data volume and model complexity.

How to get started with AI in demand planning

1. Audit your current data quality and sources

Before selecting tools or partners, assess the completeness, accuracy, and accessibility of existing data. A thorough data audit is the most critical first step and often reveals gaps that would undermine any AI initiative.

2. Define forecast granularity and business rules

Determine the level of detail your business requires (SKU, location, day, or hour) and identify constraints the model respects, such as supplier lead times or minimum order quantities.

3. Select build versus buy approach

Evaluate tradeoffs between building custom systems in-house versus purchasing platforms. Consider required flexibility, total cost of ownership, internal expertise, and desired time-to-value.

4. Plan integration with ERP and WMS systems

Create a clear plan for connecting forecast outputs to downstream systems. Key integrations include ERP, order management, warehouse management, and production planning software. By 2030, 50% of cross-functional supply chain solutions will use intelligent agents that operate across these systems autonomously.

5. Establish governance and change management

Develop processes for forecast review, exception handling, and training for demand planners transitioning from manual methods. Technology adoption fails without organizational readiness.

What to look for in an AI forecasting solution

Scalability for high data volumes

The solution handles millions of SKU-location combinations without performance degradation as your business grows. Ask vendors about their largest deployments and how they handle peak loads.

Integration with existing tech stack

Pre-built connectors or flexible APIs for your ERP, WMS, and BI tools prevent data silos. Integration complexity often determines the implementation timeline.

Forecast explainability and transparency

Demand planners trust model outputs when they understand why predictions were made. Look for feature importance explanations, confidence intervals, and anomaly flagging.

Production readiness and ongoing support

Choose enterprise-grade systems built for high uptime and robust monitoring, not prototype-level tools. Ensure the vendor provides ongoing support and model maintenance.

Custom AI forecasting solutions for enterprise supply chains

For organizations that require custom, enterprise-grade AI forecasting systems, partnering with experienced engineers accelerates time-to-value while reducing implementation risk.

Xenoss specializes in building production-ready AI solutions with robust integration, scalability, and domain expertise across CPG, retail, and manufacturing.

Our teams have delivered forecasting systems that integrate seamlessly with existing data infrastructure, connecting real-time pipelines, ERP systems, and analytics platforms into unified decision-support environments.

Book a consultation to discuss your forecasting challenges →

The post How AI demand forecasting reduces inventory costs and improves accuracy appeared first on Xenoss - AI and Data Software Development Company.

Predictive analytics in supply chain management: Implementation roadmap

Maria Novikova — Mon, 02 Feb 2026 18:40:37 +0000

The last decade exposed one of the major structural weaknesses in traditional supply chain management: poor risk visibility and underutilized data.

As Gus Trigos, AI Product Engineer at Nuvocargo, explains:

“Data is abundant, yet siloed across the supply chain. Teams rely on tools built in the 1990s–2010s, designed for manual data entry. This creates bottlenecks, drives errors, and is often ‘solved’ by adding headcount, compounding complexity.”

Traditional statistical forecasting can’t keep pace with consumers’ expectations for delivery speed. 90% of shoppers would like to have items delivered to their doorstep in two to three days, and every third consumer is expecting same-day service.

Meeting these demands puts pressure on supply chain management teams to stay ahead of weather disruptions, supplier risks, and demand shifts.

This is why leaders are turning to predictive analytics.

Key layers of predictive analytics for supply chain management

What is predictive analytics in supply chain management?

Predictive analytics in supply chain management is the use of historical and real-time data, statistical models, and machine-learning techniques to forecast demand, risks, and operational outcomes.

This technology allows organizations to proactively optimize sourcing, inventory, production, and logistics decisions before disruptions or inefficiencies occur.

Predictive analytics platforms enable a consistent flow of accurate predictions and actionable decisions by connecting three structural layers: data sources, machine learning models, and consumption-ready interfaces.

Data layer

To build accurate, timely predictions, data engineering teams combine internal sources: ERPs, WMS systems, sensors, with external feeds.

Internal data includes sales history, inventory levels, lead times, production output, and transportation events.

External signals provide visibility into weather patterns, promotions, market trends, and macroeconomic indicators.

Operationalizing these sources requires a modern data stack: ingestion tools to pull from ERPs, WMS, TMS, and external APIs, a centralized warehouse or lake to store and align data, and transformation tools to clean, validate, and version datasets.

Predictive analytics is only as good as the data behind it.

Xenoss engineers help you extract, reconcile, and structure data across systems, so your models deliver results you can trust.

Explore our data engineering services

Prediction layer

The prediction engine transforms raw data into actionable forecasts and risk signals. It applies statistical and machine-learning models to identify patterns, quantify uncertainty, and estimate outcomes like demand levels, lead-time variability, or disruption risk.

Common approaches include:

Time-series forecasting (ARIMA, exponential smoothing, Prophet) models historical patterns: trend, seasonality, cyclesto project future demand or volumes.
Machine-learning regression (gradient boosting, random forests) captures non-linear relationships between demand and drivers like price, promotions, weather, or channel mix.
Probabilistic models (Monte Carlo simulation) represent uncertainty through ranges of outcomes rather than point forecasts, supporting risk-aware decisions on safety stock and service levels.

Consumption layer

The consumption layer operationalizes through integrations, dashboards, and decision rules.

Integrations into planning systems

Predictions feed back into core systems: ERP, S&OP, replenishment engines, TMS, where they adjust parameters like reorder points, production quantities, or routing priorities.

For example, forecasted demand volatility can dynamically modify safety stock, or predicted port congestion can shift freight allocation.

User-facing dashboards

Dashboards surface key findings for operations managers, translating mathematical forecasts into actionable questions:

Which SKUs risk stockout in the next two weeks?
Which suppliers are likely to miss committed lead times?
Which lanes are trending late against SLA?

Predictive outputs are paired with decision rules that define how the organization responds when risk or opportunity thresholds are crossed, such as dual-sourcing when supplier delay risk exceeds a set probability, or expediting only when cost-to-serve stays below margin limits.

These rules can be automated or semi-automated, depending on criticality and risk:

When decision-making is automated, the system executes predefined actions without intervention, dynamically increasing safety stock when demand volatility spikes, or rerouting shipments when predicted delays breach SLA thresholds.

For semi-automated workflows, predictive insights generate recommendations with quantified trade-offs (cost, service impact, risk), allowing planners to approve, modify, or override decisions where stakes are higher or context matters.

4 high-yield use cases for predictive analytics in supply chain operations

1. Demand forecasting

High market volatility has made reactive planning uncompetitive, pushing organizations to proactively anticipate demand and disruptions.

Marcia D. Williams, founder and managing partner at USM Supply Chain Consulting, argues that predictive analytics and machine learning are becoming essential for demand management.

Marcia D. Williams, founder and managing partner at USM Supply Chain Consulting is seeing predictive analytics become a supply chain management must-have

These tools combine historical sales, real-time signals, and ML models to predict demand shifts and optimize inventory. Compared to traditional statistical methods, predictive demand forecasting delivers long-term value, cutting waste and reducing operational costs by up to 30%.

How Danone improved its supply chain with demand forecasting

The company adopted advanced predictive analytics, integrating historical sales, promotions, media signals, and seasonality patterns into continuous demand forecasts. Previously, Danone relied on statistical averages that couldn’t incorporate real-time market data.

The new approach brought in real-time indicators and cross-functional inputs from supply chain, sales, marketing, and finance, creating forecasts that accounted for demand volatility, reduced forecast errors by 20%, and recovered 30% of previously lost sales.

Predictive analytics tools for demand forecasting in supply chain management

Tool	Key features	Notable clients	Advantages	Disadvantages
Blue Yonder: Demand Planning	- AI/ML demand forecasting - Probabilistic forecasts - Exception-based planning workflows.	PepsiCo deployed Blue Yonder planning capabilities (production planning in a supply chain context).	Strong planning UX, mature supply-chain suite	Enterprise implementation effort can be significant
Kinaxis: RapidResponse (Demand Planning / Maestro)	- Concurrent planning and rapid scenario analysis (“what-if”) - Demand planning application integrated with broader supply planning/execution.	Schneider Electric, Ford, Unilever	Excellent for high-volatility environments where teams need fast replanning across functions; strong scenario capability.	Typically better suited to larger enterprises; cost/implementation overhead can be non-trivial
SAP: Integrated Business Planning (IBP) for Demand	- ML/statistical forecasting - Collaborative demand planning - Integrates tightly with SAP landscapes and planning processes.	Blue Diamond Growers implemented supply chain planning solution based on SAP IBP)	Strong choice if you’re already SAP-heavy; good governance + integration for IBP/S&OP operating models.	Value depends on data quality and process maturity Adoption can feel heavy if you need lightweight forecasting only.
o9 Solutions: Demand Planning	- AI/ML forecasting and demand sensing - Collaborative planning on a unified “digital brain” data model with cross-functional workflows.	o9 states 160+ clients overall (not all demand-forecasting-only), and publishes anonymized demand planning case studies.	Strong for “one plan” alignment across demand/supply/finance; good for complex assortments and frequent business changes.	Customer logos and outcomes are often gated/anonymized; can be overkill if you only need statistical forecasting.
Oracle: Fusion Cloud Demand Management (part of Supply Chain Planning)	- Sense/predict/shape demand; built-in ML - Connects demand insights with supply constraints and stakeholder inputs.	Oracle highlights customer stories for demand management (e.g., BISSELL discussing demand management and forecasting in Oracle programming).	Good fit if you want planning tightly integrated with Oracle cloud apps; ML embedded in planning workflows.	Public pricing is limited; the planning stack can be broad - scope control matters to avoid complexity creep.

2. Supplier risk management

McKinsey classifies suppliers into three tiers based on visibility:

Supplier tiers based on the visibility teams have over them

Tier 1: Direct suppliers - about 95% of firms have visibility into risks at this level.

Tier 2: Secondary or sub-tier suppliers - visibility drops sharply, with only 42% of companies able to see into this tier.

Tier 3 and beyond: Supplier companies have little insight into, creating blind spots in risk detection.

Predictive analytics improves visibility into deeper tiers, helping managers spot problems before they disrupt operations.

These tools continuously analyze supplier performance, delivery patterns, quality trends, and external risk signals to forecast where issues are likely to occur.

With proactive risk evaluation, supply chain teams can reduce late deliveries, quality failures, and supplier instability by adjusting orders or renegotiating terms before disruptions escalate.

How Pietro Agostini, an Italian industrial engineering company, tapped into predictive analytics to vet suppliers

During the COVID-19 pandemic, the Italian industrial engineering company built a quantitative supplier risk model to improve how it evaluated and monitored suppliers. Previously, evaluation was largely qualitative and didn’t allow engineers to anticipate disruptions or prioritize responses.

The team developed a quantitative-qualitative risk scoring methodology based on FMEA (Failure Mode and Effects Analysis) principles, assessing the probability, severity, and detectability of supplier risk factors.

The model generated a data-driven risk profile for each supplier and recommended prioritized actions for procurement teams.

Predictive analytics tools for supplier risk management

Tool	Key features	Notable clients	Advantages	Disadvantages
Interos	- AI-driven supplier/disruption risk monitoring - Multi-tier (sub-tier) mapping - Continuous risk scoring across geopolitical, cyber, financial, operational signals - Scenario impact analysis.	Google, NASA, U.S. Navy, L3Harris (reported); also cited: U.S. DoD, Accenture, Freddie Mac.	Strong for network-level visibility and “who’s connected to whom” risk propagation (useful when a Tier-2 event becomes your Tier-1 problem).	Enterprise onboarding depends heavily on supplier/master-data quality and mapping completeness
Resilince	Supplier risk monitoring + event intelligence; multi-tier supplier mapping; disruption alerts; supplier outreach/workflows; resilience analytics for mitigation planning.	IBM, General Motors, Amgen, Western Digital (examples listed in customer references).	Mature disruption management focus (alerts → workflows → mitigation) with strong “operationalization” for supply chain teams.	Breadth across risk types can vary depending on data feeds and configuration.
Everstream Analytics	Predictive risk intelligence for supply chains (weather, port/transport disruption, geopolitical risk, sub-tier supplier risk); early-warning alerts; risk scoring; integration into procurement/logistics/BCC tooling.	Google, Schneider Electric, Jaguar Land Rover, Vestas, HealthTrust Purchasing Group.	Good fit when you want predictive “risk before it hits” for both supplier and logistics disruption patterns (not just static supplier profiles).	Best value typically requires tight integration into planning/exception workflows
Prewave	AI-based risk detection from external signals; supplier monitoring for ESG/compliance + operational risk; real-time alerts; supplier engagement workflows; focus on regulatory readiness and sustainability risk.	Audi, Porsche, Volkswagen, Yanfeng	Particularly strong where supplier risk is tied to ESG/compliance + reputational exposure and you need continuous monitoring at scale.	Depending on use case category, you may still need complementary tools for deep financial/OTIF performance analytics and internal ERP-based supplier KPIs.
Sphera Supply Chain Risk Management (formerly risk methods)	AI-supported supply chain risk detection; supplier risk scoring; sub-tier visibility; compliance + transparency capabilities; alerting and action planning.	Bosch, Deutsche Telekom, Siemens	Strong for teams that want supplier risk assessment integrated with broader operational risk / ESG / compliance programs under one umbrella.	As a broad risk platform, scope can expand quickly; value realization depends on disciplined use-case definition (risk types, thresholds, response playbooks).

3. Freight management

Poor route planning, last-minute shipping premiums, detention fees, and inefficient routing increase fuel use and drive up logistics costs. Detention alone affects about 40% of loads, costing teams $50–$100 per hour on average.

AI and predictive analytics are helping supply chain teams address these bottlenecks, cutting transportation costs by up to 30% and reducing disruptions by 15%.

These tools operationalize real-time and historical data (weather, traffic patterns, port conditions) to dynamically adjust routes and avoid congestion.

How predictive analytics powers reliable freight management at UPS

The company’s ORION system (On-Road Integrated Optimization and Navigation) uses predictive analytics to recommend the most efficient stop sequences and route choices for drivers.

The model dynamically adjusts based on operational constraints: time windows, pickup/delivery patterns, and facility realities like loading dock availability. After a successful pilot, UPS expanded ORION across tens of thousands of routes and paired it with purpose-built navigation.

Tools that use predictive analytics for freight management

Tool	Key features	Notable clients	Advantages	Disadvantages
Descartes Systems	Advanced route optimization, real-time traffic/conditions, multi-stop sequencing, integration with TMS/warehouse systems. Uses predictive logic to anticipate delays and optimize routes.	Large logistics and retail fleets worldwide (Global supply chain deployments; widely used in manufacturing & distribution).	- Very mature enterprise routing and freight optimization with deep integration - Scalable for global operations.	- Often more expensive than standalone tools - Complexity can require dedicated implementation resources.
FarEye	Predictive delivery and route optimization, exception/ETA forecasting, analytics dashboards, real-time tracking.	Companies in retail, e-commerce and CPG (e.g., global brands adopting intelligent delivery systems).	- Focus on last-mile performance and predictive delivery insights - Strong real-time exception handling.	Best suited for last-mile/parcel contexts: may need complementing for full freight or multimodal planning.
Route4Me	Rapid multi-stop route optimization with predictive suggestion of efficient sequencing and dynamic rerouting.	Small/medium fleets, field service organizations, delivery businesses.	- Very easy to implement - Cost-effective and flexible for mid-size operations.	Less robust predictive analytics than enterprise TMS; best for simpler delivery networks.
Verizon Connect	Predictive routing with telematics integration, real-time route completion forecasting, vehicle performance analytics.	Enterprise fleets (transport, field services, logistics operators).	- Strong telematics and route optimization for large fleets - Real-time operational insights.	Can be pricey; advanced features may require targeted configuration.
Samsara	AI-enabled route planning and traffic prediction paired with IoT sensors, live tracking and predictive ETA/exception alerts.	Large logistics/transport customers and enterprise fleets (manufacturing, distribution).	Combines route prediction with rich sensor data for operational visibility; strong mobile/driver app.	Analytics depth depends on data quality and sensor deployment maturity.

4. Simulating scenarios with predictive digital twins

Embedding predictive analytics into digital twins gives planners a living, data-driven simulation of their entire network that anticipates disruptions, tests “what-if” scenarios, and evaluates outcomes before they occur in the real world.

How do supply chain managers use digital twins?

A digital twin is a virtual replica of physical assets, processes, or networks that continuously synchronizes with real-world data to simulate operations, predict outcomes, and optimize decisions across planning, logistics, and execution.

As Paul Narayanan, Chief Transformation and Digital Officer at KENCO, explains:

“Digital twin technology is transforming the supply chain and logistics industry by creating virtual replicas of physical operations that mirror real-time activities, equipment, and workflows. The result is optimized processes and enhanced efficiency.”

Organizations leading in predictive simulations report significant gains: up to 20% improvement in on-time delivery, 10% reduction in labor costs, and 5% uplift in revenue. Access to live data and predictive modeling helps these teams fine-tune distribution center utilization and fulfillment strategies.

How combining digital twins and predictive analytics helped Aliaxis improve supply chain planning

The global piping and fluid-management manufacturer, operating in 40+ countries, built a digital twin of its European network to run simulations and “what-if” analyses before making real-world decisions.

Teams use the model to test alternative network configurations (e.g., distribution-site consolidation), transportation setups, and make-or-buy options, predicting downstream impacts on cost, stock levels, and service outcomes.

After rollout, Aliaxis reported 9% potential cost reduction in total logistics from network and transportation redesign scenarios. Understanding how consolidation affects stock helped reduce inventory, while the same capability compressed decision cycles from months to days.

Tools that help build digital twins with predictive analytics for simulating operations

Tool	Key features	Notable clients	Advantages	Advantages
anyLogistix (ALX)	- Supply chain digital twin simulation - Real-time data integration - Bottleneck prediction - Scenario analysis - Risk and transportation planning	Used by large manufacturers and supply chain planners (e.g., Infineon, Amazon, GSK in simulation case contexts via AnyLogic/anyLogistix.	Strong supply chain focus, rich scenario testing & risk analytics; integrates with SCM/ERP for predictive insights.	Strong supply chain focus, rich scenario testing & risk analytics; integrates with SCM/ERP for predictive insights.
AnyLogic and AnyLogic Cloud	- General-purpose simulation with digital twin capability supports agent-based, discrete event, system dynamics - Integrates real data for predictive simulation.	Used by consultancies and enterprises for supply chain forecasting (e.g., exercise equipment brand order-to-delivery twin).	Very flexible simulation paradigms; industry use cases across supply chain, logistics, and manufacturing.	Very flexible simulation paradigms; industry use cases across supply chain, logistics, and manufacturing.
RELEX Digital Twin	Integrated digital twin for supply chain forecasting, inventory optimization, scenario planning, demand/replenishment simulation.	Vita Coco built a digital twin for global supply chain optimization.	Deep supply chain planning integration; built-in scenario & inventory predictive modeling.	Deep supply chain planning integration; built-in scenario and inventory predictive modeling.
Siemens Digital Logistics/Digital Twin Solutions	Logistics/supply chain mapping and virtual experimentation with predictive scenario simulation; integrates operational data for planning.	Shared across large industrial/logistics sectors via Siemens digital logistics clients.	Strong integration in manufacturing/industrial ecosystems, combined with IoT data streams.	Strong integration in manufacturing/industrial ecosystems, combined with IoT data streams.
SAP Digital Twin / IBP Extensions	Digital twin concepts embedded in SAP Integrated Business Planning for simulation of network, demand/supply behaviors, and what-if scenarios.	SAP's large-enterprise customer base (retail, manufacturing).	Built into existing SAP landscape; strong governance for planning and predictive simulation.	Built into existing SAP landscape; strong governance for planning & predictive simulation.

Timeline and cost considerations for predictive analytics adoption in supply chain management

Phase 1: Use-case selection

Project timeline: 0-2 months since kick-off

Steps to take: Quantify the cost and impact of supply chain decisions by translating planning outcomes into clear financial consequences using existing data.

For each decision you want to improve: how many SKUs to order, when to expedite, which supplier to choose, start by measuring historical error: how often the decision went wrong and what it caused (excess inventory, stockouts, late deliveries, premium freight).

Then attach unit costs: carrying cost per unit per month, lost margin per stockout, expediting cost per shipment, penalty fees, or wasted labor hours.

To estimate the impact of predictive analytics, model a conservative improvement (e.g., 10–15% reduction in forecast error or fewer late supplier deliveries) and convert that delta into annualized savings or revenue protected.

Cost considerations: Primary costs come from internal time: supply chain leaders, planners, finance, and IT aligning on decisions, data availability, and success metrics, with minimal external spend beyond light advisory support if needed. It’s best to avoid software purchases, large data work, or model development at this stage.

When the phase is successful: Phase 1 is successful if you leave with a clear business case, defined owners, and quantified ROI assumptions, without committing capital prematurely.

Phase 2: Building the data foundation

Project timeline: 2-5 months since kick-off

Steps to take: After selecting a high-yield use case, prepare the data that prediction models will use.

Data engineers pull the required data (order history, inventory positions, lead times, shipment events, etc.) and run basic validation, reconciling mismatches across systems, removing noise (outliers, duplicates, missing periods), and reality-checking against event logs.

To operationalize this data, the team sets up a repeatable pipeline with clear ownership and refresh frequency, ensuring inputs can reliably feed pilots and future scaling without manual intervention.

Cost considerations: Most spending comes from data engineering time to extract, reconcile, and reshape data. Infrastructure costs include cloud storage and compute for repeatable pipelines, plus limited tooling for integration or data quality checks.

When the phase is successful: Phase 2 is complete when you can reliably produce a decision-ready dataset that is updated on schedule, requires no manual work, and accurately reflects business operations.

Phase 3: Modeling and pilot execution

Project timeline: 5-10 months since kick-off

Steps to take: Once the team has validated high-quality data, these inputs are transformed into predictions that leaders can trust and test in the real world.

At this stage, machine learning engineers build or configure predictive models for the chosen use case, train them on historical data, and benchmark performance against business-relevant metrics.

Metrics for assessing predictive model performance

Forecast error: a measure of how far predicted demand or volume deviates from actual outcomes at the decision level (e.g., SKU × location × time), typically expressed as a percentage or absolute difference.

Accuracy of delay-risk predictions: a measure of how well a model correctly identifies shipments or suppliers that will be late, usually assessed by comparing predicted risk scores against actual delays using metrics like precision, recall, or hit rate.

The model is then deployed on a small pilot, limited to a specific region, product set, or lane. Before scaling the model, compare predictions against current planning methods, planner actions, and measure their impact on cost, service, or risk.

Cost considerations: Main expenses include data science and analytics engineering time, compute resources for training and testing, and (if buying rather than building) software licensing for forecasting or ML platforms.

Costs can rise quickly as pilot scope expands, so limit this phase to a clearly defined segment and avoid over-optimizing before business impact is proven.

When the phase is successful: the pilot stage is complete when predictive models consistently outperform current planning methods on real data and demonstrate measurable impact in a live pilot without increasing planner workload.

Cut forecast errors, reduce costs with tailored predictive analytics solutions

Xenoss helps supply chain teams deploy and scale predictive analytics pilots scoped for measurable ROI.

Talk to our team

Phase 4: Scaling the pilot to deliver organization-wide value

Project timeline: 11-15 months since kick-off

Key steps: While small-scale pilots should generate ROI within months of deployment, the true operational impact emerges when model outputs are embedded into core planning and execution systems (ERP, S&OP, replenishment, TMS).

Once predictive analytics is part of the supply chain stack, it influences parameters like reorder points, production quantities, and routing priorities, creating a measurable impact across the flow.

To ensure standardized deployment, define clear automated and semi-automated decision rules that effectively allocate planner time. Make sure to establish governance, monitoring, and KPIs to ensure the system consistently supports new product lines, regions, and use cases.

Cost considerations: At this stage, the largest expenses are tied to connecting predictive models to core systems, building workflows and decision rules, and training teams to trust and act on outputs.

Platform, compute, and model-maintenance costs become recurring.

This phase also delivers the highest ROI because spend is tied directly to operational adoption and scaled impact, not experimentation.

When the phase is successful: a predictive analytics implementation is a success when insights are automatically embedded into daily planning and execution, drive consistent decisions at scale, and require little to no manual oversight.

Bottom line

The companies in this article didn’t transform overnight. They picked one problem, proved predictive analytics could solve it, and scaled from there.

Which supply chain decision is costing you the most when it’s wrong? That’s where to start.

The post Predictive analytics in supply chain management: Implementation roadmap appeared first on Xenoss - AI and Data Software Development Company.

10 AI trends that will shape 2026: market signals, technical predictions, adoption strategies

Maria Novikova — Mon, 12 Jan 2026 18:33:45 +0000

If 2025 taught us anything, it’s that nothing about AI is set in stone. Hardly anyone anticipated the release of DeepSeek and the ripples it sent across the industry.

OpenAI, despite starting the year strong with o3, is now risking losing the LLM market leader title. AI labs shuffled staff, released new models, and made trillions of dollars of investments, hinging on a very uncertain future.

In this post, we are taking a closer look at what that future might look like.

Based on our experience in AI research and development, hundreds of hours in meetings with organization leaders, and our understanding of the market, we defined 10 trends that are set to shape the trajectory of machine learning in 2026.

Enterprise adoption

1. Value of AI generalists in the workplace rises

Why this is likely

LinkedIn members added 177% more AI literacy skills since 2023, nearly 5x faster than overall skills growth.

AI adoption is expanding across organizations: over 60% of companies now use AI in multiple functions, with more than half using it in three or more areas.

AI assistants are blurring the boundaries between workplace functions. Teams that once relied on IT for automation tools or dashboards can now build internal platforms with minimal engineering support.

Creative departments that previously coordinated with regional offices for translations can handle localization themselves.

As these capabilities expand, companies will increasingly prioritize generalists who understand how AI systems work and can deploy agents effectively.

“Generalists aren’t unfocused. They’re integrators, they understand context, connect dots, and help teams move faster with fewer people.”

Liam Darmody, Product Manager at With Curious Growth

2. Orchestration will become a bigger focus area than model intelligence

Why this is likely

65% of enterprises run 2+ paid models plus at least one open-source model, averaging three models concurrently.

Operational controls, not model intelligence, are the main bottleneck in workplace AI adoption. Gartner expects 40%+ of agentic AI projects to fail by late 2027 due to cost, value, or risk management issues.

Early adopters report 20–30% faster workflows with orchestrated multi-agent solutions. In these organizations, insurance claims processing improved by 40% in speed and 15 points in NPS.

Before 2025, the AI community debated whether smarter but slower models were preferable to faster but less capable ones.

Most machine learning engineers favored intelligence, and research followed suit.

Now that state-of-the-art LLMs solve PhD-level math problems and assist world-class programmers, orchestration, not raw capability, has become the bottleneck.

In most organizations, AI tools remain siloed from legacy systems and are poorly integrated.

The focus for 2026 will be on building orchestration layers that unify these tools and combine smaller, energy-efficient models to automate complex, end-to-end workflows, such as invoice processing.

“If you’re following the rise of AI agents, here’s the one idea that separates toy systems from production-grade intelligence. The orchestrator is the real “brain” of a multi-agent system, not the LLM. It decides what to do, when to do it, with which tools, and how each agent’s output flows into the next step.”

Ashish S K, Cloud and AI Architect

AI-powered multi-agent system

Orchestrated invoicing solution that extracts, validates, and processes financial documents across multiple formats, achieving 99% accuracy in automated invoice handling

Read the full success story

3. The Chief AI Officer title will go mainstream

Why this is likely

Chief AI Officer adoption jumped from 11% (2023) to 26% today, with 66% of CAIOs expecting widespread adoption within two years.

33% of organizations now have a CAIO, and 44% believe they should appoint one, indicating rapid formalization of AI leadership.

Major enterprises like General Motors, UBS, and Expedia Group have appointed CAIOs to drive scaled adoption and unified AI strategies.

Executives are no longer content with AI pilots confined to narrow workflows or single departments.

They want to scale new technologies across the entire organization. Proving this point, in December, Accenture partnered with Anthropic to bring Claude to its 30,000+ employees, and companies across industries are following suit.

The challenge ahead is the absence of a dedicated function to guide implementation, ensure secure rollouts, and build AI literacy organization-wide.

These responsibilities currently fall to CTOs, CIOs, and CFOs, but a new role is emerging: Chief AI Officer.

LinkedIn, General Motors, UBS, and other global organizations have already hired CAIOs to help transition operations from AI-assisted to AI-native.

Their core responsibilities typically include developing company-wide AI adoption strategies, identifying high-yield use cases and evaluating ROI, coordinating the pace of adoption across teams while providing learning resources, and establishing security practices and governance playbooks for deploying AI copilots and agents.

Whether Chief AI Officer becomes a permanent title will depend on how committed organizations are to structured, AI-enabled hyperautomation, and not every company will get the balance right.

At its best, the CAIO connects technology, strategy, and ethics. At its worst… it’s a title created so nobody argues about who’s in charge of the chatbot.

Agus Sudjianto, Senior Advisor, McKinsey & Company

Regardless, by the end of 2026, the market will already have an idea of what makes a successful CAIO, which will make pushing the title into mainstream even easier.

4. Google takes over OpenAI as the LLM market leader

Why this is likely

80% of AI developers now consider both Google Gemini and OpenAI GPT/o, with Gemini gaining ground while OpenAI’s consideration remains flat.

Enterprise market share shifted dramatically: OpenAI fell from 50% to 27% (2023–2025), while Google climbed from 7% to 21%.

Gemini referral traffic grew 388% year-over-year versus ChatGPT’s 52%, making Gemini a major web entry point for LLM users.

Despite ChatGPT remaining the most popular LLM, Google is starting 2026 strong and growing faster than OpenAI. In 2025, ChatGPT’s monthly active users grew by 6%, while Gemini’s user base increased by 30%.

OpenAI’s latest GPT-5 releases received a tepid reception, while Gemini 3’s widespread praise prompted Sam Altman to declare a “code red” and refocus resources on the next generation of models.

Gemini 3 had a stronger response from users than GPT-5.2

Google also holds a significant advantage over its rivals: distribution. By integrating Gemini directly into Search and Workspace, Google generates millions of interactions per second. As the model gains experience, its reasoning improves—creating a data flywheel that enhances performance without additional training.

With the underlying technology becoming somewhat undifferentiated, an application war is in store. OpenAI has a lead with ChatGPT, which is nearing 900 million weekly users, but Google has a distribution advantage. At this point, it’s anyone’s fight.

Alex Kantrowitz, founder of Big Technology

With all of this compound advantage, Google is poised to become the LLM market leader by 2026.

5. Agentic web takes shape alongside traditional Internet

Why this is likely

AI platforms drove 1.1B+ referral visits in 2025.

OpenAI’s Operator demonstrated 87% success on live websites and 58% on complex web tasks, proving agents can handle end-to-end web workflows.

AI bot traffic to publishers surged from 1 in 200 visits to 1 in 50 by Q2 2025, with 13% bypassing robots.txt restrictions.

2025 was the year of AI agents. OpenAI and Anthropic released Operator and Claude Code early in the year, proving that LLM-powered agents could successfully navigate browsers and system files.

SaaS companies like Salesforce, Atlassian, and Notion followed with agentic assistants, while enterprises built custom agents to automate internal operations.

The Internet’s evolutionary timeline: from the PC era to the agentic web

Yet despite efforts to standardize how agents interact with data sources through protocols like MCP, their capabilities remain limited by a web designed for humans.

A fully functional “internet for agents” is unlikely by year’s end, but tech companies will take steps in that direction, and may even collaborate on a unified navigation layer.

In practice, the emerging agentic web could work like this:

Humans use AI agents as their gateway to the web rather than switching between sites
Agents navigate website backends through APIs or communication protocols
Agents communicate with each other to automate end-to-end tasks like booking flights or grocery shopping.

This agentic web will eventually evolve into a flatter, more decentralized internet, diminishing the dominance of search engines like Google and superplatforms like WeChat.

For companies like digital media, the shift from humans to agents navigating the web will create the need for engaging audiences through other channels, like social media or widely used apps.

My 2026 prediction for digital publishers: The agentic web will require a massive recalibration of audience strategy around the reality that a growing number of visitors are AI agents/bots, not humans.

Jordan Muller, SEO Editor at Politico

6. The regulatory landscape for AI becomes more organized

Why this is likely

63% of enterprises now have AI-use policies, with 60% integrating AI risks into enterprise risk management. Among those, 79% monitor AI reliability against legal and policy standards.

At the board level, 53% are developing responsible-use policies, and 24% each are conducting regular AI audits or implementing formal AI risk frameworks.

In 2025, legal controversies around AI chatbots shifted from IP disputes with musicians and film studios to murkier territory. In early 2026, Google-owned Character.ai settled a lawsuit with the family of a teenager who used the platform to plan his suicide.

No explicit regulation yet establishes liability for LLMs in such tragedies, but as similar cases draw attention, regulators will face pressure to respond.

Defamation is another unresolved area, or as The New York Times puts it, “Who pays when AI is wrong?”

In the article, Reporter Ken Bensinger covered the case of Wolf River Electric, a Minnesota solar contractor that saw contract cancellations spike after Gemini-powered AI Overviews falsely accused the company of settling a lawsuit over deceptive sales practices.

The founders sued Google for defamation to recover financial and reputational damages.

For now, the US is taking a hands-off approach to AI regulation, but as public adoption expands and stakes rise, tighter legal control seems inevitable. The EU has already scheduled detailed guidelines on high-risk AI applications for early 2026.

The European Commission is set to split the AI Act guidelines on high-risk AI systems, according to a presentation shared with member states today. The guidelines on how to classify AI systems remain on track for publication by Feb. 2, 2026. However, the AI Office is now planning a separate set of guidelines covering high-risk obligations, substantial modifications, and the AI value chain, expected in the second or third quarter of 2026.

Luca Bertuzzi, Chief AI Correspondent at MLex

7. The distinction between “traditional SaaS” and “AI products” will blur

Why this is likely

Budget is shifting toward “AI-native” categories fast. Zylo’s 2025 SaaS Management Index reports AI-native app spending surged 75.2% YoY.

In MENA, 43 existing tech ventures rebranded as “AI startups,” while only 33 major companies were new AI ventures

To capitalize on the rise of generative AI, leading SaaS startups started embedding GPT-like features and agentic assistants into their offerings. That helped big industry names like HubSpot, Salesforce, or Webflow retain their user base, but the growth of native-AI startups like Lovable and Replit has been a lot steeper.

In 2025, AI-native companies accounted for the majority of all raised funding. Big tech companies are also aiming for an AI-native rebranding – not so long ago, Microsoft permanently changed its name from “Microsoft 365” to Microsoft 365 Copilot.

This year, companies still associated with the traditional SaaS market will have a tough choice to make: should they undergo a full revamp towards an AI-native user experience or risk irrelevance as AI-first teams take over the market?

SaaS and agents merge completely in 2026. Every SaaS product becomes an agent platform, and every agent platform builds SaaS features. The ones that don’t adapt die or get bought for pennies.

Gren Isenberg, CEO of a holding company, Late Checkout

Technical predictions

8. Physical AI will become the buzzword of 2026

Why this is likely

Amazon deployed its one millionth robot and launched DeepFleet, a genAI model targeting 10% efficiency gains across 300+ facilities.

Physical AI adoption in manufacturing is set to jump from 9% to 22% within two years, making it a key boardroom theme.

Figure’s $1B+ raise at $39B valuation shows investors treating humanoid robotics as the next major platform category.

In 2025, LLMs got a reality check, and confidence in scaling laws as the path to AGI began to waver. Now the spotlight is shifting to physical AI as the next frontier.

At CES 2026, AI-powered robots had a commanding presence. NVIDIA unveiled Alpamayo, a family of AI models that will support autonomous vehicle training through real-world data loops and integrated simulation.

Hardware leaders Samsung, Hyundai, and LG presented intelligent robots and home assistants capable of everyday tasks like laundry and meal preparation.

Humanoid robots showed significant progress as well. Boston Dynamics announced that its long-awaited Atlas robot is moving from prototype to product and unveiled an improved design.

Like many humanoids presented at the event, Atlas will have an AI brain. Boston Dynamics is partnering with Hyundai and Google DeepMind to build the model powering it.

With both AI and robotics finally reaching consumer-ready thresholds, physical AI solutions may explode by year’s end. McKinsey estimates the general-purpose robotics market will exceed $370 billion by 2040, and experts expect physical AI to be significantly more impactful than run-of-the-mill LLMs.

Think about all the vehicles and machines you see every day. Now imagine all of them being smarter than ChatGPT. AI has already made a huge impact on our daily lives. But that impact is only going to be magnified as intelligence makes it into the physical world.

Qasar Younis, founder, Applied Intuition

9. AI coding agents will be able to run autonomously for over 20 hours

Why this is likely

Anthropic’s Claude Sonnet 4.5 can maintain focus for 30+ hours on complex coding tasks, though this isn’t yet a mainstream capability.

Frontier models now handle tasks with 110-minute completion horizons, with that duration doubling every 7 months since 2019, according to NeurIPS research.

In 2025, large language models made major strides in coding with Anthropic’s Claude Code and OpenAI’s Codex. Yet they remained unreliable over long sessions, accumulating errors faster than a tired human engineer would.

Improving coding agent autonomy is a priority for AI labs—longer unsupervised operation would allow enterprises to automate more complex end-to-end assignments.

The length of tasks AI can handle is doubling approximately every seven months

According to METR, a leading AI evaluation lab, the duration coding agents can run autonomously appears to be doubling every seven months.

As of November 2025, Claude 4.5 could work independently for 4.5 hours. If that pace holds, by late 2026, we could see AI engineers completing up to 20 hours of work with minimal human supervision.

Build intelligent AI agents you can govern, scale, and trust with the help of Xenoss engineers

Explore what we offer

10. AI and data stacks merge into a single unified stack

Why this is likely

Databricks signed a $100 million agreement with OpenAI to natively integrate models into its platform, merging data and AI layers.
Microsoft Fabric’s unified AI-data platform grew 75% YoY to 19,000+ customers in 2025.
Snowflake reports 6,100+ accounts using AI tools weekly, driving 50% of new logos and 25% of use cases.

Despite data being the engine of AI projects, data engineering and machine learning stacks have historically developed separately. The “modern data stack” handled ingestion, storage, transformation, and BI, while the “AI stack” focused on deploying models and agentic applications.

That distinction may soon disappear. In 2025, Fivetran and dbt Labs merged into a single toolset for data transformation and AI modeling. Databricks, now valued at over $1 trillion, has successfully championed a unified data, AI, and governance ecosystem.

By year’s end, data and AI engineers expect more mergers and restructuring among data engineering companies, along with vendors adding AI-specific features like agent observability, tagging, and evals to their platforms.

While the ecosystem feels notably more mature, we’re still in the early days of a truly AI-native data architecture. We’re excited by ways AI can continue to transform multiple parts of the data stack, and we’re beginning to see how data and AI infrastructure are becoming inextricably linked.

Jason Cui, partner at Andreessen Horowitz

Bottom line

Mirroring 2025’s dynamic, we expect AI to develop somewhat unevenly in 2026.

Researchers and frontier labs are likely to keep racing towards smarter and cheaper models, though there may be a shift of attention to hardware-based solutions (in fact, most leading AI companies have prototypes in that area).

On the other hand, enterprise organizations will be slower on the uptake and will prioritize proven ROI over technical innovation.

Broader market trends, perhaps, remain the hardest to predict. It’s unclear when or if the AI bubble bursts, how strong public opposition to widespread AI adoption will be, or what impact the opaque AI regulations we currently have will have on vulnerable populations.

To successfully navigate this landscape, leaders should keep a pragmatic approach and commit to transforming their organizations in the highest-yield areas first, then gradually shift from AI-assisted to AI-native organizational makeup. This way, companies will be able to both harness the value of AI technology and stay protected from possible market turbulence.

The post 10 AI trends that will shape 2026: market signals, technical predictions, adoption strategies appeared first on Xenoss - AI and Data Software Development Company.

2025 in review for AI: Releases, successes, and failures of the year

Maria Novikova — Fri, 19 Dec 2025 13:57:29 +0000

Reflecting on the state of AI in 2025 feels unusual because of the hyper-optimistic view we entered the year with (think about Dario Amodei’s prediction that 90% of code will be AI-generated) and the sober reckoning the AI community experienced in the latter half of 2025.

Technically, LLM capabilities improved across the board. We got smarter coding models, improved data processing, longer focus times, better image generation, and excellent video generation.

AI agents, although not new, have found their place in the enterprise, and more companies now have a vision for specific use cases where AI agents can provide support.

At the same time, halfway through the year, it became clear that “AGI by 2027” predictions are too far-fetched. Despite improvements, models continued to hallucinate and make embarrassing mistakes, making it harder to imagine AI reliably running any complex process end-to-end.

As the AI community had to accept the reckoning, fear started creeping in on whether the global economy is not putting too much stock in the AI bubble and what the world will look like if that bubble collapses.

This review covers what mattered most in 2025: releases, wins, and risks of AI adoption in the enterprise, the state of the talent market, and the global impact of the AI explosion.

1. Anthropic and Google caught up to OpenAI

At the start of the year, OpenAI’s GPT o3 was one of the most powerful chain-of-thought models.

But by the end of the year, OpenAI no longer holds a decisive technical lead. Google and Anthropic caught up to the AI race with powerful models.

At the time of writing, Gemini 3, GPT-5.2, and Claude 4.5. seem to be locked in a stalemate when it comes to agentic task completion, coding, multi-modal generation, and document processing.

On the other hand, Amazon, Meta, and Apple have fallen behind and not made meaningful LLM contributions this year.

The table below recaps the top large language models released by three leading AI labs in 2025 and the impact of each on the development of machine learning.

Date	elease (lab)	What changed	Market impact on GenAI growth
Jan 31	o3-mini (OpenAI)	Cheaper “reasoning-tier” model	Put reasoning into high-volume, cost-sensitive production workloads
Late Jan	R1 (DeepSeek)	Cost-disruptive reasoning baseline	Forced a price/performance reset and intensified “efficiency race” narratives
Feb 19	Grok 3 (xAI)	Frontier entrant + “search/deep research” style workflows	Increased competitive cadence; broadened distribution-driven adoption pressure
Feb 24	Claude 3.7 Sonnet (Anthropic)	Hybrid “fast vs extended thinking” control	Normalized reasoning as a user-controlled dial for coding/analysis workflows
Feb 27	GPT-4.5 (OpenAI)	Compute-heavy flagship iteration	Reinforced frontier pace while highlighting the cost of pure scaling
Feb 27	Hunyuan Turbo S (Tencent)	Latency-first optimization	Strengthened the bifurcation: ultra-fast assistants vs deep reasoning models
Mar 16	ERNIE 4.5 + ERNIE X1 (Baidu)	Multimodal and “deep thinking” lineup	Increased China-side competitive intensity; pushed price/perf competition
Mar 25	Gemini 2.5 Pro (Experimental) (Google)	“Thinking model” positioning	Re-anchored expectations: top-tier models must ship with deliberation modes
Apr 05	Llama 4 (Scout, Maverick) (Meta, open-weight)	Multimodal and MoE at scale	Expanded supply and down-market availability; pressured closed-model pricing
Apr 14	GPT-4.1 (mini, nano) (OpenAI)	Developer-oriented family and smaller tier	Made “model families” (cost/latency tiers) the default procurement pattern
Apr 16	o3 + o4-mini (OpenAI)	Production-grade reasoning and tool use	Raised the baseline for agents: multi-step execution over chat quality alone
May 22	Claude 4 (Opus 4, Sonnet 4) (Anthropic)	Next-gen coding/agent focus	Escalated “agentic coding” competition and sped up enterprise adoption
Jun 17	Gemini 2.5 Pro (GA on Vertex AI) (Google)	Enterprise hardening and cloud distribution	Reduced deployment friction in regulated orgs; accelerated “procure-and-deploy.”
Aug 07	GPT-5 (OpenAI)	Default “adaptive reasoning/router”	Made adaptive reasoning a mainstream expectation (and raised buyer scrutiny)
Nov 12	GPT-5.1 (OpenAI)	Post-flagship iteration	Compressed release cycles; normalized continuous model upgrades as a market norm
Nov 18	Gemini 3 Pro (Google)	Flagship jump and agentic narrative	Rebalanced late-year leadership perceptions; leveraged Google distribution
Nov 24	Claude Opus 4.5 (Anthropic)	High-end “deep work” coding/agents	Tightened the “best model for coding/agents” race; encouraged multi-model stacks
Dec 02	Nova 2 (AWS)	Bedrock-native general models	Strengthened hyperscaler-first buying: models inside existing cloud controls
Dec 11	GPT-5.2 (OpenAI)	Further GPT-5-line iteration	Reinforced frontier models as continuously deployed product lines
Dec 17	Gemini 3 Flash (Google)	Fast/cheap tier with strong baseline	Expanded addressable use cases via latency and cost, intensifying price pressure

It’s fascinating to think about how much the approach AI labs take to building frontier models has changed since the first LLM release of 2025 (o3-mini).

With Claude 3.7 as the trendsetter, LLMs started giving users more control over how long a model should think on a query. Now, AI labs allow users to enable or disable “Extended thinking” that encourages LLMs to think “deeper” about the prompt.

Another area where AI labs have leaped astronomically is context windows. Gemini 3 Pro and Claude 4.5 Sonnet have a context window of 1 million tokens, GPT-5.2 supports up to 400,000 prompt tokens.

Now that there are fewer concerns over LLMs’ capability to digest high data volumes, enterprise teams can train off-the-shelf models on higher volumes of corporate data without necessarily requiring a separate RAG module.

Do large context windows make RAG useless?

Large context windows change the reason teams would use RAG, but do not make it useless. Even with 200k–1M tokens, you still can’t reliably “stuff” an enterprise’s full, fast-changing knowledge base into a prompt, and longer contexts can increase cost and the risk of the model focusing on irrelevant or conflicting passages.

RAG is still a practical way to keep answers grounded in fresh, permissioned, auditable sources while limiting the model’s input to the most relevant evidence.

Another important shift is how significantly the focus on LLM performance has shifted towards engineers. OpenAI’s release of 4.1. was API-only and marketed as an “improved coding model”.

When launching o3 and o4, Sam Altman’s team also focused on math, science, and coding benchmarks to prove the excellence of these models.

In the same vein, Anthropic didn’t implement image and video generation – instead, the company positioned Claude 4 as the “world’s best coding model”, capable of not losing focus on long-running tasks and multi-step agentic workflows.

Google also emphasized improved agentic coding skills in Gemini 3 Pro documentation and increased the context window size to let teams feed entire code repositories to the model.

This positioning tracks with where enterprises see the fastest, most defensible ROI: software delivery, workflow automation, and operational copilots. But it also creates a perception risk. When labs optimize their narratives around engineering benchmarks, non-technical users can read it as a deprioritization of writing quality, creativity, and broader “everyday” usefulness.

The takeaway: By the end of 2025, frontier LLM development looked less like a single-lab advantage and more like convergence across three major players.

Differentiation shifted toward product strategy and distribution, including reasoning modes, cost and latency tiers, context scale, and enterprise deployment controls.

2. Open-source LLMs went mainstream

Before this year, there were only a handful of open models capable of rivaling GPT, Claude, and Gemini in evaluations, with Mistral and Llama model families leading the landscape.

However, after DeepSeek R1 was released on January 20th, 2025, and took over the LLM community, open-source models became so influential that even SOTA AI labs had to admit to being on “the wrong side of history”.

Following high demand from engineers, AWS, Google Cloud, and Microsoft Azure added the model to their offerings, allowing teams to comfortably add it to their AI products.

Throughout the year, the open-source boom continued, mostly led by Chinese AI labs. Out of US-based models, GPT-oss was the most powerful open-source model released in 2025, though the AI community argued it tied Kimi K2 on most benchmarks.

Release date	Model (org)	Type	Notable sizes (as released)	License/weights	Why it mattered
Jan 20	DeepSeek-R1 (DeepSeek)	Reasoning LLM (open-weights)	R1 (family release)	Open-weights (public)	Major “open reasoning” moment that intensified price/perf pressure on closed frontier labs.
Apr 5	Llama 4 Scout / Maverick (Meta)	Natively multimodal, open-weight	Scout, Maverick (Meta “herd”)	Open-weight (Meta license)	Put strong multimodal open weights into builders’ hands and raised the baseline for what “open” can do.
Apr 28–29	Qwen3 family (Alibaba)	Open-source LLM family	Dense: 0.6B–32B; MoE: 30B/235B (A22B) (as listed by project)	Apache 2.0 (open-source)	Scaled open models across many sizes and reinforced open-source as a serious default for production deployments.
Mar 24	Qwen2.5-VL-32B-Instruct (Alibaba)	Vision-language (open-source)	32B	Apache 2.0	Strengthened open multimodal options for doc/vision workflows without relying on closed APIs.
Mar 26	Qwen2.5-Omni-7B (Alibaba)	Multimodal and voice (open-source)	7B	Apache 2.0	Brought “GPT-4o-style” multimodal I/O (incl. audio) into the open-source ecosystem
Jul 23	Qwen3-Coder (Alibaba)	Coding model (open-source)	(Reported as Alibaba’s most advanced open-source coding model)	Open-source release (weights public)	Escalated the open-source coding arms race and increased competitive pressure on closed coding assistants.
Jun 2025	Mistral Small 3.2 (Mistral)	General LLM (open-weight)	Small 3.2	Open-weight	A practical “deploy everywhere” open model tier for enterprise cost/latency constraints.
Dec 2	Mistral Large 3 / Mistral 3 (frontier open-weight family) (Mistral)	Frontier open-weight	Large 3; additional open models (as listed)	Open-weight (per Mistral)	Strengthened Europe’s position in open-weight frontier models and widened enterprise alternatives to US closed vendors.
Dec 15	Nemotron 3 (Nano released first) (NVIDIA)	Open-source model family	Nano (released), larger variants announced	Open-source (as reported)	Added a credible US-based open-source option positioned for efficiency and multi-step tasks, amid demand for “non-China” open models in government/regulated settings

Besides adding variety to the roster of AI models, the open-source explosion shook the standard foundations of generative AI.

Discovery #1: Frontier-level training no longer requires frontier budgets

DeepSeek directly challenged the belief that state-of-the-art performance demands massive teams, proprietary pipelines, and multi-billion-dollar compute clusters. The team reported training costs of approximately $294,000, a negligible figure compared to the estimated $250 billion collectively invested by US-based labs in AI infrastructure in 2025.

Discovery #2: Keeping the codebase private doesn’t help protect AI safety

Before 2025, many AI leaders cautioned against open-sourcing large-language models, arguing that doing so would increase the risk of misuse.

Open models largely undermined that position. Once high-performing weights, fine-tunes, and tooling are widely available, the marginal safety benefit of a single lab keeping its models closed diminishes sharply. Capable systems can be reproduced, adapted, and deployed well outside any one organization’s control.

Closed models can still reduce risk through stronger platform controls and faster patching compared to open-source models, but “closed by default” is no longer a credible standalone safety argument in a world where open alternatives like DeepSeek and Kimi K2 already meet many real-world use cases.

The takeaway: In 2025, open-source LLMs crossed the point of no return: once models like DeepSeek proved that frontier-level performance, low training costs, and cloud-native deployment could coexist, “open” stopped being an alternative and became a default option for builders.

The growth of the open ecosystem put structural pressure on closed labs, and we may be entering the era where capability diffusion, not code secrecy, defines the generative AI landscape.

3. MCP became the number-one agentic connector

In 2024, Anthropic released Model Context Protocol, an open standard that helps connect AI agents to external tools like GitHub, Figma, and others. This year, it went from a niche technology to a universally accepted industry standard.

In March, instead of building a proprietary alternative, OpenAI used MCP to connect its model to external data sources. In April, Google followed suit, and MCP became the universal framework that top models use to connect their agents to other tools.

By the end of the year, MCP adoption surpassed that of tools with a similar purpose (e.g, LangChain).

In 2025, MCP adoption outpaced LangChain, LangGraph, and OpenAI’s API

At the time of writing, Anthropic lists over 10,000 active MCP servers. The protocol is actively adopted by engineers, where the Python SDK now has over 97 million downloads.

On the other hand, as MCP adoption grew, teams became more aware of its limitations. Enterprise companies called out Anthropic for weak authorization capabilities, poor integrations with SSO providers, and high risk of prompt injection.

A recently discovered vulnerability exposed the risks of MCP prompt injection

The takeaway: MCP’s rapid adoption demonstrates how open standards can become infrastructure when ecosystem incentives align. However, its spread exposed critical gaps in enterprise readiness: security, identity, and governance weaknesses that must be addressed before production-scale deployment.

4. GPT-5 fueled a wave of speculation on whether LLMs have “peaked”

On August 7, 2025, OpenAI unveiled GPT-5 with a livestream and a ton of fanfare.

Expectations were unusually high. Among researchers, executives, and the broader public, there was a belief that GPT-5 might represent the next meaningful step toward AGI.

It was not.

During the demo livestream, the plots capturing GPT-5’s superior benchmark performance were mislabeled, and the early rollout was riddled with bugs, ranging from simple math to GPT failing to switch to agent mode.

GPT-5 failed to generate a map of North America and the timeline of US presidents

Despite technically sweeping key benchmarks, the real-world impact of GPT-5 felt a lot less significant than that of other releases we got this year, namely Claude 4.

The reason GPT-5 release still deserves a separate spot on our AI recap is that it changes the way we set expectations for AI models – instead of hoping to reach AGI, teams will be hoping to get well-rounded models that don’t feel “dumb” and drive quantifiable productivity gains.

More releases are going to look like Anthropic’s Claude 4, where the benchmark gains are minor, and the real-world gains are a big step. There are plenty of implications for policy, evaluation, and transparency that come with this. It is going to take much more nuance to understand if the pace of progress is continuing, especially as critics of AI are going to seize the opportunity of evaluations flatlining to say that AI is no longer working.

Nathan Lambert, “GPT-5 and the arc of progress”

The fumbled release of GPT also fueled a different debate: are scaling laws hitting a ceiling?

In 2020, when OpenAI published ‘Scaling Laws for Neural Language Models, ’ the idea that throwing exponentially larger datasets at models would make them exceptionally powerful was quite bold.

However, when OpenAI applied it in practice with GPT-3, and then, even more convincingly, with GPT-4, scaling laws became the guiding principle of LLM training.

Despite throwing more data and compute at newer generations of models with GPT-5, as well as other LLMs, they fail to deliver significant intelligence leaps.

The doubt about the limitations of scaling laws, initially raised by a small group of skeptics (led by Gary Marcus, an AI researcher and author), is becoming mainstream.

Engineering teams are exploring alternative methods for model improvements.

Post-training techniques, reinforcement learning refinements, and fine-tuning strategies that help models better interpret existing data became standard practice. These methods improved reliability and task performance, but none yet matched the transformative impact scaling had earlier in the decade.

The takeaway: Despite significant improvements in coding and math LLMs reached in the beginning of the year, the AI community is looking into 2026 with uncertainty about the future of this technology. It will take a new substantial breakthrough to convince an increasingly skeptical crowd that large-language models are really a bridge to AGI.

5. AI agents became the hottest corporate AI application of 2025

This year, AI agents went from the technology accessible primarily to frontier labs (the technology itself went mainstream in January when OpenAI released Operator) to a practical tool that enterprises adopted to streamline workflows.

The first major agentic releases coming outside of leading AI companies were Agentforce 2dx by Salesforce and Joule Studio by SAP.

Unlike OpenAI’s general-purpose agents, these niche releases cover a smaller list of applications. Salesforce’s agent helps sales, marketing, and customer success manage client tickets and sales pipelines, while SAP Joule Studio offers tools for automating workflows in HR, finance, and supply chain.

By mid-year, it became clear that niche, workflow-specific agents delivered more value to enterprises than general-purpose agents. Constraining scope reduced hallucinations, simplified governance, and made ROI easier to measure.

By December 2025, major Fortune 500 companies will have successfully dabbled in building both internal and user-facing AI agents.

To support growing interest in agentic systems, cloud vendors and data platforms are building an infrastructure to support AI agents.

Databricks empowers enterprise teams with a dedicated toolset for agent development that includes Mosaic AI Agent Framework, Unity Catalog, and built-in evaluation and monitoring tools.

With these services, teams can build agents that safely reason over proprietary data, invoke tools, and operate inside governed production environments.

AWS Bedrock helps enterprises bring agents to production with Amazon Bedrock AgentCore. The platform is a one-stop shop for building, deploying, operating, securing, and monitoring agents at scale. With AgentCore, engineers who host their infrastructure on AWS can connect multi-agent workflows to AWS-native identity, permissions, and data stack.

The takeaway: Agentic systems are still in their early stages, but a powerful infrastructure to help deploy and scale autonomous workflows is developing rapidly.

Companies began seeing first wins from AI agent adoption in increased employee productivity, reduced error rate on manual tasks, and improved cross-department workflow integration.

The next phase will be less about agent novelty and more about disciplined execution, governance, and scaling agents into core business processes.

Build custom AI agents for your business case

Work with our engineers to design, integrate, and deploy agents tailored to your data, workflows, and security requirements

Book a call

6. “Vibe coding” took over no-code and prototyping

When Andrej Karpathy coined the term “vibe coding” in a tweet, he probably anticipated AI-assisted coding to become a trend. Still, it’s unlikely he predicted the speed with which his new term became a buzzword in the AI community.

The concept of “vibe coding” was coined by Andrej Karpathy

In early 2025, tools like Cursor and Microsoft Copilot were already empowering hands-off programming, but the inflection point came in late February, when Anthropic released Claude 3.7 and previewed Claude Code.

Claude Code was no longer just auto-complete. It wrote and read code, edited files, wrote tests, pushed code to GitHub, and used the CLI with minimal human involvement.

Claude Code gave engineers a massive productivity boost, allowing them to build up to four projects at a time, but, at the end of the day, it is still an engineer-facing tool.

Vibe-coding went mainstream when tools like Lovable and Replit gave team managers and entrepreneurs with a layman’s understanding of engineering the power to transform plain-language ideas into ready-to-deploy pilots.

In the year since its release, Lovable has hit 8 million users and has been used by over half of Fortune 500 companies.

Among enterprise companies, tools like Lovable or Replit are rarely deployed for user-facing products or internal tools for organization-wide adoption, but are helpful for prototyping.

I used to bring an idea to a meeting. Now I bring a Lovable prototype.

Sebastian Siemiatkowski, CEO of Klarna

Vibe-coding drives real productivity gains.

As with any trend threatening the status quo of traditional engineering departments, vibe-coding is controversial. Users have reported bugs in their Lovable MVPs and, on one occasion, Replit accidentally deleted a user’s entire database.

Nevertheless, vibe coding is likely to stay because it is already delivering tangible value.

A Forrester Research report found that using agentic coding tools saves enterprise companies up to ~$44.5M in risk-adjusted employee time savings over three years. A different survey showed a 206% ROI uplift following vibe coding adoption and a 50% time-to-market reduction.

According to Lovable’s internal data, a prototype built on the platform saves teams between $50,000 and $90,000 in engineering costs.

The takeaway: Vibe coding was one of the clearest productivity inflection points of 2025, shifting software creation from an engineer-only activity to a rapid, language-driven prototyping capability accessible to managers and founders.

While not production-ready by default, its impact is already measurable in faster time to market, six-figure cost savings per prototype, and enterprise-scale ROI that makes experimentation cheaper, broader, and strategically unavoidable.

7. The MIT study discovered that 95% of enterprise AI applications still bring no impact

In August, MIT-backed NANDA initiative published the “The GenAI Divide: State of AI in Business 2025” report, with one finding particularly standing out.

According to the study, only 5% of enterprise AI pilots bring revenue, while most deliver little to no measurable impact.

You may have seen the MIT study that 95% of generative AI projects fail. I believe this. The challenge isn’t AI itself — it’s the ability to rethink workflows, redesign processes, and operate differently.

Mohamad Ali, SVP and Head at IBM Consulting

It’s a bold number, but the real story is subtler – and in some ways, more damning. The divide isn’t about model quality. It’s about how organisations wrap those models.

On one side sits a shadow economy of employees using ChatGPT, Claude, or Copilot on personal accounts – flexible, cheap, and immediately useful. On the other side sit enterprise AI projects – often custom-built or pricey vendor tools – that collapse under the weight of workflow fit, governance, and brittle, hard-coded logic.

Tony Seale, former Knowledge Graph Architect at UBS, founder of The Knowledge Graph Guys

But not everyone was on board. Several enterprise leaders called the study out on methodology blunders.

Dave Kellogg, Executive in Residence at Balderton Capital, pointed out an overlap of what NANDA presented as the solution to the problem (an “agentic web” for distributed AI with its own focus on building networking agents.

Kevin Werbach, a Wharton professor, highlighted that the 95% claim making headlines was never explicitly mentioned in the study. The closest possible claim is that 5% of respondents successfully implemented custom AI enterprise tools, but that conclusion is not anywhere near as far-reaching as “95% of AI pilots generate zero returns”.

MIT study discovered that 95% of AI pilots don’t deliver tangible outcomes

One of the reasons the MIT study exploded so effectively was that its release overlapped with an underperforming release of GPT-5. As teams were disappointed with the lack of meaningful improvements in the model that marketed itself as a “pocket PhD”, the MIT study further strengthened these concerns.

The takeaway: Regardless of methodological debates, the MIT study succeeded in shifting enterprise AI conversations toward pragmatic deployment strategies. The heightened focus on clear use cases, reliable data infrastructure, and measurable business outcomes represents a healthy correction from earlier hype-driven adoption approaches.

Build enterprise AI products that deliver measurable ROI

We help teams prioritize high-impact use cases, integrate with your stack, and ship production systems that save costs and drive revenue

Explore enterprise AI development capabilities

8. Competition for top-tier AI talent got fierce

This year, AI engineers got celebrity-level treatment, with employment agents, lucrative salary packages, and intense competition from leading AI labs.

Meta’s all-in talent war

Both by the pace of hiring and the pay package generosity, Meta took the lead. In June, Zuckerberg’s team offered up to $100 million in sign-on bonuses to poach OpenAI employees. That same month, Meta acquired a 49% stake in Scale AI at a total price of $19.3 billion and had its founder, Alexander Wang, lead the company’s Superintelligence Labs.

Meta also attempted to acquire The Thinking Machines Lab for $1 billion – Mira Murati, the founder of the startup now valued at over $2 billion, shot down the offer.

Reportedly, Zuckerberg’s key goal was poaching Andrew Tulloch, a former Meta engineer who continued his career first at OpenAI and, eventually, at Murati’s startup. Despite initially turning down Zuckerberg’s offer, in October, Tulloch changed his mind and will be coming back to work on Meta Superintelligence on a $1.5 billion pay package.

If you can’t hire them, acquihire them

Meta was not the only big tech company making waves on the job market, but its competitors took a different strategy.

Instead of poaching top researchers from other AI labs, they strike deals with promising AI startups to add their leading engineers to their teams.

The Google-Windsurf $2.4 billion deal, confirmed in July, was the biggest licensing move of the year. The team behind Windsurf, a vibe-coding agent, was at the time in $3-billion acquisition talks with OpenAI, but the deal fell through.

Google’s counteroffer was not an acquisition but a licensing agreement and a move to poach Varun Mohan and Douglas Chen, the co-founders of Windsurf.

In September 2025, Windsurf was acquired by Cognition and, according to early reports, helped nearly double the company’s ARR.

For big tech, acqui-hiring AI researchers at up-and-coming startups is an intelligent way to keep growing as the AI talent pool is drying up.

But, for enterprise teams looking for reliable AI vendors, the “acquihire boom” unlocked a new fear: “What if the vendor we chose gets acquired?”

Historically, startups struggled to survive after their founders jumped ship. Adept, a robotics company that signed a licensing agreement with Amazon, doesn’t have a product yet and only has four people indicating it as their workplace on LinkedIn.

When shortlisting AI vendors, enterprise companies may need to consider pending acquisition talks. Some startups, like CVector, an industrial AI company, baked “We are not going anywhere” into their positioning and are using stability as a bargaining chip in customer talks.

The takeaway: The 2025 AI talent war turned top engineers into strategic assets, driving unprecedented compensation, aggressive poaching, and a surge in acquihires as big tech competed for a shrinking talent pool.

For enterprises, this shifted vendor risk calculus: technical excellence alone was no longer enough, and organizational stability became a decisive factor in AI partner selection.

9. AI became a national security asset

Now that AI is getting more powerful, world leaders are exploring its impact on defense and global economics.

Steven Adler, a former AI Safety researcher at OpenAI, highlights that AI is on track to become a massive force in the military by helping develop:

New weapon systems: both the US and China are actively exploring autonomous and semi-autonomous military units, often described as intelligent “robot legions”.
Advanced cyber operations: AI-driven attacks capable of targeting high-stakes systems such as power grids, financial infrastructure, or even nuclear command-and-control.
Enhanced intelligence analysis: models that can synthesize fragmented signals intelligence, satellite imagery, and open-source data at speeds beyond human capacity.
Upgrades to existing defense technology: including AI-based image recognition for UAVs, sensor fusion, and stealth optimization for aircraft and naval systems.

In 2025, global world powers took different approaches to integrating AI into global trade and military.

US: continued growth and focus on competition containment

With the release of DeepSeek, Qwen, Kimi-K2, and other Chinese models that now rival SOTA LLMs by performance and reportedly beat them in cost-effectiveness, the American superiority in the AI race started appearing less certain.

To counter the rapid pace of AI research in China, the US government responded with containment strategies and regulations.

In January, a few Chinese AI companies were added to the Entity List to enforce stricter controls over chip export and supply chain intermediation between the countries.

In April, the US tightened controls on the export of NVIDIA H20 chips to China to prevent its number-one geopolitical rival from building state-of-the-art LLMs on American hardware. In December, the US allowed chip licensing but with an added 25% export fee.

Simultaneously, US-based AI labs are closely working with the government to expand AI involvement in security and state management.

In January, the White House issued the Executive Order on Advancing United States Leadership in Artificial Intelligence Infrastructure.

It encourages federal agencies to assist in the development of data centers and energy sources necessary to sustain them, to make sure the US has the resources necessary to build large-scale AI systems.

In June, OpenAI won a $200 million contract for the US Defense Department for building custom models that help solve security challenges in warfighting and supply chain.

Anthropic followed the lead by making Claude available for purchase by federal agencies, launching agreements with national laboratories, and building custom Claude Gov models for national security applications.

China: focus on self-reliance and AI deployment for pragmatic goals

China’s 2025 approach to the AI race is built around ensuring autonomy in core technologies: chips, models, and computing power. The government responded to NVIDIA licensing restrictions with regulations that prioritized domestic AI chipmakers like Cambricon and Huawei over foreign suppliers.

To boost domestic chip manufacturing, China backed several incumbents in the sector (MetaX Integrated Circuits and Moore Threads) with valuation growth and got financial backing from the government and VC firms.

Similar to the US, China also zeroed in on maximizing data center capacity and exploring cheaper compute sources. In December, the government announced the “East Data, West Computing” strategy that plans a state-led build-out of data center clusters and computing hubs in the country’s western regions.

These data centers, coupled with an expanded power grid that enables cheaper electricity, will help process millions of generative AI workflows generated by Eastern China.

Europe: regulation and responsible AI use

Unlike other powers, European leaders decided not to adopt the “move fast” AI development strategy.

Instead, EU nations focused on enforcing hard regulatory milestones under the EU AI Act.

In February 2025, the European Commission issued formal guidelines clarifying prohibited AI uses and followed them up with detailed governance rules and obligations for general-purpose AI (GPAI) models.

Although this cautious stance might help make AI development more sustainable long-term, in the short run, it is hurting European AI innovation.

The State of European Tech survey found that 70% of EU-based founders find the current regulatory environment too restrictive. Others are leaving the region altogether – as was the case for a Dutch messenger company, Bird, that moved most of its business out of Europe due to strict AI regulation.

The takeaway: In 2025, global superpowers realized the need for state participation in AI development, but they are taking different paths to this goal.

In the US and China, governments are actively incentivizing AI development and signing massive agreements to build data centers. In Europe, regulation takes the lead, which helps protect the general population from deep fakes and privacy risks of AI misuse, but it is hindering AI innovation.

10. Concerns about the AI bubble grew stronger

One of the most pressing AI questions that came up in 2025 was: “Are we in a bubble?” Answering this question negatively became harder and harder when Sam Altman himself said he thinks so.

There are indeed multiple signs of the expectations of AI being blown out of proportion, and reasons to worry about what happens when our current technologies do not hit these benchmarks.

Concern #1: Circular financing

Looking into recent investments and partnerships in the AI landscape, it’s clear that billions in financing flows between a small group of companies.

Infrastructure vendors like NVIDIA or Oracle are investing in cloud intermediaries and AI labs like OpenAI, which then reinvest that capital back into chips, compute, and data center capacity. This creates a feedback loop that amplifies market momentum but also concentrates risk.

NVIDIA is wrapping up 2025 as Wall Street’s hottest company, but a closer look at its earnings reveals that 61% of Q3 revenue came from four customers. If these partnerships fall out, NVIDIA is at risk of losing a large fraction of its cash flow and taking millions of shareholders down with it.

Economists have also raised concerns about how this growth is being financed. Morgan Stanley estimates that about 50% of the total $2.9 trillion in AI investment is funded via debt financing. If the bubble bursts, global companies that sign billion-dollar debt contracts can dissolve, as did victims of the 2008 financial crisis.

Concern #2: Adoption lags behind the hype wave

There is a growing expectation-reality gap between the “inevitable AI adoption” agenda AI lab leaders are pushing in media and internal communications and the reality of fairly slow and incremental adoption.

The positive gains of enterprise AI adoption have been widely reported, but they are hardly comparable to the trillions of dollars that tech companies spend on AI infrastructure.

For enterprise customers, scaling AI organization-wide is still a challenge – only 30% of global teams surveyed by McKinsey say they are actively doing so. UBS, one of the leading investment firms in America, has publicly acknowledged this discrepancy, stating that “enterprise AI spend is moving slowly” and “ROI is less clear.”

Right now, market leaders are operating on the hope that the enterprise segment will eagerly adopt the latest technologies, but real-world data is not backing that assumption. Should enterprise demand for AI solutions stay tepid, key AI infrastructure spenders will find themselves between a rock and a hard place when justifying their billion-dollar capex.

Concern #3: Data center ambitions are triggering public concerns

AI labs’ scramble for new energy sources and computing power to keep training the next generation of SOTA models is sending ripples way beyond the AI or data market.

It’s estimated that increased data center build-outs will drive the total US electricity use from roughly 4% to about 12%. Such a steep rise in electricity demand will negatively impact American households, who will shoulder the burden of higher utility bills.

In response to the backlash from local communities, state courts may be forced to pause data center construction projects. In November, the court of Virginia ordered a halt to construction of the Digital Gateway data center. Similar interventions are likely as environmental, zoning, and energy concerns intensify.

Until these tensions are ironed out, the infrastructure spend AI companies are allocating into data centers will be threatened by the uncertainty of political and community-driven friction, further destabilizing the landscape.

The presence of these risks does not mean AI is a dead-end technology. Historically, periods of intense hype often precede durable transformation.

An MIT Technology Review article argues that it’s more accurate to compare the AI bubble to the dot-com era than to the subprime mortgage crisis of 2008. After the dot-com bubble burst, it still left us the Internet and a handful of promising incumbents (Google and Amazon) that defined the modern technological era.

The same may be true for the AI bubble. It’s possible that most AI startups on the market today are not equipped to live through the burst. However, a handful of better-positioned market leaders may become the driving force behind the next age of technological growth.

The takeaway: AI bubble concerns are justified. A: a meaningful share of today’s momentum is being driven by aggressive capital deployment, optimistic timelines, and concentrated bets that can unwind quickly if demand lags.

At the same time, the presence of froth does not negate the underlying trajectory. AI capabilities are already reshaping how software is built and discovered, and the post-correction landscape is still likely to leave durable infrastructure and a new set of “default” interfaces for the future web.

The bottom line

Although the second half of 2025 forced the AI industry to recalibrate its expectations, the year ia still a net positive. The end of GPT dominance in the LLM arena helps level the playing field. It keeps all AI labs focused on improving both technical capabilities and the experience of interacting with models.

The growing penetration of AI agents and vibe coding is the first step towards AI democratization. Though it’s not here yet, we may be looking at a future where building an AI platform will require minimal engineering talent.

There’s uncertainty as to where machine learning as a field should go next if LLMs really hit the ceiling. Researchers already have ideas – world models, neuro-symbolic systems, and cognitive architectures. It’s unclear which of those will power AGI, but ChatGPT itself was the product of a decade of research.

Our takeaway is: while we wait for AI research labs to figure out the path that takes us to AGI, team leaders and employees should focus on making the most out of the tools they have.

Most organizations have barely begun to scratch the surface of custom-made AI agents, intelligent copilots, and predictive analytics. Applying these tools will be transformative for nearly every team, and by the time AI agents in the workplace become commonplace, the next frontier may arrive.

The post 2025 in review for AI: Releases, successes, and failures of the year appeared first on Xenoss - AI and Data Software Development Company.

SVOD, AVOD, or a hybrid model: How streaming platforms can maximize CTV revenue

Maria Novikova — Thu, 04 Dec 2025 17:04:30 +0000

CTV remains one of the fastest-growing revenue channels in digital media. Global CTV (connected TV) ad spend is projected to surpass $42 billion in 2025, and household streaming spend is climbing more than 12% year-over-year.

As spending, viewing hours, and advertiser budgets shift toward CTV, publishers need to choose the right monetization model.

The two dominant CTV revenue paths are:

SVOD (subscription video on demand)
AVOD (ad-supported video on demand).

Each offers massive scale opportunities but comes with operational challenges, retention concerns, and infrastructure requirements.

SVOD continues to expand globally, with households maintaining an average of four paid subscriptions. Markets like MENA are projected to reach $1.5B in streaming revenue by the end of this year.

AVOD is accelerating, too. 90% of European marketers plan to increase AVOD/FAST spending in 2025. Nearly 80% of consumers say they will accept ads if the content is free.

However, neither model is flawless.

SVOD faces rising acquisition friction, declining perceived value, and churn rates reaching 50% among Gen Z and millennials. AVOD deals with fragmentation, measurement gaps, and CTV fraud.

Deloitte Media Trends Survey reports that Americans are spending $70/month on average on streaming services

As publishers aim to balance predictable subscription revenue with scalable ad revenue, the hybrid model is becoming the new standard in streaming.

Netflix, Disney+, and Prime Video have fully integrated AVOD into their streaming experiences, expanding both revenue and user bases.

In this article, we’ll examine how publishers can blend SVOD, AVOD, and hybrid monetization strategies, compare the costs and benefits of both, and offer an actionable roadmap publishers can follow to monetize streaming services.

Why SVOD gives CTV publishers a strategic advantage

For web publishers, a shift to paywalls and subscriptions came with considerable friction. Industry surveys show that only 17% of readers pay for news media, and 83% simply move on to a free source covering the topic when they hit a paywall.

Despite the headwinds, news publishers are committed to subscriptions because the upside is much higher. Even though web publishers report online traffic decline since paywall adoption, 76% still saw higher reader revenue, and the average ARPU rose from $24 to $29.

Streaming services have it easier because subscription-based video-on-demand (SVOD) has been the default business model.

What is SVOD?

Subscription Video-on-Demand (SVOD) is a monetization model where viewers pay a recurring fee to access a library of video content without ads.

Revenue comes directly from subscriber fees rather than from advertising or pay-per-view transactions. Success depends on sustained subscriber acquisition and retention, with key metrics including churn rate, average revenue per user, and customer lifetime value.

In 2025, an average American household is comfortable paying $70/month for streaming services, so SVOD publishers don’t face the same attrition as news media do.

In fact, until a publisher has a wide enough reach and content library to explore ad-supported monetization, SVOD should be the default monetization playbook for a few reasons.

1. SVOD creates stable, predictable revenue

CTV ad spend is growing, but the market is still volatile and relies heavily on macroeconomic trends.

Linear TV is a clear example of how relying purely on ad-based monetization makes publishers more vulnerable to shifts in ad spend. In December 2025, German broadcaster RTL had to lay off 600 staff members due to a dip in ad revenue and a lack of alternative, reliable income sources.

On the other hand, while both Disney and Paramount reported a decline in ad revenue in Q3 2025, both publishers run a SVOD business model, which cushioned the impact of a weaker ad quarter.

Relying on monthly subscription fees on the outset of launching a streaming service helps create a brand-loyal community of viewers that fuels recurring revenue.

Publishers can funnel SVOD returns into expanding the content library, engineering infrastructure, and supply chains on a stable basis before they are ready to layer AVOD as an additional revenue stream.

2. SVOD is the strongest source of first-party data

A SVOD offering encourages publishers to build direct connections with their audiences. These relationships are account-based and authenticated, with viewers logging in, sharing emails and payment details, and building long-term viewing histories tied to a persistent ID.

Over time, SVOD publishers can build a long trail of data on viewing habits, session length, devices, and genre affinity.

Considering that AdTech has been on the edge about cookie deprecation for the last three years, having a robust first-party data library as a backup plan differentiates SVOD publishers from media that rely solely on third-party trackers.

3. SVOD still makes room for branded deals and advertising integrations

Subscription-only platforms typically avoid interruptive advertising, but they can still monetize brand partnerships through:

product placement
branded content
native integrations
co-marketing campaigns

These formats allow publishers to capture high-value brand deals without sacrificing user experience, requiring an in-house AdTech stack or sharing ad revenue (in some cases, up to 50%) with advertising partners.

A well-known example is the Eggo waffles product placement in the Netflix show “Stranger Things”, which brought a 14% sales increase in 2017 and a 9.4% sales uplift in 2018.

We can build a fully functional SVOD streaming platform in months

Xenoss engineers will create the back-end, payments, recommendation algorithms, and a frictionless UI for streaming platforms

Talk to us

The limitations and risks of SVOD

The SVOD industry faces a mounting credibility crisis as consumers increasingly question whether their subscriptions deliver real value.

While 53% of consumers rely on streaming services as their primary paid entertainment source, satisfaction is plummeting.

Customer satisfaction with SVOD streaming plummets because of frequent price hikes from CTV publishers

Now that more SVOD platforms are hitting the market, an average household in the US has to maintain four active streaming services. Having to pay a separate monthly subscription for each of those makes one in two viewers feel like they are spending too much on CTV content.

As a result, SVOD publishers are now facing a harder time acquiring new subscribers and retaining their audiences.

1. Growing customer acquisition costs

In the last three years, SVOD publishers have had a harder time retaining viewers whose attention is dispersed on short-form social media content.

With high-quality video generation models like Sora and Nano Banana, the sheer volume of available video content is growing exponentially, making it harder to cut through the noise.

A Deloitte survey on digital media trends noted that SVOD publishers are falling behind on personalization expectations of younger audiences and are losing viewers to social media, where algorithmic recommendations reflect user interests more accurately.

To continue acquiring new subscribers, SVOD streaming services invest more in:

sophisticated recommendation engines
social media campaigns promoting new releases
bundles, discounts, or extended free trials

These tactics help with acquisition but drive CAC higher every year.

2. Rising customer churn

Even when platforms succeed in attracting new subscribers, retaining them has become significantly harder.

Throughout 2025, subscriber churn has been rising. Deloitte reports that 40% of consumers have cancelled at least one paid streaming service every six months.

The average churn rate among large SVOD publishers, Netflix, Hulu, and Disney+, is at 5.5%, a two-fold rise from 2.9% in 2019.

Churned viewers are not lost forever. 24% of them resubscribe within six months. However, chasing these audiences requires publishers to keep running costly re-acquisition campaigns that erode the bottom line.

The new monetization playbook: adding AVOD to an SVOD service

Viewers reaching the tipping point about the top price they are willing to pay for a streaming service is both a challenge for SVOD providers and an opportunity to explore adding a cheaper ad-supported video on-demand (AVOD) tier.

What is AVOD?

Ad-supported video-on-demand (AVOD) is a revenue model in which streaming video services offer free or low-cost content in exchange for displaying advertisements.

AVOD platforms monetize through targeted ad inventory sold to brands seeking premium, television-quality reach. This revenue model appeals particularly to budget-conscious viewers and brands looking for premium inventory in a fragmented media landscape.

Before Netflix rolled out ad-supported subscriptions, many industry analysts thought that ads would increase subscriber churn by making streaming more similar to linear TV, which it originally branched away from.

However, according to industry signals, viewers no longer mind ads if they can save on subscriptions.

A Marketing Brew survey reported that 80% of consumers would accept ads if video content were completely free.
Two-thirds of consumers surveyed by PwC say they’ll tolerate ads to lower subscription costs
Ad acceptance is rising even among self-proclaimed ‘ad-haters’: 42% of them are now tolerant of ads in streaming platforms.

Audiences primarily want access to more content at lower prices. As households juggle multiple subscriptions, adding AVOD tiers becomes an acceptable, even welcomed, trade-off.

Benefits of expanding SVOD capabilities with AVOD offerings

AdTech is ready for the growth of AVOD inventory

Besides becoming widely accepted by customers, in-streaming ads are heavily sought out by advertisers.

68% marketers now view AVOD CTV channels as “must-buy” items, and demand will likely go up as the programmatic ecosystem for CTV matures.

For now, this growth has been slow; most AdOps teams don’t have dedicated CTV advertising teams, and only 34% of the total CTV inventory is biddable.

But the ecosystem is picking up pace. By early 2026, nearly half of CTV inventory is estimated to be biddable, and 75% of marketers plan to set up internal teams for CTV campaign management by the end of next year.

Both advertiser interest and the rate at which tech capabilities grow are looking good for AVOD publishers.

Build a custom AdTech stack for CTV to get full control of your ad revenue

Explore our CTV capabilities

AVOD is a way to monetize the first-party data SVOD publishers collect

SVOD subscriptions generate high-quality, authenticated first-party data, but the data becomes significantly more valuable when publishers add AVOD capabilities. With both models in place, publishers can use viewer behavior, device usage, genre affinity, and title-level interaction data to create premium audience segments, higher CPMs, direct deals with global brands, and more accurate frequency and reach models.

Real-life example: Disney+ centered its AVOD offering around high-quality first-party data

Disney Advertising has built a suite of high-value ad products on top of its first-party data to attract high-budget advertisers. The publisher’s Audience Graph and Disney Select tools aggregate streaming and other Disney touchpoints into more than 1,000–2,000 first-party behavioural and psychographic segments.

Global advertisers like Chipotle, United Airlines, and T-Mobile tapped into Disney’s metadata and audience graph to insert ads in key emotional moments of Disney content and drive more user attention to their campaigns.

Fueled by growing viewer acceptance, AdTech capabilities, and brand demand, AVOD is becoming the industry standard. Amazon, Disney, Netflix, Paramount, and many other leading streaming services are effectively running ad-supported monetization on top of monthly subscriptions.

Why new publishers should not choose AVOD as their only monetization model

The rise of AVOD may tempt new entrants to skip SVOD entirely and launch as a free, ad-supported service.

In our experience, this is a riskier strategy because building or buying an AdTech stack requires considerable upfront investment, both in engineering capabilities and internal sales teams.

Need for a proprietary AdTech stack

To successfully support AVOD streaming, publishers have to run an ad server in a channel that’s still fragmented and lacks robust AdTech standards.

To appeal to advertisers, publishers also need to circumvent inconsistent CTV measurement, disparate reporting, and a lack of data standardization with custom data pipelines, clean IDs, and cross-screen attribution.

Building a competitive AdTech stack for AVOD will stretch time-to-market and require a considerably higher budget. For a new CTV market entrant, setting up a simple subscription pipeline first and investing all remaining funding into the content library makes more sense in the long term.

Difficulty building engaged audiences

Major SVOD providers who have been experimenting with ad-supported streaming report that ad-supported users watch 22–23 minutes less per day than ad-free homes and churn faster than ad-free tier subscribers.

Not having the support of a more engaged SVOD audience and scaling a streaming service built on less committed viewers exposes publishers to risks in viewership fluctuations and will likely make them less attractive to advertisers compared to services with combined SVOD and AVOD monetization.

How streaming publishers can integrate both SVOD and AVOD monetization

The decision framework for adopting SVOD and AVOD comes from understanding their respective strengths and weaknesses in customer acquisition and content production costs, upfront investment in development, and margins.

Dimension	SVOD (Subscription-focused CTV)	AVOD / FAST (Ad-focused CTV)
CAC (Customer Acquisition Cost)	Medium to high per user, but fully tied to identity Heavy spend on performance marketing, free trials, bundles, and device promos, Each acquisition yields a logged-in, paying account with rich 1P data, enabling predictable MRR/ARPU and strong LTV once churn is under control.	Low to medium per viewer, but weaker identity It’s easier to attract “free” viewers via app store presence, device placement, and channel line-ups. However, many viewers remain anonymous or loosely identified (device-level), so effective CAC per known user is higher than it looks once you adjust for data quality and limited monetization
Cost of content production	High and largely fixed Originals and premium rights are expensive, but subscription cash flows (monthly/annual) give finance teams a clear basis for multi-year content investment. Major streamers explicitly rely on subscriptions to fund high-budget series and films, then use viewing data to optimize future spend.	Medium-high, pressured by CPMs AVOD/FAST can lean more on library content and volume programming, but still faces rising content and rights costs. Because revenue is tied to ad demand and fill rates, there’s less certainty that new content will recoup costs, especially in downturns or when CTV CPMs are under pressure.
Margins on subscription vs ad revenue	Medium–high and more predictable Once a larger scale is reached, incremental subscriptions have high contribution margins. Recurring nature and predictable churn make SVOD publishers attractive to investors as “steady cash flows.	Highly variable Gross ad revenue on CTV can be attractive at high CPMs, but net margin is shaved by rev-share with platforms (e.g., Roku, Amazon, smart-TV OEMs), demand-side fees, data/verification costs, and sales overhead. When ad markets soften, yield compression can sharply erode margin, even if viewership holds.
Engineering costs	Low to medium No ad stack is needed beyond basic marketing analytics. The technical team can focus on product, UX, recommendations, and billing, not advertising infrastructure	High: AdTech is existential for the model AVOD/FAST publishers must invest heavily in SSAI infrastructure, identity resolution (device graphs, household IDs, clean room integrations), and IVT mitigation, because ad fraud and spoofing can directly wipe out revenue and harm demand.
Impact of lower watch time on the bottom line	Moderate impact Lower watch time harms perceived value and increases churn risk, but subscription revenue per user remains partially decoupled from hours watched in the short term. With good retention models, SVOD services can intervene (personalization, promotion, content tweaks) before churn fully hits revenue.	Severe impact Lower watch time immediately reduces ad impression volume, frequency opportunities, and total sellable inventory, slashing revenue almost 1:1. Because AVOD relies on impressions, any drop in engagement directly compresses yield, and there’s no subscription buffer to smooth the hit.
Time to market	Typically faster to deploy A publisher can launch an SVOD app quickly using off-the-shelf OTT platforms. The core needs are content rights, basic apps, billing, and authentication. No ad stack, sales org, or measurement/verification integrations are required to start monetizing; the complexity grows later with scale and bundles.	Typically slower to deploy A credible AVOD/FAST business needs not just content and apps but also SSAI, ad-server/SSP integrations, measurement and fraud partners, sales or programmatic deals, and reporting pipelines. Fully monetizing ad inventory with decent yield takes more time, partners and engineering.

SVOD monetization is easier to build into a streaming platform than an AVOD stack, which is why all leading CTV publishers use it as the default model. It will help lay a strong financial foundation, more predictable retention curves, and a clear playbook for collecting first-party data.

However, in a market where consumer price sensitivity keeps rising and subscription fatigue is accelerating, SVOD is no longer sustainable on its own.

Introducing ad-supported monetization gives SVOD publishers the ability to cut subscription costs and improve user retention while maintaining positive margins and attracting new financial gains through ad revenue.

Five-step framework for SVOD launch and AVOD transition

Drawing from our experience in building CTV solutions, we developed a five-step monetization roadmap that publishers can effectively combine SVOD and AVOD capabilities.

Step 1: Launch with a tight, easy-to-understand subscription offer.

A focused content proposition, simple plans (1–3 tiers at most), and a smooth signup/billing experience across key devices.

Step 2: Instrument data from day one and build a clean first-party data flow.

Require login for all subscribers and track viewing, engagement, churn, and acquisition channels in a unified data model. This first-party data becomes the backbone for later decisions on content, pricing, and, eventually, ad targeting.

Step 3: Stabilize unit economics before touching ads.

Iterate on catalog, recommendations, UX, and pricing until you hit acceptable CAC payback, churn, and LTV/CAC ratios. Only once subscription revenue is predictable and reasonably profitable should you consider adding another monetization layer.

Step 4: Design an ad strategy that complements SVOD.

Introduce an “ad-lite” or AVOD tier as a deliberate segmentation move. Lower price or free with registration, without degrading the value of your flagship ad-free plans. Clearly define which audiences each tier is for and how you’ll move users up the value ladder.

Step 5: Phase in AVOD infrastructure and optimise with SVOD data.

Roll out SSAI, measurement, and IVT/fraud controls incrementally, starting with limited ad loads and a small set of trusted demand partners. Use your rich SVOD first-party data to power targeting, frequency management, and content/ad load optimisation, so ads are a high-yield add-on rather than a structural dependency.

By following these implementation steps, CTV publishers can tap into fast-growing ad budgets without exposing themselves to ad-market whiplash. The services that win this decade will be the ones that continually rebalance the SVOD/AVOD mix, using first-party data, unit economics, and viewer sentiment as their north stars.

The post SVOD, AVOD, or a hybrid model: How streaming platforms can maximize CTV revenue appeared first on Xenoss - AI and Data Software Development Company.

Total cost of ownership for enterprise AI: Hidden costs beyond the API bills

Maria Novikova — Tue, 11 Nov 2025 10:09:48 +0000

Worldwide AI spending will reach 1.5 trillion by the end of 2025. By contrast, the global enterprise software market reached $316.69 billion in 2025. This means enterprises are spending nearly five times more on AI than on the software that runs their core operations.

AI total cost of ownership differs fundamentally from traditional enterprise software economics through technical factors that create hidden cost multipliers:

Computational resource scaling with model parameter growth
Continuous data pipeline processing overhead
Real-time model performance monitoring requirements
Multi-environment deployment complexity
Regulatory compliance automation
Legacy system integration challenges.

Unlike enterprise software expenses, however, AI investments are much more complex. Business leaders often lack a comprehensive understanding of the total cost of ownership (TCO) of developing, deploying, maintaining, and scaling an AI model. That’s why 85% of organizations misestimate AI project costs by more than 10%.

Understanding where your AI budget goes is critical. This guide breaks down AI development cost into six components that determine long-term ROI:

Infrastructure: GPU clusters, auto-scaling, multi-cloud ($200K-$2M+ annually)
Data engineering: Pipeline processing, quality monitoring (25-40% of total spend)
Talent acquisition and retention: Specialized engineers ($200K-$500K+ compensation)
Model maintenance: Drift detection, retraining automation (15-30% overhead)
Compliance and governance: up to 7% revenue penalty risk
Integration complexity: 2-3x implementation premium

By segmenting AI costs, you gain control, visibility, and the ability to make informed trade-offs among speed, accuracy, and efficiency, achieving meaningful enterprise impact through systematic TCO management.

Build vs. partner vs. buy: Decision tree for cost-efficient AI adoption

Before committing to any AI app development cost model, organizations should assess critical factors that determine long-term TCO trajectories:

Is this a proof-of-concept or a long-term goal? Helps you realize the level of commitment required to see a project through.

What’s our acceptable threshold for going over budget on an AI project? Defines a clear stopping point and prevents uncontrolled spending.

Do we have a defined business problem and clear KPIs to measure AI performance? Drives alignment between technical execution and business results, ensuring your AI initiatives remain accountable and measurable.

These assessment questions help you determine whether AI is: a tactical experiment to see where it would be most effective, or a strategic infrastructure investment necessary to solve a budget-draining enterprise problem.

With this understanding, you can decide whether to integrate a ready-made solution, partner with AI vendors, or build an AI solution in-house from scratch.

Architecture strategy comparison

Approach	Initial investment	Ongoing costs	Control level	Time to value	Risk profile
Custom development	High ($500,000 – $2 million)	High (30-40% annually)	Maximum	12–24 months	High technical risk
Strategic partnership	Medium ($100,000–$500,000	Medium (15-25% annually)	Shared	6–12 months	Medium implementation risk
Commercial platform	Low ($50,000–$200,000)	Low-medium (10-20% annually)	Limited	3–6 months	Low technical, high vendor risk

Each decision incurs different costs and AI TCO. A ready-made solution is, of course, cheaper than custom development. However, you shouldn’t focus only on what’s cheaper, but also on how AI prices align with your business goals. For instance, you may realize that although solving your current business problem requires a significant upfront investment, the potential AI ROI is worth it.

The decision tree below shows the pros and cons of each decision, along with the chain of reasoning questions leading to the final decision.

Buy vs. build vs. partner pros and cons

Out-of-the-box purchases work for experiments. Long-term goals require trusted partnerships or in-house expertise.

Once you set your mind in the right direction, you’ll analyze each of the TCO components listed in this article with the clarity of which costs are yours to control and which depend on your partners or vendors.

#1. AI infrastructure stack: Training, inference, and hosting costs

Every respondent to IBM’s study said they had cancelled or postponed at least one of their GenAI projects due to rising compute expenses. To avoid this in the future, 73% of respondents plan to implement centralized monitoring solutions to analyze every aspect of AI computing.

But why do AI infrastructure costs spiral out of control? Because they exhibit non-linear scaling patterns:

Model drift: performance degrades over time, requiring retraining and revalidation and consuming, on average, an additional 15-25% of compute overhead.
Parameter growth and memory scaling: large language models with 70B+ parameters require 140GB+ GPU memory for inference.
Data decay: outdated or low-quality data inflates processing and storage costs.
Hallucination control: continuous monitoring, evaluation, and guardrails add operational load.
Hardware wear and capacity limits: GPU utilization declines as workloads scale and diversify, experiencing 20-40% utilization penalties compared to dedicated deployments.

Each of these factors adds new layers of compute, storage, and monitoring demands that compound month after month.

Where and how you host your models (cloud, on-premises, or hybrid), and whether you run training and inference workloads, determine the long-term cost of implementing AI.

Model training vs. inference costs

Google spends 10 or even 20 times more on inference than on model training. The gap between training and inference costs is so wide because, during model training, you make a high initial investment to ensure enough computing power to run training datasets.

During inference, costs accumulate and increase over time. The more people use the model and the more outputs it produces, the more computing power it needs, and eventually, the more money the company pays.

The cost of training and inference for your business depends on the model type, the number of parameters, and the size of your data. To reduce training costs, you can purchase pre-trained models and expand their capabilities with enterprise knowledge bases based on retrieval-augmented generation (RAG).

Skipping inference won’t be possible, since the model will run in production within your organization. But you can reduce AI software cost by choosing where to host your system.

Cloud vs. on-premises model hosting

Enterprises can host AI solutions in the cloud or on-premises, each with distinct cost implications. Platforms like Amazon Bedrock, Azure AI, and Google Vertex AI simplify deployment by providing managed infrastructure and ready access to pretrained models.

Cloud hosting reduces setup complexity but introduces unpredictable expenses. While spot instances make training and retraining more affordable, inference workloads often drive “cloud bill shocks”, with costs spiking from 5 to 10 times due to idle GPU instances or overprovisioning. As Christian Khoury, CEO of EasyAudit, puts it: “Inference workloads are the real cloud tax; companies jump from $5K to $50K a month overnight.”

A hybrid approach often balances cost and control: running training in the cloud and inference on-premises. Khoury notes: “We’ve helped teams shift to colocation for inference using dedicated GPU servers that they control. It’s not sexy, but it cuts monthly infra spend by 60–80%.”

As of 2025, renting an NVIDIA H100 GPU in the cloud costs $0.58–$8.54 per hour or $5,000–$75,000 per year if used continuously, rivaling the $25,000–$30,000 purchase price of on-premises hardware.

However, on-premises setups require spending on power, cooling, and maintenance, which can add 20–40% to ownership costs unless utilization stays high. The larger and more complex the model, the higher the end-of-the-month bill. But the obvious advantage is that you’re in control of how many GPUs to buy, how many workloads to run, and when to stop or completely change direction without any extra fees.

Cloud vs. on-premises vs. hybrid architecture economics

Deployment model	Training workloads	Inference workloads	Data governance	Scalability	Total cost impact
Public cloud	Optimal for burst capacity	High per-request costs	Limited control	Unlimited	2–4x premium for production scaling
On-premises	High capital investment	Predictable operating costs	Maximum control	Hardware limited	40–60% lower at high utilization
Hybrid architecture	Cloud training and edge inference	Optimized cost structure	Balanced control	Selective scaling	30–50% cost optimization

Research comparing LLaMA models (7B, 13B, 65B) found that less powerful GPUs (like V100s) consume less energy per second but take longer to complete inference. Actual efficiency lies in optimizing both energy use and model performance.

How different models consume energy

#2. Data engineering costs: From manual to automated data processing

The more autonomous you expect your AI system or agent to be, the more data they need. But 49% of organizations view integration with enterprise data as the main bottleneck to AI scaling.

That’s why companies can fall into two extremes, as mentioned in the ISG research:

Boil the ocean: large data transformation projects, costing millions of dollars, to update all data management systems and practices at once.
Bypass the mess: use siloed, unprepared data pipelines to get the project going, but end up with tech debt.

Both extremities are costly and inefficient. The balance is, as always, in between. Data transformation is essential, but only to the extent necessary for a particular AI project and business problem.

Collection, cleaning, and labeling at enterprise scale

On average, up to 13.2% of AI project costs are allocated to data preparation steps, and 43% of chief data officers (CDOs) perceive data quality, completeness, and readiness as the main drivers of AI adoption.

Here’s a cost breakdown of different data preparation stages with real-life use cases:

Data requirement	Description	Real-life use cases	Estimated costs
Data collection & integration	Aggregating information from fragmented internal systems and external APIs to build training datasets. Often includes custom pipeline development, API connectors, and ETL workflows.	A logistics company integrating IoT sensor feeds, warehouse ERP, and shipment tracking APIs into a unified lakehouse for predictive maintenance.	$150K–$500K per year, depending on the number of data sources and pipeline complexity.
Data cleaning & preprocessing	Preparing raw data for analysis, removing duplicates, resolving inconsistencies, enriching metadata, and ensuring schema compatibility.	A retail chain is cleaning millions of sales and inventory records to train demand forecasting models.	$25K–$30K per data analyst annually; 10–20% of total AI budget on preprocessing efforts.
Data labeling & annotation	Tagging text, image, or video data for supervised training or fine-tuning. Costs vary by complexity, domain expertise, and quality assurance.	A healthcare company is labeling 200,000 MRI scans for a diagnostic model.	$0.05–$5 per label (simple → complex); $5K–$10K per project via SaaS platforms like Labelbox, Scale AI, or Labellerr.
Data storage & lifecycle management	Storing structured and unstructured data (including model inputs/outputs) and applying tiered retention policies to control cost.	A pharma company managing high-resolution microscopy and genomic datasets for drug-discovery models, requiring fast retrieval and secure access controls.	$23–$80 per TB/month (cloud storage)
Data governance & compliance	Implementing access control, lineage tracking, and regulatory compliance (GDPR, HIPAA, AI Act). Requires metadata management and policy enforcement.	A financial institution using Databricks Unity Catalog and Collibra to ensure customer data traceability and model audit readiness.	$100K–$300K annually for governance tools and personnel

Costs vary depending on the complexity of the use case and the maturity of the infrastructure. To control them, leaders are choosing synthetic data generation, automated labeling and data ingestion tools, data-quality monitoring systems, and data contract enforcement. Platforms like AWS Glue DataBrew can reduce data preparation time by up to 80%, freeing engineers to focus on model development rather than data cleanup.

By investing in automation and ongoing data-quality control, enterprises cut redundant labor and costs while strengthening the reliability of their AI models.

Data storage and governance

Storing multimodal big data (text, audio, image, and sensor streams) drives continuous spending on compute and memory. Each redundant pipeline increases the total cost of AI development, while poor data lifecycle management results in wasted storage and compliance risks.

Planning for a cost-efficient storage architecture starts with understanding data value over time. Not all data needs to live in the fastest or most expensive tier. Frequently accessed data should be stored in high-performance environments, while historical or low-priority data can be archived in lower-cost tiers.

To ensure AI/ML models have consistent access to relevant structured and unstructured data, enterprises often build consolidated repositories such as data lakes, data warehouses, or hybrid data lakehouses. These architectures simplify data access for analytics and AI pipelines while maintaining scalability.

For example, a global manufacturing company is running predictive maintenance models. The team implemented a tiered storage strategy using Amazon S3 and Glacier: real-time sensor data stayed in S3 Standard for instant access. At the same time, readings older than 90 days were automatically moved to S3 Glacier. This shift cut storage costs by over 60% without affecting model performance.

Architecture type	Cost structure	AI compatibility	Optimization potential
Data warehouse	High ETL processing overhead and rigid schema management	Limited support for unstructured data and constrained to batch processing	20–30% cost reduction possible through warehouse automation, though scalability remains limited
Data lake	Storage-optimized but compute-intensive for large-scale data processing	Excellent support for multimodal and unstructured data with flexible schema evolution	50–80% savings in storage costs are achievable through tiered storage and data processing optimization
Data lakehouse	Balanced storage and compute economics with built-in transactional capabilities	Native integration with ML workflows supporting both real-time and batch processing	Up to 30% reduction in data management costs through a unified architecture that eliminates redundant data movement

Control AI costs without compromising on model performance

Xenoss engineers provide end-to-end AI infrastructure optimization, ensuring every GPU hour delivers measurable business value

Schedule a call

#3. Model maintenance and retraining: The hidden tax of model drift

AI systems require continuous lifecycle management to maintain accuracy, ensure regulatory compliance, and optimize computational efficiency.

Model drift and performance degradation

Over time, AI models tend to lose accuracy as real-world data drifts away from the data they were initially trained on. This divergence, known as model drift, causes models to misinterpret new patterns, “forget” previously learned relationships, and deliver unreliable predictions. Left unchecked, drift can quietly erode ROI by increasing false outputs, compliance risks, and customer dissatisfaction.

One way to mitigate model drift is to fine-tune only some parts of the model rather than invest heavily in full model retraining. The chart below shows what happens when you fine-tune different parts of an AI model:

When you retrain the entire model (far right), it learns the new task well but forgets much of what it knew before.
When you fine-tune only specific layers, such as the self-attention projector (SA Proj.), you still achieve substantial learning gains (+30 points) while losing very little of the original performance.

In business terms, this means not all model updates are worth the cost. Full retraining delivers short-term gains but causes “AI amnesia,” forcing extra rounds of validation, retraining, and maintenance later. Targeted fine-tuning, on the other hand, preserves past accuracy and keeps infrastructure and compute costs lower.

Interdependence between model fine-tuning and its performance

Version control and rollback infrastructure

Maintaining version control for AI models can add another 5-10% to annual maintenance costs. Organizations need a robust MLOps infrastructure to:

Track model versions and their performance metrics
Store model artifacts and training configurations
Enable quick rollbacks when new versions underperform
Manage A/B testing between model versions
Document changes and maintain audit trails

Tools like MLflow, Weights & Biases, and vendor-specific solutions (AWS SageMaker, Azure ML, Google Vertex AI) provide these capabilities but require dedicated resources for setup and maintenance.

Security updates and vulnerability management

As models move into production, every layer of the stack becomes an attack surface: APIs, data pipelines, vector databases, and model endpoints.

The hidden risk lies in how often AI models interact with sensitive or proprietary data. 99% of enterprises inadvertently expose confidential data to AI tools, primarily through third-party integrations and shadow AI usage. Fixing such leaks involves not only technical remediation but also incident response, re-training security teams, and updating governance policies, all of which add both direct and opportunity costs.

Each update, patch, or access policy review adds operational overhead but also reduces the risk of multimillion-dollar compliance fines or reputational damage.

Forward-thinking enterprises are now integrating continuous AI security monitoring, combining model-level access control, encrypted inference, and real-time anomaly detection, to keep systems both compliant and resilient without derailing ROI.

#4. Talent acquisition and team training: The $200k+ per specialist reality

Human capital is among the most essential factors to consider in the AI TCO, encompassing both technical specialists who architect and maintain AI systems and end users who integrate AI capabilities into business workflows

Beyond salary: The full cost of AI teams

According to data from Levels. fyi, compensation by role and experience level for AI specialists in the US is as follows:

Entry-level AI engineers: $150,000-$200,000
Mid-level AI engineers (3-5 years): $200,000–$300,000
Senior AI engineers (7-10 years): $300,000–$500,000
Principal/Staff AI researchers: $500,000–$1,000,000
Applied Research Scientists: $250,000–$350,000

Beyond direct salaries, the true cost of AI teams includes recruitment premiums, retention bonuses, and the ongoing cost of upskilling talent to keep pace with rapidly evolving frameworks and infrastructure.

The competition for senior and applied research talent drives companies to offer equity packages, relocation support, and signing bonuses that can add another 20–30% to total annual spend per employee.

Add to that the cost of turnover, which can reach 50–60% of annual salary when accounting for recruitment, onboarding, and lost productivity. For smaller firms, maintaining a full in-house AI department may be unsustainable without clear ROI metrics or external support partners.

As a result, many enterprises now balance internal expertise with strategic AI partners, outsourcing model optimization, MLOps, or compliance work while retaining only core AI leadership roles in-house. This hybrid staffing model cuts operational costs while preserving technical ownership.

Change management and team training

35% of organizations plan to invest in employee training as a future priority to help employees use AI more efficiently. Expenses on team training span the following areas:

Department-specific training. The Workplace AI institute offers courses for different business functions for $498 each (sometimes available at a 50% discount). Training up to 10 teams can cost around $50,000.
Executive leadership training. An Oxford Management Centre offers a 10-day course for C-suite AI literacy and strategic planning for $11 900.
Cross-departmental integration. AI high performers are 3 times more likely than less progressive enterprises to redesign workflows to maximize AI value. Redesign can involve hiring new people, integrating human-in-the-loop processes, automating tasks, and retraining teams. Overall, costs for cross-departmental AI integration can range from $150,000 to $500,000 (depending on AI system complexity and headcount).
Custom development of AI training materials. To increase AI adoption, you can invest $100,000 – $300,000 in custom training software with gamified or interactive components.

Investments in change management and team training initiatives pay off within 6-12 months as teams get up to speed with AI tools, improve productivity, and contribute to increased company profits.

Save on maintaining an in-house AI team by cooperating with Xenoss experts

We optimize and govern enterprise AI systems with cost efficiency in mind

Request a quote

#5. AI compliance and governance: The 40-80% cost multiplier for regulated industries

Regulations such as HIPAA, GDPR, CCPA, PCI DSS, ISO 27001, and the EU AI Act require that every stage of the AI lifecycle (from data collection and storage to model inference and human oversight) be transparent, traceable, and well-documented. This means extra layers of governance: maintaining data lineage, setting access permissions, logging model decisions, and ensuring the right to explanation for automated outputs.

Here are the differences in violation types and maximum fines of GDPR, EU AI Act, and PCI DSS:

Regulation	Violation type	Maximum fines
GDPR	Breach of data protection obligations (consent, data processing, security, etc.)	Up to €20 million or 4% of global annual turnover, whichever is higher
EU AI Act	Non-compliance with high-risk AI obligations (e.g., lack of transparency, risk management, or human oversight) Use of prohibited AI systems (e.g., social scoring, manipulative surveillance)	Up to €15 million or 3% of global turnover Up to €35 million or 7% of global annual turnover, whichever is higher
PCI DSS	Non-compliance with payment card data security standards, or breach by a non-compliant merchant	$5,000 – $100,000 per month, escalating with time

To minimize regulatory risk, invest early in a unified AI governance framework that balances transparency, data protection, and human oversight. It’s far cheaper to prevent a compliance breach than to pay for one, especially given penalties that can reach up to 7% of global turnover.

#6. Complexity of AI integration into legacy systems

Enterprises still operate on a large number of legacy systems that have been running for decades and are too valuable to abandon for the sake of AI. And then the engineering team faces challenges in integrating modern AI systems with these systems. Completely re-architecting legacy software can make your AI project bill skyrocket.

Rather than rip out core systems, Xenoss advocates for incremental and cost-efficient integration approaches:

Hybrid architecture: Train models in the cloud while deploying inference on-premises to keep data close and latency low.
Middleware layers: Use API gateways or a central AI middleware that links legacy systems and multiple AI services without altering core applications.
Modular AI microservices: Build focused AI capabilities (e.g., document classification, anomaly detection) as independent modules that integrate via standard APIs and leave legacy logic untouched.

These incremental strategies are 2–3x more cost-efficient than full-stack modernization. Instead of spending millions to replace legacy systems, companies can integrate AI into existing systems in stages.

Integrate AI systems into your existing environment without disruption

Request an AI engineering consultation

Measuring AI ROI: From short-term to long-term business impact

After answering: how much does AI cost? The next question is: Is it worth it? The only way to answer is through measurable ROI. Start with small, contained pilots and track how AI impacts key financial, operational, and strategic metrics, from direct savings to decision-making speed.

Category	Metric	What it measures
Financial	Direct savings	Reduction in labor, infrastructure, or operational costs after AI implementation
	Opportunity cost	Lost revenue or efficiency from delayed or failed AI adoption
	Capital efficiency	ROI on infrastructure spending (GPUs, cloud, data stack)
Operational	Time to First Value (TTFV)	How quickly AI delivers its first tangible outcome (e.g., faster reporting, reduced workload)
	Time to Value (TTV)	When AI achieves full operational impact across teams
	Automation rate	Share of workflows or tasks automated by AI
	Decision speed	Time saved in insights, reporting, and execution
	Error reduction	Drop in rework, compliance issues, or output errors
	Employee productivity	Increase in throughput or value delivered per employee
Strategic	Total Economic Impact (TEI)	Overall ROI factoring in flexibility, risk, and payback period
	Customer Lifetime Value (CLV)	Growth in long-term customer revenue due to personalization or retention
	Net Promoter Score (NPS)	Improvement in customer satisfaction and brand loyalty driven by AI-enabled experiences

Together, these metrics form a balanced view of how AI delivers value, from short-term productivity to long-term revenue impact.

Bottom line

Most organizations focus on headline costs like API usage or GPU hours but overlook the dozens of small, recurring expenses that quietly erode ROI over time: data preparation that never ends, retraining cycles caused by model drift, cloud bills inflated by idle instances, or compliance audits that turn into six-figure line items.

This article proves that artificial intelligence cost estimation is predictable when it becomes visible. When you break AI spending into components, it becomes clear that the most expensive part isn’t always the technology itself.

Data engineering, security updates, and people-related costs often outweigh software licensing fees and API bills. For instance, data collection and cleaning can take up over 10–15% of total AI budgets, while high-end AI engineers command $300,000–$500,000 in annual compensation. Meanwhile, maintaining accuracy through retraining and vulnerability patching can add 15–30% to your operational costs each year.

Therefore, the real challenge is sustaining AI use rather than affording the technology. Xenoss provides a detailed estimate of your AI systems’ potential TCO and develops a strategic roadmap to help you stay on budget.

The post Total cost of ownership for enterprise AI: Hidden costs beyond the API bills appeared first on Xenoss - AI and Data Software Development Company.

Real-life digital twins applications in manufacturing and a roadmap for implementation

Maria Novikova — Wed, 05 Nov 2025 09:52:21 +0000

.In fields like automotive and consumer electronics, physical products include digital features.

Users have come to expect regular over-the-air updates and new features.

In automotive, 36% of auto owners want to get over-the-air updates at least once every three years.

With a strained supply chain, a shortage of blue-collar workers, and a turbulent economy, the strain on manufacturers is compounding. Nine in ten manufacturing executives surveyed by McKinsey in 2024reported facing visibility challenges and shortages with their supplier partners.

At the same time, technologies like machine learning (ML) and the Internet of Things (IoT) are becoming easier to implement. AutoML now automates large parts of the ML workflow, reducing manual effort and the need for deep ML expertise.

In IoT, the Matter 1.3. Spec released last year expanded device coverage and can now capture data from a wider range of devices.

Manufacturers are seeking ways to use new technologies to fix operational issues.

In this piece, we will explore how digital twins, platforms that blend AI, IoT, cloud, and advanced analytics, help manufacturers create better, cheaper products, plan operations effectively, and manage multiple facilities from one central hub.

What are digital twins?

Digital twins are the digital representation of the physical world: the factory itself, the final product, equipment, or the supply chain.

Digital twins bridge physical facilities and digital technologies in a unified high-fidelity replica

There’s no single rulebook on what a digital twin should look like. In fact, these platforms come in different shapes and sizes depending on the manufacturer’s needs.

Product twins

Product twins are exact replicas of the final product. They include every part of the physical design and follow the rules of math and physics.

Manufacturers create these systems to help engineers run simulations. These simulations are cheaper and less risky than real-world tests.

Asset twins

Asset twins are models of specific factory assets, like equipment. They connect to IoT sensors and gather data on energy use, device performance, and maintenance needs.

Manufacturers create asset twins to spot early signs of equipment failure. This helps factory operators act before critical machinery stops production.

Factory twins

Factory twins offer a complete view of the factory: its layout, IT systems, and external partnerships with suppliers and distributors.

Manufacturers create these systems to clearly visualize production, optimize planning, and simulate ‘what-if’ scenarios.

As digital twins are adopted in factories and other facilities, they gather more data, which helps streamline operations.

Digital twins are delivering clear ROI in manufacturing

According to McKinsey, three top-of-mind challenges manufacturers face in day-to-day operations are high material costs, labor constraints due to talent gaps, and a lack of end-to-end visibility into factory operations.

Digital twins, especially when augmented with ML and IoT, are growing in popularity as practical and financially feasible workarounds to these hurdles.

Among 100 manufacturing leaders surveyed by McKinsey, 86% believe digital twins can help streamline operations at their factories. 44% already use a digital twin, and 15% are considering implementing one in the near future.

Most executives surveyed by McKinsey recognize the value of digital twins. Nearly half are actively implementing these technologies.

Large language models (LLMs) and enterprise AI agents are now creating new ways for digital twins to improve processes and support business decisions.

Now manufacturers can use predictive analytics capabilities, design simulations powered by ML, or build AI agents that run complex end-to-end workflows with no human supervision.

PwC data proves that AI is making digital twins appear more lucrative to manufacturers. Between 2020 and 2025, digital twin adoption in the sector has grown by over 1,000%.

PwC recorded a +1000% growth of digital twin adoption in manufacturing

How manufacturers apply digital twins: Real-world examples

As factories move towards increased digitization, the value that digital twins can bring to facilities is growing exponentially. McKinsey reports that its clients are using digital twins to fully revamp production schedules and cut monthly costs by up to 7% by compressing overtime requirements.

Digital twins are equally powerful at pinpointing production bottlenecks and supporting facility managers with recommendations on product line sequencing, warehouse storage, and capacity management.

Let’s review four promising digital twin applications and real-world case studies that highlight the impact of this technology.

#1. Improving productivity with real-time production simulations

Manufacturers use digital twins to create digital replicas of factory floors and test responses to production challenges in these environments.

For example, these platforms help factory managers assess the consequences of equipment malfunctions and create response checklists to reduce downtime. Similarly, team leaders can test and measure the impact of changes in maintenance schedules, headcount, and other impactful events.

Real-world impact: BMW’s iFactory is a high-fidelity mirror of real-life facilities and processes. It fully reflects the company’s production pipeline, logistics, and supply chain, and helps simulate the impact of disruption in these areas.

The automaker is now integrating generative AI into the digital twin to simulate a broader range of scenarios and suggest effective troubleshooting strategies based on its up-to-date component catalog, supplier list, quality assurance checklists, and other data.

Building and testing high-fidelity product prototypes

Aerospace or automotive manufacturers working on high-complexity products have limited room to experiment and test real-life components. If a company, like SpaceX, aims to apply the “fail fast” approach to high-stakes manufacturing, R&D costs skyrocket.

The cost of each failed test is estimated at $90-100 million. In 2023, the company spent $2 billion on Starship R&D alone.

Digital twins help large aerospace and automotive manufacturers reduce R&D costs by creating high-fidelity product replicas that enable engineers to test product development decisions and validate new design choices across a range of real-world conditions, from everyday to extreme.

Real-world impact: Airbus engineers use digital twins to simulate how concepts would perform under real-world conditions. The system ingests real-time data from the company’s in-service aircraft and uses it to make predictions.

We’re effectively building each aircraft twice: first in the digital world, and then in the real one.

Airbus Newsroom statement

The digital replicas of Airbus planes enabled the manufacturer to cut time-to-order by spotting quality issues and fixing them before they require time-consuming maintenance interventions.

The company’s digital twin platform, Skywise, now hosts over 12,000 aircraft replicas and helps streamline operations for the company’s 50,000+ employees.

#2. Creating a connected environment in the factory

A manufacturing pipeline is a process with multiple moving parts: component sourcing, product design, equipment maintenance, process scheduling, quality assurance, feedback loops, and more.

Each of these is typically managed by a separate team, supported by dedicated technologies, and frequently spread across multiple facilities. The more complex the process becomes, the more fragmentation and lack of end-to-end visibility become a challenge.

Becoming a centralized control tower that breaks silos and gives teams access to the big-picture view is perhaps the most promising application of digital twins.

A single platform will now give manufacturers access to all relevant data:

Product development: concept designs, preliminary tests, cost, and production time estimates

IT systems: a single access point to computer-assisted design (CAD) software, warehouse management platforms, advanced planning and scheduling software, supplier databases, and other components of the company’s technology stack.

Supply and procurement data: vendor performance metrics, material lead times, purchase order histories, pricing fluctuations, and supplier quality ratings

Distribution and logistics logs: shipment tracking records, delivery times, transportation costs, route optimization data, and warehouse inventory movements

Sales, marketing, and customer service insights: demand forecasts, order patterns, product returns, warranty claims, customer feedback, and market trend analytics

Digital twins connect the company’s internal data and external suppliers

Real-world impact: Digital twins help BASF, the world’s largest chemical manufacturer, break down data silos at its production site in Antwerp.

This is BASF’s second-largest factory in the world, managed by over 3,500 employees and supporting over 50 production pipelines.

For over 20 years, BASF has used a digital twin platform, Smart Sites, to connect data from hundreds of sources, including CAD software, building information modeling (BIM), ERP, and workforce management systems.

The digital mirror of the factory’s structure and operations gives factory teams instant access to data, contributes to faster decision-making, and keeps everyone on the same page.

#3. Generating new data to get insight into production

Besides effectively applying real-world data to accelerate product development, digital twins are a powerful source of synthetic data that drives real-world testing and R&D.

Consider glass melting — an area of manufacturing known for the difficulty of achieving optimal production conditions. The temperature inside a melting furnace is approximately 1600 ℃, which is higher than the melting point of standard silicon sensors.

Digital twins help glass manufacturers simulate the conditions inside the melting furnace without relying on sensor data and create reliable data using physics, mathematics, and, most recently, ML models.

Digital twins help AGC Japan test production conditions in glass furnaces

Real-world impact: AGC, a Japan-based glass manufacturer, piloted COCOA, a digital twin model that generates production-ready synthetic data on glass flow properties based on the melting furnace’s temperature distribution.

AGC technicians use this data for the preliminary studies of production conditions. Before embracing digital twin technology, the company had to bring in simulation specialists and invest additional time and resources to estimate glass flow accurately. Now AGC can get similarly accurate estimates at a fraction of the cost.

AGC plans to expand the system beyond glass flow and furnace temperature estimation and use it for sustainability monitoring and GHG emission reduction.

Digital twin architecture: A combination of data, technology, and processes

As control towers that provide visibility into processes, R&D, and quality assurance, digital twins sit at the intersection of a manufacturer’s data, the other technologies the factory uses, and the processes embedded within the organization.

To accurately represent the factory’s day-to-day operations, digital twins require a multi-layer architecture that combines secure data ingestion and processing, seamless interaction with the rest of the stack, and frictionless embedding into the organization.

1. The data layer

Inventory, production, demand, and other data types are the backbone of digital twin architecture.

A data pipeline supporting digital twins comprises four components: ingestion, transformation, loading, and application.

Data ingestion

Data ingestion is the gateway that validates, transforms, and routes diverse data streams from sensors, machines, enterprise systems, and manual inputs into a unified structure that the digital twin can process.

There are two standard approaches to data ingestion – batch and streaming processing.

Batch vs streaming processing

Batch processing ingests large volumes of data in groups at scheduled intervals. Several chunks, or batches, of data are collected first and then processed together.

Streaming processing ingests data continuously in real-time as it arrives. It enables immediate analysis and response to individual data points or small batches.

Although streaming processing has been gaining traction, in some cases, batch processing is still more optimal.

For example, a pharmaceutical manufacturer might use batch processing to compile end-of-day quality control reports that aggregate test results, environmental conditions, and ingredient lot numbers from all production lines.

Regulatory entities like the FDA, which may request these files during an inspection, will value data completeness over instant access, so batch processing is a better fit here.

On the other hand, streaming data ingestion is critical for real-time monitoring, such as tracking temperature fluctuations in injection molding processes, detecting vibration anomalies in CNC machines, or monitoring conveyor belt speeds. Access to immediate insights gives factory managers the room to intervene rapidly and prevent defects or equipment failures.

Data transformation

Collecting data from multiple sources exposes organizations to fragmentation because suppliers, technology vendors, and factory teams store their logs in different formats.

To ensure these disparate data points can be viewed in an integrated dashboard and applied to business intelligence decisions, data engineers must double down on data normalization, structuring, and cleaning.

Here are the steps data engineering teams should follow to keep high data quality standards.

Data modeling. Engineers define standard schemas and taxonomies that map disparate source systems to a unified data structure, creating consistent naming conventions and hierarchies for equipment from various vendors.

Normalizing units and scales. All measurements should be converted to standard units. It’s also a good practice to align time zones across global facilities to keep accurate production logs.

Implementing validation rules by setting acceptable ranges and thresholds for each data type to automatically flag outliers or impossible values.

Protocols for handling missing data. Teams can choose from several methods to fill data gaps: interpolation, forward filling, flagging for manual review, or rejecting incomplete records.

Documenting data lineage. Track the origin, transformations, and quality scores of each data element so operators understand the context behind digital twin insights

Build a scalable and secure real-time data infrastructure for your digital twin!

See our data engineering capabilities

Data loading

Manufacturers typically load ingested and normalized data from digital twins into a centralized data warehouse or data lake architecture designed for industrial analytics.

These repositories will lay the foundation for advanced analytics, ML models, and business intelligence tools.

Cloud-based platforms like AWS, Azure, or Google Cloud are popular choices for large manufacturers because they offer scalable storage and computing power to handle the massive volumes of time-series data, sensor readings, and operational metrics generated by digital twins.

On the other hand, some manufacturers prefer hybrid storages that keep on-premises data centers for sensitive operational data and use cloud infrastructure for less critical analytics workloads.

In the table below, we recap the benefits of both approaches and optimal use cases for each.

Approach	Key challenges	Main drawbacks	Optimal use cases
Cloud-only	• Dependency on internet connectivity • Data sovereignty and compliance concerns • Latency for real-time operations	• Vulnerable to network outages • Ongoing subscription costs • Less control over data location • Potential security concerns	• Distributed manufacturing sites • Scalable analytics needs • Limited IT infrastructure • Collaboration across locations
Hybrid	• Complex architecture management • Data synchronization between environments • Higher initial investment	• Requires skilled IT staff • Integration complexity • Duplicate infrastructure costs • Higher maintenance overhead	• Sensitive or proprietary data • Mission-critical operations • Strict regulatory requirements • Low-latency control systems • Large enterprises with existing infrastructure

2. The application layer

Vendors get to fully leverage the value of digital twins once they, in addition to enabling it with real-time data access, build a layer of capabilities on top of a robust data pipeline.

After digital twins ingest production data, it should be connected to core capabilities like simulation and predictive analytics

Although it’s a good idea to tailor digital twin capabilities to a manufacturer’s needs and use case, below we offer a blueprint with some high-yield features.

Simulators allow manufacturers to test different scenarios and operational changes in a virtual environment before implementing them in the physical facility. Running preliminary tests on a digital twin reduces the risk of failed real-world tests and cuts down the total cost of adopting organization-wide changes.

Advanced analytics tools process vast streams of real-time data from connected assets to identify patterns, predict equipment failures, and uncover optimization opportunities.

Twin management platforms enable centralized control and orchestration of digital twins across multiple facilities, creating a fully unified control tower for manufacturers.

Low-code capabilities enable domain experts and operators to create and modify twins without extensive programming knowledge. Embedding low-code capabilities, possibly supported by generative AI coding assistants such as Cursor or Microsoft Copilot, into the digital twin platform helps accelerate development and reduce reliance on IT.

Event management tools automatically detect, prioritize, and route alerts from digital twins to the appropriate personnel, enabling faster responses to anomalies and quicker resolution of critical issues before they cause downtime at the production site.

Connect factory data, technology, and operations in one digital twin!

Xenoss engineers will build a digital twin that forecasts risks, tracks productivity, and brings all moving parts together.

Talk to our experts

Stacking these capabilities on top of one another allows manufacturers to build complex, use-case-specific features.

For instance, McKinsey shares an account of a manufacturing team that built a time tracker to monitor how long each step in the pipeline is idle and cut down on such delays.

3. The process layer

Digital twins are by definition separated from the manufacturer’s real-life pipelines. That’s why teams need to put extra effort into bridging the two components.

The two milestones team site leaders should strive to reach are:

Digital twins accurately represent the day-to-day realities of the factory. All data is up-to-date, and employees actively interact with the digital layer to make sure it is basically identical to the factory floor.
On-site teams actively leverage insights that digital twins and simulators supply them with and use them to improve productivity and build higher-quality products.

To reach this state of alignment, factory leaders need to commit to upskilling team members who previously relied on manual work, create checklists documenting how real-world production should be augmented with digital twins, and establish task forces to expand digital twin adoption across the organization.

The table below dives into process-related challenges factory managers face when adopting digital twins and mitigation strategies that facilitate adoption.

Challenge	Description	Mitigation strategies
Lack of standardized workflows	Inconsistent processes across departments create integration difficulties and silos that hinder digital twin effectiveness	- Establish cross-functional governance committees - Document standard operating procedures - Implement change management protocols before deployment
Unclear ROI measurement	Difficulty quantifying benefits makes it hard to justify investments and measure the success of digital twin initiatives	- Define specific KPIs upfront (OEE improvements, downtime reduction, energy savings) - Implement phased pilots with measurable outcomes - Track metrics consistently
Resistance to process changes	Operators and managers are hesitant to modify established workflows and adopt new data-driven decision-making approaches	- Involve frontline workers early in design - Provide comprehensive training - Emphasize how digital twins augment rather than replace human expertise
Integration with existing systems	Connecting digital twins to legacy MES, ERP, and SCADA systems requires extensive customization and process alignment	- Use middleware and APIs for gradual integration - Prioritize systems with the highest impact - Consider phased implementation rather than full replacement
Maintenance of twin fidelity	Keeping digital representations synchronized with physical asset changes and process modifications over time	- Establish update protocols when physical changes occur - Assign ownership responsibilities - Schedule regular validation reviews - Automate synchronization where possible

Bottom line

Successful digital twins deliver tangible ROI for manufacturers like BMW, BASF, and Airbus by increasing factory-floor visibility, running R&D simulations before committing to expensive real-world tests, and predicting the impact of workplace bottlenecks.

Even though executives are showing interest in adopting digital twins, challenges such as poor data validation, employee resistance, and difficulties integrating them with the rest of the tech stack are holding them back.

One way to address these challenges is by building a smaller-scale digital twin prototype with a few data sources and application connectors. Limit its adoption to non-mission-critical processes to iron out their data infrastructure, application layers capabilities, and organizational workflows without risking factory disruption.

Once the simulation starts capturing value, consider gradually expanding the number of data sources and built-in capabilities. High-complexity features like predictive analytics should be introduced once the baseline digital twin is operational and part of factory operations.

The post Real-life digital twins applications in manufacturing and a roadmap for implementation appeared first on Xenoss - AI and Data Software Development Company.

AI hallucinations in production: The problem enterprises can’t ignore

Maria Novikova — Mon, 13 Oct 2025 10:07:35 +0000

A Med-Gemini model once made up a brain part, “basilar ganglia”, by merging two real ones, “basal ganglia” (helps with motor control) and “basilar artery” (transfers blood to the brain). It even diagnosed a patient with a non-existent condition: “basilar ganglia infarct”. If missed, this seemingly minor error could mislead a radiologist, resulting in dangerous treatment or a lack of it.

When using AI for decision-making in customer service, legal, healthcare, or financial industries, frequent AI hallucinations can undermine the value of AI and further investment in this technology. For instance, in legal space, AI hallucination rates can range from 69% to 88% and that’s for highly customized models.

A recent OpenAI study reveals that AI models hallucinate because they guess instead of admitting they don’t know something, a behavior similar to that of students during tests. However, unlike students’ errors, AI hallucinations in business can lead to severe consequences, including compliance violations, brand damage, lawsuits, loss of customer trust, or human health risks.

In 2022, Douglas Hofstadter, an American cognitive scientist, said that “GPT has no idea that it has no idea about what it is saying.” ChatGPT hallucinations are a case of double ignorance, but it can be controlled.

Our in-depth analysis examines what AI hallucinations are in production, their business implications, and potential mitigation strategies. While entirely eliminating hallucinations may be impossible, strong pre-training, training, and post-training validation can lead to near-perfect AI outputs.

Understanding AI hallucinations in enterprise systems

Broadly, AI hallucinations can be divided into two categories: factuality and faithfulness hallucinations. Factuality hallucinations occur when the output differs from verifiable real-world facts, such as claiming that the USA has 52 states instead of 50.

Faithfulness hallucinations occur when the AI model fails to consider the prompt context and deviates from the instructions, such as when an AI assistant, instead of fetching requested data from the CRM, pulls it from an Excel spreadsheet, thereby frustrating the sales team.

In fact, 47% of employees are concerned about the decisions their companies make based on AI outputs. As in the case with the Med-Gemini model, clinicians said that they don’t trust human judgment enough to verify every AI output, as validation also requires experience and time, which some medical workers may lack.

On a more granular level, enterprise teams can encounter the following hallucinations:

AI hallucinations examples and types

Type	Description	Enterprise relevance
Extrinsic vs intrinsic hallucination	Extrinsic: output claims facts not present in the input/knowledge base; Intrinsic: output contradicts itself or internal logic	Particularly dangerous when the model contradicts known policies/data
Contextual or domain hallucination	The model misinterprets domain-specific jargon or context and “hallucinates” domain-specific facts (e.g., inventing a regulation name)	High in regulated industries (finance, healthcare, legal)
Overconfident misstatements	The model expresses certainty about a statement that is incorrect	Users may not question it and propagate errors across the enterprise
Citation or reference hallucination	The model fabricates references, DOIs, court cases, whitepapers, or internal document identifiers that don’t exist	Misleads audits, research, and compliance

How hallucinations differ from traditional software bugs

There is a three-fold approach to understanding AI hallucination examples when compared to traditional software systems:

Origin. Traditional software failures follow predictable patterns. A database query either returns correct results or fails with an error message. By contrast, AI hallucinations generate outputs based on probabilistic patterns, meaning that an LLM estimates the most statistically likely next word in a sentence based on the knowledge it gained from the training data. That’s why an AI system confidently provides incorrect information that looks completely legitimate, as it’s convinced that this output is correct.

Behavior. Traditional systems work and fail predictably, but an AI solution is a black box. Data science teams can impact AI models during pre-training, training, and post-training, but the process of running queries remains a mystery.

Detection. System administrators can debug traditional software using logs, stack traces, and reproducible error conditions. Hallucinations require domain expertise to identify and often slip past technical reviewers who lack subject matter knowledge.

What are the possible business consequences of frequently dealing with AI hallucinations?

Minimize hallucinations in your custom AI systems with our experienced AI engineers

Talk to AI experts

Business risks from AI hallucinations

Beyond the common financial losses that companies often incur due to AI hallucinations, the latter can also lead to regulatory penalties, damage customer relationships, and expose organizations to litigation.

Brand value destruction through hallucination incidents

Market reactions to AI-generated hallucinations demonstrate how quickly fabricated information can destroy enterprise value. Google lost $100 billion in market capitalization within 24 hours after Bard provided incorrect information during a product demonstration.

The way customers see things is more important than just getting the technical details right. Users don’t distinguish between “The AI made an error” and “Your company published false information.” They hold the brand accountable for every piece of content delivered through official channels.

As was the case in the famous incident with Air Canada, when the company sought to avoid responsibility for the false information provided by their chatbot to a customer, claiming that the technology is a “separate legal entity.” However, the British Columbia Civil Resolution Tribunal took a different view and found AI Canada liable for misinformation, awarding a fine.

Recovery from hallucination-driven reputation damage often requires months of remediation efforts, customer communications, and process changes, which can cost significantly more than the original incident.

Compliance exposure in regulated industries

Healthcare and financial services face amplified risks because AI hallucinations can trigger regulatory violations with severe penalties.

For instance, 77% of US healthcare non-profit organizations identify unreliable AI outputs as their biggest obstacle to deployment.

Medical AI hallucinations can lead to incorrect treatment recommendations, diagnostic errors, and patient safety violations.

Financial services companies face similar compliance challenges when AI systems generate incorrect regulatory reports, miscalculate risk exposures, or provide false customer information that violates consumer protection laws and regulations.

The regulatory environment continues to tighten as agencies recognize AI-specific risks and develop enforcement frameworks that hold enterprises accountable for automated decision-making systems.

To address these risks effectively, organizations should treat AI hallucinations seriously and examine the root causes driving unreliable outputs within large language models: from limitations in training data to architectural and operational design choices.

Risk area	Warning signs	Quick fixes
Brand trust	Customer complaints about AI errors	Add HITL reviews + disclaimers
Compliance	AI generates regulated content	Implement RAG + automated fact-checking
Financial/Legal	AI used for contracts/advice	Human validation for all outputs
Operational	AI drives workflows (e.g., CRM)	CoT prompting + flagging uncertain outputs

Root causes behind hallucinations in enterprise LLMs

When deploying enterprise LLMs, organizations need to understand why hallucinations occur to build effective safeguards.

Training data limitations and noise

AI models reproduce and propagate every flaw present in their training data. If you train models on datasets containing biases, errors, inconsistencies, or incomplete information, you’ll see those same problems amplified in production AI outputs.

Static training data creates another business challenge, as models lack up-to-date knowledge after their training cutoff and can produce inaccurate outputs. AI systems show higher reliability and prove more effective when trained on extensive, relevant, and high-quality data.

To the question of what exciting things a Dell team is doing with AI, their CEO, Michael Dell, responded by emphasizing the importance of data:

The fun thing about your question is that almost anything interesting and exciting that you want to do in the world revolves around data. If you want to make an autonomous vehicle or advance drug discovery with mRNA vaccines, or you want to create a new kind of company in the financial sector, everything interesting in the world revolves around data. All of the unsolved problems of the world require more compute power and more data, and this is why I love what we do.

To feed custom AI models with high-quality data, enterprises should implement robust data governance frameworks that include regular auditing for biases and continuous quality monitoring throughout the AI lifecycle. It’s better to identify and address data issues before they manifest as model hallucinations.

Additionally, by implementing real-time data integration pipelines, you can keep models current with the most up-to-date information, particularly in specialized or rapidly changing domains.

Stochastic generation and next-token prediction

LLMs are stochastic in nature, meaning they operate in a world of controlled randomness, where each content generation involves selecting from multiple possible tokens (or words in a sequence). That’s their beauty and curse at the same time. On the one hand, it helps them produce creative, uncommon, and personalized responses. On the other hand, the probability of AI hallucinations increases. That’s why the more sophisticated and verbose AI models get, the higher the chances of hallucinations.

The best solution here is to stop treating LLM outputs as deterministic software responses. Heeki Park, a Solutions Architect with more than 20 years of experience, suggests that you should focus on how best to tackle a problem, whether by prompting a model or by writing code:

When considering whether to write code or to prompt a model within agents, let’s first define the problem space as it pertains to hallucinations, then discuss scenarios when one or the other is appropriate. When leveraging models for reasoning and task execution, remember that the output is non-deterministic. Agent developers could certainly lower certain parameters, like temperature, to reduce how stochastic the response is, but it still has some degree of randomness in the response.

In scenarios where your use case requires absolute determinism, i.e., the same exact output every time with mathematical precision, then it’s likely appropriate to write code for the task or tool, as code execution is deterministic. For example, if you have a dataset on which you want to perform statistical analysis, you should write code with standard analytical packages to do that work. That said, you could certainly use an AI assistant to help you write that code.

On the other hand, if you are conducting work that is fuzzier in its output, e.g., summarizing an academic paper, extracting insights from a financial analysis paper, then this is a scenario where models excel and could be a great tool for knowledge extraction.

Thus, depending on the level of determinism, your current problem needs, you select either a coding (could be with the help of AI) solution or a prompting one.

Temperature settings and prompt ambiguity

Model configuration, such as setting the temperature, can also affect hallucination frequency. The temperature hyperparameter controls randomness in token selection: lower settings (0.2-0.5) produce more predictable outputs, while higher values (1.2-2.0) increase creativity but simultaneously raise hallucination risks.

Ambiguous prompts with unclear terms or missing context also often trigger inconsistent or incorrect responses. This AI hallucination problem compounds when prompts contain negative instructions that introduce “shadow information,” confusing the model. Inaccurate prompts outweigh the temperature setting, as even reducing temperature values shows only minor improvements in handling ambiguous queries.

This presents a dilemma for enterprises: the same creativity settings that make AI outputs engaging also increase the likelihood of producing false information.

It’s essential to strike a balance between temperature settings and prompt details, so as not to overwhelm the model with too much information or deprive it of its creative capabilities.

To achieve this, work with an expert data science team that can perform thorough testing and validation during model training and define those model parameters that work for your business and data.

Lack of grounding in external knowledge sources

LLMs randomly manipulate symbols without a genuine understanding of the physical world. This fundamental limitation produces outputs that appear coherent but may disconnect entirely from reality. Without external verification mechanisms, models cannot validate their generated content against trusted sources.

Knowledge Graph-based Retrofitting (KGR) presents a promising approach, enabling models to ground their responses in external knowledge repositories and reduce factual hallucinations.

Mitigation strategies for reducing generative AI hallucinations

AI hallucinations aren’t inevitable. With the right safeguards, enterprises can reduce errors by 70% or more. Here are the most effective approaches.

Retrieval-Augmented Generation (RAG) integration

Apart from KGR, RAG techniques can also provide LLMs with access to verified knowledge sources, such as external or internal documentation, enabling models to access them in real time.

RAG implementation involves connecting AI systems to enterprise knowledge bases, product catalogs, or regulatory databases. When a query arrives, the RAG system retrieves relevant documents first, then uses that context to generate responses. There are three distinct types of RAG-based LLM architectures: Vanilla RAG, GraphRAG, and Agentic RAG.

Vanilla RAG is effective for simple queries (e.g., “What are the key benefits of our insurance plan?”) with datasets stored in vector databases for simplified retrieval. However, this approach isn’t capable of differentiating between data types, such as sensitive, regulatory, or customer data.

GraphRAG connects disparate data in a unified graph, with clear relationships between datasets, to enable more complex queries, such as multi-hop reasoning queries (e.g., “Which suppliers are linked to vendors involved in delayed shipments last quarter?”).

And Agentic RAG is a multi-agent LLM architecture, where each agent is responsible for a particular set of data, such as regulations, marketing, or customer support, and can provide more precise responses to specialized queries (e.g., “Does our latest marketing email comply with GDPR guidelines?”). These systems are easily scalable, as the more difficult and domain-specific queries become, the more agents an organization can add.

Depending on the complexity of your use cases, data quality, and budget constraints, Xenoss can help you select the most efficient RAG approach.

Chain-of-thought prompting

Step-by-step reasoning processes help models break complex problems into verifiable components and produce more accurate outputs. One example is chain-of-thought (CoT) prompting, which guides AI systems through logical sequences, making reasoning transparent and reducing errors in multi-step calculations. Below is an example of CoT with a simple math task.

Standard prompting compared to CoT prompting. Source: arxiv

For instance, in financial analysis or legal research applications, CoT prompting requires models to show their work in a step-by-step manner: “First, I’ll identify the relevant regulation. Second, I’ll analyze how it applies to this scenario. Third, I’ll determine the compliance requirements.” This approach helps models keep a continuous focus on user instructions.

However, even with CoT, organizations should validate outputs, particularly for high-stakes decisions.

Context engineering

Context engineering is an emerging discipline that extends beyond simple prompt engineering. As Andrej Karpathy notes, it’s the “art and science of filling the context window with just the right information for the next step.”

In practice, context engineering means curating every piece of data the model sees, from task instructions and few-shot examples to retrieved documents, historical state, and tool outputs.

For example, a clinician can make the following prompt: “Summarize a patient’s record in under 100 words”, and include a few examples of correctly formatted summaries for the model to imitate the style and structure. A clinician can also attach their previous human-written summaries (to serve as historical records) for the model to produce the most up-to-date output.

By ensuring the model operates within a precisely framed, verified, and relevant context, organizations can drastically reduce hallucinations caused by missing, outdated, or noisy information.

Unlike generic prompting, which often leaves the model guessing, well-designed context engineering provides AI systems with the right evidence at the right time, thereby improving factual accuracy, model stability, and overall trustworthiness.

However, you should keep in mind that context engineering comes with its flaws, as Heeki Park puts it:

When building agentic applications, context is important for ensuring that agents have the ability to provide responses that are personalized and targeted. However, context engineering is emerging as an important skill to ensure that the agent has just the right amount of context.

There are issues that can arise with context, even in the presence of a memory system, e.g., context poisoning (a hallucination or other error makes it into the context), context distraction (context gets too long), context confusion (superfluous or irrelevant content is used), context clash (information or tools conflict). Memory doesn’t solve those context issues. Context engineering needs to be applied to prune and validate that the appropriate context is maintained for the lifecycle of a session or user interaction.

To avoid these issues and prevent hallucinations, both context and prompts should be thoroughly checked and evaluated.

Human-in-the-loop review workflows

Human-in-the-loop (HITL) validation focuses on factual accuracy, contextual appropriateness, and potential bias issues before AI outputs reach end-users. HITL involves:

Automated flagging of inappropriate or incorrect outputs
A basic human review for edge cases and to catch errors that automated systems may miss
Validation from subject matter experts (SMEs) for domain-specific queries

Financial institutions often require compliance officers to sift through AI-generated outputs, while healthcare organizations require clinical staff to approve AI-assisted diagnoses. The key is matching reviewer expertise to the domain where AI operates.

The combination of all three HITL approaches is the most effective way for a comprehensive evaluation of AI outputs. You can set custom rules as to when each HITL pattern should be triggered (e.g., automated flagging for factual inaccuracies, SME validation for business-critical decisions, and basic human review for ambiguous queries that require human resolution).

Monitoring hallucinations with feedback loops

Continuous monitoring analyzes production conversations, comparing AI responses against known facts and flagging suspicious outputs for review and further investigation.

These feedback loops create learning opportunities. When reviewers correct AI mistakes, those corrections improve the system’s future performance.

All of these mitigation strategies are theoretically sound, but how do different companies apply them in practice to increase AI reliability?

Enhance AI systems with enterprise-specific guardrails to reduce errors

Prepare your data, align context strategy, and ensure every model output meets your business standards.

Request a call

Real-world success patterns

Successful AI deployments mean that business value far outweighs the occurrence of errors or hallucinations. The following companies have developed methods to control hallucination rates, ensuring they don’t undermine the value of AI.

Truist Bank’s approach: Building trust in AI through human validation

Financial institutions process millions of transactions daily while maintaining strict regulatory compliance. To implement AI effectively without incurring brand damage, they should have rigid safeguards in place.

Chandra Kapireddy, head of generative AI and analytics at Truist Bank, shares his reflections on how to restrain AI hallucinations. In particular, their company places a strong emphasis on human oversight for high-stakes decisions.

…whenever we build a GenAI solution, we have to ensure its reliability. We have to ensure there is a human in the loop who is absolutely [checking the] outputs, especially when it’s actually making decisions. We are not there yet. If you look at the financial services industry, I don’t think there is any use case that is actually customer-facing, affecting the decisions that we would make without a human in the loop.

Truist Bank has established a set of rules for employees to employ AI, a cross-company AI policy, and a training program that helps AI users create accurate prompts and understand the flow of output verification.

The company holds their employees accountable for making decisions based on AI without first verifying its output. When everyone in the company is on the same page and understands the consequences of misuse, it’s easier to control AI and prevent financial or reputational damage.

How Johns Hopkins improves AI reliability in critical care decision support

With the increasing volume of medical data and the need for rapid diagnosis and treatment, medical organizations see considerable promise in AI. But hallucinations can pose a risk for the healthcare setting and harm patients.

To avoid such scenarios, researchers at Johns Hopkins Medicine are exploring ways to efficiently use healthcare AI. For instance, to address a pressing issue in predicting delirium in patients in an intensive care unit (ICU), they developed two models: a static and a dynamic model.

A static model provides outputs based on data provided by the patient after admission to the hospital, and a dynamic model works with real-time patient data. As a result, the static model’s accuracy was 75%, while the dynamic model showed a staggering 90%. This proved the effectiveness of feeding models with real-time internal data to increase their reliability and accuracy.

Before launching models into production, the team thoroughly tested and validated their outputs across different datasets.

These implementations demonstrate that hallucination risks can be managed through systematic validation, human oversight, and feedback mechanisms that continuously improve system reliability.

Bottom line

AI hallucinations present enterprise leaders with a clear choice: address the risks proactively or discover them through costly business disruptions.

By addressing the root causes of hallucination with high-quality data ingestion, the right choice of determinism level, and optimal temperature settings, enterprises can prepare to implement near-perfect AI systems. And RAG, prompt and context engineering, HITL, and continuous monitoring are effective strategies for reducing AI hallucinations in production environments and mitigating issues in post-production.

When applied together, all of the above practices create a reliable AI lifecycle. Over time, organizations move from reactive error correction to proactive quality assurance, ensuring AI systems remain trustworthy as they scale.

The post AI hallucinations in production: The problem enterprises can’t ignore appeared first on Xenoss - AI and Data Software Development Company.

Is MCP ready for enterprise adoption? Use cases, security, and implementation challenges

Maria Novikova — Mon, 15 Sep 2025 17:25:42 +0000

Besides OpenAI’s GPT, barely any technology had such a ripple effect on the LLM ecosystem as Anthropic’s Model Context Protocol, or MCP.

At the time of writing, every week, 6.7 million users download the TypeScript MCP SDK, and over 9 million developers download the MCP Python SDK. The GitHub topic ‘model-context-protocol’ lists over 1,100 repositories. There are over 16k active MCP servers, and new ones are created every day.

All leading LLMs, IDEs, and agent-to-agent communication platforms added MCP support. Cloud providers, Azure and AWS, rolled out services that enable building MCP workflows.

All this momentum makes MCP look like it could become the go-to standard for enterprise AI systems.

But just because a technology is popular doesn’t mean it’s ready for enterprise use. Companies need to think carefully about whether it’s actually production-ready, secure enough, and can scale properly.

In this post, we are going to examine how enterprise organizations in finance, media, and tech are building scalable MCP applications.

We will shed light on the shortcomings of the Model Context Protocol that complicate its enterprise adoption and explore the solutions to these problems.

How MCP took over AI protocols

When MCP arrived in late 2024 (and went viral in early 2025), engineers already had workarounds that allowed AI agents to call tools.

LangChain and LangGraph help accomplish the same purpose. OpenAPI is the older implementation of the same principle.

But MCP brought something different to the table. Instead of just describing how to call a tool, it handles the entire process, from connecting to the tool, running commands, and bringing the results back into your AI agent’s context.

MCP adoption is outpacing LangChain, LangGraph, and OpenAI’s API

The developer community has embraced MCP quickly, though it’s still catching up to more established frameworks in terms of overall adoption numbers.

Why MCP is a big deal for AI agents

The goal of MCP is to connect agents with any third-party tool or data.

This means your AI agent can pull data from spreadsheets, access cloud databases, or interact with web APIs without you having to build custom integrations for each one.

Understanding MCP architecture

MCP connects AI agents to tools, services, and documents by bridging three key components: Clients, servers, and data sources.

MCP connects an AI agent (MCP host) with servers that call third-party tools

MCP clients help AI assistants (e.g., Claude) get through to MCP servers. When Claude or Cursor needs to access a spreadsheet or the IDE, they use MCP clients to connect with tools and documents.

Tool-specific MCP servers transform LLM requests into commands that a third-party app or data source can read. MCP servers also redirect agents to appropriate applications (tool discovery), run commands, format app responses in an LLM-understandable way, and manage errors.

Services are the applications or data sources that MCP servers access. They can be both local files on a user’s device or remote cloud databases, web APIs, or SaaS platforms. An MCP server ensures secure and error-free access to a specific service.

The protocol itself defines how the client and servers communicate, interact with services, and communicate results. It uses structured formats (mainly JSON) to keep outputs clean and consistent.

How MCP differs from traditional APIs

Conceptually, Model Context Protocol and APIs are complementary, not mutually exclusive.

An API is a descriptive standard that contains instructions to call a tool.

MCP is an execution standard that lets AI both call the tool and retrieve its data.

Where REST APIs operate via stateless request/response messages, MCP retains session context. It can query or extract data and add it directly to an LLM’s context window.

Other important differences between MCP and traditional APIs are summarized in this table.

Compared to traditional APIs, MCP is tool-agnostic, built to scale, and configurable in real time

Ultimately, it’s more accurate to consider MCP as an adapter that facilitates the orchestration of all types of APIs.

In fact, there’s a growing number of tools that autogenerate MCP connectors from OpenAPIs.

Where MCP wins over LangChain/LangGraph

In 2023, orchestrators were groundbreaking because they helped create multi-step agentic workflows. These frameworks let LLMs search the web, run code, and access system files to look for answers.

But engineers still had to build ad-hoc integrations for every tool AI agents need access to.

Each integration has a tool-specific implementation: some would run via a Python wrapper, others would require JSON outputs.

MCP solved this problem by creating a uniform way for LangChain, LangGraph, and other orchestrators to plug into third-party tools.

Like with APIs, developers can use MCP as both an alternative and an add-on to orchestrators. It’s unlikely that Model Context Protocol will replace LangChain and LangGraph in multi-agent systems. Orchestrators are still helpful in writing the logic of AI agents, and MCP has no such capabilities.

MCP’s promise to “unify and simplify” tool calling can be as groundbreaking as OpenAPI was back in the early days of the API ecosystem or HTTP was in the infancy of the Internet.

To explore the practical value this technology delivers in the enterprise, let’s take a look at the way global teams deploy MCP-enabled agents at scale.

How enterprises are building AI agents with MCP

Although MCP is still an experimental technology and, as we will discuss later on, a security minefield, enterprises are finding ways to deploy it and create agentic workflows that drive business impact.

Three real-world examples of MCP adoption at large enterprises make it clear that MCP-enabled agents are powerful productivity enhancers.

FinTech: Block’s internal AI agent

Block, a global FinTech company behind Square and Cash App, has built an internal AI agent called Goose that runs on MCP architecture. The agent works as both a desktop application and a command-line tool, giving their engineers access to various MCP servers.

What’s interesting about Block’s approach is that they’ve built all their MCP servers in-house rather than using third-party ones. This gives them complete control over security and lets them customize integrations for their specific workflows.

Angie Jones, VP of Engineering at Block, shared a few popular MCP use cases at Block.

In engineering, MCP tools help refactor legacy software, migrate databases, run unit tests, and automate repetitive coding tasks.

Design, product, and customer support teams use MCP-powered Goose to generate documentation, process tickets, and build prototypes.

Data teams rely on MCP to connect with internal systems and get extra context from internal sources.

Block integrated MCP with the company’s go-to engineering and project management tools: Snowflake, Jira, Slack, Google Drive, and internal task-specific APIs.

Business impact: Thousands of Block’s employees use Goose and cut up to 75% of the time spent on daily engineering tasks.

Build and orchestrate AI agents with MCP to automate your enterprise workflows

Xenoss engineers can help

Media: Bloomberg’s development acceleration

At the MCP Developer Summit, Sabhav Kothari, Head of AI Productivity at Bloomberg, focused on how his team utilizes MCP internally to help AI developers reduce the time required to ship demos into production.

Kothari’s engineering team hypothesized that a system enabling AI agents to interact with the company’s entire infrastructure would facilitate shorter feedback loops and accelerate development. In early 2024, they built an MCP-like protocol internally.

After carefully following MCP adoption, Bloomberg engineers decided to adopt the protocol as an organization-wide standard.

Originally, Bloomberg built an internal MCP alternative but switched to Anthropic’s protocol after realizing its groundbreaking potential

“From day one, we closely followed MCP’s progress because we realized this protocol had the same semantic mapping as our internal approach, but it was being built in the open. We quickly recognized that MCP had that same potential”.

Sabhav Kothari, Head of AI Productivity at Bloomberg

Business impact: MCP adoption helped Bloomberg engineers bridge the product development gap and deploy agents faster. The protocol connects AI researchers to an ever-growing toolset. It reduced time-to-production from days to minutes and created a flywheel where all tools and agents interact and reinforce one another.

E-commerce: Amazon’s API–first advantage

In one of The Pragmatic Engineer’s editions, Gergely Orosz talks about Amazon using MCP at scale as part of its API-first culture. Since the mid-2000s, Amazon has required teams to build internal APIs that other teams can use – what they call their “API-first culture.”

This existing API infrastructure has made Amazon a natural fit for MCP adoption. When you already have thousands of internal APIs, adding MCP as a standardized way to connect AI agents to those APIs makes a lot of sense.

Orosz quotes an Amazon SDE saying that “most internal tools already added MCP support”. Now, Amazon employees can create agents to review tickets, reply to emails, process the internal wiki, and use the command-line interface.

Business impact: According to an Amazon engineer mentioned in the newsletter, the MCP integration with Q CLI is gaining popularity internally, and developers are now automating tedious tasks.

Despite enterprises successfully deploying agentic workflows with MCP, the machine learning community is raising concerns about the protocol’s security and architecture shortcomings.

Challenges of adopting MCP at scale for enterprises

While those early success stories sound promising, many enterprise engineers are still cautious about rolling out MCP more broadly. The technology is relatively new, and there’s always a risk-reward calculation when it comes to adopting emerging technologies at scale.

As a Reddit user points out, taking compliance and security risks for yet unproven productivity benefits is usually not a playbook enterprises play by.

I think a lot of places are exploring MCP and trying to keep up with the tech to ensure their business is competitive. BUT, without a compelling benefit – such as cost savings or generating new business – I fail to see how any company would convert a stable platform to one using MCP at this time.

Reddit user on bottlenecks to MCP adoption

Enterprise organizations are typically unwilling to be the early adopters of emerging technologies. Aside from a few leading-edge adopters like Amazon, most are waiting until the technology either exposes significant vulnerabilities or delivers considerable gains.

Speaking of security, that’s where some of the biggest concerns lie.

Challenge #1: MCP’s authorization is not ‘enterprise-friendly’

Before poking at the vulnerabilities of MCP’s current authorization specification with OAuth, let’s quickly examine the reason Anthropic introduced OAuth specifications in the first place.

Originally, setting up MCP involved a 1:1 deployment of a client and an MCP server on a developer’s local machine. This worked fine for individual developers but didn’t scale to enterprise needs.

Over time, the surge of MCP adoption among smaller projects created a ripple effect in the enterprise. Engineering team leaders were interested in setting up remote MCP servers, but to access data on these servers in privacy-compliant ways, they needed authorization.

Anthropic responded with the first set of authorization specifications, released in March 2025.

First specifications: no separation between authentication and resource servers.

The MCP Authorization spec allowed secure access to servers using OAuth 2.1. Now, engineers could set up the protocol on a remote server, but they had new concerns.

The first MCP authorization spec treats an MCP server as both a resource and an authorization server

In the specifications, MCP servers were treated as both resource and authorization servers, which went against enterprise best practices, increased fragmentation, and forced developers to expose metadata discovery URLs.

The latest specification: servers are decoupled, but security issues remain.

In June, after months of active discussions on where the first authorization specifications fell short, Anthropic released an updated version that decoupled authorization and resource servers.

Developers were still unhappy. For one, the revised specification leans on OAuth RFCs – a set of frameworks that grant third-party applications limited access to HTTP services, which is not widely used by identity providers.

Anothropic also relies on MCP clients using dynamic client registration that lets anonymous clients register on MCP servers. Not knowing which client is attempting to connect to the server in advance goes against the need for reliability and the strict security that enterprises operate by.

How enterprises solve this problem

To bypass the uncertainty of dynamic client registration, teams build custom tools that test and validate MCP clients.

An open-source example of such a tool is mcp-inspector, a project for testing and debugging MCP servers. When it registers an MCP host, the tool retrieves the metadata, registers the client, and retrieves an OAuth token.

Challenge #2: MCP does not integrate with enterprise SSO systems

Most enterprise environments rely heavily on single sign-on (SSO) systems to control who can access what applications. As Aaron Parecki, one of the co-authors of the OAuth 2.1 spec, explains:

“This enables the company to manage which users are allowed to use which applications and prevents users from needing to have their own passwords at the applications”.

Aaron Palecki, ‘Enterprise-Ready MCP’

The problem is that MCP doesn’t integrate smoothly with these enterprise SSO systems. Parecki argues that MCP-enabled AI agents should be treated like any other enterprise application – controlled through the company’s identity management system.

At the time of writing, connecting an AI agent like Claude to enterprise tools through SSO involves several frustrating steps.

A user needs to log in to Claude via SSO, access the enterprise IdP, and complete authentication.

Once authenticated, users need to connect external apps to Claude by clicking a button, get redirected to the IdP, authenticate one more time, get directed back to the app, and accept an OAuth request for access.

When the user grants appropriate OAuth permissions, they can come back to Claude and use the AI agent.

This authentication by itself is inconvenient for enterprise multi-agent systems that have to connect to a wider range of applications.

In the current SSO flow for MCP servers, the end user is the one granting the LLM (in this case, Claude) permission to connect to a third-party application

More importantly, in this authentication approach, the user is the one granting permissions, with no visibility at the admin level.

This means there’s no one to oversee access control, and there’s a risk of unchecked interaction between mission-critical systems and unvetted third-party applications.

How enterprises solve this problem

Identity solution providers are already developing workarounds to address the limitations of MCP’s authorization.

Okta, one of the leading independent identity vendors, has unveiled Cross-App Access, a protocol that aims to bring visibility and control to MCP-enabled AI agents. It is scheduled for release in Q3, 2

Okta’s internal communication platform is the unified control station that monitors AI agent connections

Here is how it adds an extra observability layer to MCP connections.

Instead of having users manually grant AI agents access to applications and documents, the agent will connect directly to Okta’s internal communication platform.

The platform determines if the request complies with enterprise policies.

If the access request is approved, Okta issues a token to the AI agent. The agent presents the token to the communication platform and gets access to the needed tool.

This sign-on gives enterprise admins visibility into access logs and prevents unchecked interactions between teams, AI agents, and internal tools.

Worried about MCP’s authorization vulnerabilities causing data leaks and security breaches?

Get a detailed roadmap for building compliant, scalable, and secure MCP-powered agents

Book a free consultation

Challenge #3: MCP’s default ‘server’ approach does not blend well with serverless architectures

Over 95% of Fortune 500 companies are embedded in the Azure ecosystem that relies on serverless architectures. These infrastructures are poorly suited to MCP implementations, since Anthropic’s protocol is currently deployed as a Docker-packaged server.

Building and managing MCP servers on top of already stable serverless architectures increases maintenance overhead and adds to infrastructure costs in the long run.

MCP developers have released workarounds like streamable HTTP transport via FastMCP with FastAPI to support serverless deployment.

However, engineers who tried deploying serverless MCP in practice say it leaves a lot to be desired.

Ran Isenberg, a Solutions Architect and an opinion leader in serverless architecture, tried setting up an MCP agent in AWS Lambda and hit a few roadblocks on the way.

Cold start delays of up to 5 seconds made the system too slow for any time-sensitive workflows – imagine waiting 5 seconds every time your AI agent needed to access a tool.

Developer experience issues plagued the setup. As Isenberg put it, the process was “confusing, inconsistent, and far from intuitive.” There wasn’t a clear guide for how to set everything up properly.

Infrastructure complexity meant figuring out all the pieces manually, since there was no standard Infrastructure-as-Code template to follow.

Logging problems arose because FastAPI and FastMCP use different logging systems, and they didn’t play well with AWS Lambda’s standard monitoring tools.

Testing difficulties required manual VS Code configuration since there weren’t any streamlined tools for testing MCP server interactions in a serverless environment.

Isenberg’s conclusion about serverless MCP architectures was that they were “doable but far from seamless”.

Before these concerns are addressed in a frictionless, standardized, and reliable way, the proponents of serverless architecture deployed on AWS Lambda, Azure Functions, or Google Cloud Functions will be reluctant to embed MCP into internal systems.

How enterprises are solving this problem

As Nayan Paul, Chief Azure Architect at Accenture, put it in his blog, ‘unless MCP evolves to support serverless deployment options, I’ll likely keep building around it instead of inside it’.

Instead, he recommends battle-tested multi-agent system setups in LangChain and LangGraph built on top of Azure Functions or other serverless environments.

Accenture’s own agentic platform, AI Foundry, is built entirely in Azure Functions and is modular, cost-efficient, and easier to maintain than MCP servers.

Challenge #4: Tool poisoning

In April 2025, Invariant Labs discovered that MCP is vulnerable to tool poisoning, a type of attack where a prompt with malicious instructions is launched at the LLM.

Poisoned context instructs AI agents to complete malicious actions in a way that’s unintelligible to humans

The instructions are not visible to humans but understandable to the AI agent. Thus, a model, now armed with access to internal tools and data, can perform malicious actions, like:

Extracting and sharing sensitive data like configuration files, databases, or SSH keys.
Sharing private conversations with third parties
Manipulate data so that any tool using it starts making wrong predictions.

Later, Invariant Labs followed up on the exploit by sharing a practical example of MCP-enabled tool poisoning. An attacker was able to extract a user’s WhatsApp message history by accessing WhatsApp’s MCP server and altering a seemingly innocent get_fact_of_the_day() tool.

Here are the instructions that the attacker ‘fed’ the LLM.

These instructions, hidden from the visible prompt, guided the agent to retrieve WhatsApp conversation histories

And here’s how they appear in Cursor: a large amount of white space before the message.

In Cursor, the stolen data appeared as white space and was hard for humans to catch

How enterprises are solving this problem

As Sam Willison points out in his blog post on this vulnerability, despite prompt injection being around for over 2 years, machine learning engineers still don’t have a single best way to deal with it.

He encourages engineering teams to follow MCP specifications and make sure there’s a human in the loop between the agent and the tools it uses.

AI agents should also be designed with transparency in mind, which means:

Have a clear UI that clarifies which tools are exposed to AI
Provide notifications or other indicators whenever an agent invokes a service
Ask users for confirmation on mission-critical actions like data manipulation or extraction to adhere to HITL principles.

Invariant Labs, the team that discovered the exploit, also built an MCP security scanner – an open-source project that scans MCP servers and prompts for code vulnerabilities and hidden instructions.

Enterprise organizations should consider foolproofing their MCP architectures with similar off-the-shelf systems or building an in-house alternative.

Challenge #5. Multi-tenancy and scalability gaps

The majority of MCP servers are still single-user machines running locally on a developer’s machine or a single endpoint.

MCP servers supporting multiple agents and concurrent users are fairly recent and have architecture gaps, like authorization gaps, explored in this post.

To support enterprise-grade scale, MCP servers will have to be deployed as a microservice that serves many agents at a time.

That type of architecture creates a new layer of considerations:

A server should be capable of handling concurrent requests
It needs to separate data contexts
There should be a rate limit per client for better resource management

Enterprise-ready MCP servers that meet multi-tenancy requirements are still a weaker part of the ecosystem, although it is maturing rapidly.

How enterprises are solving this problem

Engineering teams are experimenting with MCP Gateways, endpoints that aggregate several MCP servers. This orchestration layer enables multi-tenancy, helps enforce policies like rate limits or access tracking, and orchestrates tool selection by routing the agent to the most relevant server.

Addy Osmani, an engineer currently working on Google Chrome, also expects enterprise teams to build internal tool discovery platforms and registries.

Whenever an AI agent needs to act, it consults this catalog and chooses the best available server.

The bottom line on MCP’s enterprise readiness

Like any new technology, the Model Context Protocol is not perfect. Its ecosystem is still maturing, standardization is lacking, and security exploits are discovered on the fly.

But even these shortcomings do not take away from MCP’s brilliance as a concept and its transformative impact on enterprise operations. If the protocol keeps up its current growth streak, it will likely become the technology that helps AI agents go mainstream.

In 2-3 years, we are looking at enterprise companies where AI agents are full-on “virtual co-workers” and are treated as first-class citizens, with separate workflows, tasks, and KPIs.

Once MCP’s security and large-scale deployments are ironed out, it will be the driver of composable and adaptable workflows that automate nearly 100% of routine tasks, allowing employees to focus on strategic “heavy lifting” that is both more rewarding for the company and fulfilling for teams.

For now, MCP works best for organizations that have the technical expertise to build custom solutions and can accept some risk in exchange for early-mover advantages in AI automation.

The post Is MCP ready for enterprise adoption? Use cases, security, and implementation challenges appeared first on Xenoss - AI and Data Software Development Company.