<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Vladislav Kushka - Delivery manager, Xenoss</title>
	<atom:link href="https://xenoss.io/blog/author/vladislav-kushka/feed" rel="self" type="application/rss+xml" />
	<link>https://xenoss.io/blog/author/vladislav-kushka</link>
	<description></description>
	<lastBuildDate>Mon, 23 Mar 2026 12:43:21 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://xenoss.io/wp-content/uploads/2020/10/cropped-xenoss4_orange-4-32x32.png</url>
	<title>Vladislav Kushka - Delivery manager, Xenoss</title>
	<link>https://xenoss.io/blog/author/vladislav-kushka</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Data lake architecture: Design patterns for AI-ready enterprise data infrastructure</title>
		<link>https://xenoss.io/blog/data-lake-architecture-design-patterns</link>
		
		<dc:creator><![CDATA[Vlad Kushka]]></dc:creator>
		<pubDate>Mon, 23 Mar 2026 12:40:30 +0000</pubDate>
				<category><![CDATA[Software architecture & development]]></category>
		<category><![CDATA[Data engineering]]></category>
		<guid isPermaLink="false">https://xenoss.io/?p=14033</guid>

					<description><![CDATA[<p>The 2026 State of Data Engineering survey of 1,101 data professionals identified that 44% still rely on cloud data warehouses as their primary paradigm, while 27% have moved to lakehouse architectures. The remaining teams use a mix of both, and 25% name legacy systems and technical debt as their biggest bottleneck. For organizations stuck in [&#8230;]</p>
<p>The post <a href="https://xenoss.io/blog/data-lake-architecture-design-patterns">Data lake architecture: Design patterns for AI-ready enterprise data infrastructure</a> appeared first on <a href="https://xenoss.io">Xenoss - AI and Data Software Development Company</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p><span style="font-weight: 400;">The </span><a href="https://joereis.github.io/practical_data_data_eng_survey/"><span style="font-weight: 400;">2026 State of Data Engineering survey</span></a><span style="font-weight: 400;"> of 1,101 data professionals identified that 44% still rely on cloud data warehouses as their primary paradigm, while 27% have moved to lakehouse architectures. The remaining teams use a mix of both, and 25% name legacy systems and technical debt as their biggest bottleneck. For organizations stuck in that last group, the root cause is almost always the same: the data lake was built as a storage project instead of an architecture project.</span></p>
<p><span style="font-weight: 400;">The storage itself is rarely the issue. S3 is cheap, ADLS scales well, GCS is reliable. Where data lake architecture breaks down is in the decisions made (or not made) before the first byte lands: </span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">how zones are structured</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">which open table format governs transactions</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">whether a catalog exists to make data discoverable. </span></li>
</ul>
<p><span style="font-weight: 400;">Skip any of those three, and the lake drifts toward a swamp, regardless of how much you spent on compute.</span></p>
<p><span style="font-weight: 400;">This article focuses on the architectural decisions: open table format selection, catalog and metastore strategy, AI-specific zone design, and the concrete triggers for evolving a lake into a </span><a href="https://xenoss.io/blog/modern-data-platform-architecture-lakehouse-vs-warehouse-vs-lake"><span style="font-weight: 400;">lakehouse</span></a><span style="font-weight: 400;">. If you already know what a data lake is, this is the article about how to build one that holds up in production.</span></p>
<h2><b>Summary</b></h2>
<ul>
<li style="font-weight: 400;" aria-level="1"><b>Data lake architecture fails when teams treat it as a storage problem.</b><span style="font-weight: 400;"> Three decisions made before ingestion determine success: zone structure, open table format, and metadata catalog.</span></li>
<li style="font-weight: 400;" aria-level="1"><b>Open table formats (Iceberg, Delta Lake, Hudi) are now essential.</b><span style="font-weight: 400;"> The 2026 State of Data Engineering survey found that 27% of data professionals already use lakehouse architectures built on these formats.</span></li>
<li style="font-weight: 400;" aria-level="1"><b>AI workloads need specific architectural patterns.</b><span style="font-weight: 400;"> Feature store integration, unstructured data pipelines, and model training data lineage require purpose-built zones that traditional lake designs don&#8217;t include.</span></li>
<li style="font-weight: 400;" aria-level="1"><b>Governance cannot be an afterthought.</b><span style="font-weight: 400;"> 25% of data professionals cite legacy systems and technical debt as their biggest bottleneck. Most of that debt accumulates from deferred governance decisions.</span></li>
</ul>
<h2><b>What is data lake architecture?</b></h2>
<p><span style="font-weight: 400;"><div class="post-banner-text">
<div class="post-banner-wrap post-banner-text-wrap">
<h2 class="post-banner__title post-banner-text__title">Data lake architecture</h2>
<p class="post-banner-text__content">Is a system design for storing raw, semi-structured, and unstructured data at scale, using schema-on-read to defer structure decisions until query time.</p>
</div>
</div></span></p>
<p><span style="font-weight: 400;">Unlike </span><a href="https://xenoss.io/blog/building-vs-buying-data-warehouse"><span style="font-weight: 400;">data warehouses</span></a><span style="font-weight: 400;"> that enforce schema-on-write, data lakes accept data in its original format, making them well-suited for exploratory analytics, log processing, and training machine learning models. The architecture encompasses ingestion pipelines, storage layers, processing engines, metadata catalogs, and governance frameworks that work together to keep data accessible, trustworthy, and queryable.</span></p>
<h2><b>Core data lake design patterns</b></h2>
<h3><b>Medallion architecture (bronze, silver, gold)</b></h3>
<p><span style="font-weight: 400;">The medallion pattern, popularized by Databricks, organizes data into three quality tiers. </span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The bronze layer holds raw, unprocessed data exactly as ingested. </span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Silver applies cleaning, deduplication, and schema enforcement. </span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Gold serves curated, business-ready datasets optimized for analytics and reporting. </span></li>
</ol>
<p><span style="font-weight: 400;">This works well when different teams need data at different stages of refinement. Data scientists might query bronze for raw signals, while finance teams rely on gold for reconciled numbers. The </span><a href="https://xenoss.io/blog/modern-data-platform-architecture-lakehouse-vs-warehouse-vs-lake"><span style="font-weight: 400;">medallion architecture</span></a><span style="font-weight: 400;"> also simplifies debugging, because every transformation step is preserved and replayable.</span></p>
<h3><b>Data lake zones (landing, raw, curated, sandbox)</b></h3>
<p><span style="font-weight: 400;">Zone-based architecture organizes the lake by access patterns and data maturity rather than quality tiers. </span></p>
<p><span style="font-weight: 400;">A typical layout includes:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">a landing zone (temporary staging for incoming data)</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">a raw zone (immutable, append-only storage)</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">a curated zone (governed, validated datasets)</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">a sandbox zone (experimental space for data science teams). </span></li>
</ul>
<p><span style="font-weight: 400;">Zones enforce different security and governance rules: the raw zone might restrict access to </span><a href="https://xenoss.io/capabilities/data-engineering"><span style="font-weight: 400;">data engineering</span></a><span style="font-weight: 400;"> teams only, while the sandbox zone allows broader access with reduced governance overhead. The key decision is how many zones to create. Xenoss engineers recommend starting with three or four and expanding only when a clear business need arises. Over-engineering zones adds complexity without adding value.</span></p>
<h3><b>Lambda and kappa architectures</b></h3>
<p><span style="font-weight: 400;">Lambda architecture runs batch and real-time processing in parallel, merging results in a serving layer. It handles historical reprocessing well, but creates maintenance overhead because teams maintain two codebases. </span></p>
<p><span style="font-weight: 400;">Kappa architecture simplifies this by treating all data as a stream, replaying historical data through the same streaming pipeline when reprocessing is needed. </span></p>
<p><span style="font-weight: 400;">For enterprise use cases in 2026, kappa-influenced designs (stream-first, with batch as a fallback) are gaining traction. </span><a href="https://xenoss.io/blog/what-is-a-data-pipeline-components-examples"><span style="font-weight: 400;">Apache Kafka</span></a><span style="font-weight: 400;"> and Confluent Cloud support this pattern natively, and platforms like Databricks unify batch and streaming under a single API.</span></p>
<h2><b>Three decisions to make before your first ingestion pipeline runs</b></h2>
<p><span style="font-weight: 400;">Across Xenoss client engagements, data lakes that succeed share one trait: the team made three explicit architectural decisions before ingesting data. Each decision, if deferred or skipped, creates compounding problems as the lake grows.</span></p>
<figure id="attachment_14034" aria-describedby="caption-attachment-14034" style="width: 1376px" class="wp-caption alignnone"><img fetchpriority="high" decoding="async" class="size-full wp-image-14034" title="Three decisions to make before your first ingestion pipeline runs" src="https://xenoss.io/wp-content/uploads/2026/03/freepik__img1-img2-img3-create-a-clean-enterprise-infograph__72359.png" alt="Three decisions to make before your first ingestion pipeline runs" width="1376" height="768" srcset="https://xenoss.io/wp-content/uploads/2026/03/freepik__img1-img2-img3-create-a-clean-enterprise-infograph__72359.png 1376w, https://xenoss.io/wp-content/uploads/2026/03/freepik__img1-img2-img3-create-a-clean-enterprise-infograph__72359-300x167.png 300w, https://xenoss.io/wp-content/uploads/2026/03/freepik__img1-img2-img3-create-a-clean-enterprise-infograph__72359-1024x572.png 1024w, https://xenoss.io/wp-content/uploads/2026/03/freepik__img1-img2-img3-create-a-clean-enterprise-infograph__72359-768x429.png 768w, https://xenoss.io/wp-content/uploads/2026/03/freepik__img1-img2-img3-create-a-clean-enterprise-infograph__72359-466x260.png 466w" sizes="(max-width: 1376px) 100vw, 1376px" /><figcaption id="caption-attachment-14034" class="wp-caption-text">Three decisions to make before your first ingestion pipeline runs</figcaption></figure>
<p><span style="font-weight: 400;">The sequence matters: zones define the physical structure, the open table format defines transactional behavior within those zones, and the catalog makes everything discoverable. Skipping any of the three means the next one cannot function properly.</span></p>
<h2><b>Open table formats: Choosing between Iceberg, Delta Lake, and Hudi</b></h2>
<p><span style="font-weight: 400;">Open table formats bring warehouse-grade capabilities (ACID transactions, time travel, schema evolution) to data lake storage. </span></p>
<p><a href="https://joereis.github.io/practical_data_data_eng_survey/"><span style="font-weight: 400;">27% of data professionals</span></a><span style="font-weight: 400;"> now use lakehouse architectures, up significantly from prior years. Three formats dominate the space.</span></p>

<table id="tablepress-168" class="tablepress tablepress-id-168">
<thead>
<tr class="row-1">
	<th class="column-1">Format</th><th class="column-2">Best for</th><th class="column-3">Strengths</th><th class="column-4">Considerations</th>
</tr>
</thead>
<tbody class="row-striping row-hover">
<tr class="row-2">
	<td class="column-1">Apache Iceberg</td><td class="column-2">Multi-engine environments (Spark, Trino, Flink, Presto) and teams avoiding vendor lock-in</td><td class="column-3">Engine-agnostic design, hidden partitioning, strong community momentum across AWS, Snowflake, Databricks</td><td class="column-4">Newer ecosystem, fewer mature tooling integrations than Delta Lake</td>
</tr>
<tr class="row-3">
	<td class="column-1">Delta Lake</td><td class="column-2">Databricks-centric environments and teams already on Spark</td><td class="column-3">Tight Spark integration, mature tooling, strong documentation, built-in optimization (Z-ordering, liquid clustering)</td><td class="column-4">Historically tighter coupling to Databricks, though open-source compatibility is improving</td>
</tr>
<tr class="row-4">
	<td class="column-1">Apache Hudi</td><td class="column-2">Streaming-heavy workloads with frequent upserts and CDC</td><td class="column-3">Record-level upserts, incremental processing, designed for streaming-first architectures</td><td class="column-4">Smaller community than Iceberg or Delta. Best suited for specific ingestion patterns</td>
</tr>
</tbody>
</table>
<!-- #tablepress-168 from cache -->
<p><span style="font-weight: 400;">In practice, the market is converging toward </span><a href="https://xenoss.io/blog/apache-iceberg-delta-lake-hudi-comparison"><span style="font-weight: 400;">Apache Iceberg</span></a><span style="font-weight: 400;"> as the default for new deployments. </span><a href="https://aws.amazon.com/marketplace/seller-profile?id=seller-t6vmse2zrcbck"><span style="font-weight: 400;">AWS</span></a><span style="font-weight: 400;">, </span><a href="https://xenoss.io/blog/snowflake-vs-redshift-data-warehouse-decision"><span style="font-weight: 400;">Snowflake</span></a><span style="font-weight: 400;">, and Databricks all now support Iceberg REST catalogs, and the format&#8217;s engine-agnostic design aligns with the multi-cloud direction most enterprises are moving toward. For teams already invested in Databricks, Delta Lake remains a strong choice. Hudi is best suited for teams with heavy CDC and streaming upsert requirements.</span></p>
<p><b>Why this matters: </b><span style="font-weight: 400;">Choosing a table format after data is already in the lake means migrating terabytes of files and rewriting transformation logic. The format decision should be locked before the first ingestion pipeline runs.</span></p>
<p><span style="font-weight: 400;"><div class="post-banner-cta-v2 no-desc js-parent-banner">
<div class="post-banner-wrap post-banner-cta-v2-wrap">
	<div class="post-banner-cta-v2__title-wrap">
		<h2 class="post-banner__title post-banner-cta-v2__title">Build an AI-ready data lake with Xenoss data engineers.</h2>
	</div>
<div class="post-banner-cta-v2__button-wrap"><a href="https://xenoss.io" class="post-banner-button xen-button">Contact us</a></div>
</div>
</div></span></p>
<h2><b>Data lake vs lakehouse: When to evolve your architecture</b></h2>
<p><span style="font-weight: 400;">The lakehouse concept merges the flexibility of data lakes with the transactional guarantees of data warehouses. In the </span><a href="https://joereis.github.io/practical_data_data_eng_survey/"><span style="font-weight: 400;">2026 State of Data Engineering survey</span></a><span style="font-weight: 400;">, 44% of respondents still use cloud data warehouses as their primary paradigm, while 27% have adopted lakehouse architectures. The remaining teams use a mix of both.</span></p>
<p><span style="font-weight: 400;">A pure data lake makes sense when the primary consumers are data scientists and ML engineers who need raw, flexible access to diverse data types. A lakehouse becomes necessary when business analysts, BI tools, and governance requirements enter the picture. The lakehouse adds structure without losing flexibility.</span></p>
<p><span style="font-weight: 400;">The practical trigger for migration is usually the moment when a team needs to run both SQL analytics and ML training on the same data. In a pure lake, maintaining separate ETL pipelines for each use case is required. In a lakehouse, both workloads read from the same governed, transactionally consistent tables.</span></p>
<p><b>Why this matters: </b><span style="font-weight: 400;">Premature lakehouse adoption adds complexity without business value. But delaying it too long means accumulating technical debt in the form of duplicated datasets, inconsistent metrics, and ungoverned ML training data. Xenoss engineers recommend evaluating the transition when the </span><a href="https://xenoss.io/capabilities/data-pipeline-engineering"><span style="font-weight: 400;">data pipeline</span></a><span style="font-weight: 400;"> count exceeds 50 or when more than three teams consume the same datasets for different purposes.</span></p>
<h2><b>Architecting data lakes for AI and ML workloads</b></h2>
<p><a href="https://www.dremio.com/newsroom/why-data-lakehouses-are-poised-for-major-growth-in-2025/"><span style="font-weight: 400;">85% of Lakehouse users</span></a><span style="font-weight: 400;"> are either developing AI models or plan to. At the same time, 36% cite governance as a major challenge for AI-driven analytics. Teams are pushing AI workloads onto data lakes that were designed for dashboards and batch reporting. The architecture gaps only become visible when the first ML pipeline goes to production.</span></p>
<p><span style="font-weight: 400;">AI workloads place four specific demands on data lake architecture that traditional designs don&#8217;t address.</span></p>
<ol>
<li><b> Feature store integration. </b><span style="font-weight: 400;">ML models consume features, not raw tables. A feature store (such as Feast, Tecton, or Databricks Feature Store) sits between the curated zone and the training pipeline, providing versioned, point-in-time correct feature sets. The data lake must support the feature store&#8217;s read patterns, which typically involve large sequential scans for training and low-latency lookups for inference.</span></li>
<li><b> Unstructured data pipelines. </b><span style="font-weight: 400;">Text documents, images, audio, sensor readings, and log files are increasingly valuable for AI use cases. The data lake needs a dedicated zone for unstructured data with its own ingestion and cataloging pipeline. Parquet and Iceberg work well for structured features, but unstructured data often requires object-level metadata tagging and separate indexing.</span></li>
<li><b> Training data lineage. </b><span style="font-weight: 400;">Regulatory and compliance requirements increasingly demand traceability from model predictions back to training data. The catalog must track which datasets were used to train which model version, including the specific time-travel snapshot. Without this lineage, models in regulated industries (banking, healthcare, insurance) cannot pass an audit.</span></li>
<li><b> Data versioning and reproducibility. </b><span style="font-weight: 400;">ML experiments require reproducing exact training conditions. Open table formats with time-travel support (Iceberg, Delta Lake) enable this by letting teams query the lake as it existed at any point in time. The architecture must preserve historical snapshots long enough to support experiment reproducibility, which means retention policies need to account for ML workflows, not just analytics use cases.</span></li>
</ol>
<p><b>Why this matters: </b><span style="font-weight: 400;">The data lake is increasingly the foundation for AI, not just analytics. Architectures that don&#8217;t account for ML-specific requirements will need expensive retrofitting as AI adoption scales.</span></p>
<h2><b>Data lake governance: Three failure patterns and how to avoid them</b></h2>
<p><span style="font-weight: 400;">One in two </span><a href="https://www.gartner.com/doc/reprints?__hstc=81614408.70ec33dd6327b05fa51c21f8c2df014e.1760896946410.1760896946410.1760896946410.1&amp;__hssc=81614408.1.1760896946410&amp;__hsfp=1159134056&amp;id=1-2LIY0X6L&amp;ct=250724&amp;st=sb&amp;submissionGuid=30131aa2-9f42-443c-ac09-55ae3c2eee6a"><span style="font-weight: 400;">Chief Data and Analytics Officers</span></a><span style="font-weight: 400;"> now considers optimizing the technology landscape a primary responsibility. That urgency exists because governance failures compound faster than most teams expect. Data lakes degrade through three specific patterns.</span></p>
<p><b>Missing metadata. </b><span style="font-weight: 400;">Without a catalog that describes what each dataset contains, who owns it, and when it was last updated, the lake becomes unsearchable. Teams create duplicate copies of the same data rather than finding the authoritative source. Storage costs grow while data utility shrinks.</span></p>
<p><b>Absent ownership. </b><span style="font-weight: 400;">When no team is accountable for a dataset&#8217;s quality, accuracy degrades silently. Stale records, schema drift, and broken pipelines go unnoticed until a downstream report produces wrong numbers. Data mesh principles (domain ownership, data-as-a-product) solve this by assigning clear accountability to the team closest to the data source.</span></p>
<p><b>Deferred governance decisions. </b><span style="font-weight: 400;">The most common mistake is treating governance as a future initiative. Teams plan to add access controls, quality monitoring, and retention policies &#8220;later,&#8221; after the lake is operational. </span></p>
<p><span style="font-weight: 400;">By the time &#8220;later&#8221; arrives, the lake holds terabytes of ungoverned data, and retroactive governance becomes a multi-month remediation project. 25% of data professionals cite legacy systems and technical debt as their single biggest bottleneck. Much of that debt originates from governance decisions that were deferred during the initial build.</span></p>
<p><span style="font-weight: 400;"><div class="post-banner-cta-v2 no-desc js-parent-banner">
<div class="post-banner-wrap post-banner-cta-v2-wrap">
	<div class="post-banner-cta-v2__title-wrap">
		<h2 class="post-banner__title post-banner-cta-v2__title">Govern your data lake before it becomes a data swamp.</h2>
	</div>
<div class="post-banner-cta-v2__button-wrap"><a href="https://xenoss.io" class="post-banner-button xen-button">Talk to Xenoss engineers</a></div>
</div>
</div></span></p>
<h2><b>Bottom line</b></h2>
<p><span style="font-weight: 400;">Data lake architecture is a solved problem in the sense that the design patterns are well understood. Medallion zones, open table formats, and metadata catalogs have been validated across thousands of enterprise deployments. The architecture fails when teams skip the foundational decisions.</span></p>
<p><span style="font-weight: 400;">The practical checklist is short: define your zone structure before ingesting data, select an open table format before building pipelines, and deploy a metadata catalog before granting access. These three decisions, made upfront, prevent the governance drift that turns data lakes into swamps.</span></p>
<p><span style="font-weight: 400;">For teams preparing to serve AI workloads, the architecture needs to go further: feature store integration, unstructured data zones, training data lineage, and experiment-grade versioning. These are not future requirements. With 82% of data professionals already using AI tools daily, they are current ones.</span></p>
<p>The post <a href="https://xenoss.io/blog/data-lake-architecture-design-patterns">Data lake architecture: Design patterns for AI-ready enterprise data infrastructure</a> appeared first on <a href="https://xenoss.io">Xenoss - AI and Data Software Development Company</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Fine-tuning LLMs at scale: Cost optimization strategies</title>
		<link>https://xenoss.io/blog/fine-tuning-llm-cost-optimization</link>
		
		<dc:creator><![CDATA[Vlad Kushka]]></dc:creator>
		<pubDate>Tue, 10 Feb 2026 12:36:54 +0000</pubDate>
				<category><![CDATA[Software architecture & development]]></category>
		<guid isPermaLink="false">https://xenoss.io/?p=13763</guid>

					<description><![CDATA[<p>Fine-tuning a large language model can run anywhere from $300 for a small 2.7B model with LoRA to over $35,000 for full fine-tuning on a 40B+ parameter model. Most engineering teams figure out this cost spectrum the hard way, after blowing past their initial compute budget on the first few training runs. The difference between [&#8230;]</p>
<p>The post <a href="https://xenoss.io/blog/fine-tuning-llm-cost-optimization">Fine-tuning LLMs at scale: Cost optimization strategies</a> appeared first on <a href="https://xenoss.io">Xenoss - AI and Data Software Development Company</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p><span style="font-weight: 400;">Fine-tuning a large language model can run anywhere from </span><a href="https://learningdaily.dev/what-is-the-cost-of-fine-tuning-llms-f5801c00b06d"><span style="font-weight: 400;">$300 for a small 2.7B model</span></a><span style="font-weight: 400;"> with LoRA to over $35,000 for full fine-tuning on a 40B+ parameter model. Most engineering teams figure out this cost spectrum the hard way, after blowing past their initial compute budget on the first few training runs. The difference between staying on budget and overspending usually traces back to one decision: which fine-tuning technique you pick before writing any training code.</span></p>
<p><span style="font-weight: 400;">This guide breaks down the techniques that keep fine-tuning costs under control: parameter-efficient training methods like LoRA and QLoRA, smarter infrastructure choices, and the MLOps practices that prevent wasted </span><a href="https://xenoss.io/blog/ai-infrastructure-stack-optimization"><span style="font-weight: 400;">GPU</span></a><span style="font-weight: 400;"> hours without sacrificing model quality.</span></p>
<h2><b>Why LLM fine-tuning costs escalate in production</b></h2>
<p><span style="font-weight: 400;">Most enterprises are still transitioning from LLM experimentation to production, </span><a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai"><span style="font-weight: 400;">only about one-third have scaled</span></a><span style="font-weight: 400;"> beyond piloting, and are discovering that fine-tuning costs can spiral quickly. Without deliberate optimization, GPU compute, data preparation, and iteration cycles compound into budgets that exceed initial projections by 2-5x.</span></p>
<p><b>Cost-efficient LLM fine-tuning</b><span style="font-weight: 400;"> typically involves Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA and QLoRA, selecting smaller base models in the 7B-13B parameter range, and using high-quality curated datasets to reduce training time. </span><a href="https://thebiggish.com/news/llm-fine-tuning-shifts-to-peft-methods-as-enterprises-chase-efficiency"><span style="font-weight: 400;">PEFT methods now dominate enterprise LLM adaptation strategies</span></a><span style="font-weight: 400;">, precisely because they cut compute requirements by orders of magnitude compared to full fine-tuning.</span></p>
<h3><b>GPU memory costs for LLM training</b></h3>
<p><a href="https://xenoss.io/capabilities/fine-tuning-llm"><span style="font-weight: 400;">Full fine-tuning</span></a><span style="font-weight: 400;"> loads every model weight into GPU memory at once. A 70B parameter model needs roughly 140GB of VRAM just to hold the weights in FP16 precision, and that&#8217;s before you add optimizer states and gradients. </span></p>
<p><span style="font-weight: 400;">For fine-tuning at FP16, expect around </span><a href="https://www.arsturn.com/blog/ram-vram-for-70b-ai-model-ultimate-guide"><span style="font-weight: 400;">200GB of VRAM</span></a><span style="font-weight: 400;">, which pushes teams toward multi-GPU clusters or cloud instances running H100s at</span><a href="https://www.gmicloud.ai/blog/how-much-does-the-nvidia-h100-gpu-cost-in-2025-buy-vs-rent-analysis"> <span style="font-weight: 400;">$2.50 to $4.50 per GPU-hour</span></a><span style="font-weight: 400;"> depending on the provider.</span></p>
<p><span style="font-weight: 400;">Scaling up model size means scaling up hardware spend, and the jumps aren&#8217;t gradual. Going from a 7B model (which fits on a single 24GB consumer GPU) to a 70B model means jumping from one RTX 4090 to a cluster of two or more H100s. You&#8217;re paying for an entirely different class of infrastructure.</span></p>
<h3><b>Data preparation and quality bottlenecks</b></h3>
<p><a href="https://xenoss.io/blog/total-cost-of-ownership-for-enterprise-ai"><span style="font-weight: 400;">Hidden costs</span></a><span style="font-weight: 400;"> often live in data preparation: cleaning, formatting, annotation, and validation cycles that precede any training run. When your dataset has labeling errors or formatting inconsistencies, you end up re-running training multiple times, each run burning GPU hours without improving the final model.</span></p>
<p><span style="font-weight: 400;">Teams frequently underestimate this phase. A dataset that looks ready for training often reveals formatting inconsistencies, label errors, or distribution imbalances only after the first failed training run, challenges that</span><a href="https://xenoss.io/blog/data-pipeline-best-practices"> <span style="font-weight: 400;">strategic pipeline practices</span></a><span style="font-weight: 400;"> can help mitigate.</span></p>
<h3><b>Experiment tracking and iteration costs</b></h3>
<p><span style="font-weight: 400;">Hyperparameter sweeps, architecture experiments, and A/B testing eat GPU hours fast. Every failed experiment costs money without producing anything you can ship. Teams running dozens of training runs across different learning rates, batch sizes, and LoRA ranks can spend more on experimentation than on the final production training job.</span></p>
<p><span style="font-weight: 400;">Without disciplined experiment tracking, teams end up re-running the same configurations without realizing it. Duplicate experiments are more common than most leads want to admit. Setting up proper logging with tools like </span><a href="https://wandb.ai/site/"><span style="font-weight: 400;">Weights &amp; Biases</span></a><span style="font-weight: 400;"> or MLflow before the first training run pays for itself quickly by preventing wasted reruns.</span></p>
<h3><b>Catastrophic forgetting: Why retraining costs spike</b></h3>
<p><b>Catastrophic forgetting</b><span style="font-weight: 400;"> happens when fine-tuning on a new task erases what the model knew before. A model trained to analyze legal contracts might suddenly struggle with basic questions it handled fine out of the box. The new task knowledge crowds out the original capabilities.</span></p>
<p><span style="font-weight: 400;">When this happens, the fix is often a full retraining cycle from scratch instead of a quick incremental update. For teams that hit this problem repeatedly, retraining costs can balloon well beyond original projections. Techniques like Elastic Weight Consolidation (EWC) and careful learning rate schedules help preserve base model knowledge during fine-tuning, but they require planning upfront.</span></p>
<h2><b>Parameter-efficient fine-tuning: LoRA, QLoRA, and AdaLoRA</b></h2>
<p><span style="font-weight: 400;">PEFT methods freeze most of a model&#8217;s weights and train only a tiny fraction, typically 0.1% to 1% of the total parameters. PEFT techniques reduce memory requirements by </span><a href="https://introl.com/blog/fine-tuning-infrastructure-lora-qlora-peft-scale-guide-2025"><span style="font-weight: 400;">10 to 20x</span></a><span style="font-weight: 400;"> compared to full fine-tuning while retaining 90-95% of the quality. For teams that would otherwise need multi-GPU clusters, that tradeoff changes the economics entirely.</span></p>
<h3><b>LoRA fine-tuning: How it works</b></h3>
<p><b>Low-Rank Adaptation (LoRA)</b><span style="font-weight: 400;"> works by injecting small, trainable low-rank matrices into transformer layers while keeping the original model weights frozen. Instead of updating a weight matrix W directly, you add BA, where B and A are much smaller matrices with a low rank (typically 8 to 64).</span><a href="https://thinkingmachines.ai/blog/lora/"><span style="font-weight: 400;"> </span></a></p>
<p><span style="font-weight: 400;">When you pick the </span><a href="https://thinkingmachines.ai/blog/lora/"><span style="font-weight: 400;">right learning rate</span></a><span style="font-weight: 400;"> for each setting, LoRA training progresses almost identically to full fine-tuning across Llama 3 and Qwen3 models. The typical result would be that you train 0.1% of the parameters and get </span><a href="https://michielh.medium.com/lora-fine-tuning-for-dummmies-4af64f096b4d"><span style="font-weight: 400;">95-99% of full fine-tuning</span></a><span style="font-weight: 400;"> performance.</span></p>
<p><span style="font-weight: 400;">The infrastructure savings are substantial. A 7B model that needs </span><a href="https://introl.com/blog/fine-tuning-infrastructure-lora-qlora-peft-scale-guide-2025"><span style="font-weight: 400;">100-120GB VRAM</span></a><span style="font-weight: 400;"> for full fine-tuning can run on a single 24GB RTX 4090 with LoRA. Training time drops proportionally. And because LoRA produces small adapter files (typically 10-100MB rather than gigabytes), you can version them in Git, store dozens of task-specific adapters cheaply, and swap between them at inference time without reloading the base model.</span></p>
<h3><b>QLoRA: Fine-tuning on consumer GPUs</b></h3>
<p><b>QLoRA</b><span style="font-weight: 400;"> takes LoRA further by quantizing the base model to 4-bit precision while keeping the LoRA adapters in higher precision (typically 16-bit). The frozen weights compress to roughly 25% of their original size, but gradients still flow through them during training.</span><a href="https://medium.com/@birla2006/llm-fine-tuning-showdown-full-fine-tuning-vs-lora-vs-qlora-which-method-should-you-choose-b876c76ab86e"><span style="font-weight: 400;"> </span></a></p>
<p><span style="font-weight: 400;">QLoRA used only </span><a href="https://medium.com/@birla2006/llm-fine-tuning-showdown-full-fine-tuning-vs-lora-vs-qlora-which-method-should-you-choose-b876c76ab86e"><span style="font-weight: 400;">17% of A100 GPU</span></a><span style="font-weight: 400;"> memory compared to full fine-tuning while actually outperforming standard LoRA on accuracy (94.48% vs 93.79%). The 4-bit quantization appears to act as a form of regularization.</span></p>
<p><span style="font-weight: 400;">This technique opened fine-tuning to teams without enterprise-grade hardware budgets, </span><a href="https://arxiv.org/abs/2509.12229"><span style="font-weight: 400;">proven feasible on 8GB VRAM GPUs</span></a><span style="font-weight: 400;">, demonstrating that consumer GPUs can handle parameter-efficient training for models up to 1.5B parameters. </span></p>
<p><span style="font-weight: 400;">For larger models, a single RTX 4090 ($1,500) can fine-tune a </span><a href="https://introl.com/blog/fine-tuning-infrastructure-lora-qlora-peft-scale-guide-2025"><span style="font-weight: 400;">7B model</span></a><span style="font-weight: 400;"> that would otherwise require roughly $50,000 in H100 hardware. With tools like </span><a href="https://unsloth.ai/"><span style="font-weight: 400;">Unsloth</span></a><span style="font-weight: 400;">, teams can fine-tune </span><a href="https://medium.com/@matteo28/qlora-fine-tuning-with-unsloth-a-complete-guide-8652c9c7edb3"><span style="font-weight: 400;">3B parameter</span></a><span style="font-weight: 400;"> models on 8GB cards by combining QLoRA with gradient checkpointing and 8-bit optimizers.</span></p>
<h3><b>Adaptive Low-Rank Adaptation for variable budgets</b></h3>
<p><b>AdaLoRA</b><span style="font-weight: 400;"> builds on LoRA by dynamically allocating the parameter budget across layers based on their importance during training. However, not all transformer layers contribute equally to task-specific adaptation.</span> <span style="font-weight: 400;">Top layers (</span><a href="https://arxiv.org/abs/2303.10512"><span style="font-weight: 400;">10, 11, 12 in a 12-layer model</span></a><span style="font-weight: 400;">) often matter more for fine-tuning than bottom layers. </span></p>
<p><span style="font-weight: 400;">AdaLoRA uses singular value decomposition to score each layer&#8217;s importance and prunes low-value parameters automatically, concentrating capacity where it drives the most improvement.</span></p>
<p><span style="font-weight: 400;">AdaLoRA proves most valuable when you&#8217;re working with tight parameter budgets on complex tasks. For teams experimenting with different rank configurations or running hyperparameter sweeps, AdaLoRA removes one variable from the search space by handling rank allocation automatically. The </span><a href="https://arxiv.org/abs/2409.10673"><span style="font-weight: 400;">sensitivity-based importance scoring</span></a><span style="font-weight: 400;"> works, though simpler magnitude-based approaches can match performance in some cases.</span></p>

<table id="tablepress-153" class="tablepress tablepress-id-153">
<thead>
<tr class="row-1">
	<th class="column-1">Method</th><th class="column-2">Memory reduction</th><th class="column-3">Training speed</th><th class="column-4">Best use sase</th>
</tr>
</thead>
<tbody class="row-striping row-hover">
<tr class="row-2">
	<td class="column-1">LoRA</td><td class="column-2">~90%</td><td class="column-3">Fast</td><td class="column-4">General-purpose fine-tuning</td>
</tr>
<tr class="row-3">
	<td class="column-1">QLoRA</td><td class="column-2">~95%</td><td class="column-3">Moderate</td><td class="column-4">Memory-constrained environments</td>
</tr>
<tr class="row-4">
	<td class="column-1">AdaLoRA</td><td class="column-2">~90% (variable)</td><td class="column-3">Moderate</td><td class="column-4">Complex tasks requiring dynamic allocation</td>
</tr>
</tbody>
</table>
<!-- #tablepress-153 from cache -->
<p><span style="font-weight: 400;"><div class="post-banner-cta-v1 js-parent-banner">
<div class="post-banner-wrap">
<h2 class="post-banner__title post-banner-cta-v1__title">Reduce your fine-tuning costs by 90% without sacrificing model quality</h2>
<p class="post-banner-cta-v1__content">Xenoss engineers build production-grade fine-tuning pipelines using LoRA, QLoRA, and optimized infrastructure</p>
<div class="post-banner-cta-v1__button-wrap"><a href="https://xenoss.io/#contact" class="post-banner-button xen-button post-banner-cta-v1__button">Get a cost assessment</a></div>
</div>
</div> </span></p>
<h2><b>Distributed training architectures for large models</b></h2>
<p><span style="font-weight: 400;">When models exceed single-GPU memory capacity, distributed training becomes necessary.</span> <span style="font-weight: 400;">Memory constraints become the </span><a href="https://www.preprints.org/manuscript/202512.2207/v1/download"><span style="font-weight: 400;">primary limiting factor</span></a><span style="font-weight: 400;"> when scaling to models with hundreds of billions of parameters. The complexity increases, but modern frameworks like </span><a href="https://github.com/deepspeedai/DeepSpeed"><span style="font-weight: 400;">DeepSpeed</span></a><span style="font-weight: 400;"> and </span><a href="https://docs.pytorch.org/docs/stable/fsdp.html"><span style="font-weight: 400;">PyTorch FSDP</span></a><span style="font-weight: 400;"> have made distributed training accessible to teams without specialized infrastructure expertise.</span></p>
<h3><b>Data parallelism and gradient accumulation</b></h3>
<p><span style="font-weight: 400;">Data parallelism replicates the entire model across multiple GPUs and splits data batches among them. While pure data parallelism is </span><a href="https://www.sciencedirect.com/science/article/pii/S2949719125000500"><span style="font-weight: 400;">memory-intensive</span></a><span style="font-weight: 400;"> (each GPU needs the full model), techniques like</span><a href="https://www.deepspeed.ai/training/"> <span style="font-weight: 400;">DeepSpeed&#8217;s ZeRO optimizer</span></a><span style="font-weight: 400;"> reduce memory consumption by up to 8x by partitioning optimizer states and gradients instead of replicating them.</span></p>
<p><span style="font-weight: 400;">Gradient accumulation simulates larger batch sizes without additional GPUs by accumulating gradients over several smaller batches before updating weights. Accumulating over K batches </span><a href="https://syhya.github.io/posts/2025-03-01-train-llm/"><span style="font-weight: 400;">reduces synchronization</span></a><span style="font-weight: 400;"> frequency (since you only run all-reduce once per K batches), which cuts communication overhead significantly. A team with 4 GPUs can achieve the effective batch size of 16 GPUs by accumulating across 4 forward passes, though the reduced update frequency may slow convergence slightly.</span></p>
<h3><b>Model parallelism for 70B+ parameter models</b></h3>
<p><span style="font-weight: 400;">Model parallelism splits the model itself across GPUs when the full model cannot fit on a single device. There are two main approaches: pipeline parallelism (splitting by layers, with each GPU handling a segment of the network) and tensor parallelism (splitting individual layers across GPUs).</span><a href="https://engineering.fb.com/2025/10/17/ai-research/scaling-llm-inference-innovations-tensor-parallelism-context-parallelism-expert-parallelism/"><span style="font-weight: 400;"> </span></a></p>
<p><a href="https://engineering.fb.com/2025/10/17/ai-research/scaling-llm-inference-innovations-tensor-parallelism-context-parallelism-expert-parallelism/"><span style="font-weight: 400;">Meta&#8217;s engineering team notes</span></a><span style="font-weight: 400;"> that tensor parallelism improves both model fitting and throughput by sharding attention blocks and MLP layers into smaller blocks executed on different devices. For Llama 3 70B, Meta used 2,000 GPUs with multi-dimensional parallelism combining both approaches.</span></p>
<p><span style="font-weight: 400;">The tradeoff is increased communication overhead between GPUs. Data flows sequentially through layers on different devices, creating potential bottlenecks. Careful optimization of layer placement and communication patterns can minimize this overhead.</span></p>
<h3><b>Mixed precision training: FP16 and BF16</b></h3>
<p><span style="font-weight: 400;">Mixed precision uses FP16 or BF16 for most operations while maintaining FP32 for critical calculations like loss scaling. Memory usage drops by roughly half, and training speed increases significantly on modern GPUs with tensor cores.</span></p>
<p><span style="font-weight: 400;">Most frameworks now support mixed precision with minimal code changes. PyTorch&#8217;s automatic mixed precision (AMP) handles the complexity of deciding which operations run in which precision.</span></p>
<h2><b>Infrastructure strategies for scalable training</b></h2>
<p><a href="https://xenoss.io/blog/ai-infrastructure-stack-optimization"><span style="font-weight: 400;">Infrastructure decisions</span></a><span style="font-weight: 400;"> act as multipliers on training costs. For example, </span><a href="https://intuitionlabs.ai/articles/h100-rental-prices-cloud-comparison"><span style="font-weight: 400;">H100 prices dropped</span></a><span style="font-weight: 400;"> from $8/hour at launch to $2.85-3.50/hour in late 2025, with AWS cutting P5 instance pricing by 44% in June 2025 alone. Teams that locked into high-rate contracts early paid significantly more than those who waited for the market to stabilize. </span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><b>GPU selection:</b><span style="font-weight: 400;"> A100/H100 GPUs offer high memory bandwidth for large models, while L4/T4 instances provide better cost-per-performance for smaller models and QLoRA workflows.</span></li>
<li style="font-weight: 400;" aria-level="1"><b>Spot instances:</b><span style="font-weight: 400;"> Cloud providers offer 60-90% discounts on interruptible compute. Effective use requires fault-tolerant training with frequent checkpointing to resume after interruptions.</span></li>
<li style="font-weight: 400;" aria-level="1"><b>Right-sizing:</b><span style="font-weight: 400;"> Matching GPU count and memory to model parameters prevents both over-provisioning (wasted spend) and under-provisioning (training failures and delays).</span></li>
</ul>
<p><span style="font-weight: 400;">The build-vs-buy decision depends on utilization rate, capital availability, and scaling flexibility.</span> <span style="font-weight: 400;">For </span><a href="https://docs.jarvislabs.ai/blog/h100-price"><span style="font-weight: 400;">one-time training runs</span></a><span style="font-weight: 400;"> or infrequent model updates, cloud compute is up to 12x more cost-effective than hardware purchase. </span></p>
<p><span style="font-weight: 400;">Teams with consistent high utilization (40+ hours/week) often find on-premises infrastructure more economical over 2-3 year horizons, while teams with variable workloads benefit from cloud elasticity. With H100 retail prices around $25,000-30,000 per unit, the break-even calculation requires careful utilization forecasting.</span></p>
<h2><b>Model compression for LLM inference costs</b></h2>
<p><span style="font-weight: 400;">Training is often a one-time cost, but inference runs continuously. At scale, inference costs frequently exceed training costs within months of deployment.</span></p>
<h3><b>Post-training quantization: GPTQ and AWQ</b></h3>
<p><span style="font-weight: 400;">Quantization reduces the numerical precision of model weights from FP32 or FP16 down to INT8 or INT4.</span> <span style="font-weight: 400;">Using 4-bit integer weights yields an </span><a href="https://aws.amazon.com/blogs/machine-learning/accelerating-llm-inference-with-post-training-weight-and-activation-using-awq-and-gptq-on-amazon-sagemaker-ai/"><span style="font-weight: 400;">8x reduction </span></a><span style="font-weight: 400;">in weight memory compared to FP32 (4x compared to FP16). Model size shrinks, inference speeds up, and the accuracy tradeoff depends heavily on the quantization method and calibration approach.</span></p>
<p><span style="font-weight: 400;">GPTQ and AWQ have emerged as the leading approaches for 4-bit quantization.</span> <span style="font-weight: 400;">GPTQ uses layer-wise </span><a href="https://docs.jarvislabs.ai/blog/vllm-quantization-complete-guide-benchmarks"><span style="font-weight: 400;">Hessian-based optimization</span></a><span style="font-weight: 400;"> to minimize output error, while AWQ identifies &#8220;salient&#8221; weights (roughly 1% of total) that carry the most important information and protects them during quantization.</span></p>
<h3><b>Knowledge distillation to smaller models</b></h3>
<p><span style="font-weight: 400;">Knowledge distillation trains a smaller &#8220;student&#8221; model to mimic a larger &#8220;teacher&#8221; model&#8217;s outputs. The student can be 10x smaller while retaining most of the teacher&#8217;s performance on specific tasks.</span></p>
<p><span style="font-weight: 400;">This dramatically reduces inference costs for production deployment. A 7B student model serving the same queries as a 70B teacher uses roughly 10x less compute per request.</span></p>
<p><em><b>Tip:</b><span style="font-weight: 400;"> Consider distillation early in your fine-tuning workflow. Training a student model alongside your primary fine-tuning run adds minimal overhead but creates a cost-efficient deployment option.</span></em></p>
<h2><b>Continuous learning systems to avoid retraining costs</b></h2>
<p><span style="font-weight: 400;">Continuous learning systems prevent the costly &#8220;throw it away and start over&#8221; model update pattern that many teams fall into by default. Models left unchanged for 6+ months saw error rates jump </span><a href="https://www.rohan-paul.com/p/ml-interview-q-series-handling-llm"><span style="font-weight: 400;">35%</span></a><span style="font-weight: 400;"> on new data, creating pressure to retrain frequently. Continuous learning offers an alternative: incremental updates that preserve existing capabilities while adding new ones.</span></p>
<h3><b>Elastic Weight Consolidation for knowledge preservation</b></h3>
<p><span style="font-weight: 400;">Elastic Weight Consolidation (EWC) penalizes changes to weights identified as important for previous tasks. The model can learn new information incrementally without overwriting foundational knowledge.</span></p>
<p><span style="font-weight: 400;">This avoids full retraining cycles when adding new capabilities. EWC </span><a href="https://arxiv.org/html/2505.05946v1"><span style="font-weight: 400;">applied to full parameter </span></a><span style="font-weight: 400;">sets of Gemma2, successfully adding Lithuanian language capabilities while mitigating catastrophic forgetting of English performance across seven language understanding benchmarks. </span></p>
<p><span style="font-weight: 400;">The approach works for domain-specific fine-tuning too: a model trained for customer support can later learn product documentation tasks without losing its ability to handle support queries.</span></p>
<h3><b>Drift detection and automated retraining triggers</b></h3>
<p><span style="font-weight: 400;">Model drift occurs when performance degrades as real-world data distributions shift over time. A model trained on 2024 customer queries may perform poorly on 2025 queries as language patterns and topics evolve.</span></p>
<p><span style="font-weight: 400;">Continuous monitoring with threshold-based alerts triggers retraining only when necessary. This approach prevents both unnecessary retraining on arbitrary schedules and undetected performance degradation that erodes user trust.</span></p>
<h2><b>MLOps for LLM fine-tuning: Cost control practices</b></h2>
<p><span style="font-weight: 400;">MLOps provides operational discipline to prevent cost wasteMLOps provides operational discipline to prevent</span><a href="https://xenoss.io/blog/data-tool-sprawl"> <span style="font-weight: 400;">cost waste</span></a><span style="font-weight: 400;"> through visibility, automation, and reproducibility.</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><b>Experiment tracking:</b><span style="font-weight: 400;"> Tools like MLflow and Weights &amp; Biases log every experiment with cost metadata, enabling cost-per-experiment analysis and identification of inefficient patterns.</span></li>
<li style="font-weight: 400;" aria-level="1"><b>Model versioning:</b><span style="font-weight: 400;"> Registries enable quick rollback to stable versions, avoiding wasted debugging time on faulty deployments.</span></li>
<li style="font-weight: 400;" aria-level="1"><b>Cost monitoring:</b><span style="font-weight: 400;"> Integration with cloud cost management tools provides real-time spending visibility with anomaly detection and budget alerts.</span></li>
</ul>
<h2><b>Building production-ready fine-tuning pipelines</b></h2>
<p><span style="font-weight: 400;">An effective end-to-end workflow synthesizes PEFT methods for training efficiency, distributed architectures for scale, compression for inference costs, and MLOps for operational control. Each component reinforces the others, experiment tracking identifies which PEFT configurations work best, while cost monitoring validates that infrastructure choices deliver expected savings.</span></p>
<p><span style="font-weight: 400;">For enterprises seeking to reduce fine-tuning costs while maintaining production reliability, Xenoss engineers bring experience building pipelines that preserve foundational model knowledge while cutting GPU costs significantly.</span></p>
<p><a href="https://xenoss.io/#contact"><span style="font-weight: 400;">Book a consultation</span></a><span style="font-weight: 400;"> to discuss your specific requirements.</span></p>
<p>The post <a href="https://xenoss.io/blog/fine-tuning-llm-cost-optimization">Fine-tuning LLMs at scale: Cost optimization strategies</a> appeared first on <a href="https://xenoss.io">Xenoss - AI and Data Software Development Company</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Best practices for architecting data pipelines in AdTech</title>
		<link>https://xenoss.io/blog/data-pipeline-best-practices-for-adtech-industry</link>
		
		<dc:creator><![CDATA[Vlad Kushka]]></dc:creator>
		<pubDate>Thu, 20 Jun 2024 11:43:33 +0000</pubDate>
				<category><![CDATA[Data engineering]]></category>
		<guid isPermaLink="false">https://xenoss.io/?p=6825</guid>

					<description><![CDATA[<p>Managing data in digital advertising is another dimension of difficulty compared to other industries. AdTech companies have to maintain ultra-low latency and uber-high data processing speeds to accommodate zettabytes of real-time data coming hot from all the ecosystem partners.  The consequences of delayed or incomplete data are high for AdTech–poor attribution, eschewed reporting, lost auctions, [&#8230;]</p>
<p>The post <a href="https://xenoss.io/blog/data-pipeline-best-practices-for-adtech-industry">Best practices for architecting data pipelines in AdTech</a> appeared first on <a href="https://xenoss.io">Xenoss - AI and Data Software Development Company</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>Managing data in digital advertising is another dimension of difficulty compared to other industries. AdTech companies have to maintain ultra-low latency and uber-high data processing speeds to accommodate zettabytes of real-time data coming hot from all the ecosystem partners. </p>



<p>The consequences of delayed or incomplete data are high for AdTech–poor attribution, eschewed reporting, lost auctions, frustrated customers, and reduced revenue. </p>



<p>Every data engineer will tell you that building data pipelines is a tough, time-consuming, and costly process, especially in the AdTech industry, where we&#8217;re dealing with massive volumes of asynchronous, event-driven data. </p>



<p>In this post, we’ll talk about:</p>



<ul>
<li>The complexities of big data in AdTech </li>



<li>Emerging trends and new approaches to data pipeline architecture.</li>



<li>System design and development best practices from industry leaders </li>
</ul>



<h2 class="wp-block-heading">Why data in AdTech is complex </h2>



<p>The volume, variety, and velocity of data in AdTech are humongous. TripleLift’s programmatic ad platform, for example, <a href="https://www.datacouncil.ai/talks/the-highs-and-lows-of-building-an-adtech-data-pipeline" target="_blank" rel="noreferrer noopener">processes</a> over 4 billion ad requests and over 140 billion bid requests per day, which translates to 13 million unique aggregate rows in its databases per hour and over 36 GB of new data added to its Apache Druid storage. </p>



<p>The second key characteristic of data in AdTech is its wide variety. The industry processes petabytes of structured and unstructured data generated from user behavior, ad engagement, programmatic ad auctions, and private data exchanges, among other elements in the chain. </p>



<p>In each case, the incoming data can have multiple dimensions. For one ad impression, you need to track multiple parameters like “time window,” “geolocation,” “user ID,” etc. Combined, these parameters create specific measures—analytics on specific events such as click-through rate (CTR), conversion, viewability, revenue, etc. </p>



<p>These events are often distributed in time and happen a million times per day. In other words, your big data pipeline architecture needs to be designed to process asynchronous data at scale. </p>



<p>Large data volumes also translate to high data storage costs. As the big data volumes continue to increase, permanent retention of all information will be plainly unfeasible. So AdTech companies are also facing the tough choice of <a href="https://xenoss.io/blog/infrastructure-optimization">optimizing their data storage infrastructure</a> to balance data retention vs. operating costs. </p>



<h2 class="wp-block-heading">Trends in data pipeline architecture for AdTech </h2>



<p>ETL/ELT pipelines have been around since the early days of data analytics. Although many of the best practices in conceptual design are still applicable today, major advances in database design and cloud computing have changed the game. </p>



<p>Over <a href="https://dzone.com/storage/assets/17234502-dz-tr-data-pipelines-2023.pdf" target="_blank" rel="noreferrer noopener">66% of companies</a> used cloud-based data pipelines and data storage solutions, with a third using a combination of both. Cloud-native ETL tools have greater scalability potential and support a broader selection of data sources. Serverless solutions also remove the burden of infrastructure management. </p>





<p>Real-time data processing also progressively replaces standard batch ingestion. Distributed stream-processing platforms like <a href="https://kafka.apache.org/" target="_blank" rel="noreferrer noopener">Apache Kafka</a> and <a href="https://aws.amazon.com/kinesis/" target="_blank" rel="noreferrer noopener">Amazon Kinesis</a> enable continuous data collection from a firehose of data sources in a standard message format. Data gets then uploaded to cloud object stores (data lakes) and made available for querying engines.</p>





<p><a href="https://pubmatic.com/" target="_blank" rel="noreferrer noopener">PubMatic</a>, for example, uses Kinetica to enable blazing-fast data ingestion, storage, and processing for its real-time reporting and ad-pacing engine. Thanks to data streaming architecture, Pubmatic can process over a trillion ad impressions monthly with high speed and accuracy. </p>



<p>That said, because most of the information is an event, AdTech companies often rely on a combination of real-time and batch data processing.  For example, streaming data on ad viewability can be used and then enriched with some batch data on past inventory performance. </p>





<p>As the data infrastructure expands, AdTech teams also concentrate more efforts on data observability and infrastructure monitoring to eliminate costly downtime and expensive pipeline repairs.<br /><br />To dive deeper into the current trends in AdTech, we&#8217;ve invited Charles Proctor, MarTech Architect at CPMartec and EnquiryLab, to share his insights on real-time processing, AI advancements, cloud solutions, and essential data governance practices. Here&#8217;s what he had to say:</p>
<p><iframe title="Charles Proctor, MarTech Architect at CPMartec, EnquiryLab, on upcoming data pipeline advancements" width="500" height="281" src="https://www.youtube.com/embed/kL4LtAV4kac?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe></p>



<h5><span style="font-weight: 400;"><a href="https://www.linkedin.com/in/charles-proctor-marketingauto/">Charles Proctor</a>, MarTech Architect at<a href="https://www.linkedin.com/company/cpmartec/"> CPMartec</a> and EnquiryLab, on upcoming data pipeline advancements and their  impact on AdTech</span></h5>



<h2 class="wp-block-heading">AdTech data pipeline development: Best practices and recommended technologies  </h2>



<p>A data pipeline is a sequence of steps: Ingestion, processing, storage, and access. Each of these steps can (and should) be well-architected and optimized for the highest performance levels. </p>



<p>Xenoss <a href="https://xenoss.io/big-data-solution-development">big data engineers </a>placed a microscope over the types of data pipeline architectures in AdTech. We evaluated the strengths and weaknesses of different architecture design patterns and toolkits industry leaders use for AdTech analytics and reporting.<br /><br />Our analysis extended to both logical and platform levels, providing a comprehensive understanding of the data processing ecosystem. The logical design describes how data is processed and transformed from the source into the target, ensuring consistent data transformation across different environments. In contrast, the platform design focuses on the specific implementation and tooling required by each environment, whether it&#8217;s GCP, Azure, or Amazon. While each platform offers a unique set of tools for data transformation, the goal of the logical design remains the same: efficient and effective data transformation regardless of the provider.</p>



<h3 class="wp-block-heading">Data ingestion </h3>



<p>AdTech data originates from multiple sources — <a href="https://xenoss.io/dsp-demand-supply-platform-development">DSP</a> and <a href="https://xenoss.io/ssp-supply-side-platform-development">SSP partners</a>, <a href="https://xenoss.io/customer-data-platform-development">customer data platforms (CDP)</a>, or even <a href="https://xenoss.io/dooh-advertising-platform-development">DOOH devices</a>. To extract data from a source designation, you need to make API calls, query the database, or process log files. </p>



<p>The challenge, however, is that in AdTech, you need to simultaneously ingest multiple streams in the pipeline — and that’s no small task (pun intended). </p>



<p><a href="https://triplelift.com/" target="_blank" rel="noreferrer noopener">TripleLift</a>, for example, needed <a href="http://highscalability.com/blog/2020/6/15/how-triplelift-built-an-adtech-data-pipeline-processing-bill.html" target="_blank" rel="noreferrer noopener">its data pipelines</a> to handle:  </p>



<ul>
<li>Up to 30 billion event logs per day </li>



<li>Normalized aggregation of 75 dimensions and 55 metrics</li>



<li>Over 15 hourly jobs for ingesting and aggregating data into BI tools </li>
</ul>



<p>And all of the above have to be in a cost-effective manner, with data delivery happening within expected customer SLAs. </p>



<p>The TrifpleLift’s team organized all incoming event data streams into 50+ Kafka topics. Events are consumed by Secor (a Pinterest open-source consumer) and written to AWS S3 in parquet format. TripleLift uses Apache Airflow to schedule batch jobs and manage dependencies for data aggregation into its data stores and subsequent data exposure to different reporting tools. </p>
<figure id="attachment_10840" aria-describedby="caption-attachment-10840" style="width: 1575px" class="wp-caption alignnone"><img decoding="async" class="size-full wp-image-10840" title="Final TripleLift’s architecture, after resolving scaling issues, replacing VoltDB and implementing Apache Airflow" src="https://xenoss.io/wp-content/uploads/2024/06/3-1.jpg" alt="Final TripleLift’s architecture, after resolving scaling issues, replacing VoltDB and implementing Apache Airflow" width="1575" height="1232" srcset="https://xenoss.io/wp-content/uploads/2024/06/3-1.jpg 1575w, https://xenoss.io/wp-content/uploads/2024/06/3-1-300x235.jpg 300w, https://xenoss.io/wp-content/uploads/2024/06/3-1-1024x801.jpg 1024w, https://xenoss.io/wp-content/uploads/2024/06/3-1-768x601.jpg 768w, https://xenoss.io/wp-content/uploads/2024/06/3-1-1536x1201.jpg 1536w, https://xenoss.io/wp-content/uploads/2024/06/3-1-332x260.jpg 332w" sizes="(max-width: 1575px) 100vw, 1575px" /><figcaption id="caption-attachment-10840" class="wp-caption-text">Final TripleLift’s architecture, after resolving scaling issues by replacing VoltDB and implementing Apache Airflow</figcaption></figure>





<p>Aggregation tasks are done with Apache Spark on Databricks clusters. The data is denormalized into wide tables by joining raw event logs in order to paint a complete picture of what happened pre-, during, and after an auction. Denormalized logs are stored in Amazon S3.</p>





<p>In such a setup, Kafka helps make the required data streams available to different consumers simultaneously. Thanks to horizontal scaling, you can also maintain high throughput even for extra-large data volumes. You can also configure different retention policies for different Kafka topics to <a href="https://xenoss.io/blog/infrastructure-optimization">optimize cloud infrastructure costs</a>.</p>



<p>Thanks to in-memory data processing, Apache Spark can perform data aggregation tasks at blazing speeds. It’s also a highly versatile tool, supporting multiple file formats, such as Parquet, Avro, JSON, and CSV, which makes it great for handling different data sources.</p>



<p>Pubmatic also <a href="https://pubmatic.com/blog/realtime-streaming-ingestion-at-scale/" target="_blank" rel="noreferrer noopener">relies on Apache Spark</a> as the main technology for its data ingestion model. The team opted to use Spark Structured Streaming—a fault-tolerant stream processing engine built on the Spark SQL engine —and FlatMap to transform their datasets. In Pubmatic’s case, FlatMap delivered a 25% better performance than MapPartitions (another popular solution for distributed data transformations). With a new data ingestion module, Pubmatic can process 1.5X to 2X more data with the same number of resources.</p>



<h4 class="wp-block-heading"><strong>Recommended technologies for data ingestion:  </strong></h4>



<ul>
<li><a href="https://kafka.apache.org/">Apache Kafka</a>: An open-source distributed event streaming platform. Kafka&#8217;s high throughput and fault tolerance make it suitable for capturing and processing large volumes of ad impressions and user interactions in real-time, enabling immediate processing and analysis.</li>
<li><a href="https://aws.amazon.com/kinesis/">Amazon Kinesis</a>: A managed framework for real-time video and data streams.A strong choice for AWS users, providing managed, scalable real-time processing with seamless integration into the AWS ecosystem. Kinesis facilitates low-latency data processing and high availability, making it effective for real-time analytics in AdTech environments.</li>
<li><a href="https://flume.apache.org/">Apache Flume</a>:  An open-source data ingestion tool for collection, aggregation, and transportation of log data. Specialized for log data, Flume can be effective in environments requiring robust log data collection and integration with Hadoop for further analysis.</li>
</ul>



<ul></ul>



<ul></ul>



<h3 class="wp-block-heading">Data processing</h3>



<p>Ingested AdTech data must then be brought into an analytics-ready state. Depending on your setup, you may codify automatic: </p>



<ul>
<li>Schema application</li>



<li>Deduplication</li>



<li>Aggregation  </li>



<li>Filtering</li>



<li>Enriching </li>



<li>Splitting </li>
</ul>



<p>The problem? Data transformation can be complex and expensive if you use outdated ETL technology. </p>



<p>Take it from <a href="https://www.appsflyer.com/" target="_blank" rel="noreferrer noopener">AppsFlyer</a>, whose attribution SDK is installed on 95% of mobile devices worldwide. The company collected ample data, but operationalizing it was an uphill battle. </p>



<p>Originally, AppsFlyer built an in-house ETL tool to channel event data from Kafka to a BigQuerry warehouse. Yet, as <a href="https://www.linkedin.com/in/avnerlivne/" target="_blank" rel="noopener">Avner Livne</a>, AppsFlyer Real-Time Application (RTA) Groups Lead, <a href="https://www.upsolver.com/case-studies/appsflyer" target="_blank" rel="noreferrer noopener">explained</a>: <em>“Data transformation was very hard. Schema changes were very hard.  While [the system] was functional, everything required a lot of attention and engineering”</em>. In fact, one analytics use case costs AppsFlyer over $3,000 per day on BigQuery and over $1.1 million annually.</p>



<p>The team used the Upsolver cloud-native data pipeline development platform to improve its data ingestion and transformation capabilities. After all the necessary transformations on S3 data have been performed, Upsolver&#8217;s visual IDE and SQL help make the data query ready via the AWS Glue Data Catalog.</p>





<p>Upsolver’s engine proved to be more cost-effective than the in-house ELT tool. AppsFlyer also substantially improved its visibility into stream log records, which allowed the company to reduce the size of created tables, leading to further cost savings. </p>



<p>At Xenoss, we also frequently saw cases when clients’ infrastructure costs spiral out of control—and we specialize in getting them back on track. Among other projects, our team has helped programmatic ad marketplace <a href="https://www.powerlinks.com/">PowerLinks</a> reduce its monthly infrastructure costs from <a href="https://xenoss.io/cases/cutting-infrastructure-costs-by-20x-times-for-a-programmatic-ad-marketplace-with-1b-audience-reach">$200k+ per month to $8k-10k</a> without any performance losses. On the contrary, the volume of inbound traffic went from 20 to 80 QPS during our partnership, and we’ve implemented scaling possibility to up to 1 million QPS.</p>



<h4 class="wp-block-heading"><strong>Recommended technologies for data processing:  </strong></h4>



<ul>
<li><a href="https://cloud.google.com/dataflow">Google Cloud Dataflow</a>: A managed streaming analytics service. </li>
<li><a href="https://flink.apache.org/">Apache Flink</a>: A unified stream-processing and batch-processing framework.</li>
<li><a href="https://spark.apache.org/">Apache Spark</a>: A multi-language, scalable data querying engine. </li>
</ul>
<div class="post-banner-cta-v2 no-desc js-parent-banner">
<div class="post-banner-wrap post-banner-cta-v2-wrap">
	<div class="post-banner-cta-v2__title-wrap">
		<h2 class="post-banner__title post-banner-cta-v2__title">Overwhelmed by the complexity of AdTech data? Xenoss specializes in solving AdTech data complexities</h2>
	</div>
<div class="post-banner-cta-v2__button-wrap"><a href="https://xenoss.io/cases" class="post-banner-button xen-button">Discover</a></div>
</div>
</div>



<ul></ul>



<ul></ul>





<h3 class="wp-block-heading">Data storage</h3>



<p>All the collected and processed AdTech data needs a “landing pad”—a target storage destination from where it will be queried by different analytics apps and custom scripts. </p>



<p>In most cases, data ends up in either of the following locations:</p>



<ul>
<li><strong>Data lake</strong> (e.g., based on <a href="https://aws.amazon.com/big-data/datalakes-and-analytics/datalakes/">AWS S3</a>, <a href="https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html">Hadoop HDFS</a>, <a href="https://www.databricks.com/">Databricks</a>)</li>
</ul>
<figure id="attachment_6832" aria-describedby="caption-attachment-6832" style="width: 2100px" class="wp-caption alignnone"><img decoding="async" class="size-full wp-image-6832" title="" src="https://xenoss.io/wp-content/uploads/2024/06/flowchart-of-data-pipeline-stages-in-data-lake-design.jpg" alt="Flowchart of data pipeline stages in data lake design" width="2100" height="1438" srcset="https://xenoss.io/wp-content/uploads/2024/06/flowchart-of-data-pipeline-stages-in-data-lake-design.jpg 2100w, https://xenoss.io/wp-content/uploads/2024/06/flowchart-of-data-pipeline-stages-in-data-lake-design-300x205.jpg 300w, https://xenoss.io/wp-content/uploads/2024/06/flowchart-of-data-pipeline-stages-in-data-lake-design-1024x701.jpg 1024w, https://xenoss.io/wp-content/uploads/2024/06/flowchart-of-data-pipeline-stages-in-data-lake-design-768x526.jpg 768w, https://xenoss.io/wp-content/uploads/2024/06/flowchart-of-data-pipeline-stages-in-data-lake-design-1536x1052.jpg 1536w, https://xenoss.io/wp-content/uploads/2024/06/flowchart-of-data-pipeline-stages-in-data-lake-design-2048x1402.jpg 2048w, https://xenoss.io/wp-content/uploads/2024/06/flowchart-of-data-pipeline-stages-in-data-lake-design-380x260.jpg 380w" sizes="(max-width: 2100px) 100vw, 2100px" /><figcaption id="caption-attachment-6832" class="wp-caption-text">Flowchart of data pipeline stages in data lake design</figcaption></figure>



<ul>
<li><strong>Data warehouses </strong>(e.g., <a href="https://aws.amazon.com/redshift/" target="_blank" rel="noreferrer noopener">Amazon Redshift</a>, <a href="https://cloud.google.com/bigquery" target="_blank" rel="noreferrer noopener">BigQuery</a>, <a href="https://hive.apache.org/" target="_blank" rel="noreferrer noopener">Apache Hive</a>, <a href="https://www.snowflake.com/en/" target="_blank" rel="noreferrer noopener">Snowflake</a>)</li>
</ul>





<p>But that’s not the end of the story. You also need suitable analytic database management software to ensure that data gets stored in the right format and can be effectively queried by downstream applications. </p>



<p>That’s where <a href="https://xenoss.io/blog/database-management-systems-for-adtech">database management systems (DBMS)</a> come into play. A well-selected DBMS can automate data provisioning to multiple apps and ensure better data governance and cost-effective operating costs. </p>



<p><a href="https://doubleverify.com/">DoubleVerify</a>, for example, originally relied on a monolithic Python application for AdTech data analysis. Data was hosted in several storage locations, but the most frequent one was the columnar database Vertica, where request logs went. </p>



<p>The team created custom Python functions to orchestrate SQL scripts against Vertica. For fault tolerance, Python code was deployed to two on-premises servers—one primary and one secondary. Using the job scheduling software Rundeck, the code was executed using a cron schedule.</p>



<p>As data volumes increased, the team soon ran into issues with Vertica. According to <a href="https://medium.com/doubleverify-engineering/modernizing-data-pipelines-with-dbt-c2941be74b13" target="_blank" rel="noreferrer noopener">Dennis Levin</a>, Senior Software Engineer at DoubleVerify, jobs on Vertica were taking too long to run while adding more nodes to Vertica was both time-consuming and expensive. Due to upstream dependencies, the team also had to run most jobs on ancient Python v2.7.</p>



<p>To patch things up, the team came up with a new cloud-native data people architecture built with DBT, Airflow, and Snowflake. </p>





<p><a href="https://www.getdbt.com/product/what-is-dbt">DBT</a> is a SQL-first transformation workflow that allows teams to deploy analytics code faster by adding best practices like modularity, portability, and CI/CD. In DoubleVerify’s case, DBT replaced ancient Python code. </p>



<p>The team also replaced Vertica with a cloud-native Snowflake SQL database. Unlike legacy data warehousing solutions, Snowflake can natively store and process both structured (i.e., relational) and semi-structured (e.g., JSON, Avro, XML) — all in a single system, which is convenient for deploying multiple AdTech analytics use cases.</p>



<p>DoubleVerify also replaced Rundeck with<a href="https://airflow.apache.org/" target="_blank" rel="noreferrer noopener"> Apache Airflow</a> — a modern, scalable workflow management platform. It was configured to run in Google’s data workflow orchestration service, <a href="https://cloud.google.com/composer" target="_blank" rel="noreferrer noopener">Cloud Composer</a> (which is built on Apache Airflow open source project). </p>



<p>Cloud Composer helps author, schedule, and monitor pipelines across hybrid and multi-cloud environments. Since pipelines are configured as directed acyclic graphs (DAGs), the adoption curve is low for any Python developer.  </p>



<p>To avoid the scalability constraints of SQL databases, some AdTech companies go with non-relational databases instead. NoSQL databases have greater schema flexibility and higher scalability potential. Modern non-relational databases also use in-memory storage and distributed architectures to deliver lower latency and faster processing speeds. </p>



<p>The flip side, however, is that greater scalability often translates to higher operating costs. A poorly configured cloud NoSQL database can easily generate a <a href="https://www.theregister.com/2020/12/10/google_cloud_over_run/" target="_blank" rel="noreferrer noopener">$72,000 overnight bill</a>. One possible solution is using a mix of hot and cold storage for different types of data streams as The Trade Desk does.</p>



<p>TTD receives <a href="https://www.youtube.com/watch?v=lA8MXNZ9uY4&amp;ab_channel=Aerospike" target="_blank" rel="noreferrer noopener">over 100K QPS of data</a> from its partners, which translates to over 200 TDID/segment updates per second.  Given the volumes and costs of merging records, TTD needs to pick the “best” elements for analysis if any given record is too large. At the same time, the platform needs to only serve data on the record in use by an active campaign.<br /><br />To manage this scale, the team uses <a href="https://aerospike.com/" target="_blank" rel="noreferrer noopener">Aerospike</a>—a multi-model, multi-cloud NoSQL database. Aerospike runs on the edge as a hot cache destination for the real-time bidding system, which processes over 800 billion queries per day. It also serves as a System of record on AWS for managing peak loads of up to  20 million writes per second for its “cold storage” of user profiles. </p>



<p>This way, TTD can:</p>



<ul>
<li>Rapidly serve data required for active campaigns </li>



<li>Refresh hot records within hours of new campaign activation</li>



<li>Forget about any impact of data delivery on bidding system performance </li>



<li>Support advanced analytics scenarios by surfacing cold storage cluster data. </li>
</ul>



<p>Such a data pipeline architecture allows TTD to maintain large-scale multidimensional data records dimensions without burning unnecessary CPU costs and thaw data in 8ms for real-time bidding. </p>



<h4 class="wp-block-heading"><strong>Recommended technologies for data storage: </strong></h4>



<ul>
<li><a href="https://clickhouse.com/" target="_blank" rel="noreferrer noopener">Clickhouse:</a> A cost-effective RDBMS for large-scale AdTech projects. </li>



<li><a href="https://aerospike.com/">Aerospike</a>: A schemaless distributed database with a distinct data model for organizing and storing its data, designed for scalability and high performance.</li>
<li><a href="https://hive.apache.org/">Apache Hive</a>: A distributed, fault-tolerant data warehouse system.</li>
</ul>



<ul></ul>



<h3 class="wp-block-heading">Data access </h3>



<p>The final step is building an easy data querying experience for users and enabling effective data access to downstream analytics applications. </p>



<p>Query engines help retrieve, filter, aggregate, and analyze the available AdTech data. Modern query engine services support multiple data sources and file formats, making them highly scalable and elastic for processing data within the data lake instead of pushing it into a data warehouse.</p>



<p>That’s the route <a href="https://www.captifytechnologies.com/">Captify</a> — a search intelligence platform — chose for its data pipelines for reporting. According to <a href="https://www.linkedin.com/in/roksolanadiachuk/" target="_blank" rel="noreferrer noopener">Roksolana Diachuk</a>, the platform’s Engineering Manager, the team uses: </p>



<ul>
<li>Amazon S3 to store customer data in various formats (CSV, parquet, etc.) </li>
<li>Apache Spark for processing the stored data. </li>
</ul>



<ul></ul>



<p>To ensure effective processing, the team built a custom on top of Amazon S3 client called S3 Lister, which filters our historical records so the team doesn’t need to query with Spark. Since the data arrives in different formats, Captify applies data partitioning at the end of its data pipeline. Data partitioning is based on timestamps (date, time, and hour) as it is required for their reporting use case. Afterwards, all processed data is loaded to <a href="https://impala.apache.org/" target="_blank" rel="noreferrer noopener">Impala</a>, a query engine built on top of <a href="https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html" target="_blank" rel="noreferrer noopener">Apache HDFS</a>. </p>





<p>Similar to The Trade Desk, Captify uses a system of hot and cold data caches. Typically, all data streams are saved for 30 months for reporting purposes. However, teams only need data from the past month or so for most reporting use cases. </p>



<p>Therefore, HDFS contains fresh data, which is several months old at max. All historical data records, in turn, rest in S3 stores. This way, Captify can maintain high-speed and cost-effective data querying speeds.</p>



<p>That said, SQL querying requires technical expertise, meaning that average business users have to rely on data science teams for report generation. For that reason,  AdTech players also leverage self-service BI tools. </p>



<p>Tokyo-based <a href="https://www.cyberagent.co.jp/en/" target="_blank" rel="noreferrer noopener">CyberAgent,</a> for example, went with <a href="https://www.tableau.com/" target="_blank" rel="noreferrer noopener">Tableau</a>—a self-service analytics platform. TTableau has pre-made connectors to data sources like Amazon Redshift and Google BigQuery among others, and helps build analytical models visually to provide business users with streamlined access to analytics. </p>



<p>CyberAgent stores petabytes of data across Hadoop, Redshift, and BigQuerry. Occasionally, they also use data marts to import data from MySQL and CSV files. <a href="https://www.tableau.com/solutions/customer/cyberagent-inc-saves-months-improves-insight-data-tableau" target="_blank" rel="noreferrer noopener">According to Ken Takao</a>, Infrastructure Manager at CyberAgent, the company “uses  MySQL to store the master data for most of the products.  Then blend the master data on MySQL and data on Hadoop or Redshift to extract”.</p>



<p><br />Before Tableau, the company’s engineers spent a lot of time figuring out how to obtain the required data before scripting custom SQL queries. Tableau now allows them to extract data directly from connected insights and make it available to downstream applications. This saves the engineering teams dozens of hours. Business users benefit from readily accessible insights on ad distribution, logistics, and sales volumes for the company’s portfolio of 20 products. </p>



<p>Both Tableau and Looker are popular data visualization solutions, but they have some limitations for AdTech data. In particular, some analytics use cases may require heavy, mostly manual data porting. </p>



<p>Ideally, you should build or look for a solution that supports automatic data collection from multiple systems. Media-specific data visualization solutions often have field normalization, which eliminates the need for manual data mapping and improves the granularity of data presentation. </p>



<h4 class="wp-block-heading"><strong>Recommended technologies to provide effective data access: </strong></h4>



<ul>
<li><a href="https://aws.amazon.com/athena/">Amazon Athena</a>: A serverless, interactive analytics service built on open-source frameworks. </li>
<li><a href="https://prestodb.io/" target="_blank" rel="noopener">Presto</a>: An open-source SQL query engine that allows querying Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB, and Teradata. </li>
<li><a href="https://impala.apache.org/" target="_blank" rel="noopener">Apache Impala</a>: An open source, distributed SQL query engine for Apache Hadoop.</li>
<li><a href="https://www.tableau.com/" target="_blank" rel="noopener">Tableau</a>: A flexible self-service business intelligence platform. </li>
</ul>



<ul></ul>



<ul></ul>



<ul></ul>



<h3 class="wp-block-heading">Data orchestration</h3>



<p>Poor data pipeline management affects almost everything—data quality, processing speed, data governance. The biggest challenge, however, is that AdTech data pipelines have complex, multi-step workflows—and “clogging” at any step can affect the entire system&#8217;s performance. </p>



<p>Moreover, workflows have upstream, downstream, and interdependencies. Without a robust data orchestration system, managing all of these effectively is nearly impossible. </p>



<p>The simplest (and still most-used) orchestration method used for ETL pipelines is sequential scheduling via cron jobs. While it’s still a workable option for simple analytics use cases, it doesn’t scale well, plus requires significant developer time for configuration, upkeep, and error handling. </p>



<p>Orchestration is also challenging in data pipelines for streaming data processing. A batch orchestrator relies on idempotent steps in a pipeline, whereas real-life processes are seldom idempotent. Therefore, when you need to roll back or replay a workflow, data quality and integrity issues may arise. </p>



<p>In AdTech, data engineers also often need to enrich events in a stream with batch data to obtain more comprehensive insights. For example, when you need to contextualize an ad click event using user interaction data, stored in a database. This requires pipeline synchronization. </p>



<p>Modern orchestration tools like <a href="https://airflow.apache.org/" target="_blank" rel="noreferrer noopener">Airflow</a> and <a href="https://aws.amazon.com/step-functions/" target="_blank" rel="noreferrer noopener">AWS Step Functions</a>, among others help deal with the above challenges through the concept of Direct Acyclic Graphs (DAG). DAGs help record the underlying task nodes in a graph and task dependencies—as edges between these nodes. Thanks to this, the system can execute concurrent tasks in a more efficient way, while data engineers get better controls for logging, debugging, and job re-runs. </p>



<p>For example, with an Airflow-based orchestration service, <a href="https://business.adobe.com/products/experience-manager/adobe-experience-manager.html" target="_blank" rel="noreferrer noopener">Adobe</a> can now run <a href="https://airflow.apache.org/use-cases/adobe/" target="_blank" rel="noreferrer noopener">over 1000+ concurrent workflows </a>for its Experience Management platform. <a href="https://arpeely.com/" target="_blank" rel="noreferrer noopener">Arpeely</a><strong>, </strong>in turn, went with Google <a href="https://cloud.google.com/composer" target="_blank" rel="noreferrer noopener">Cloud Composer</a> and <a href="https://cloud.google.com/scheduler">Cloud Scheduler</a> to automate data workflows for its autonomous media engine solution. </p>



<p>Overall, orchestration services provide a convenient toolkit to streamline and automate complex data processing tasks, allowing your teams to focus on system fine-tuning instead of endless debugging. </p>



<h4 class="wp-block-heading"><strong>Recommended technologies for data orchestration: </strong></h4>



<ul>
<li><a href="https://airflow.apache.org/">Apache Airflow</a>: An open-source workflow management platform for data engineering pipelines, originally developed by Airbnb. </li>
<li><a href="https://aws.amazon.com/step-functions/">AWS Step Functions</a>: A serverless workflow orchestration service offering seamless integration with AWS Lambda, AWS Batch, AWS Fargate, Amazon ECS, Amazon SQS, Amazon SNS, and AWS Glue. </li>
<li><a href="https://github.com/spotify/luigi">Luigi</a>: An open-source orchestration solution from the Spotify team. </li>
<li><a href="https://dagster.io/">Dagster:</a>  A cloud-native orchestrator improving upon Airflow. </li>
</ul>



<ul></ul>



<ul></ul>



<ul></ul>



<h3 class="wp-block-heading">Monitoring  </h3>



<p>Similar to regular pipes, big data pipelines also need servicing. Job scheduling conflicts, data format changes, configuration problems, errors in data transformation logic—a lot of things can cause havoc in your systems. And downtime is costly in AdTech. </p>



<p>High system reliability requires end-to-end observability of the data pipeline, coupled with the ability to effectively prevent and optimize the flow of data to avoid bottlenecks, increase resource efficiency, and optimize operation costs. </p>



<p>In particular, AdTech companies should implement:</p>



<ul>
<li>Compute performance monitoring</li>



<li>Data reconciliation processes</li>



<li>Schema and data drift monitoring</li>
</ul>



<p>Soundly, these tasks can be automated with data observability platforms. <a href="https://pubmatic.com/" target="_blank" rel="noreferrer noopener">PubMatic</a>, for example, <a href="https://www.acceldata.io/blog/pubmatic-leverages-data-observability-platform" target="_blank" rel="noreferrer noopener">uses Acceldata Pulse </a>to monitor the performance of its massive data platform, spanning over thousands of nodes handling hundreds of petabytes of data. </p>



<p>The sheer scale of operations caused frequent performance issues, while Mean Time to Resolve (MTTR) stayed high. Acceldata’s observability platform helped PubMatic’s data engineers locate and isolate data bottlenecks faster, plus automate a lot of infrastructure support tasks. Thanks to the obtained insights, PubMatic also reduced its HDFS block footprint by 30% and consolidated its Kafka clusters, resulting in lower costs. </p>



<p>At Xenoss, we also built a custom pipeline monitoring stack using <a href="https://prometheus.io/" target="_blank" rel="noreferrer noopener">Prometheus</a> and <a href="https://grafana.com/" target="_blank" rel="noreferrer noopener">Grafana</a>, which allows us to keep a 24/7 watch over all data processing operations and rapidly respond to errors and failures. This is one of the most well-balanced and efficient stacks we&#8217;ve compiled and have successfully implemented across various client businesses.</p>



<h4 class="wp-block-heading"><strong>Recommended technologies for data pipeline monitoring: </strong></h4>



<ul>
<li><a href="https://grafana.com/" target="_blank" rel="noopener">Grafana</a>: An open-source, multi-platform service for analytics and interactive visualizations. </li>
<li><a href="https://www.datadoghq.com/">Datadog</a>: A SaaS monitoring and security platform. </li>
<li><a href="https://www.splunk.com/">Splunk</a>: A leader in application management, security, and compliance analytics. </li>
<li><a href="https://www.dynatrace.com/">Dynatrace</a>: An observability, AI, automation, and application security functionality meshed in one platform. </li>
</ul>
<div class="post-banner-cta-v1 js-parent-banner">
<div class="post-banner-wrap">
<h2 class="post-banner__title post-banner-cta-v1__title">Looking for best practices in data pipeline development?</h2>
<p class="post-banner-cta-v1__content">Xenoss provides expert insights and services to refine your data strategy</p>
<div class="post-banner-cta-v1__button-wrap"><a href="https://xenoss.io/big-data-solution-development" class="post-banner-button xen-button post-banner-cta-v1__button">Learn more</a></div>
</div>
</div>



<h2 class="wp-block-heading">Final thoughts </h2>



<p>Data pipelines are the lifeline of the AdTech industry. But building them is hard. There are always trade-offs between cost vs. performance. </p>



<p>Development of robust, scalable, and high-concurrency data pipelines requires a deep understanding of the AdTech industry and prolific knowledge of different technologies and system design strategies. Partnering with a team of <a href="https://xenoss.io/big-data-solution-development">AdTech data engineers</a> is your best way to avoid subpar architecture choices and costly operating mistakes. </p>



<p>Xenoss’ big data engineers have helped architected some of the most robust products in the industry, a <a href="https://xenoss.io/cases/developing-a-gaming-advertising-platform-with-1-4b-monthly-video-impressions">gaming advertising platform with 1.4 billion monthly video impressions</a>, and a <a href="https://xenoss.io/cases/building-performance-oriented-mobile-dsp-with-innovative-user-behavior-prediction-mechanism">performance-oriented mobile DSP</a>, recently acquired by the Verve Group. </p>



<p>We know how to build high-load products for high-ambition teams. Contact us to learn more about our custom software development services. </p>
<p>The post <a href="https://xenoss.io/blog/data-pipeline-best-practices-for-adtech-industry">Best practices for architecting data pipelines in AdTech</a> appeared first on <a href="https://xenoss.io">Xenoss - AI and Data Software Development Company</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>What kind of engineers should you hire for AdTech software projects?</title>
		<link>https://xenoss.io/blog/engineers-for-adtech-software-development</link>
		
		<dc:creator><![CDATA[Vlad Kushka]]></dc:creator>
		<pubDate>Thu, 21 Apr 2022 11:22:12 +0000</pubDate>
				<category><![CDATA[Software architecture & development]]></category>
		<category><![CDATA[Product development]]></category>
		<guid isPermaLink="false">https://xenoss.io/?p=2877</guid>

					<description><![CDATA[<p>As the complexity of software development in AdTech increases, it puts more burden on the hiring process. The average interview time for hiring senior software engineers is 40.8 days.  Hiring AdTech software engineers can take even longer due to the need for extensive domain knowledge, and the growing importance of AI/ML competency in this industry.  [&#8230;]</p>
<p>The post <a href="https://xenoss.io/blog/engineers-for-adtech-software-development">What kind of engineers should you hire for AdTech software projects?</a> appeared first on <a href="https://xenoss.io">Xenoss - AI and Data Software Development Company</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p><span style="font-weight: 400;">As the complexity of software development in AdTech increases, it puts more burden on the hiring process. The average interview time for hiring senior software engineers is </span><a href="https://www.glassdoor.com/research/time-to-hire-in-25-countries/"><span style="font-weight: 400;">40.8 days</span></a><span style="font-weight: 400;">. </span></p>
<p><span style="font-weight: 400;">Hiring AdTech software engineers can take even longer due to the need for extensive domain knowledge, and the growing </span><a href="https://www.thedrum.com/opinion/2022/02/17/3-ways-artificial-intelligence-will-save-adtech"><span style="font-weight: 400;">importance of AI/ML</span></a><span style="font-weight: 400;"> competency in this industry. </span></p>
<p><span style="font-weight: 400;">In order to reap the benefits of your AdTech software team – faster time to market and increased collaboration across teams and departments – it’s important to understand the industry context in which you operate and which software development specialists will be best suited to it. </span></p>
<p><span style="font-weight: 400;">The modern job market offers a vast range of software engineers with wide-ranging expertise. Besides specialization in programming language or technologies, there is division by so-called generalists, I-shapers, and T-shapers. We are going to talk about who the latter two are, which specialists are best fitted for an AdTech development project, and illustrate the way we at <a href="https://xenoss.io/">Xenoss</a> approach managing and growing T-shaped engineers.  </span></p>
<h2 class="p1">I-shaped vs T-shaped tech specialists</h2>
<p><b>I-shaped specialists</b><span style="font-weight: 400;"> are narrowly specialized professionals, such as designers, software developers, or data engineers. I-shapers get proficient in a particular stack of technology and then only polish this specific expertise. </span></p>
<p><span style="font-weight: 400;">Hiring I-shaped software development specialists can be a good fit for certain long-established, conservative industries, such as healthcare or financial services. </span></p>
<p><span style="font-weight: 400;">These companies have in-house expertise for any set of problems and value deep proficiency in a particular discipline instead of tech outlook and knowledge in the related domains. For instance, a QA engineer is responsible for the testing, but can’t put in the larger product context and won’t be able to perform even the minor tweaks in code or the platform UI. When I-shapers encounter more multidisciplinary tasks, they refer you to a specialist in a different department.</span></p>
<div class="mceTemp"></div>
<p><figure id="attachment_2909" aria-describedby="caption-attachment-2909" style="width: 2100px" class="wp-caption aligncenter"><img decoding="async" class="wp-image-2909 size-full" src="https://xenoss.io/wp-content/uploads/2022/04/example-of-i-shape-expertise-min-1.jpg" alt="Example of I-shape expertise - Xenoss blog - Engineers For AdTech Software Projects" width="2100" height="1036" srcset="https://xenoss.io/wp-content/uploads/2022/04/example-of-i-shape-expertise-min-1.jpg 2100w, https://xenoss.io/wp-content/uploads/2022/04/example-of-i-shape-expertise-min-1-300x148.jpg 300w, https://xenoss.io/wp-content/uploads/2022/04/example-of-i-shape-expertise-min-1-1024x505.jpg 1024w, https://xenoss.io/wp-content/uploads/2022/04/example-of-i-shape-expertise-min-1-768x379.jpg 768w, https://xenoss.io/wp-content/uploads/2022/04/example-of-i-shape-expertise-min-1-1536x758.jpg 1536w, https://xenoss.io/wp-content/uploads/2022/04/example-of-i-shape-expertise-min-1-2048x1010.jpg 2048w, https://xenoss.io/wp-content/uploads/2022/04/example-of-i-shape-expertise-min-1-527x260.jpg 527w, https://xenoss.io/wp-content/uploads/2022/04/example-of-i-shape-expertise-min-1-20x10.jpg 20w" sizes="(max-width: 2100px) 100vw, 2100px" /><figcaption id="caption-attachment-2909" class="wp-caption-text">Functional expertise of an I-shaped QA engineer</figcaption></figure></p>
<p><span style="font-weight: 400;">This is especially true if the company’s workflow is built on by </span><a href="https://www.toolshero.com/information-technology/rational-unified-process-rup/"><span style="font-weight: 400;">RUP (Rational Unified Process)</span></a><span style="font-weight: 400;"> methodology that entails completing one stage of development, clearly recording it in the documentation, before moving on to the next. The first release &#8220;in production&#8221; often occurs after a few months. This works if the external conditions are relatively constant.</span></p>
<p><figure id="attachment_2880" aria-describedby="caption-attachment-2880" style="width: 2100px" class="wp-caption aligncenter"><img decoding="async" class="wp-image-2880 size-full" src="https://xenoss.io/wp-content/uploads/2022/04/rational-unified-process-rup-min.jpg" alt="Rational unified process (RUP) - Xenoss blog - Engineers For AdTech Software Projects" width="2100" height="1094" srcset="https://xenoss.io/wp-content/uploads/2022/04/rational-unified-process-rup-min.jpg 2100w, https://xenoss.io/wp-content/uploads/2022/04/rational-unified-process-rup-min-300x156.jpg 300w, https://xenoss.io/wp-content/uploads/2022/04/rational-unified-process-rup-min-1024x533.jpg 1024w, https://xenoss.io/wp-content/uploads/2022/04/rational-unified-process-rup-min-768x400.jpg 768w, https://xenoss.io/wp-content/uploads/2022/04/rational-unified-process-rup-min-1536x800.jpg 1536w, https://xenoss.io/wp-content/uploads/2022/04/rational-unified-process-rup-min-2048x1067.jpg 2048w, https://xenoss.io/wp-content/uploads/2022/04/rational-unified-process-rup-min-499x260.jpg 499w, https://xenoss.io/wp-content/uploads/2022/04/rational-unified-process-rup-min-20x10.jpg 20w" sizes="(max-width: 2100px) 100vw, 2100px" /><figcaption id="caption-attachment-2880" class="wp-caption-text"><a href="https://www.toolshero.com/information-technology/rational-unified-process-rup/">RUP</a> – agile software development methodology</figcaption></figure></p>
<p><span style="font-weight: 400;">But if the product exists in the industry with a high degree of uncertainty and fast-paced market changes, such as in AdTech, a different development approach is needed. It is vital to focus on feedback from the market rather than focusing on the canonical rules for building development processes.</span></p>
<p><span style="font-weight: 400;">To avoid downtime and increase the speed of delivery, AdTech software development seeks out</span><b> T-shaped specialists</b><span style="font-weight: 400;">. These are people who have their own deeply studied specialization (similar to the I-shaped) and competencies in related areas.</span></p>
<p><figure id="attachment_2888" aria-describedby="caption-attachment-2888" style="width: 1024px" class="wp-caption aligncenter"><img decoding="async" class="wp-image-2888 size-large" src="https://xenoss.io/wp-content/uploads/2022/04/types-1024x537.gif" alt="Types of expertise - Xenoss blog - Engineers For AdTech Software Projects" width="1024" height="537" srcset="https://xenoss.io/wp-content/uploads/2022/04/types-1024x537.gif 1024w, https://xenoss.io/wp-content/uploads/2022/04/types-300x157.gif 300w, https://xenoss.io/wp-content/uploads/2022/04/types-768x403.gif 768w, https://xenoss.io/wp-content/uploads/2022/04/types-495x260.gif 495w, https://xenoss.io/wp-content/uploads/2022/04/types-20x10.gif 20w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption id="caption-attachment-2888" class="wp-caption-text">Different types of professional expertise</figcaption></figure></p>
<p><span style="font-weight: 400;">The concept of T-shaped skills is a metaphor that has been </span><a href="https://corporatefinanceinstitute.com/resources/management/t-shaped-skills/"><span style="font-weight: 400;">used in recruiting since the 90s</span></a><span style="font-weight: 400;"> of the last century. The concept can be represented as two stripes: horizontal and vertical.</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The horizontal bar (Breadth of Knowledge / General Skills) is the ability to interact with experts in other fields and apply their knowledge in areas other than one&#8217;s own.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The vertical bar (I-Shaped / Expert in one thing) is a deep competence in a particular area.</span></li>
</ul>
<p><figure id="attachment_2915" aria-describedby="caption-attachment-2915" style="width: 1050px" class="wp-caption aligncenter"><img decoding="async" class="wp-image-2915 size-full" src="https://xenoss.io/wp-content/uploads/2022/04/expertise-min.gif" alt="Example of T-shaped expertise - Xenoss blog - Engineers For AdTech Software Projects" width="1050" height="560" /><figcaption id="caption-attachment-2915" class="wp-caption-text">Functional expertise of T-shape QA engineer</figcaption></figure></p>
<p><span style="font-weight: 400;">In this scenario, a QA knows everything required to do the job, but also understands UX design, can create unit tests, can perform basic DevOps operations, etc.</span></p>
<p><span style="font-weight: 400;">In sophisticated AdTech projects, very often there is a critical lack of horizontal stripe width. Understanding and knowledge in related fields help to find a “common language” in the team, speed up the creation of a product, and improve its quality.</span></p>
<blockquote>
<p class="p1"><i>A T-shaper, while having a specialized skillset, also understands the product development holistically (different engineering environments, tools, and specifications). This gives the client an enormous advantage: keeping the team small and consequently cost-efficient, the team approaches the development the right way from the beginning. </i></p>
</blockquote>
<p style="text-align: right;"><span style="font-weight: 400;"><a href="https://www.linkedin.com/in/vovakyrychenko/">Vova Kyrychenko</a>, CTO at Xenoss</span></p>
<h2 class="p1">Challenges of AdTech dev team composition and management</h2>
<p><span style="font-weight: 400;">AdTech development teams work in a rapidly shifting market, with changing user preferences, and tough competition, on comprehensive business tasks that have various ways to approach them. AdTech companies need engineers with knowledge in adjacent disciplines and the ability to adapt and synthesize expertise. </span></p>
<p><span style="font-weight: 400;">For instance, to increase conversion rates for a media buying platform (a classical business objective for AdTech projects), data engineers have to take into account lots of factors from the related domains: The competitive landscape, AdOps specifics, potential hardware issues, the platform’s business use cases. </span><span style="font-weight: 400;"> </span></p>
<p><span style="font-weight: 400;">To determine whether T-shapers are the right fit for AdTech projects, let’s talk first about the typical team composition and management challenges. </span></p>
<h3 class="p1"><b>Hiring for a technically demanding project</b></h3>
<p><span style="font-weight: 400;">Assembling the dream team for a complicated and technically demanding project, common for the AdTech market, is a challenge in itself. Hiring narrowly-specialized senior tech talent might put a significant burden on the project due to the steep cost and long time to hire. The scale of necessary expertise might turn out to be smaller, and you’ll overspend on expensive work hours. </span></p>
<p><span style="font-weight: 400;">Instead of hiring people with a narrow skillset, AdTech companies need to prioritize tech specialists with wide knowledge that can approach the problem holistically. </span></p>
<p><span style="font-weight: 400;">Igor Petrenko, Solution Architect at Xenoss, </span><a href="https://www.linkedin.com/posts/xenoss_join-xenoss-activity-6842074798932283394-w-mV?utm_source=linkedin_share&amp;utm_medium=member_desktop_web"><span style="font-weight: 400;">emphasizes</span></a><span style="font-weight: 400;"> the importance of extensive tech and product knowledge for the optimal development of the AdTech software: </span></p>
<blockquote>
<p class="p1"><em>It&#8217;s not just about mastering tools and platforms. In every project, our tech team gains an in-depth understanding of the underlying technologies: the tech stack, internal components, operating systems, and hardware. By diving so deep and optimizing the product&#8217;s core, the solutions we build parallel-process hundreds of thousands of requests and are ready to support the next milestones of the clients’ businesses.</em></p>
</blockquote>
<h3 class="p1">The complexity of the team structure</h3>
<p><span style="font-weight: 400;">Development in companies that rely on I-shaped specialists is predicated on the multiple managers that can merge the expertise of engineers with widely different stacks. To effectively manage the workload, project leads (or team leads/tech leads/managers) have to be familiar with the technical aspect of each specialization and be able to plan for the long haul. The typical structure for such a team can look like this:  </span></p>
<p><figure id="attachment_2911" aria-describedby="caption-attachment-2911" style="width: 2100px" class="wp-caption aligncenter"><img decoding="async" class="wp-image-2911 size-full" src="https://xenoss.io/wp-content/uploads/2022/04/functional-structure-of-the-i-shape-team-1-min.jpg" alt="Functional structure of the I-shape team - Xenoss blog - Engineers For AdTech Software Projects" width="2100" height="1214" srcset="https://xenoss.io/wp-content/uploads/2022/04/functional-structure-of-the-i-shape-team-1-min.jpg 2100w, https://xenoss.io/wp-content/uploads/2022/04/functional-structure-of-the-i-shape-team-1-min-300x173.jpg 300w, https://xenoss.io/wp-content/uploads/2022/04/functional-structure-of-the-i-shape-team-1-min-1024x592.jpg 1024w, https://xenoss.io/wp-content/uploads/2022/04/functional-structure-of-the-i-shape-team-1-min-768x444.jpg 768w, https://xenoss.io/wp-content/uploads/2022/04/functional-structure-of-the-i-shape-team-1-min-1536x888.jpg 1536w, https://xenoss.io/wp-content/uploads/2022/04/functional-structure-of-the-i-shape-team-1-min-2048x1184.jpg 2048w, https://xenoss.io/wp-content/uploads/2022/04/functional-structure-of-the-i-shape-team-1-min-450x260.jpg 450w, https://xenoss.io/wp-content/uploads/2022/04/functional-structure-of-the-i-shape-team-1-min-20x12.jpg 20w" sizes="(max-width: 2100px) 100vw, 2100px" /><figcaption id="caption-attachment-2911" class="wp-caption-text">Software development team structure with I-shaped experts</figcaption></figure></p>
<p><span style="font-weight: 400;">However, in AdTech software development, especially if it is an emerging product or startup, maintaining such a rigid organizational structure is often unsustainable and costly. </span></p>
<h3 class="p1">Budget constraints for new roles</h3>
<p><span style="font-weight: 400;">Besides paying for multiple managerial roles, you might face the need to increase the budget on the go, each time you require some narrow expertise. For example, a few DevOps tasks emerge during the project span. You had no budget allocated for an additional position, but I-shaped back-end engineers cannot substitute for its functions so you’ll have to extend the project’s budget anyway. On the other hand, small teams of senior T-shaped developers, that have more universal tech expertise from the beginning, are more cost-effective in the long run. </span></p>
<h3 class="p1">Understanding the business environment</h3>
<p><span style="font-weight: 400;">Managing business objectives of an evolving AdTech product requires actionable tactical solutions and long term planning – a combination that requires a profound understanding of the domain. </span></p>
<p><span style="font-weight: 400;">The software engineer in this industry needs a thorough understanding of the competitor landscape, privacy regulations, supply chain logic, the typical data, and identity challenges. Even the most skilled I-shaper won’t be able to navigate these treacherous waters. AdTech engineers need a corresponding knowledge of AdOps, data science, and data architecture to comprehend the complexity of the technical solutions they have to devise. </span></p>
<h3><strong>Communication in cross-functional teams </strong></h3>
<p><span style="font-weight: 400;">Establishing a common language for a team of I-shaped specialists can be challenging. It is incredibly difficult to establish a clear feedback loop between developer, architect, tester, and data scientist. Narrow specialists don&#8217;t understand each other well, their vocabularies vary, and they tend to focus on their own well of knowledge. Workflows </span><a href="https://www.lucidchart.com/blog/are-you-ready-to-commit-developing-a-professional-software-engineer-workflow"><span style="font-weight: 400;">are not symmetric</span></a><span style="font-weight: 400;">; with small volumes, it is difficult to plan the work of a deep specialist without allowing downtime. </span></p>
<h2 class="p1">Why T-shaped expertise is indispensable for AdTech software development</h2>
<p><span style="font-weight: 400;">We recommend prioritizing T-shaped specialists in the hiring process since only these specialists are well-equipped for the multidisciplinary nature of AdTech. </span></p>
<p><span style="font-weight: 400;">A squad of T-shapers offers you a great deal of flexibility – with feature and task prioritization, change management based on user feedback, data-driven experimentation, and even resource optimization. Such small, </span><a href="https://www.bairesdev.com/blog/5-elements-of-a-high-performing-agile-team/"><span style="font-weight: 400;">agile teams</span></a><span style="font-weight: 400;"> have already taken over the world little by little, even banks and insurance providers.  </span></p>
<p><span style="font-weight: 400;">A T-shaper is appreciated for several qualities:</span><span style="font-weight: 400;"><br />
</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><b>Outlook. </b><span style="font-weight: 400;">In a modern, competitive business, this property is one of the most valuable. Knowledge of related or distant subjects helps create nonstandard solutions and solutions &#8220;at the junctions.&#8221; </span></li>
<li style="font-weight: 400;" aria-level="1"><b>Universality</b><span style="font-weight: 400;">. A T-shaper can reinforce the development of any part of the project at any stage, providing close to 100% utilization of his working time. </span></li>
<li style="font-weight: 400;" aria-level="1"><b>Interoperability</b><span style="font-weight: 400;">. It saves the manager&#8217;s time on establishing workflow and communications, which helps avoid misunderstandings that result in the </span><span style="font-weight: 400;">waste of the development resource</span><span style="font-weight: 400;">. </span></li>
<li style="font-weight: 400;" aria-level="1"><b>Agility</b><span style="font-weight: 400;">. Such a specialist is a walking backup for some team members. What if a Python developer gets hit by a coronavirus? A T-shaper will be able to pick up the dropped baton and continue the project.</span></li>
</ul>
<p><span style="font-weight: 400;">Due to the complexity of AdTech software projects, knowledge of the domain and technology outlook is absolutely critical to solving the business challenges of this industry. </span></p>
<p><span style="font-weight: 400;">T-shapers are capable of solving the challenges of our complex niche that requires brainstorming with a multi-disciplinary team, experiments, and improvisation for the optimal solution. The team of T-shapers can also help you keep the software development expenses in check; they can optimize when specialized development would just write off the costs. </span></p>
<p>[cta-no-description title=&#8221;Looking for T-shape experts for your AdTech team? &#8221; url=&#8221;https://xenoss.io/dedicated-development-teams&#8221; buttontext=&#8221;Get in touch&#8221;]</p>
<h2 class="p1">Xenoss success case: a T-shaped team for an AdTech platform</h2>
<p><span style="font-weight: 400;">To put into perspective how T-shaped experts reinforce the development of AdTech projects, let’s review a real-world case from our practice</span><span style="font-weight: 400;">.</span></p>
<p><span style="font-weight: 400;">One AdTech solution Xenoss delivered is a customer data platform for mobile apps. Due to the initial focus on T-shaped expertise in the hiring process, we were able to assemble a multifaceted team that could adapt to the changing needs of stakeholders. </span></p>
<p><span style="font-weight: 400;">In the projects, we agree on the quarterly business objectives with the client that are ambitious, and usually concern optimization of specific processes or increasing performance KPIs. Those are tasks typical for startups that operate with a high degree of uncertainty. If the team didn’t  understand the project holistically, we wouldn&#8217;t able to anticipate the software&#8217;s future problems and possible outcomes that allow us to develop optimal solutions. </span></p>
<p><span style="font-weight: 400;">To deliver a solution with utmost efficiency, team members require a comprehensive understanding of the business objective and understand different aspects of its implementation: </span></p>
<p><b>Business/AdOps.</b><span style="font-weight: 400;"> How do the projects align with the business processes and market landscape</span></p>
<p><b>Engineering</b><span style="font-weight: 400;">. Software development, the technical underpinning </span></p>
<p><b>Data science.</b><span style="font-weight: 400;"> AI models and machine learning algorithms, the core of the solution  </span></p>
<p><b>Product and Delivery. </b><span style="font-weight: 400;">DevOps, automation, and development infrastructure </span></p>
<p><span style="font-weight: 400;">For instance, data scientists on this project need basic AdOps knowledge. Otherwise, they won&#8217;t distinguish different inventory types for optimization and won&#8217;t build realistic models. Effective communication between team members would also be impossible without some understanding of the domain. </span></p>
<p><span style="font-weight: 400;">Before beginning the quarter, the entire team discusses the business objective and decides on the best strategy to approach it. It is much easier for our team of T-shaped professionals to convey ideas and formulate a shared product vision. In the I-shaped team, the company would have to invest more management resources to make those wheels turn.  </span></p>
<p><span style="font-weight: 400;">The project crew holds several sessions where each team member lays out their vision for the project. Then we summarize those outputs in the roadmap. The T-shaped team can quickly reach a consensus and proceed with the development due to the holistic understanding of the project by the entire team.</span></p>
<h2 class="p1">Important notice about a T-shaped engineering team</h2>
<p><span style="font-weight: 400;">Having T-shapers on your team is not a silver bullet against all development constraints. Despite their broad expertise, you cannot expect them to perform </span><span style="font-weight: 400;">exceptionally well in every domain. </span></p>
<p><span style="font-weight: 400;">Product managers frequently expect T-shaped engineers to be full-fledged tech consultants in everything. A </span><span style="font-weight: 400;">T-shaper can adjust and get up to speed with various tech stacks. Yet they still have a main area of expertise.</span></p>
<p><span style="font-weight: 400;">Expecting an engineer to be an expert in data architecture, cloud technologies, and UI design is simply not realistic. Developers can substitute for each other when there is a need and assume different roles throughout the project while acknowledging the strong and weak sides of each team member is essential. You can’t put a generalist in charge of an infrastructure decision that requires years of expertise and a solid track record. </span></p>
<p><span style="font-weight: 400;">Developing T-shaped expertise within your company is also a separate organizational challenge. Working in a cross-functional team can sometimes mean expanding expertise in the corresponding domains is more cumbersome than in a functionally aligned organization. Introducing “guilds,” a.k.a </span><a href="https://medium.com/scaled-agile-framework/exploring-key-elements-of-spotifys-agile-scaling-model-471d2a23d7ea"><span style="font-weight: 400;">communities of practice</span></a><span style="font-weight: 400;">, can facilitate this process, as they do it in Spotify.  </span></p>
<h2 class="p1">How Xenoss grows T-shaped specialists</h2>
<p><figure id="attachment_2912" aria-describedby="caption-attachment-2912" style="width: 1048px" class="wp-caption aligncenter"><img decoding="async" class="wp-image-2912 size-full" src="https://xenoss.io/wp-content/uploads/2022/04/grow-min.gif" alt="How to grow T-shaped specialist - Xenoss blog - Engineers For AdTech Software Projects" width="1048" height="498" /><figcaption id="caption-attachment-2912" class="wp-caption-text">Tips on growing T-shaped specialists in-house</figcaption></figure></p>
<p><span style="font-weight: 400;">While you can sometimes find skilled T-shaped engineers on the market, that&#8217;s not always the case. You can approach this problem by growing such specialists internally. To develop a T-shaped specialist within your ranks, you must create the right environment and conditions for them.</span></p>
<p><b>Autonomy</b><span style="font-weight: 400;">. Everyone must understand their responsibility for what they do. The individual or team must be able to make their own decisions. In turn, management must provide room for potential mistakes while having clear guidance and streamlined control processes in place that eliminate mistakes in the client-facing solution. </span></p>
<p><b>Motivational goal.</b><span style="font-weight: 400;"> Each team member must understand the overall goal and be aware of what their contribution brings to the table. Instead of a straightforward task, they should be responsible for solving a business challenge with the creative freedom to choose the technical and tactical approach.</span></p>
<p><b>Space for growth.</b><span style="font-weight: 400;"> Something this allows people to show their best qualities, learn something new, and be the best at something. And this is the main prerequisite for nurturing the T-shaped – setting problem-oriented tasks, allocating time for research, and mastering a new skill set to create solutions. </span></p>
<p><span style="font-weight: 400;">Our development methodology at Xenoss fully supports these three objectives. The team strategizes with the client for the optimal solution to a given business challenge, collectively sets goals and tactics to achieve them, and bears the responsibility for successful execution. The work is divided into sprints, and for each, a goal or goals are set that brings the team one step closer to achieving the final result. </span></p>
<p><span style="font-weight: 400;">To make this collaborative process work, Xenoss AdTech software engineers pick up the knowledge and skills from the related disciplines and establish effective cross-functional communication to foster a holistic understanding of the project. This stimulates the search for ideas to improve the product. </span></p>
<h2 class="p1">Takeaways</h2>
<p><span style="font-weight: 400;">A T-shaper, capable of picking up knowledge on the fly and establishing proficiency in the related disciplines, is the most valuable asset for the AdTech project, especially with a diverse tech stack. Modern advertising technologies are developing in the direction of syncretism, dense intersection, and even partial mergers of various domains. </span></p>
<p><span style="font-weight: 400;">The team of I-shapers can be a good fit for the company with a rigid managerial structure and a long-term incremental delivery process. For time-sensitive <a href="https://xenoss.io/custom-adtech-programmatic-software-development-services">AdTech software development</a> with a high degree of uncertainty, especially for an emerging product, you need a T-shaper to analyze the problem on several levels and work out a solution. </span></p>
<p><span style="font-weight: 400;">T-shaped specialists are especially valuable for emerging AdTech products, where they quite often have to work in startup mode, adapt to the constantly changing context, and at the same time be able to demonstrate team effectiveness and deliver real business value.</span></p>
<p><span style="font-weight: 400;"><div class="post-banner-cta-v2 no-desc js-parent-banner">
<div class="post-banner-wrap post-banner-cta-v2-wrap">
	<div class="post-banner-cta-v2__title-wrap">
		<h2 class="post-banner__title post-banner-cta-v2__title">Looking for experienced AdTech engineers and integrators for your team?</h2>
	</div>
<div class="post-banner-cta-v2__button-wrap"><a href="https://xenoss.io/dedicated-development-teams" class="post-banner-button xen-button">Learn more</a></div>
</div>
</div></span></p>
<p>The post <a href="https://xenoss.io/blog/engineers-for-adtech-software-development">What kind of engineers should you hire for AdTech software projects?</a> appeared first on <a href="https://xenoss.io">Xenoss - AI and Data Software Development Company</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
