← Glossary / Silver Layer

What is Silver Layer?

Silver layer is the intermediate stage in a medallion data architecture where raw, ingested data is cleaned, filtered, and standardized into a queryable format. For scraping pipelines, this is where raw HTML payloads and messy JSON responses are transformed into typed, deduplicated records. It acts as the enterprise source of truth, shielding downstream analytics from upstream schema drift and extraction anomalies.

Medallion ArchitectureData CleaningETLData LakehouseDeduplication
// 02 — definitions

Cleaned, typed,
and trusted.

The bridge between raw pipeline exhaust and business-ready datasets. If your data engineers are querying raw JSON, you don't have a Silver layer.

Ask a DataFlirt engineer →

TL;DR

The Silver layer takes immutable Bronze data (raw fetches) and applies schema validation, type coercion, and deduplication. It provides a standardized, historical view of all extracted entities. DataFlirt's extraction engine writes directly to Silver, ensuring clients never have to parse raw DOM artifacts or handle missing sentinels.

01Definition & structure
In a medallion architecture, the Silver layer represents data that has been cleaned, conformed, and deduplicated. While the Bronze layer stores data exactly as it was received from the source (often messy, nested, or containing raw HTML), the Silver layer transforms this into a structured, relational format. It acts as the "enterprise view" of an entity—a single, trusted table for products, reviews, or companies that all downstream analytics can rely on.
02Transformations applied
Promoting data to Silver involves several critical ETL steps:
  • Type Coercion: Converting string prices ("₹1,299") into numeric decimals (1299.00).
  • Date Normalization: Parsing disparate date strings into standard ISO 8601 timestamps.
  • Deduplication: Merging overlapping records from different crawl paths into a single entity state.
  • Null Handling: Imputing missing values with explicit sentinels or defaults.
03Schema enforcement
The defining characteristic of a Silver layer is strict schema enforcement. Unlike Bronze, which accepts any payload, Silver tables have defined contracts. If a scraping job returns a record missing a required primary key, or containing a string where a boolean is expected, the record is rejected. This strictness prevents silent data corruption from propagating to business dashboards.
04How DataFlirt handles it
We automate the Bronze-to-Silver promotion using distributed processing frameworks and open table formats (Iceberg/Delta). Our extraction engine validates every record against a versioned schema contract before writing to Silver. Records that fail validation are automatically routed to a quarantine table, triggering an alert for our data engineers to patch the extraction logic and replay the failed records.
05Silver vs Gold distinction
A common mistake is putting business logic in the Silver layer. Silver should represent the reality of the source, just cleaned up. If a product is listed twice on a website with different prices, Silver should reflect that anomaly. The Gold layer is where you apply business rules (e.g., "take the lowest price" or "aggregate by category"). Keeping Silver objective ensures you don't lose data fidelity.
// 03 — silver metrics

Measuring layer
integrity.

A Silver layer is only as useful as its reliability. DataFlirt tracks these metrics per table to ensure downstream Gold aggregations aren't poisoned by upstream extraction failures.

Bronze-to-Silver Yield = Y = recordssilver / recordsbronze
Tracks drop-off from validation failures. Target > 99.5%. DataFlirt Pipeline SLO
Deduplication Ratio = D = 1 − (unique_entities / total_ingested)
High D indicates overlapping crawl paths or pagination loops in Bronze. Storage Optimization Metric
Schema Conformance = C = valid_records / (valid_records + quarantined)
Drops below 1.0 trigger automated schema drift alerts. Data Quality Monitor
// 04 — pipeline execution

Promoting Bronze
to Silver.

A transformation job processing raw e-commerce product fetches into a clean Silver table, handling type coercion, null imputation, and deduplication.

Apache Icebergdbttype coercion
edge.dataflirt.io — live
CAPTURED
// read bronze
source: "s3://df-lake/bronze/products/raw_json/"
records_scanned: 142,850

// apply transformations
cast(price_string as decimal) -> price_usd
parse_timestamp(scraped_at) -> ingested_at
coalesce(stock_status, 'UNKNOWN') -> availability

// validation & quarantine
schema_check: 142,812 passed, 38 failed
quarantine_write: 38 records -> s3://df-lake/quarantine/

// deduplication (upsert)
merge_key: [product_id, domain]
inserted: 12,400
updated: 130,412

// commit
target: catalog_silver.products
status: SUCCESS
// 05 — transformation risks

Where Silver
promotions fail.

The most common reasons raw data fails to promote to the Silver layer, based on DataFlirt's telemetry across 400+ active extraction pipelines.

PIPELINES ·  ·  ·  ·  ·   400+ active
RECORDS/DAY ·  ·  ·  ·    850M+
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Type coercion errors

38% of failures · String to numeric/date parsing failures
02

Schema drift

29% of failures · Missing required fields from DOM changes
03

Primary key collisions

18% of failures · Deduplication logic or composite key failures
04

Encoding corruption

11% of failures · Malformed UTF-8 or hidden control characters
05

Late-arriving data

4% of failures · Out-of-order timestamp merges
// 06 — architecture

Immutable history,

queryable present.

DataFlirt builds Silver layers using modern table formats like Apache Iceberg or Delta Lake. This allows us to perform ACID-compliant upserts on massive scraped datasets without rewriting entire partitions. When a product price changes, we don't just overwrite the row; we append a new version, giving clients a complete time-series history of the entity while keeping the default query pointed at the latest state.

Silver Table Metadata

Live metadata for a Silver-tier product catalog table.

table.name catalog_silver.products
table.format Apache Iceberg v2
partitioning day(ingested_at)
row_count 84,291,005
quarantine_rate 0.04%healthy
last_compaction 2 hours ago
schema.enforcement strict

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about medallion architecture, data cleaning, and how DataFlirt manages Silver layer pipelines.

Ask us directly →
What is the difference between Bronze, Silver, and Gold layers? +
Bronze is raw, immutable data exactly as fetched (e.g., HTML dumps, raw JSON). Silver is cleaned, typed, and deduplicated entity-level data (e.g., a products table). Gold is business-level aggregated data ready for reporting (e.g., daily_average_price_by_category).
Why not just extract directly to Gold? +
Extracting directly to Gold tightly couples your scraping logic to your business logic. If a business requirement changes, you have to rewrite the scraper. If a scraper breaks, your business reports fail silently. The Silver layer decouples extraction from analytics, providing a stable, historical foundation that can be re-queried if Gold logic changes.
How do you handle schema changes in the Silver layer? +
Through schema evolution and quarantine queues. If a target site adds a new field, the Silver schema evolves to include it. If a site removes a required field, the record fails validation and is routed to a dead-letter queue (quarantine) for engineer review, rather than silently poisoning the Silver table with nulls.
Does DataFlirt deliver data in Silver or Gold format? +
We typically deliver Silver-tier data to data engineering teams, as they prefer to build their own Gold aggregations in-house using dbt. For product managers or analysts without data engineering support, we build and deliver Gold-tier materialized views or CSV feeds.
How is deduplication handled during the Bronze-to-Silver merge? +
We use upserts (MERGE operations) based on a composite primary key—usually a combination of the target domain and the entity's unique ID. If the record exists, we update the mutable fields (like price or stock) and append a new history row. If it doesn't, we insert it.
What happens to records that fail Silver validation? +
They are written to a quarantine table alongside the error metadata (e.g., "Failed cast to numeric on column: price"). This ensures the main Silver table remains pristine while preserving the failed records so they can be reprocessed once the extraction logic is patched.
$ dataflirt scope --new-project --target=silver-layer READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h