← Glossary / Field Availability Monitoring

What is Field Availability Monitoring?

Field availability monitoring is the continuous tracking of non-null values for specific schema attributes across a scraping pipeline's output. It is the primary observability mechanism for detecting silent extraction failures. When a site layout changes, the scraper might still return a 200 OK and parse the page, but if the price selector breaks, the field silently drops to zero. Monitoring availability catches the drift before empty columns hit your data warehouse.

Data QualitySchema DriftObservabilitySelector RotETL
// 02 — definitions

Catching the
silent failures.

Why a 200 OK and a successful parse step mean nothing if the actual business payload is returning nulls.

Ask a DataFlirt engineer →

TL;DR

Field availability monitoring tracks the fill rate of specific data attributes across a scraping run. It's the defense layer against selector rot—detecting when a site layout change causes a field to silently drop from 98% presence to 0% without breaking the overall pipeline.

01Definition & structure

Field availability monitoring is the practice of tracking the percentage of non-null values for specific attributes in a scraped dataset over time. It sits at the extraction layer of a data pipeline, validating the output of CSS selectors, XPath queries, or JSON path navigations.

A typical monitoring setup tracks:

  • Absolute fill rate: The raw percentage of records containing the field.
  • Historical baseline: The trailing average fill rate for that specific field.
  • Drift: The delta between the current batch and the historical baseline.
02The silent failure mode

Most scraping pipelines are heavily monitored for HTTP errors (403s, 429s) and timeout exceptions. But the most dangerous failures are silent. If a target website redesigns their product page and changes the price element's class from .price-tag to .product-price, the scraper doesn't crash. It successfully fetches the page, fails to find .price-tag, returns null, and writes the record.

Without field availability monitoring, this silent failure propagates to the data warehouse, corrupting downstream analytics and pricing models until a human notices the empty columns weeks later.

03Handling sparse and optional fields

Not all fields are expected to be 100% populated. A discount_percentage field might only appear on 10% of products. A secondary_author field might only exist on 5% of articles.

Effective monitoring requires context. Instead of alerting when a field drops below 90%, the system must alert when a field deviates significantly from its historical norm. If discount_percentage drops from 10% to 0%, that is an anomaly worth investigating, even though the absolute fill rate was always low.

04How DataFlirt handles it

We enforce field availability as a strict data contract. Every pipeline has defined baselines for every extracted field. During the extraction phase, if a batch violates its availability thresholds, the delivery is automatically halted and the batch is quarantined.

Because we separate the fetch layer from the extraction layer, we don't need to re-scrape the target site to fix the issue. Our engineers update the broken selector, and the pipeline re-processes the cached raw HTML, recovering the missing fields before delivering the final dataset to the client.

05A/B testing as a false positive

A common cause of partial field availability drops is target site A/B testing. If a site routes 20% of traffic to a new layout, the scraper will successfully extract data from 80% of the pages and return nulls for the 20% hitting the variant.

This manifests in monitoring as a sudden, stable drop in fill rate (e.g., from 99% to 79%). Robust extraction logic must be updated to handle both the control and variant selectors simultaneously until the target site concludes their test.

// 03 — the metrics

How we measure
field health.

DataFlirt tracks availability at the field level, comparing current batch fill rates against historical baselines to detect anomalous drops before data delivery.

Field Fill Rate = F = records_with_field / total_records
The absolute percentage of non-null values for a given attribute. Standard ETL metric
Availability Drift = D = Fcurrent − μ(Fhistorical)
Measures deviation from the 30-day trailing average for that specific field. DataFlirt anomaly detection
Quarantine Threshold = if (D −0.15) → HALT
A sudden 15% drop in a required field triggers an automatic delivery halt. DataFlirt pipeline SLO
// 04 — pipeline observability

A silent drop,
caught at the edge.

Trace of a post-extraction validation step on an e-commerce pipeline. The price selector broke due to an A/B test, but the page still parsed. Availability monitoring caught the anomaly.

schema validationanomaly detectionquarantine
edge.dataflirt.io — live
CAPTURED
// extraction job complete
job.id: "ext-retail-uk-092"
records.total: 14,200

// field availability check
field.title: 100% (baseline: 100%)
field.sku: 99.8% (baseline: 99.9%)
field.price: 12.4% (baseline: 98.5%)
field.stock: 98.1% (baseline: 98.0%)

// anomaly detection
alert.trigger: field.price drift exceeds threshold (-86.1%)
root_cause_guess: "selector_rot_or_ab_test"

// pipeline action
delivery.status: QUARANTINED
action: pagerduty_alert_dispatched
// 05 — root causes

Why fields
go missing.

The most common reasons a previously stable field suddenly starts returning nulls across a production dataset, ranked by frequency across DataFlirt's managed pipelines.

PIPELINES MONITORED ·   300+ active
FIELDS TRACKED ·  ·  ·    4,200+
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Selector rot

DOM changes · Target site updated their HTML structure or CSS classes.
02

A/B testing

variant layout · A subset of requests hit a new layout with different selectors.
03

Conditional rendering

state changes · Field only appears if item is in stock or user is logged in.
04

Anti-bot tarpits

poisoned data · Silent blocks returning fake HTML missing key payload data.
05

API schema drift

JSON changes · Backend API renamed a key from 'price_usd' to 'price_current'.
// 06 — DataFlirt's approach

Never deliver,

an empty column.

DataFlirt treats field availability as a strict data contract. We don't just log warnings when a field drops; we halt delivery. If a critical field like price or SKU drops below its historical availability baseline, the batch is automatically quarantined. Our engineering team receives an alert, repairs the selector, backfills the missing data from the raw HTML cache, and releases the batch. Downstream consumers never ingest a silent failure.

extraction.validation.log

Live status of a schema validation check enforcing availability thresholds.

pipeline.id real-estate-us-04
schema.version v2.4.1
field.address 100% fillpass
field.agent_phone 82% fillpass
field.tax_history 0% filldrift -94%
contract.status violated
batch.action quarantine_and_alert

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About schema drift, handling optional fields, threshold tuning, and how DataFlirt prevents silent data loss.

Ask us directly →
What is the difference between field availability and success rate? +
Success rate measures HTTP 200s and successful page parses. Field availability measures whether the specific data points you care about (price, author, rating) were actually found in the parsed document. A pipeline can have a 100% success rate while delivering 0% of the required data if the selectors are broken.
How do you monitor fields that are genuinely optional? +
We use historical baselines rather than absolute thresholds. If a "discount_price" field is historically present on 15% of products, a 14% fill rate is normal. We only alert if it drops to 2% or spikes to 80%. Context-aware monitoring prevents alert fatigue on sparse fields.
Should thresholds be static or dynamic? +
Dynamic. Static thresholds (e.g., "alert if < 90%") fail on seasonal data or sparse fields. Dynamic thresholds calculate a trailing average (e.g., 7-day or 30-day mean) and alert on standard deviation anomalies. This handles natural fluctuations in target site inventory without manual tuning.
What happens when DataFlirt quarantines a batch? +
The data is held in a staging bucket and an incident is paged to our engineers. We diagnose the root cause (usually a selector change), patch the extraction logic, and re-run the extraction against the cached raw HTML. The client receives the data slightly delayed, but structurally perfect.
Can AI automatically fix broken selectors when availability drops? +
Yes, but with caveats. We use LLMs to suggest selector repairs when a field drops, comparing the old DOM structure to the new one. However, we always keep a human in the loop to approve the new selector before it merges to production, ensuring we don't accidentally extract the wrong price element.
What if the target site actually removed the data permanently? +
If a field is permanently deprecated by the target, we update the schema contract. We notify the client, mark the field as deprecated in the schema registry, and adjust the expected baseline to 0%. The pipeline resumes, and downstream consumers have an audit trail of why the column is now empty.
$ dataflirt scope --new-project --target=field-availability-monitoring READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h