← Glossary / Error Rate Tracking

What is Error Rate Tracking?

Error rate tracking is the continuous measurement and classification of failures within a scraping pipeline. In web data extraction, errors aren't just HTTP 500s — they include anti-bot challenges, proxy timeouts, missing DOM selectors, and silent schema validation failures. Tracking these rates across dimensions like target domain, proxy ASN, and extraction field is what separates a fragile script from a production-grade data feed.

ObservabilityPipeline HealthTelemetryAnti-botSLOs
// 02 — definitions

Measure the
breakage.

Pipelines fail constantly. Tracking the rate, type, and velocity of those failures is how you maintain data continuity.

Ask a DataFlirt engineer →

TL;DR

Error rate tracking categorizes pipeline failures into network, anti-bot, and extraction buckets. By monitoring the ratio of failed to successful records over time, engineering teams can trigger auto-healing routines, rotate proxy pools, or alert on-call engineers before downstream data consumers notice a drop in volume.

01Definition & structure

Error rate tracking is the systematic logging, categorization, and analysis of failures in a data extraction pipeline. Unlike standard web applications where errors are mostly server-side faults, scraping pipelines face hostile environments. Errors must be bucketed into distinct classes:

  • Network/Proxy: Timeouts, DNS failures, connection resets.
  • Target Server: 5xx HTTP codes indicating the target is down or overloaded.
  • Anti-Bot: 403s, CAPTCHAs, or redirects to challenge pages.
  • Extraction: Missing DOM elements, type coercion failures, or schema drift.
02The silent failure problem
The most dangerous errors in a scraping pipeline return a 200 OK. If a target site changes its CSS class names, your HTTP client will successfully fetch the page, but your selectors will extract nothing. If you only track HTTP status codes, your error rate will be 0%, but your database will fill with null values. Effective error tracking requires validating the extracted payload against a schema and logging validation failures as critical pipeline errors.
03Dimensional tracking
A global error rate is unactionable. To debug effectively, telemetry must be tagged with dimensions. If the error rate spikes to 15%, you need to group those errors by proxy_asn, target_domain, and worker_id. If the errors are isolated to a single ASN, your proxy pool is burned. If they span all proxies but are isolated to one domain, the target deployed a new anti-bot rule. Dimensionality turns logs into root-cause analysis.
04How DataFlirt handles it
We treat error telemetry as the input to our control plane. Every request is tracked via Prometheus and visualized in Grafana. When an error occurs, it is classified instantly. If a worker hits a 403, the session is discarded and the IP is cooled down. If schema validation fails on more than 5% of records in a 5-minute window, the pipeline automatically pauses, quarantines the bad data, and pages our engineering team to patch the selector.
05Static thresholds vs anomaly detection
Static alerting thresholds (e.g., "alert if errors > 5%") fail at scale because different targets have different baseline failure rates. A highly protected e-commerce site might naturally run at a 3% block rate, while a government registry runs at 0.1%. Advanced tracking systems use anomaly detection — calculating the trailing 7-day baseline for a specific target and alerting only when the current rate deviates significantly from that specific baseline.
// 03 — the telemetry

How to quantify
pipeline health.

A single global error rate is useless. DataFlirt tracks specific failure dimensions to isolate whether the target is down, our proxies are burned, or the schema drifted.

True Error Rate = E = (HTTP_err + Bot_blocks + Schema_fails) / Requests
Includes silent failures, not just network drops. DataFlirt Observability Standard
Proxy Burn Rate = B = 403_responses / Requests_per_ASN
Triggers automatic pool rotation when B > 0.05. Fleet Management Heuristic
Extraction Yield = Y = 1 − (Null_critical_fields / Total_records)
Measures schema health independent of HTTP status. Data Quality SLO
// 04 — telemetry stream

Classifying errors
in real time.

A live observability stream from a DataFlirt worker scraping a major retailer. Notice how errors are tagged, counted, and evaluated against threshold rules.

PrometheusGrafanaAlertmanager
edge.dataflirt.io — live
CAPTURED
// window: 1m | target: retail-eu
metric.req_total: 14,200
metric.err_network: 12 // 502 Bad Gateway
metric.err_antibot: 415 // 403 Forbidden (DataDome)
metric.err_schema: 0

// evaluating thresholds
rule.antibot_rate: 0.029
rule.antibot_limit: 0.050
status: WARN // approaching rotation threshold

// 2 minutes later...
metric.err_antibot: 1,102
rule.antibot_rate: 0.077
status: CRITICAL // threshold breached

// auto-remediation triggered
action: rotating proxy pool -> residential_eu_2
action: throttling concurrency -> 20 req/s
pipeline.state: recovering
// 05 — failure distribution

Where pipelines
actually break.

Based on telemetry across DataFlirt's fleet, here is the distribution of error types that trigger pipeline alerts. Anti-bot interventions and schema drift dominate.

PIPELINES ·  ·  ·  ·  ·   300+ active
WINDOW ·  ·  ·  ·  ·  ·   30d trailing
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Anti-bot blocks

45% of alerts · 403s, CAPTCHAs, silent tarpits
02

Schema drift

32% of alerts · Missing selectors, type coercion fails
03

Target server errors

14% of alerts · 500, 502, 503, 504 status codes
04

Proxy/Network timeouts

7% of alerts · Read timeouts, connection resets
05

Delivery/Sink failures

2% of alerts · S3 write fails, DB deadlocks
// 06 — our observability stack

Don't just log errors,

classify and react to them.

At DataFlirt, error rate tracking is an active control loop, not a passive dashboard. Every failure is classified into one of four buckets: Network, Target, Anti-Bot, or Extraction. If the anti-bot error rate spikes, the scheduler automatically rotates the proxy pool and adjusts the browser fingerprint. If extraction errors spike, the pipeline pauses to prevent writing poisoned data to the client's S3 bucket. Observability without automated remediation is just a pager that wakes you up.

Pipeline Telemetry State

Live health metrics for a continuous pricing pipeline.

pipeline.id retail-pricing-eu
uptime 99.98%ok
rate.success 98.4%ok
rate.antibot 1.2%
rate.schema_fail 0.0%ok
active_alerts 0clear
auto_remediations 14 in 24h

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Questions about error classification, acceptable failure rates, alerting strategies, and how DataFlirt guarantees data delivery despite constant breakage.

Ask us directly →
What is an acceptable error rate for a scraping pipeline? +
It depends entirely on the target. For aggressive anti-bot targets, an initial request failure rate of 1–3% is normal and handled via retries. We aim for a final delivery failure rate of <0.1%. The key metric is retry success, not initial request success.
How do you track silent failures where the HTTP status is 200 OK? +
Through schema validation. We track the completeness of critical fields on every record. If a 200 OK response yields a null value for the price field, it is logged as an extraction error, not a success. Relying solely on HTTP status codes guarantees you will deliver bad data.
Why not just retry every error indefinitely? +
Retrying a 403 from an anti-bot system with the same IP and fingerprint just burns the IP faster. Retrying a 500 might DDoS a struggling target. You must classify the error before deciding how to retry. Network timeouts get immediate retries; bot blocks require a new identity context.
How does DataFlirt handle sudden spikes in error rates? +
Our control plane detects anomalies within 60 seconds using rolling windows. Depending on the error class, it will automatically rotate proxies, throttle concurrency, or pause the job entirely and page an engineer. We never let a failing pipeline spin out of control and burn infrastructure.
Should we alert on absolute error counts or percentages? +
Always use percentages over a rolling window. 100 errors in a 10,000 req/min pipeline is background noise. 100 errors in a 150 req/min pipeline is a critical outage. Absolute counts only make sense for fatal errors like database connection drops.
How do you distinguish between a proxy failure and a target server failure? +
By tracking error rates across proxy ASNs and geographic regions. If requests through AWS datacenter proxies fail but residential proxies succeed, it's an IP block. If all proxies across all regions fail with 502 Bad Gateway, the target server is down.
$ dataflirt scope --new-project --target=error-rate-tracking READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h