← Glossary / Success Rate

What is Success Rate?

Success rate is the percentage of attempted scraping requests that return a valid, fully extracted data record. It is the ultimate measure of pipeline health, sitting downstream of network reliability, proxy rotation, anti-bot evasion, and schema extraction. A high success rate means your infrastructure is correctly tuned to the target's constraints; a dropping success rate is the first indicator of selector rot, IP bans, or new fingerprinting challenges.

Pipeline HealthSLAsExtractionMonitoringThroughput
// 02 — definitions

The ultimate
metric.

Why measuring 200 OKs is a trap, and how to track the actual delivery of structured data.

Ask a DataFlirt engineer →

TL;DR

Success rate isn't just about HTTP status codes. A request that returns a 200 OK but contains a CAPTCHA, a soft block, or an empty JSON payload is a failure. True success rate measures the ratio of valid, schema-compliant records delivered against the total URLs queued for extraction.

01Definition & structure
Success rate in a scraping context is the ratio of fully extracted, schema-compliant records to the total number of URLs attempted. It is a composite metric that requires the network request to succeed, the anti-bot system to be bypassed, the target server to respond with the correct payload, and the extraction logic to successfully parse the required fields.
02The false positive problem
The most common mistake in scraping telemetry is equating an HTTP 200 OK with a successful scrape. Modern anti-bot systems frequently serve challenge pages, honeypots, or empty templates with a 200 status code. If your pipeline does not validate the presence of expected data fields before logging a success, your metrics will lie to you, and downstream consumers will receive empty or garbage data.
03Measuring at scale
At scale, success rate must be tracked across multiple dimensions: per target domain, per proxy provider, per proxy subnet, and per extraction schema. A sudden drop in success rate isolated to a specific ASN indicates an IP ban; a drop across all proxies for a specific domain indicates a site layout change or a new global anti-bot deployment.
04How DataFlirt handles it
We enforce strict schema validation at the edge. A request is only counted as a success if the resulting record passes all type checks and contains all required fields. Failed requests are automatically categorized (e.g., network timeout, CAPTCHA, schema mismatch) and routed to a specialized retry queue that adjusts the request parameters—such as upgrading from a datacenter IP to a residential IP—to maximize the probability of success on the next pass.
05Did you know?
Chasing a 100% success rate is usually an anti-pattern. The cost of extracting the final 1% of a difficult target—requiring premium residential proxies, heavy browser rendering, and manual CAPTCHA solving—often exceeds the value of the data. Mature data engineering teams optimize for a 95–98% success rate, accepting a small margin of data loss in exchange for massive reductions in infrastructure spend.
// 03 — the math

How to calculate
true success.

Network success and extraction success are entirely different metrics. DataFlirt's telemetry tracks both, but our client SLAs are strictly bound to the final schema-validated output.

True Success Rate = S = records_validated / urls_attempted
The only metric that matters to downstream data consumers. DataFlirt Pipeline SLO
False Positive Rate = F = (http_200records_validated) / http_200
Measures silent failures: soft blocks, tarpits, and selector rot. Scraper Forensics
Effective Throughput = Teff = concurrency × req_per_sec × S
How fast you are actually acquiring usable data. Infrastructure Capacity Planning
// 04 — pipeline telemetry

A pipeline run,
measured in real time.

Live metrics from a distributed crawl of a major retail catalog. Notice the divergence between network-level success and extraction-level success.

PrometheusSchema ValidationRetry Queue
edge.dataflirt.io — live
CAPTURED
// job.init: retail_catalog_US
urls.queued: 250,000
workers.active: 120

// network layer metrics (mid-run)
http.200_ok: 248,105 // 99.2% network success
http.403_forbidden: 1,240
http.5xx_server: 655

// extraction layer metrics
extract.success: 239,400 // 95.7% true success
extract.captcha_hit: 4,205 // returned 200 OK, but blocked
extract.schema_fail: 4,500 // missing price field

// resolution
action.retry_queued: 10,600 // pushing to residential pool
pipeline.status: SLO MAINTAINED
// 05 — failure modes

What degrades
success rates.

The most common reasons a queued URL fails to become a delivered record. Network errors are rarely the primary culprit on mature pipelines.

SAMPLE SIZE ·  ·  ·  ·    18M requests
WINDOW ·  ·  ·  ·  ·  ·   7d trailing
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Schema drift / selector rot

silent failure · Target updated DOM structure
02

Soft blocks / CAPTCHAs

false positive · 200 OK but no target data
03

Proxy timeouts / IP bans

network drop · Connection refused or dropped
04

Target server 5xx errors

capacity limit · Target infrastructure overload
05

Rate limiting (429s)

throttle hit · Concurrency too high for target
// 06 — our approach

Measure the data,

not just the network response.

At DataFlirt, we decouple network success from pipeline success. A residential proxy returning a 200 OK is meaningless if the payload is a Cloudflare challenge page or a 'Product Not Found' redirect. Our telemetry tracks success at the schema level: a request is only marked successful if it yields a record that passes all type assertions and completeness thresholds. Everything else is a failure, and is automatically queued for retry with a different fingerprint profile or proxy tier.

Pipeline Health Dashboard

Real-time success metrics for a high-volume pricing feed.

target.domain b2b-supply-eu.com
network.success 99.8%nominal
extraction.success 98.5%nominal
schema.validation 1.5% failquarantined
retry.efficiency 92% recovered on pass 2
sla.compliance met

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about measuring, maintaining, and optimizing scraping success rates at scale.

Ask us directly →
What is considered a 'good' success rate? +
It depends heavily on the target. For a standard surface web catalog, 98–99% is expected. For highly defended targets (e.g., major social networks or flight aggregators), a 90–95% success rate on the first pass is excellent, with retries picking up the remainder. Chasing 100% is usually a waste of compute and proxy bandwidth.
Why is my success rate dropping but my HTTP 200s are stable? +
You are hitting soft blocks. Anti-bot systems like DataDome and Cloudflare often return a 200 OK status code alongside a CAPTCHA or a poisoned HTML payload. If your monitoring only looks at HTTP status codes, your pipeline looks healthy while delivering zero actual data.
How does DataFlirt maintain high success rates on hostile targets? +
We use dynamic fallback routing. If a datacenter IP gets a 403, the request is automatically retried on a residential IP. If the residential IP gets a CAPTCHA, it's retried with a fully headed browser and a pristine TLS fingerprint. We escalate the cost of the request only when necessary to achieve the extraction.
Should I retry failed requests immediately? +
No. Immediate retries usually result in immediate secondary failures, burning through your proxy pool. Implement exponential backoff, and more importantly, change the request parameters (new IP, new user-agent, different TLS fingerprint) before attempting the retry.
Is it legal to bypass blocks to improve success rate? +
Accessing publicly available data is generally lawful, but bypassing authentication or ignoring explicit access controls carries legal risk (e.g., CFAA in the US). We focus on maintaining high success rates through legitimate fingerprint diversity and rate-limit compliance, not through exploiting vulnerabilities. Consult counsel for your specific use case.
How do you calculate success rate for pagination? +
Pagination success is measured by yield versus expected count. If the category page says "Showing 1,000 results" and your pipeline extracts 980 records before the pagination loop terminates, your success rate for that job is 98%. Tracking expected vs. actual yield is the only way to catch infinite loops or truncated pagination.
$ dataflirt scope --new-project --target=success-rate READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h