← Glossary / Block Rate

What is Block Rate?

Block rate is the percentage of outbound scraping requests that fail to return the target payload due to active anti-bot countermeasures. It encompasses HTTP 403s, silent tarpits, CAPTCHA challenges, and poisoned HTML responses. For data engineering teams, it is the primary leading indicator of pipeline health—a rising block rate precedes data completeness drops and signals that your proxy pool or browser fingerprint is burning out.

Scraping PerformanceAnti-BotObservabilityProxy HealthSLOs
// 02 — definitions

The cost of
getting caught.

Why counting 200 OKs is a dangerous metric, and how to measure the real friction your pipeline faces at the edge.

Ask a DataFlirt engineer →

TL;DR

Block rate measures the proportion of requests intercepted by anti-bot systems. It is not just HTTP 403s—modern WAFs return 200 OK with CAPTCHAs or fake data. A healthy production pipeline maintains a block rate below 1%; anything above 5% indicates systemic fingerprint or proxy pool failure.

01Definition & structure

Block rate is the percentage of your scraping requests that are actively intercepted and denied by the target's infrastructure. It is the primary metric for evaluating the effectiveness of your proxy pool and anti-bot bypass mechanisms.

A block can manifest in several ways:

  • Explicit blocks: HTTP 403 Forbidden, 429 Too Many Requests, or TCP connection resets.
  • Interactive challenges: HTTP 200 or 403 responses containing a CAPTCHA (reCAPTCHA, Turnstile, DataDome).
  • Silent blocks: HTTP 200 OK responses that contain fake data, honeypots, or "Access Denied" text instead of the expected DOM.
02The silent block problem

The biggest mistake data teams make is calculating block rate based solely on HTTP status codes. Modern Web Application Firewalls (WAFs) intentionally return 200 OK for blocked requests to confuse naive scraping scripts. If your pipeline only alerts on 4xx and 5xx errors, a silent block will result in your database filling up with null values or CAPTCHA HTML.

True block rate must be calculated at the extraction layer: if the required schema fields are missing because the page structure is a WAF challenge, it counts as a block.

03What causes block rate spikes

Block rates rarely degrade slowly; they spike. The most common causes are:

  • IP Burnout: You pushed too much volume through a specific proxy ASN, and the target blacklisted the subnet.
  • Fingerprint Drift: The target updated its TLS or JavaScript challenge expectations, and your client's signature is now flagged as anomalous.
  • Honeypot Traps: Your crawler followed a hidden link designed to catch bots, resulting in an immediate IP ban.
04How DataFlirt handles it

We treat block rate as a real-time control signal, not just a reporting metric. Our extraction workers validate the schema of every response. If a worker detects a WAF challenge or poisoned HTML, it immediately flags the response as a block, quarantines the proxy IP, and rotates the TLS fingerprint for the retry.

By catching silent blocks instantly, we prevent poisoned data from reaching the client and keep our fleet-wide true block rate strictly under our 1% SLO.

05Did you know?

Some advanced anti-bot systems use "tarpitting" to manage bots without explicitly blocking them. Instead of returning a 403, the server accepts the connection but trickles the response back at 10 bytes per second. Your block rate stays at 0%, but your pipeline throughput drops to zero and your workers run out of memory holding open connections. Tracking read timeouts is essential for catching tarpits.

// 03 — the math

How to measure
true failure.

Naive block rate only counts 4xx/5xx errors. True block rate includes silent failures—CAPTCHAs, redirects, and schema validation drops caused by poisoned HTML. DataFlirt monitors the latter.

Naive Block Rate = (HTTP_403 + HTTP_429) / Total_Requests
Only measures network-layer rejections. Dangerously optimistic. Standard HTTP telemetry
True Block Rate = (Explicit_Blocks + Silent_Blocks) / Total_Requests
Includes 200 OK responses that contain CAPTCHAs or fail schema validation. DataFlirt extraction layer
DataFlirt Pipeline Health = 1 − (Valid_Records / Expected_Records)
The ultimate business metric. If this drops, the block rate is usually the culprit. Internal SLO
// 04 — pipeline telemetry

Detecting a silent
block in real time.

A trace of a DataFlirt worker hitting a Cloudflare challenge. The network layer reports success, but the extraction layer catches the block and triggers an auto-rotation.

Schema ValidationAuto-HealingProxy Rotation
edge.dataflirt.io — live
CAPTURED
// pipeline: ecom-pricing-eu
worker.id: "w-492-alpha"
req.url: "https://target.com/p/19283"
res.status: 200 OK
res.body.length: 14,201

// extraction phase
schema.price: missing // selector failed
schema.title: "Just a moment..." // Cloudflare challenge detected
event: silent_block_detected

// telemetry & response
metric.block_rate_5m: 0.08 // 8% threshold breached
action: quarantine_proxy_subnet
action: rotate_tls_fingerprint

// retry with new identity
req.url: "https://target.com/p/19283"
schema.price: "€49.99"
pipeline.status: recovered
// 05 — failure modes

What drives block
rate spikes.

Ranked by their contribution to sudden block rate increases across DataFlirt's monitored pipelines. IP reputation is the most common trigger, but fingerprint mismatches are the hardest to fix.

PIPELINES MONITORED ·   300+ active
WINDOW ·  ·  ·  ·  ·  ·   90d trailing
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

IP Reputation / ASN Burnout

primary trigger · Target bans the entire datacenter or proxy subnet
02

TLS/JA3 Fingerprint Mismatch

structural · Client hello doesn't match the advertised User-Agent
03

Request Velocity / Rate Limits

operational · Crawling faster than the target's WAF threshold allows
04

Header Order Anomalies

structural · HTTP/2 pseudo-headers sent in non-browser order
05

JavaScript Challenge Failures

execution · Headless browser fails to solve Turnstile or DataDome JS
// 06 — our approach

Measure the payload,

not the HTTP status code.

DataFlirt's observability stack doesn't trust the network layer to report success. We calculate block rate at the extraction layer. If a request returns a 200 OK but fails schema validation because the DOM contains a CAPTCHA or a DataDome block page, it is logged as a block. This prevents poisoned data from silently corrupting your downstream warehouse and triggers automated proxy rotation before the target permanently bans the subnet.

pipeline-telemetry.json

Live block rate telemetry for a high-volume retail pipeline.

pipeline.id ecom-eu-daily
requests.total 1,240,500
status.403 1,204ok
status.200_captcha 4,192warn
true_block_rate 0.43%within SLO
proxy_pool.health 98.2%stable
auto_rotation active

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about measuring block rates, handling silent failures, and keeping pipelines healthy at scale.

Ask us directly →
What is an acceptable block rate? +
For a production pipeline using residential proxies, the true block rate should remain under 1%. For datacenter proxies targeting less aggressive sites, under 3% is typical. If your block rate exceeds 5%, you are burning through your proxy pool and need to adjust your concurrency or fingerprinting strategy immediately.
Why is my block rate 0% but my data is missing? +
You are measuring naive block rate. Modern anti-bot systems (like Cloudflare and PerimeterX) rarely return 403s to sophisticated bots. Instead, they return a 200 OK with a CAPTCHA page, a soft block, or poisoned HTML. If you don't validate the extracted schema, your pipeline will report 100% success while delivering garbage data.
How does proxy rotation affect block rate? +
Rotating IPs resets rate-limit counters, which lowers block rates caused by velocity. However, if you rotate into a burned ASN or a proxy pool with poor reputation, your block rate will spike instantly regardless of your request speed. Quality matters more than quantity.
How does DataFlirt keep block rates low? +
We use predictive rate limiting based on target behavior, maintain high fingerprint diversity (matching TLS, HTTP/2, and browser profiles perfectly), and route traffic through premium residential ISP pools. More importantly, our extraction layer catches silent blocks instantly, quarantining bad IPs before they drag down the pipeline.
Should I retry blocked requests immediately? +
No. Immediate retries with the same IP and fingerprint will result in another block and further damage your IP reputation. Implement exponential backoff, and ensure the retry uses a completely fresh identity (new proxy, new TLS fingerprint, cleared cookies).
Does a high block rate mean I'm legally at risk? +
A high block rate usually indicates you are hitting technical limits (WAF thresholds), not legal ones. However, aggressively hammering a server and ignoring 429 Too Many Requests or 403 Forbidden responses is poor practice and can be construed as a denial-of-service attempt. Respecting target infrastructure is both legally prudent and operationally necessary.
$ dataflirt scope --new-project --target=block-rate READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h