← Glossary / Soft Block

What is Soft Block?

Soft block is an anti-bot mitigation strategy where a server intentionally degrades the client experience rather than dropping the connection outright. Instead of a hard 403 Forbidden, the scraper receives an interactive CAPTCHA challenge, a silent tarpit delay, or a fake 200 OK with poisoned data. It is designed to exhaust pipeline resources, corrupt downstream datasets, and force scraping engineers to spend hours debugging what looks like a successful fetch.

Anti-BotCAPTCHATarpittingData PoisoningWAF
// 02 — definitions

The illusion
of access.

Why modern WAFs prefer to waste your time and corrupt your data instead of just closing the connection.

Ask a DataFlirt engineer →

TL;DR

A soft block occurs when an anti-bot system flags a request but returns an interactive challenge, a delayed response, or deceptive HTML instead of a hard error. It is the most expensive failure mode in data extraction because it often registers as a successful 200 OK in monitoring dashboards while silently failing to deliver actual data.

01Definition & structure
A soft block is a defensive posture where a server responds to a suspected bot with a degraded or deceptive experience rather than a definitive HTTP error. Common forms include:
  • Interactive challenges: reCAPTCHA, hCaptcha, or Cloudflare Turnstile.
  • JavaScript challenges: Silent background computations required before the real HTML is served.
  • Tarpits: Holding the connection open and sending data at 1 byte per second.
  • Data poisoning: Returning a valid 200 OK but replacing prices or product names with fake data.
The goal is to increase the cost of scraping while providing a fallback mechanism for human users who were mistakenly flagged.
02How it works in practice
When a request hits the edge (e.g., Akamai or DataDome), the WAF calculates a bot score. If the score is >0.99, it issues a hard 403. If the score is between 0.70 and 0.98, it issues a soft block. The scraper receives a 200 OK or 403 with a challenge payload. If the scraper is a naive HTTP client (like Python's requests), it cannot execute the JS challenge and fails to extract data. If the scraper doesn't validate the output, it may silently write nulls to the database.
03The cost of silent failures
Soft blocks are dangerous because they evade standard monitoring. A dashboard tracking HTTP status codes will show a 100% success rate, even if every request is hitting a CAPTCHA page. This leads to schema drift alerts, missing data, and downstream pipeline corruption. The engineering time spent debugging a soft block is significantly higher than handling a straightforward IP ban.
04How DataFlirt handles it
We rely on strict schema validation and response heuristics. Every payload is checked for expected data density and known challenge signatures before it is considered successful. If a soft block is detected, our orchestration layer immediately terminates the connection, marks the proxy IP as burned for that specific target, and retries the request using a higher-tier residential IP with a perfectly aligned TLS and browser fingerprint.
05Did you know: Data Poisoning
Some advanced e-commerce targets use soft blocks to actively sabotage competitor intelligence. Instead of blocking a scraper, they return a perfectly formatted page where all prices have been randomly inflated by 15%. If the scraper doesn't have anomaly detection built into its extraction layer, the business will ingest the poisoned data and automatically misprice their own inventory in response.
// 03 — the cost model

How soft blocks
drain pipelines.

Hard blocks are cheap — you retry or rotate immediately. Soft blocks consume concurrency slots and pollute datasets. DataFlirt models the true cost of soft blocks to optimize our proxy rotation triggers.

Pipeline Drag (Tarpit) = D = workers × (tarpit_delay / timeout)
A 30s tarpit on 100 workers effectively reduces concurrency to near zero. DataFlirt Infrastructure Model
Poisoning Impact = P = fake_records / total_extracted
Fake 200 OKs inject nulls or decoy data, destroying dataset integrity. Data Quality SLO
DataFlirt Rotation Threshold = R = challenge_rate > 0.02
If >2% of requests hit a challenge page, the IP subnet is burned. Internal Routing Logic
// 04 — the silent failure

A 200 OK that
contains no data.

A scraper hitting a retail target. The WAF suspects automation but isn't certain, so it serves a soft block: a valid HTTP 200 response containing a JavaScript challenge instead of the product catalog.

HTTP 200JS ChallengeFalse Positive
edge.dataflirt.io — live
CAPTURED
// outbound request
GET /category/electronics HTTP/2
user-agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64)..."

// waf evaluation
ja3_hash: match_chrome
ip_reputation: suspicious_asn
action: issue_managed_challenge

// response
status: 200 OK // deceptive status
content-type: "text/html"
body_size: 14.2 KB // expected ~180 KB

// scraper extraction phase
dom.products: null
dom.title: "Just a moment..."
pipeline.status: silent failure — 0 records extracted
// 05 — trigger signals

What triggers a
managed challenge.

Soft blocks are typically deployed when a request falls into the 'grey area' of a WAF's confidence score. It's not obviously a bot, but it doesn't perfectly resemble a human.

CHALLENGE RATE ·  ·  ·    0.31% avg
FALSE POSITIVES ·  ·  ·   ~12% of blocks
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Datacenter IP range

high risk · AWS/DigitalOcean IPs almost always trigger challenges
02

Missing session cookies

medium risk · First-touch requests without tracking cookies
03

High request velocity

rate limit · Spikes in traffic from a single subnet
04

Generic fingerprint

low entropy · Default Playwright or Puppeteer signatures
05

Geographic mismatch

anomaly · US residential IP requesting localized Indian content
// 06 — our mitigation

Detect early,

rotate immediately.

DataFlirt treats soft blocks as hard failures. Our edge workers inspect the DOM structure and response timing before passing the payload to the extraction layer. If we detect a challenge page, a tarpit delay, or a honeypot injection, we instantly drop the session, flag the IP, and retry the request on a clean residential node. We never attempt to solve CAPTCHAs in real-time — bypassing the trigger is always cheaper and faster than solving the challenge.

Soft Block Detection

Live evaluation of a suspicious 200 OK response.

response.status 200 OK
response.ttfb 450ms
dom.title_match false
dom.has_captcha true
action.taken drop_and_rotate
proxy.status ip_burned
retry.status success on residential_IN

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About soft blocks, challenge pages, tarpitting, and how DataFlirt maintains extraction quality when targets lie about response status.

Ask us directly →
What is the difference between a hard block and a soft block? +
A hard block is a definitive network or HTTP rejection — a TCP reset, a 403 Forbidden, or a 429 Too Many Requests. A soft block is a deceptive response that keeps the connection alive but denies the actual data. This includes CAPTCHAs, infinite redirects, tarpits (intentionally slow responses), and fake 200 OKs with missing or poisoned data.
Why do WAFs use soft blocks instead of just blocking IPs? +
To handle false positives. If a WAF isn't 100% sure a request is a bot, a hard block might ban a legitimate human user. A soft block (like a Cloudflare Turnstile challenge) allows a real human to pass through while stopping naive scripts. Additionally, tarpits are used to tie up a scraper's concurrency limits, making the attack economically unviable.
How does DataFlirt handle CAPTCHAs? +
We don't solve them; we avoid them. Solving CAPTCHAs at scale is slow, expensive, and unreliable. Instead, we maintain high-quality residential proxy pools and coherent browser fingerprints so our requests never cross the risk threshold that triggers a CAPTCHA in the first place. If a challenge is served, we drop the IP and rotate.
What is a tarpit response? +
A tarpit is a soft block where the server intentionally drips the response back at a glacial pace — sometimes one byte per second. If your scraper doesn't have strict read timeouts configured, a tarpit can hold your worker threads open indefinitely, causing a complete pipeline stall without throwing a single error.
How do you detect a fake 200 OK? +
Through schema validation and DOM heuristics. We don't trust HTTP status codes. Every response is validated against the expected schema. If the target fields are missing, or if the page title matches known challenge signatures (e.g., "Just a moment..."), our pipeline flags it as a soft block and triggers a retry.
Is it legal to bypass a soft block? +
Bypassing a soft block to access publicly available data is generally treated the same as bypassing a hard block under current US and EU precedents (like hiQ v. LinkedIn). However, if the soft block is protecting authenticated or non-public data, bypassing it may violate the CFAA or equivalent statutes. We only scrape public, indexable surface web data.
$ dataflirt scope --new-project --target=soft-block READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h