← Glossary / HTTP Status Code

What is HTTP Status Code?

HTTP status codes are the three-digit integers returned by a server to indicate the outcome of a client's request. In web scraping, they are the primary control signal for pipeline orchestration. A 200 means you have data to parse, a 429 means your concurrency is too high, and a 403 means your fingerprint or IP has been burned. Misinterpreting these codes leads to infinite retry loops, poisoned datasets, or permanent IP bans.

Network LayerError HandlingPipeline ControlRFC 9110
// 02 — definitions

The pipeline's
control plane.

Status codes dictate the immediate next action for a scraper worker: parse, retry, rotate proxy, or abort.

Ask a DataFlirt engineer →

TL;DR

HTTP status codes are grouped into five classes. For scrapers, 2xx means success, 3xx requires redirect following, 4xx indicates a client-side issue (often a block or rate limit), and 5xx means the target server is failing. Production pipelines use these codes to trigger automated recovery workflows rather than failing the extraction job.

01Definition & structure
An HTTP status code is a three-digit integer defined by RFC 9110 that indicates the result of an HTTP request. They are divided into five classes:
  • 1xx (Informational): Request received, continuing process. Rarely seen in scraping.
  • 2xx (Successful): The action was successfully received, understood, and accepted.
  • 3xx (Redirection): Further action must be taken to complete the request.
  • 4xx (Client Error): The request contains bad syntax or cannot be fulfilled (e.g., blocks, rate limits).
  • 5xx (Server Error): The server failed to fulfill an apparently valid request.
02How it works in practice
The status code is transmitted in the very first line of the HTTP response header (e.g., HTTP/2 200 OK). Because it arrives before the response body, it is the fastest way for a scraper to determine if a request succeeded. High-performance pipelines read the status code and immediately drop the connection if it's a 4xx or 5xx, saving the bandwidth and CPU cost of downloading and parsing an error page.
03The "Fake 200" problem
Relying solely on HTTP status codes is dangerous. Modern anti-bot systems frequently return a 200 OK status code while serving a CAPTCHA page or a silent JavaScript challenge instead of the requested data. If your pipeline assumes 200 means success without validating the DOM payload, you will silently ingest poisoned data. Robust pipelines always pair status code checks with schema validation.
04How DataFlirt handles it
We treat status codes as a real-time control plane. Our orchestration layer maps every non-200 response to an automated recovery workflow. A 403 automatically burns the current proxy IP and requests a new TLS fingerprint. A 429 triggers a dynamic backoff algorithm that throttles the concurrency for that specific target domain. By the time data reaches your S3 bucket, the network chaos has been entirely abstracted away.
05Did you know?
The HTTP standard includes several esoteric status codes. 418 I'm a teapot was defined in 1998 as an April Fools' joke (RFC 2324) but is occasionally used by developers as an easter egg block page. 451 Unavailable For Legal Reasons is a real status code used when content is blocked due to government censorship or DMCA takedown requests.
// 03 — pipeline metrics

How status codes
drive telemetry.

DataFlirt aggregates status codes across millions of requests to calculate real-time pipeline health, adjust concurrency, and trigger automatic proxy rotation.

Success Rate = S = Σ(2xx) / Σ(All_Requests)
Target > 99.5% for stable pipelines. Drops indicate selector rot or new anti-bot deployment. DataFlirt Pipeline SLO
Block Rate = B = Σ(403 + 401 + 451) / Σ(All_Requests)
Triggers immediate IP cooldown and fingerprint rotation if B > 0.02. Proxy Management Heuristic
Retry Rate = R = Σ(429 + 5xx) / Σ(All_Requests)
Measures target instability and concurrency limits. High R triggers automatic backoff. Orchestration Layer
// 04 — worker execution trace

Navigating a hostile
status code sequence.

A standard DataFlirt worker encountering a rate limit, a WAF block, and finally achieving a successful fetch via automated proxy rotation.

HTTP/2auto-retryproxy rotation
edge.dataflirt.io — live
CAPTURED
// attempt 1: direct fetch
GET /api/v1/catalog?page=4
status: 429 Too Many Requests
action: backoff_triggered (2000ms)

// attempt 2: retry after backoff
GET /api/v1/catalog?page=4
status: 403 Forbidden // WAF escalated block
action: proxy_rotation_requested

// attempt 3: fresh residential IP + new fingerprint
GET /api/v1/catalog?page=4
status: 200 OK
content_length: 142,850
pipeline: record_extracted
// 05 — failure distribution

The most common
scraping roadblocks.

Distribution of non-200 status codes across DataFlirt's global scraping fleet. 403s dominate due to aggressive modern anti-bot deployments on surface web targets.

SAMPLE SIZE ·  ·  ·  ·    1.2B requests
WINDOW ·  ·  ·  ·  ·  ·   7d trailing
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

403 Forbidden

68% of errors · Anti-bot block, WAF rule, bad fingerprint
02

429 Too Many Requests

15% of errors · Concurrency limits, IP rate limiting
03

503 Service Unavailable

8% of errors · Target server overload, Cloudflare tarpit
04

404 Not Found

6% of errors · Dead links, expired product listings
05

502 Bad Gateway

3% of errors · Upstream proxy failure, target timeout
// 06 — DataFlirt's control plane

Never fail on,

a recoverable status code.

A naive scraper treats a 403 as a fatal error. DataFlirt treats it as a routing signal. Our orchestration layer maps every HTTP status code to a specific recovery matrix. 429s trigger exponential backoff and concurrency throttling. 403s trigger fingerprint rotation and IP blacklisting. 5xx errors trigger circuit breakers to prevent target DDoS. We abstract the network chaos so your extraction logic only ever sees a 200 OK.

Status Code Recovery Matrix

How our edge workers respond to specific HTTP status codes in real-time.

status.200 parse_payload
status.301 follow_redirect
status.403 rotate_proxyrotate_fingerprint
status.404 mark_record_dead
status.429 throttle_concurrencybackoff
status.503 circuit_breaker_open

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About handling HTTP errors, rate limits, fake 200s, and how DataFlirt ensures data delivery despite hostile network responses.

Ask us directly →
What is the difference between a 401 and a 403 in scraping? +
A 401 Unauthorized means you lack valid authentication credentials for a protected endpoint. A 403 Forbidden means the server understands your request but refuses to fulfill it — in scraping, this almost always means your IP or TLS fingerprint has been flagged by a Web Application Firewall (WAF) or anti-bot system.
Why do I get a 200 OK but the page is empty or shows a CAPTCHA? +
This is known as a "soft block" or a fake 200. Sophisticated anti-bot systems (like DataDome or PerimeterX) often return a 200 status code with poisoned HTML or a JavaScript challenge to confuse naive scrapers that only check the HTTP status code. You must validate the DOM payload, not just the network response.
How should a scraper handle a 429 Too Many Requests? +
Stop immediately. Read the Retry-After header if present, and implement exponential backoff. Continuing to hammer a server returning 429s will almost always escalate the temporary rate limit into a permanent 403 block or an IP ban.
What are Cloudflare's custom 52x status codes? +
Cloudflare uses the 520–527 range to indicate issues between their edge nodes and the target's origin server. For a scraper, a 521 (Web Server Down) or 522 (Connection Timed Out) means the target infrastructure is actually offline, not that your scraper has been blocked.
How does DataFlirt guarantee data delivery when targets return 5xx errors? +
We decouple extraction from delivery. If a target is throwing 503s, our circuit breakers pause the job and queue it for later. Your data delivery SLA is protected by our asynchronous retry queues, which automatically resume the crawl once the target server recovers.
Is it legal to scrape a site that returns a 403? +
A 403 is a technical barrier, not a legal one. However, bypassing technical barriers can intersect with the CFAA (in the US) or equivalent laws depending on jurisdiction and whether the data is public. We only bypass 403s on public, unauthenticated data where the block is based on bot-mitigation heuristics, not access control.
$ dataflirt scope --new-project --target=http-status-code READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h