← Glossary / HTTP Response

What is HTTP Response?

HTTP response is the payload a server returns after a client issues a request. It consists of a status line, headers, and an optional body containing the actual data — HTML, JSON, or binary. In scraping, the response is the raw material of your pipeline. If the headers indicate a cache hit but the body is a silent CAPTCHA challenge, your extraction layer will fail. Parsing the response correctly is the baseline for data quality.

Network LayerPayloadStatus CodesHeadersParsing
// 02 — definitions

The server
replies.

The anatomy of the data payload returned by the target server, and why reading the body isn't enough to guarantee a successful scrape.

Ask a DataFlirt engineer →

TL;DR

An HTTP response contains a status code, metadata headers, and the requested content. In production scraping, the response headers are just as critical as the body — they dictate rate limits, cache states, and anti-bot session cookies. Relying solely on a 200 OK status is a rookie mistake; many WAFs return 200s with poisoned or empty bodies.

01Definition & structure
An HTTP response is the message sent by a server back to a client after processing an HTTP request. It consists of three main parts:
  • Status Line — Contains the protocol version (e.g., HTTP/1.1 or HTTP/2) and a status code (e.g., 200 OK, 404 Not Found).
  • Headers — Metadata about the response, such as Content-Type, Content-Length, caching directives, and cookies.
  • Body — The actual payload. This could be HTML, JSON, an image, or empty (in the case of a HEAD request or a 204 No Content response).
02How it works in practice
When your scraper sends a request, the network library waits for the server to reply. The headers arrive first, allowing the client to determine how to handle the incoming body (e.g., allocating memory based on Content-Length, or preparing a decompression stream for Gzip). Once the body is fully downloaded, the response object is handed off to your extraction logic. If the connection drops mid-transfer, you get a truncated response.
03The silent failure of 200 OK
The most dangerous assumption in web scraping is that a 200 OK status means you got the data. Modern anti-bot systems frequently use "soft blocks" — they return a 200 OK status code but serve a CAPTCHA challenge or a generic "Access Denied" HTML page instead of the requested content. If your pipeline only monitors status codes, these failures will silently corrupt your dataset with empty or invalid records.
04How DataFlirt handles it
We treat every HTTP response as potentially hostile. Our edge proxies validate the response headers against the expected schema before the body is even fully downloaded. We scan the first few kilobytes of the payload for known WAF signatures (like Cloudflare's Turnstile scripts or DataDome's block pages). If a response fails validation, we immediately rotate the proxy and retry, ensuring that only clean, verified data reaches the extraction layer.
05Did you know?
A missing Content-Length header isn't an error — it usually means the server is using Transfer-Encoding: chunked. This allows the server to start sending the response before it knows the total size, which is common for dynamically generated pages or streaming APIs. Your HTTP client must read the stream until it receives a zero-length chunk, signaling the end of the response.
// 03 — response metrics

How do we measure
response health?

A successful pipeline doesn't just measure 200 OKs. DataFlirt tracks response latency, payload efficiency, and extraction yield to determine the true health of a target.

True Success Rate = (200sSoft_Blocks) / Total_Requests
Filters out fake 200 OKs that contain CAPTCHAs or access denied messages. DataFlirt pipeline SLO
Payload Efficiency = Extracted_Bytes / Total_Response_Bytes
Measures how much bandwidth is wasted on boilerplate HTML vs actual data. Network optimization metric
Time to Last Byte (TTLB) = TTFB + (Response_Size / Throughput)
The total time until the response body is fully downloaded and ready for parsing. Standard network latency model
// 04 — raw response trace

Parsing a response
at the edge.

A raw HTTP/2 response from a target server, showing the critical headers and the beginning of a chunked JSON payload.

HTTP/2JSONBrotli
edge.dataflirt.io — live
CAPTURED
// status line
HTTP/2 200 OK

// response headers
date: Tue, 19 May 2026 14:22:10 GMT
content-type: application/json; charset=utf-8
content-encoding: br
cache-control: max-age=300, public
x-ratelimit-remaining: 492 // healthy buffer
set-cookie: session_id=9f8a...; Secure; HttpOnly

// response body (decompressed)
{
"status": "success",
"data": [
{ "id": 1042, "price": 49.99 },
{ "id": 1043, "price": 54.99 }
]
}

pipeline.action: parsed and routed to extraction
// 05 — response anomalies

Where responses
deceive you.

The most common ways an HTTP response can look successful at the network layer but fail at the extraction layer. Ranked by frequency across DataFlirt's monitoring fleet.

RESPONSES SCANNED ·  ·    1.2B daily
ANOMALY RATE ·  ·  ·  ·   4.1%
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Soft blocks (Fake 200 OK)

38% of anomalies · Status is 200, body is a CAPTCHA or block page
02

Truncated body

26% of anomalies · Connection drops before body finishes downloading
03

Encoding mismatch

18% of anomalies · Headers claim UTF-8, body is ISO-8859-1
04

Stale cache hit

12% of anomalies · CDN returns old data despite cache-busting headers
05

Content-Length mismatch

6% of anomalies · Header size doesn't match actual downloaded bytes
// 06 — response validation

Trust the headers,

verify the payload.

DataFlirt doesn't just pass the response body to the extraction layer. We validate the response at the edge. We check the Content-Type against the expected schema, verify the Content-Length matches the downloaded bytes, and scan for known WAF challenge signatures in the first 4KB of the payload. If a response is poisoned, we drop it before it pollutes your dataset. Validation at the network layer prevents garbage data at the application layer.

Response Validation Trace

Edge validation of an incoming JSON response.

status.code 200 OK
content.type application/json
content.length 14,204 bytesverified
encoding brotlidecompressed
waf.signature none detected
schema.match valid JSON
pipeline.route extraction_queue

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about handling HTTP responses, parsing payloads, and dealing with deceptive server behavior.

Ask us directly →
Why do I get a 200 OK but no data? +
This is usually a soft block. Anti-bot systems like Cloudflare or DataDome often return a 200 OK status code but replace the actual HTML with a JavaScript challenge or a CAPTCHA page. Your HTTP client sees success, but your scraper finds no data. You must validate the response body, not just the status code.
How do I handle chunked transfer encoding? +
When a server sends Transfer-Encoding: chunked, it omits the Content-Length header and sends the body in pieces. Most modern HTTP clients (like httpx or requests) handle this automatically, reassembling the chunks before returning the response object. If you are writing a custom client, you must parse the chunk size markers manually.
What's the difference between TTFB and TTLB? +
Time to First Byte (TTFB) measures the latency from sending the request to receiving the first byte of the response headers. Time to Last Byte (TTLB) measures the time until the entire response body is downloaded. For large payloads, TTLB is the metric that actually impacts your pipeline throughput.
How does DataFlirt handle compressed responses? +
We advertise support for Brotli, Gzip, and Deflate in our Accept-Encoding headers to minimize bandwidth. Our edge workers decompress the response body transparently before passing it to the extraction layer. Brotli typically yields a 15-20% size reduction over Gzip for HTML payloads, significantly reducing egress costs.
Is it legal to scrape responses containing personal data? +
Scraping personal data introduces significant compliance burdens under GDPR, CCPA, and similar frameworks. Even if the data is publicly accessible, you must establish a lawful basis for processing it. DataFlirt strictly avoids scraping non-public personal data and enforces data minimization principles across all pipelines.
How do you scale response parsing for 100M+ pages? +
We decouple fetching from extraction. Responses are streamed directly to distributed object storage (S3) or fast message queues (Kafka) as raw bytes. Asynchronous worker pools then pull these responses, validate the headers, and run the extraction schemas in parallel. This prevents slow parsers from blocking the network I/O threads.
$ dataflirt scope --new-project --target=http-response READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h