← Glossary / Response Body

What is Response Body?

Response body is the payload of an HTTP transaction — the actual HTML, JSON, or binary data returned by the server after the headers. In a scraping pipeline, it is the raw material that feeds the extraction layer. While headers dictate how the connection is managed, the response body dictates whether your pipeline actually captures value or just burns bandwidth on empty, poisoned, or malformed bytes.

HTTP PayloadData ExtractionChunked TransferGzip/BrotliNetwork Layer
// 02 — definitions

The payload
that matters.

The actual bytes delivered by the target server, and why capturing them reliably at scale is harder than just calling a read function.

Ask a DataFlirt engineer →

TL;DR

The response body contains the target data requested by the client. In production scraping, bodies are often compressed, chunked, or intentionally poisoned by anti-bot systems. Handling them requires robust decompression, encoding normalization, and strict size limits to prevent memory exhaustion across concurrent workers.

01Definition & structure
The response body is the data payload returned by a web server in response to an HTTP request. In the raw HTTP protocol, it is the sequence of bytes that immediately follows the headers and a blank line (double CRLF). Depending on the Content-Type header, this body could be an HTML document, a JSON string, an XML feed, or a binary file like a PDF or image.
02Compression and encoding
To save bandwidth, modern servers rarely send response bodies as plain text. If your scraper sends an Accept-Encoding: gzip, br header, the server will compress the body. Your HTTP client must decompress the byte stream before it can be parsed. Additionally, the bytes must be decoded into characters using the correct character set (usually UTF-8), otherwise special characters and currency symbols will render as corrupted mojibake.
03Chunked transfer encoding
When a server generates a response dynamically, it may not know the total size of the body upfront. Instead of waiting to calculate a Content-Length, it sends the body in pieces using Transfer-Encoding: chunked. The scraper receives the data as a stream of chunks, each prefixed with its size, ending with a zero-length chunk. Robust HTTP clients handle this transparently, but network interruptions during chunked transfers result in truncated, unparseable bodies.
04How DataFlirt handles it
We treat response bodies as untrusted streams. Our fetch layer reads bodies iteratively, enforcing strict maximum size limits defined by the pipeline's schema contract. If a target unexpectedly serves a 500MB video file instead of a JSON response, the connection is severed after the first few kilobytes. We also enforce strict timeout bounds on the read operation to prevent slow-loris style attacks from tying up our worker nodes.
05The "poisoned body" trap
Relying solely on HTTP status codes is dangerous. Advanced anti-bot systems (like Akamai or DataDome) will often return a 200 OK status code, but replace the actual response body with a JavaScript challenge, a CAPTCHA page, or subtly altered fake data. A pipeline that doesn't validate the structural integrity of the response body before extraction will silently ingest this poisoned data, corrupting downstream databases.
// 03 — payload math

Measuring the
byte stream.

Response bodies dictate network egress costs and parser CPU time. DataFlirt tracks compression ratios and parse times to optimize worker allocation per pipeline.

Compression ratio = C = sizeraw / sizecompressed
Brotli typically achieves 20-30% better compression than Gzip on JSON payloads. Network optimization standard
Parse time bound = Tparse = body_bytes / throughputparser
JSON parses orders of magnitude faster than an equivalent HTML DOM tree. Extraction layer SLO
DataFlirt memory budget = Mmax = workers × limitbody
Strict per-request body size limits prevent a single rogue 2GB PDF from crashing a node. Internal infrastructure limit
// 04 — the wire trace

Receiving a chunked
JSON payload.

A live trace of an HTTP/2 response stream. The headers arrive first, followed by the compressed body delivered in chunks.

HTTP/2BrotliChunked Transfer
edge.dataflirt.io — live
CAPTURED
// inbound HTTP/2 response
:status: 200 OK
content-type: "application/json; charset=utf-8"
content-encoding: "br"

// body stream initiated
stream.chunk_01: 16.4 KB received
stream.chunk_02: 32.8 KB received
stream.chunk_03: 12.1 KB received
stream.state: EOF

// decompression & validation
action.decompress: brotli -> utf-8
body.size_raw: 142,048 bytes
body.json_valid: true

// extraction handoff
pipeline.status: payload captured
// 05 — failure modes

Where response bodies
break pipelines.

Ranked by frequency across DataFlirt's extraction layer. A 200 OK status code guarantees nothing about the integrity or usefulness of the response body.

PIPELINES MONITORED ·   300+ active
BODY FAILURES ·  ·  ·  ·  per 1M reqs
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Anti-bot poisoning

silent failure · 200 OK, but body contains a CAPTCHA or fake data
02

Truncated payloads

parse error · Connection drops mid-transfer, leaving unclosed JSON/HTML
03

Encoding mismatches

data corruption · Headers declare UTF-8, body is actually Windows-1252
04

Memory exhaustion

node crash · Unbounded read on an unexpectedly massive file
05

Malformed syntax

parse error · Target API returns invalid JSON (e.g., trailing commas)
// 06 — our architecture

Stream the bytes,

never buffer the ocean.

Reading a response body directly into memory is a rookie mistake that scales poorly. When a target server hangs or sends a 2 GB file instead of a 20 KB JSON response, a buffering scraper crashes with an Out-Of-Memory error. DataFlirt streams all response bodies through a size-bounded, timeout-aware buffer. If the content type is wrong or the size exceeds the pipeline contract, we terminate the socket early. We only pay for the bytes we actually want to parse.

body-stream.worker

Live state of a streaming response reader on a DataFlirt extraction node.

target.url /api/v2/catalog
content.type application/jsonexpected
encoding brotli
stream.state reading
bytes.received 45,102 / 150,000 maxsafe
buffer.status healthy

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about handling HTTP response bodies, compression, and avoiding memory traps at scale.

Ask us directly →
What is the difference between HTTP headers and the response body? +
Headers are the metadata — they tell the client what the data is, how it's compressed, and set cookies. The response body is the actual payload — the HTML document, the JSON object, or the image file. They are separated in the raw HTTP message by a double CRLF (carriage return, line feed).
Why do I get a 200 OK status but an empty response body? +
This is a common anti-bot tarpit technique. The server accepts your request to avoid triggering your retry logic, but intentionally drops the body to starve your extraction layer. It can also happen if your request headers (like Accept or Content-Type) are malformed and the target API fails silently.
How does compression affect scraping speed? +
Requesting compressed bodies (via the Accept-Encoding header) drastically reduces network transfer time and egress costs, but requires CPU cycles to decompress. At scale, the network savings always outweigh the CPU cost. Brotli (br) is the modern standard and should be preferred over Gzip when the target supports it.
What happens if a response body is too large for my scraper? +
If you use a standard blocking read (like Python's requests.get().text), your process will buffer the entire payload into RAM. If the file is 5GB, your worker crashes. Production scrapers must stream the response and enforce a maximum byte limit, terminating the connection if the limit is breached.
Can I scrape just the headers without downloading the body? +
Yes, by using an HTTP HEAD request instead of a GET request. The server will return the exact same headers it would for a GET, but omit the body entirely. This is highly efficient for checking cache freshness (via ETag or Last-Modified) or verifying file sizes before downloading.
How does DataFlirt handle encoding mismatches in the body? +
We don't blindly trust the Content-Type header. If a payload fails to decode as UTF-8, our extraction layer falls back to chardet-style heuristic analysis on the raw byte stream to determine the actual encoding (e.g., Windows-1252 or ISO-8859-1), normalising it to UTF-8 before it hits the parsers.
$ dataflirt scope --new-project --target=response-body READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h