← Glossary / Chunked Transfer

What is Chunked Transfer?

Chunked transfer encoding is an HTTP/1.1 mechanism that allows a server to send a response in a series of discrete blocks, or "chunks," without knowing the total payload size upfront. For data pipelines, it's a double-edged sword: it enables efficient streaming of massive JSONL datasets without memory bloat, but it's also a common vector for anti-bot tarpits designed to bleed your scraper's connection pool dry.

HTTP/1.1StreamingTarpitsMemory ManagementNDJSON
// 02 — definitions

Streaming without
a ceiling.

How HTTP handles payloads that are too large, too slow, or too dynamic to measure before sending.

Ask a DataFlirt engineer →

TL;DR

Instead of sending a Content-Length header, the server sends Transfer-Encoding: chunked and streams data in hex-sized blocks. It's essential for consuming large APIs or NDJSON feeds, but requires your HTTP client to parse streams rather than waiting for a single monolithic response.

01Definition & structure

Chunked transfer encoding is a data transfer mechanism in HTTP/1.1. When a server uses it, it omits the Content-Length header and instead sends the Transfer-Encoding: chunked header.

The payload is then transmitted as a series of chunks. Each chunk begins with its size in hexadecimal, followed by a CRLF (carriage return / line feed), the actual data, and another CRLF. The transmission is terminated by a final chunk of length zero. This allows the server to maintain an open connection and stream dynamically generated content without buffering it first.

02How it works in practice

In data engineering, chunked transfer is most commonly encountered when consuming large APIs, NDJSON (Newline Delimited JSON) feeds, or Server-Sent Events (SSE). Because the data arrives in pieces, your HTTP client must be configured to yield lines or chunks iteratively.

If you naively call response.text() on a 5GB chunked response, your scraper will attempt to buffer the entire 5GB into RAM, resulting in an Out-of-Memory (OOM) crash. Proper handling requires streaming the response directly to disk or a parsing pipeline.

03The tarpit defense

Anti-bot vendors weaponise chunked encoding to create tarpits. When they detect a scraper, instead of returning a 403 Forbidden, they return a 200 OK with chunked encoding. They then send a few bytes of garbage data every 15 seconds.

Because data is technically still flowing, standard request timeouts are never triggered. If your scraper doesn't enforce a strict read timeout (the maximum time allowed between bytes), the connection stays open indefinitely, eventually exhausting your worker pool and halting the entire pipeline.

04How DataFlirt handles it

We use chunked transfer extensively on the delivery side. When clients request massive historical datasets, we stream the records directly from our data warehouse to the client via NDJSON. This ensures instant TTFB and zero memory bloat on our egress nodes.

On the ingestion side, our fetch layer is hardened against tarpits. We enforce strict inter-chunk read timeouts. If a target server stalls mid-stream, we sever the TCP connection, log the anomaly, and automatically retry the request using a different proxy route and fingerprint.

05HTTP/2 and HTTP/3

Chunked transfer encoding is specific to HTTP/1.1. In HTTP/2 and HTTP/3, the concept of "chunking" is built directly into the protocol's framing layer via DATA frames. You won't see a Transfer-Encoding: chunked header in an HTTP/2 response, but the operational semantics—streaming data without a known total length—are identical.

// 03 — the math

Calculating stream
efficiency.

Chunked transfer shifts the bottleneck from memory to network I/O. DataFlirt monitors chunk arrival variance to detect tarpits before they exhaust worker threads.

Chunk Overhead = O = hex_length_bytes + 4
Every chunk adds a hex size string and two CRLF sequences. RFC 7230
Tarpit Detection = T = Δt_chunk > timeout_threshold
If the time between chunks exceeds the threshold, drop the connection. DataFlirt network heuristics
Memory Footprint = M = max(chunk_size) × concurrency
Streaming keeps memory bounded by chunk size, not total payload size. Streaming architecture principles
// 04 — wire trace

Reading a chunked
NDJSON stream.

A raw HTTP/1.1 trace showing the hex-encoded chunk sizes and the zero-length terminating chunk. Notice the absence of a Content-Length header.

HTTP/1.1NDJSONStreaming
edge.dataflirt.io — live
CAPTURED
// Inbound HTTP headers
HTTP/1.1 200 OK
Content-Type: application/x-ndjson
Transfer-Encoding: chunked
Connection: keep-alive

// Chunk 1 (hex size 24 = 36 bytes)
24
{"id":"A1","price":14.99,"stock":true}

// Chunk 2 (hex size 25 = 37 bytes)
25
{"id":"A2","price":89.00,"stock":false}

// Terminating chunk
0

// Connection state
stream.status: COMPLETE
// 05 — failure modes

Where chunked
streams break.

Streaming data introduces temporal failure modes that monolithic requests don't face. Ranked by frequency across DataFlirt's ingestion tier.

PIPELINES MONITORED ·   300+ active
PROTOCOL ·  ·  ·  ·  ·    HTTP/1.1
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Malicious tarpitting

% of failures · Anti-bot holds connection open with 1-byte chunks
02

Premature connection drop

% of failures · TCP reset before the 0-byte terminating chunk
03

Memory leaks

% of failures · Client buffers entire stream instead of yielding
04

Malformed hex framing

% of failures · Bad proxy or WAF interference corrupts the stream
05

JSON parse errors

% of failures · Chunk boundary splits a JSON object mid-string
// 06 — our stack

Stream everything,

buffer nothing.

DataFlirt's delivery architecture is built entirely on chunked streaming. When a client requests a 50GB dataset, we don't build it in memory or write it to a temp disk. We stream records directly from the database to the client via NDJSON over chunked HTTP. If the client's connection drops, the query halts. Zero memory bloat, instant time-to-first-byte.

streaming.delivery.log

Live metrics from a 10M record export.

export.id exp_992a_b2
transfer.mode chunkedactive
records.streamed 8,402,110
memory.usage 42 MBstable
chunk.variance 12mslow
client.backpressure detected

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About streaming protocols, anti-bot tarpits, memory management, and how DataFlirt handles massive payloads.

Ask us directly →
Why use chunked transfer instead of Content-Length? +
When a server generates data dynamically — like querying a massive database or streaming real-time events — it doesn't know the total size upfront. Waiting to calculate the size would require buffering the entire response in memory. Chunked transfer allows the server to start sending data immediately, drastically reducing Time to First Byte (TTFB).
How do anti-bot systems use chunked encoding against scrapers? +
They use it to create "tarpits." Instead of blocking your IP, the WAF accepts the connection and sends a valid 200 OK with chunked encoding. It then sends one byte every 10 seconds. If your scraper only has a total request timeout, it will sit there for hours, exhausting your connection pool and thread limits.
Does HTTP/2 support chunked transfer? +
No. Chunked encoding is strictly an HTTP/1.1 mechanism. HTTP/2 and HTTP/3 use their own native framing layers (DATA frames) to stream content. However, the conceptual model — streaming data without a known total length — remains the same, and most HTTP clients abstract the difference away.
How do I parse a chunked JSON response? +
Do not use standard JSON parsers that expect a complete string (like json.loads(response.text)). You need a streaming JSON parser (like ijson in Python or JSONStream in Node.js) that can yield objects as they arrive over the wire, keeping your memory footprint flat.
How does DataFlirt handle target tarpits? +
We enforce strict read timeouts per chunk, not just total request timeouts. If a target server takes more than 5 seconds to deliver the next chunk, our network layer drops the connection, flags the proxy IP, and retries the request through a different route. We never let a target hold our worker threads hostage.
Can proxies interfere with chunked encoding? +
Yes. Poorly configured transparent proxies or cheap residential gateways often attempt to buffer the entire response before forwarding it to the client. This breaks the streaming behavior, causes massive latency spikes, and often results in 502 Bad Gateway errors when the proxy runs out of memory.
$ dataflirt scope --new-project --target=chunked-transfer READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h