← Glossary / Read Timeout

What is Read Timeout?

Read timeout occurs when a scraping client successfully establishes a TCP connection and sends an HTTP request, but the target server fails to return the response bytes within the configured time limit. In data pipelines, it's rarely a simple network glitch. It is usually a symptom of an overloaded target database, a slow residential proxy exit node, or an anti-bot tarpit intentionally holding the connection open to exhaust your worker threads.

Network LayerConcurrencyTarpittingProxy LatencyError Handling
// 02 — definitions

Waiting for
bytes.

The difference between failing to connect and being left on read, and why the latter is far more dangerous to your pipeline's throughput.

Ask a DataFlirt engineer →

TL;DR

A read timeout means the TCP handshake succeeded and the request was sent, but the server didn't respond in time. Unlike connection timeouts which fail fast, read timeouts tie up worker threads and proxy connections for 30–60 seconds, causing cascading concurrency exhaustion across your scraping fleet.

01Definition & structure
A read timeout happens during the data transfer phase of an HTTP request. The client has successfully resolved the DNS, completed the TCP handshake, negotiated TLS, and sent the request headers. The connection is open, but the client is waiting for the server to send the response body. If the server takes longer than the client's configured timeout limit to send the next chunk of data, the client aborts the connection and throws a read timeout error.
02Connection vs. Read Timeout
A connection timeout is a failure to reach the server at all. It fails fast, usually within milliseconds, and immediately frees up the worker thread to try another proxy. A read timeout is a slow bleed. The connection is established, so the worker thread sits idle, waiting for bytes that may never come. Without strict timeout limits, read timeouts will silently consume your entire concurrency pool.
03Anti-bot tarpitting
Modern anti-bot systems use read timeouts as an offensive weapon. Instead of blocking your IP with a 403 Forbidden, which tells you exactly what happened, they route your request to a tarpit. The tarpit accepts the connection and trickles the response back at a few bytes per second. If your scraper uses default HTTP client settings, it will hang indefinitely, effectively neutralizing your scraping infrastructure without firing a single block alert.
04How DataFlirt handles it
We treat read timeouts as a concurrency threat, not just a network error. Our fetch layer enforces dynamic timeout budgets based on the target's historical Time to First Byte (TTFB). We also enforce minimum bandwidth thresholds. If a connection is open but receiving data at less than 10 KB/s, we assume it is a tarpit or a dying proxy, sever the connection, and retry immediately. This keeps our worker pool highly fluid.
05The proxy layer tax
When using residential proxies, read timeouts are often false positives. The target server may have generated and sent the response instantly, but the residential exit node (a consumer's laptop or phone) lost its cellular connection mid-transfer. The bytes are dropped at the proxy layer, leaving your client waiting. This is why aggressive retries on fresh proxy IPs are the standard mitigation for read timeouts in residential pools.
// 03 — the timeout model

Calculating the
read budget.

Setting a static 30s timeout is a rookie mistake. DataFlirt calculates dynamic read timeout budgets per target based on historical TTFB (Time to First Byte) and expected payload size.

Dynamic read timeout = Tread = μ(TTFB) + (3 × σ(TTFB)) + (Payload / Bandwidth)
Baseline response time plus variance, plus time to stream the bytes. DataFlirt network heuristics
Worker thread exhaustion = Wblocked = Req_Rate × Tread
If T_read is 60s and you send 10 req/s, you need 600 idle threads just to wait. Concurrency planning
DataFlirt fast-fail threshold = Tabort = p99(TTFB) + 2.5s
Kill the connection aggressively if it exceeds the 99th percentile response time. Internal SLO
// 04 — the wire trace

A 30-second bleed
on a single thread.

Trace of a worker thread hitting a tarpit. The connection succeeds instantly, but the server trickles bytes at 14 bytes per second until the client aborts.

TCP/TLS OKTarpit detectedThread released
edge.dataflirt.io — live
CAPTURED
// connection phase
tcp.handshake: 42ms OK
tls.negotiation: 118ms OK
http.request_sent: 121ms

// waiting for response
ttfb: 14,500ms // unusually slow
http.status: 200 OK

// read phase (tarpit)
bytes_received: 1024 ... [t=15.0s]
bytes_received: 2048 ... [t=22.0s]
bytes_received: 2156 ... [t=29.5s]
read_rate: 14 bytes/sec

// intervention
worker.interrupt: ReadTimeoutError
action: connection_dropped
status: thread_released
// 05 — root causes

Why servers stop
sending data.

Ranked by frequency across DataFlirt's monitoring of enterprise pipelines. Anti-bot tarpitting is the most destructive, but target infrastructure limits are the most common.

PIPELINES MONITORED ·   400+ active
TIMEOUT EVENTS ·  ·  ·    30d trailing
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Target database overload

infrastructure · Heavy search queries timing out the backend
02

Anti-bot tarpitting

security · Intentional slow-loris response to bots
03

Residential proxy latency

network · Slow exit nodes dropping packets mid-stream
04

Massive payload generation

data volume · Unpaginated JSON/CSV exports
05

WAF deep inspection

security · Security layer holding response for scanning
// 06 — DataFlirt's approach

Fail fast,

recover instantly, protect the thread pool.

A read timeout is a concurrency killer. If you have 100 workers and a target tarpits you for 60 seconds, your pipeline throughput drops to zero instantly. DataFlirt uses predictive TTFB modeling and minimum-bandwidth thresholds to detect stalled reads early. We sever the connection, rotate the proxy, and retry before the standard HTTP client timeout would even trigger.

timeout.policy.config

Dynamic timeout profile for a high-latency e-commerce target.

target.domain api.retailer.com
connect.timeout 3.0s
ttfb.p99_baseline 4.2s
read.absolute_max 15.0s
read.min_bandwidth 50 KB/s
action.on_stall abort_and_rotate
pool.health 99.8% threads active

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About network timeouts, proxy latency, tarpitting, and how DataFlirt protects pipeline concurrency.

Ask us directly →
What's the difference between a connection timeout and a read timeout? +
A connection timeout means your client couldn't establish a TCP handshake with the server, often due to IP bans or dead proxies. A read timeout means the connection succeeded and you asked for the data, but the server took too long to send the response bytes. Connection timeouts fail fast; read timeouts bleed your concurrency slowly.
Why do residential proxies cause more read timeouts? +
Residential proxies route traffic through real consumer devices on home Wi-Fi or cellular networks. If the homeowner's connection drops or lags while streaming the response back to you, your client experiences a read timeout, even if the target server replied instantly. It is a network-layer bottleneck, not a target-layer block.
How does a tarpit use read timeouts against scrapers? +
Tarpitting is an anti-bot technique where the server intentionally accepts your connection but sends the response at an agonizingly slow rate, like 1 byte per second. This ties up your scraper's worker threads. If you don't enforce strict read timeouts, a tarpit can paralyze your entire scraping fleet in minutes.
What is a good default read timeout value? +
There is no universal default. For static HTML, 10 seconds is generous. For heavy, unpaginated API search queries, 30 to 60 seconds might be required. The best practice is to measure the target's p99 response time and set your timeout just above that threshold to fail fast on anomalies.
How does DataFlirt prevent read timeouts from exhausting concurrency? +
We do not rely on static timeouts. Our engine monitors the Time to First Byte (TTFB) and the ongoing byte-receive rate. If a connection drops below a minimum bandwidth threshold, we preemptively kill it, flag the proxy or session, and retry on a fresh route. This keeps our worker pool fluid.
Should I retry immediately after a read timeout? +
It depends on the cause. If it is proxy latency, retrying on a new proxy works. If the target server is overloaded and timing out on heavy database queries, immediate retries will only worsen the server load and likely fail again. Implement exponential backoff to give the target room to recover.
$ dataflirt scope --new-project --target=read-timeout READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h