← Glossary / Request Timeout

What is Request Timeout?

Request timeout is the maximum duration a client will wait for a server to respond before abandoning the connection. In scraping pipelines, timeouts are not just network failures — they are critical signals of proxy degradation, target rate-limiting, or anti-bot tarpitting. Setting static timeouts across a distributed crawl guarantees either wasted compute on dead connections or dropped data on slow but valid responses.

Network LayerConcurrencyProxy HealthTarpittingLatency
// 02 — definitions

When waiting
costs money.

The mechanics of connection lifecycles, and why abandoning a slow request is often more profitable than waiting for it to complete.

Ask a DataFlirt engineer →

TL;DR

A request timeout dictates how long a scraper waits for a response. Modern pipelines split this into connection timeouts (DNS/TCP/TLS) and read timeouts (TTFB/download). Misconfigured timeouts cause worker starvation, where concurrency slots are tied up by silent drops or anti-bot tarpits, bringing pipeline throughput to a halt.

01Definition & structure
A request timeout is a client-side mechanism that aborts a network request if it takes too long. In modern HTTP clients (like httpx or aiohttp), this is typically split into distinct phases:
  • connect — time allowed to resolve DNS and establish the TCP connection.
  • tls — time allowed to complete the TLS handshake.
  • read — time allowed to wait for the first byte (TTFB) and subsequent data chunks.
Without explicit timeouts, a scraper will wait indefinitely on a dropped packet, permanently locking up that worker thread.
02How it works in practice
When a scraping worker dispatches a request, it starts a timer. If the server (or the proxy in the middle) fails to respond within the threshold, the client raises a timeout exception and forcefully closes the socket. The worker catches the error, logs the failure, and moves on to the next URL in the queue. In a distributed crawl, managing these timers is critical to maintaining high throughput.
03The anti-bot tarpit
Security vendors know that scrapers rely on high concurrency. Instead of returning a fast 403 Forbidden, they will often route suspected bots to a tarpit. The server accepts the TCP connection, completes TLS, and then sends data at an agonizingly slow rate (e.g., 1 byte every 10 seconds). If your read timeout is set to 60 seconds, a single tarpit can hold your worker hostage for a full minute.
04How DataFlirt handles it
We don't use static timeouts. Our scheduler dynamically profiles each target domain during the warmup phase, calculating the 99th percentile latency. We set our read timeouts just above this threshold. If a request exceeds it, we assume the proxy is dead or the IP is tarpitted. We kill the connection immediately, rotate the proxy ASN, and retry. This keeps our worker starvation rate near zero.
05Did you know?
TCP keep-alive settings can mask timeout issues. If your client relies solely on OS-level TCP keep-alives without application-layer read timeouts, a silently dropped connection (common with cheap rotating proxies) might take up to two hours to time out by default on Linux. Always enforce application-layer timeouts in your scraping code.
// 03 — timeout math

Calculating the
abandonment threshold.

Static timeouts fail at scale. DataFlirt calculates dynamic timeout thresholds per target, factoring in historical latency, proxy overhead, and the cost of worker starvation.

Total Timeout = Ttotal = Tconn + Ttls + Tread
The absolute ceiling before the socket is forcibly closed by the client. Standard HTTP client architecture
Dynamic Read Timeout = Tread = μlatency + (3 × σlatency)
99th percentile historical response time for the specific target. DataFlirt adaptive scheduler
Worker Starvation Risk = S = active_tarpits / concurrency_limit
When S > 0.8, the pipeline stalls. Aggressive timeouts prevent this. DataFlirt fleet telemetry
// 04 — the wire trace

A 15-second wait,
cut short.

Trace of a worker hitting an anti-bot tarpit. The connection succeeds, but the server intentionally trickles bytes to tie up the scraper. The read timeout intervenes.

TCP/TLS successRead timeoutWorker freed
edge.dataflirt.io — live
CAPTURED
// phase 1: connection
dns_lookup: 42ms OK
tcp_handshake: 115ms OK
tls_negotiation: 280ms OK

// phase 2: request sent
http_req: "GET /api/v1/catalog HTTP/2"
ttfb: 8,400ms WARN // extreme delay

// phase 3: tarpit detected
bytes_received: 14 // trickling 1 byte/sec
read_timer: 15,000ms EXCEEDED
socket.close(): SIGKILL

// outcome
error: "ReadTimeoutError"
worker_status: freed // concurrency slot recovered
// 05 — latency sources

Where the time
actually goes.

Timeouts rarely happen because a server is simply 'slow'. They are usually symptoms of underlying infrastructure or security friction. Ranked by frequency across our fleet.

SAMPLE SIZE ·  ·  ·  ·    1.2B requests
WINDOW ·  ·  ·  ·  ·  ·   7d trailing
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Anti-bot tarpitting

91% of timeouts · Silent drop or byte trickling
02

Proxy node saturation

74% of timeouts · Residential IP bandwidth overload
03

Target database locks

52% of timeouts · Heavy search queries timing out upstream
04

TLS negotiation hangs

38% of timeouts · Cipher mismatch or dropped packets
05

DNS resolution failures

21% of timeouts · Proxy-side resolver issues
// 06 — our stack

Cut your losses early,

and retry on a cleaner route.

In high-throughput scraping, a slow request is worse than a failed request. A failure frees the worker to try again; a slow request holds a concurrency slot hostage. DataFlirt's scheduler uses adaptive timeouts. If a target's median response time is 800ms, we don't wait 30 seconds for an outlier. We kill the socket at 3 seconds, rotate the proxy ASN, and retry. This aggressive pruning keeps fleet utilization above 95% and prevents cascading pipeline stalls.

timeout.policy.json

Dynamic timeout configuration for a high-concurrency retail pipeline.

target.median_ttfb 412ms
timeout.connect 2500ms
timeout.tls 3000ms
timeout.read 4500ms
action.on_timeout rotate_asn_and_retry
worker.starvation risk: 0.02
pipeline.throughput 1,450 req/s

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About timeout tuning, tarpits, proxy overhead, and how DataFlirt prevents worker starvation at scale.

Ask us directly →
What is the difference between a connection timeout and a read timeout? +
Connection timeout is how long you wait to establish the TCP/TLS handshake. Read timeout is how long you wait for the server to send data after the request is made. You should always configure both independently. A dead proxy triggers a connection timeout; a tarpit triggers a read timeout.
Why do my requests time out on residential proxies but work locally? +
Residential proxies route traffic through consumer devices with unpredictable bandwidth and latency. A 10-second timeout might be fine for a datacenter IP, but residential hops often need 15–20 seconds just to negotiate the connection and handle the initial TLS handshake.
What is an anti-bot tarpit? +
Instead of blocking you with a 403, sophisticated anti-bot systems accept your connection and send data at 1 byte per second. If you have a 60-second timeout and 100 workers, the tarpit can paralyze your entire pipeline in seconds. Aggressive read timeouts are the only defense.
How does DataFlirt handle target sites that are genuinely slow? +
We profile the target during the warmup phase to establish a baseline latency distribution. If the site legitimately takes 12 seconds to run a search query, we adjust the read timeout to 15 seconds and lower the concurrency to avoid DDOSing the target.
Should I use exponential backoff after a timeout? +
Yes, but only if the timeout was caused by target server load (e.g., HTTP 503/504). If the timeout was caused by a dead proxy or a tarpit, backoff just wastes time. Rotate the IP and retry immediately.
How do you prevent worker starvation? +
By enforcing strict, aggressive read timeouts and monitoring the ratio of active vs. idle workers. If a specific proxy subnet shows a spike in timeouts, our scheduler automatically blacklists the subnet and re-routes pending jobs to healthy nodes.
$ dataflirt scope --new-project --target=request-timeout READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h