← Glossary / Connection Reuse Rate

What is Connection Reuse Rate?

Connection reuse rate is the percentage of HTTP requests in a scraping pipeline that are dispatched over an already-established TCP/TLS connection rather than requiring a fresh handshake. In high-throughput extraction, negotiating TLS 1.3 and TCP for every single request burns CPU, exhausts proxy ports, and adds 50–150ms of pure latency per fetch. Maximising reuse is the difference between a crawler that scales linearly and one that chokes on its own socket exhaustion.

TCP/TLSKeep-AliveConnection PoolingThroughputSocket Exhaustion
// 02 — definitions

Stop shaking
hands.

Why establishing a secure connection is the most expensive part of a request, and how connection pooling keeps your pipeline from drowning in network overhead.

Ask a DataFlirt engineer →

TL;DR

Connection reuse rate measures how effectively your crawler leverages HTTP Keep-Alive and connection pooling. A rate below 80% means your infrastructure is spending more time negotiating cryptography and TCP windows than actually transferring HTML. High reuse drastically lowers CPU load, reduces proxy billing, and prevents ephemeral port exhaustion.

01Definition & structure
Connection reuse rate is the ratio of HTTP requests that utilize an existing, warm TCP/TLS connection versus those that require establishing a new one. When a client makes a request, it can send a Connection: keep-alive header. If the server agrees, the socket remains open after the response is sent. The next request to that same domain can be sent immediately down the open pipe, bypassing the 3-way TCP handshake and the TLS cryptographic negotiation.
02How it works in practice
In a production scraper, HTTP clients (like Python's requests.Session or Go's http.Client) maintain an internal connection pool. When a worker thread needs to fetch a URL, it checks the pool for an idle connection to that specific host. If one exists, it uses it. If not, it opens a new one, pays the latency penalty, and returns it to the pool when finished. If the pool is too small, connections are evicted prematurely, destroying the reuse rate.
03The proxy pool complication
Proxies complicate reuse. A connection is bound to a specific IP path. If you use a rotating proxy that changes your exit IP on every request, connection reuse is mathematically impossible — the TCP route changes, so the socket must be rebuilt. To achieve high reuse rates while using proxies, you must use "sticky sessions" (keeping the same exit IP for a set duration) and ensure your client pools connections per-proxy-node, not just per-target-domain.
04How DataFlirt handles it
We decouple connection pooling from the individual scraping workers. Our edge routing layer maintains massive, persistent connection pools to target domains across our datacenter and sticky-residential proxy fleets. When a worker requests a page, the edge router multiplexes that request onto an already-warm connection. This allows us to maintain a >92% reuse rate globally, reducing our compute costs and preventing target WAFs from flagging us for SYN-flood behavior.
05Did you know: Ephemeral port exhaustion
A low connection reuse rate doesn't just slow you down; it can crash your server. Every new outbound connection consumes an "ephemeral port" on your machine. An OS typically has about 28,000 available. When a connection closes, it enters a TIME_WAIT state for 60 seconds. If your scraper makes 500 new connections per second without reusing them, you will exhaust all available ports in under a minute, resulting in fatal Cannot assign requested address errors.
// 03 — the math

How much time
are you wasting?

Every new connection incurs a minimum of 2-3 network round trips before the first byte of the HTTP request is even sent. DataFlirt tracks reuse rate per target to optimize worker thread allocation and proxy routing.

Connection Reuse Rate = R = 1 − (new_connections / total_requests)
Target > 0.85 for datacenter proxies; residential proxies will naturally be lower. Standard networking metric
Handshake Overhead Penalty = O = new_connections × (RTTtcp + RTTtls)
Pure latency added to the pipeline before any data is transferred. TCP/IP fundamentals
DataFlirt Fleet SLO = Rfleet > 0.92
Maintained across our edge-routed API extraction pipelines as of v2026.5. Internal SLO
// 04 — connection pool trace

A worker thread's
first 500 requests.

Trace logs from a DataFlirt HTTP client hitting a JSON API. Notice the latency drop once the connection pool is warm and TLS negotiation is bypassed.

HTTP/1.1 Keep-Aliveconnection poollatency drop
edge.dataflirt.io — live
CAPTURED
// req 1: cold start
tcp.connect: 42ms
tls.handshake: 85ms
http.ttfb: 110ms
total_latency: 237ms
pool.status: connection added

// req 2-499: warm pool
tcp.connect: 0ms (reused)
tls.handshake: 0ms (reused)
http.ttfb: 108ms
total_latency: 108ms

// req 500: server closed connection
event: GOAWAY received
pool.status: connection evicted
tcp.connect: 45ms
total_latency: 240ms
// 05 — reuse killers

Why connections
get dropped.

Ranked by frequency across DataFlirt's infrastructure. Maintaining high reuse rates requires tuning both client-side pool settings and understanding target-side load balancer behaviors.

SAMPLE SIZE ·  ·  ·  ·    1.2B requests
WINDOW ·  ·  ·  ·  ·  ·   7d trailing
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Aggressive proxy rotation

forces new IP · Every new exit node requires a new TCP/TLS handshake
02

Target Keep-Alive timeout

server-side · Nginx defaults to 75s; aggressive WAFs drop it to 5s
03

Max requests per connection

server-side · Target load balancers force disconnect after N requests
04

Client pool size too small

client-side · Eviction thrashing when concurrency exceeds pool limits
05

Network partitions / RST

network · Silent drops requiring client-side retry logic
// 06 — DataFlirt's pool architecture

Warm connections,

routed intelligently across the fleet.

DataFlirt maintains persistent connection pools at the edge, decoupled from the scraping workers. When a worker needs to fetch a page from a target, it requests a lease from the edge pool. If a warm TLS session exists for that target and proxy combination, the request is multiplexed immediately. This architecture pushes our fleet-wide connection reuse rate above 92%, slashing CPU overhead and keeping proxy providers happy by avoiding SYN floods.

edge-pool.status

Live metrics from a regional connection pool handling a retail catalog crawl.

target.domain api.retail-target.com
active_connections 1,240
idle_connections 85
reuse_rate 0.94optimal
tls_overhead_saved 42.8 hours/dayhigh
eviction_rate 12/secmonitor
protocol HTTP/2 multiplexed

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About connection pooling, proxy rotation impacts, HTTP/2 multiplexing, and how DataFlirt scales network I/O.

Ask us directly →
Why does connection reuse matter for scraping? +
Throughput and CPU. Negotiating a TLS 1.3 connection requires cryptographic operations and multiple network round trips. If you do this for every single request, your scraper will spend 70% of its CPU cycles and time just saying "hello" to the server. Reusing connections lets you spend that time actually downloading data.
How do rotating residential proxies affect reuse rates? +
They destroy it. By definition, a rotating proxy gives you a new IP address for every request. A new IP means a new TCP socket and a new TLS handshake. This is why residential scraping is inherently slower and more CPU-intensive than datacenter scraping. You trade network efficiency for IP diversity.
Is it legal or ethical to hold connections open? +
Yes, it's standard HTTP behavior defined by the Keep-Alive header. In fact, target servers prefer it because it reduces their CPU load from TLS handshakes. You are doing the target's infrastructure a favor by reusing connections. Just ensure your client respects the server's timeout directives so you don't hold dead sockets.
How does HTTP/2 change connection reuse? +
HTTP/2 multiplexes multiple concurrent requests over a single TCP connection. Instead of needing a pool of 50 connections to make 50 simultaneous requests (like in HTTP/1.1), you can make all 50 over a single connection. Reuse rate becomes less about sequential Keep-Alive and more about concurrent stream limits.
Why am I getting 'Connection reset by peer' errors? +
Usually, the target server or a proxy closed the connection (perhaps due to a timeout or max-request limit), but your client's connection pool didn't realize it and tried to send a new request down the dead socket. You need robust retry logic to catch these socket errors and transparently re-issue the request on a fresh connection.
How does DataFlirt optimize reuse on rotating proxies? +
We use sticky sessions where possible. Instead of rotating per request, we group requests to the same target through the same proxy node for a 1-to-3 minute window. This allows us to warm up a connection and push hundreds of requests through it before the IP rotates, balancing fingerprint diversity with network efficiency.
$ dataflirt scope --new-project --target=connection-reuse-rate READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h