← Glossary / TCP Handshake Time

What is TCP Handshake Time?

TCP handshake time is the latency incurred during the initial three-way SYN/SYN-ACK/ACK exchange required to establish a connection before any HTTP or TLS data can flow. In high-throughput scraping, establishing thousands of new connections per second creates massive overhead. If you don't pool connections or route through edge nodes close to the target, handshake latency will dominate your pipeline, silently halving your effective extraction speed.

Network LatencyConnection PoolingTCP/IPThroughputInfrastructure
// 02 — definitions

The cost of
saying hello.

Before you can negotiate TLS or send an HTTP GET, you have to establish a TCP connection. At scale, this setup cost is brutal.

Ask a DataFlirt engineer →

TL;DR

TCP handshake time is the round-trip latency required to open a socket. For a scraper hitting a target 10,000 miles away, a 150ms handshake per request destroys throughput. Production pipelines mitigate this using connection pooling, keep-alive headers, and geographically distributed proxy gateways to terminate connections closer to the origin.

01Definition & structure
TCP handshake time is the duration of the three-way exchange required to open a Transmission Control Protocol connection. The client sends a SYN (synchronize) packet, the server replies with a SYN-ACK, and the client confirms with an ACK. Only after this process completes can the client begin negotiating TLS or sending HTTP data. Because this requires a full round trip across the network, it is heavily bound by physical distance and routing efficiency.
02How it works in practice
In a naive scraping script, every requests.get() or fetch() call opens a brand new TCP socket. If the target server is 100ms away, you pay a 100ms penalty just to say hello, followed by another 100-200ms for TLS, before the HTTP request is even transmitted. In a pipeline doing 5,000 requests per second, this setup overhead consumes massive amounts of CPU, memory, and network bandwidth, leading to port exhaustion and dropped connections.
03The impact of geography
You cannot optimize away the speed of light. Data traveling through fiber optic cables moves at roughly two-thirds the speed of light in a vacuum. A packet traveling from New York to Mumbai will always take at least 180ms round-trip. If your scraper is in the US and your target is in India, your TCP handshake time will be terrible. The only solution is to move the client closer to the server.
04How DataFlirt handles it
We eliminate handshake latency for our clients by decoupling the scraper from the target. Our infrastructure maintains persistent, warm connection pools from regional proxy gateways located in the same geographic zones as the target servers. When your pipeline requests data, it connects to our nearest edge node over a fast, multiplexed HTTP/2 connection, and we route the request over an already-established TCP socket to the target.
05The proxy rotation dilemma
There is an inherent conflict between connection pooling and IP rotation. If you reuse a TCP connection, you are using the same IP address. If you rotate your proxy IP to avoid rate limits, you are forced to tear down the old socket and perform a new TCP handshake. High-performance scraping requires balancing these two needs: pooling connections long enough to be efficient, but rotating them fast enough to avoid anti-bot IP bans.
// 03 — the latency math

How much time
are you wasting?

Every new connection requires a full round trip just to establish TCP, followed by more round trips for TLS. DataFlirt's infrastructure minimizes this by maintaining warm connection pools.

Total Setup Latency = Tsetup = RTTtcp + RTTtls
Standard TLS 1.2 requires 2 RTTs total. TLS 1.3 drops this to 1 RTT. Network Fundamentals
Connection Pool Efficiency = E = 1 − (new_conns / total_reqs)
E > 0.90 means 90% of requests reused an existing TCP socket. DataFlirt Pipeline SLO
Theoretical Minimum RTT = RTTmin = (2 × distance) / (0.66 × c)
Fiber optic speed limit. You cannot beat physics. Physics
// 04 — packet trace

The anatomy of
a slow connection.

A raw packet trace of a scraper in AWS us-east-1 opening a fresh connection to a target server in ap-south-1 (Mumbai). The physical distance forces a 210ms penalty before the HTTP request even begins.

tcpdumpus-east-1 to ap-south-1no pooling
edge.dataflirt.io — live
CAPTURED
// TCP 3-way handshake
00.000ms Scraper -> Target : SYN // seq 0
00.210ms Target -> Scraper: SYN-ACK // 210ms RTT penalty
00.211ms Scraper -> Target : ACK // TCP established

// TLS 1.3 Handshake
00.212ms Scraper -> Target : ClientHello
00.422ms Target -> Scraper: ServerHello, Finished
00.423ms Scraper -> Target : Finished // TLS established

// Application Layer
00.424ms Scraper -> Target : GET /api/v1/products HTTP/2
00.635ms Target -> Scraper: HTTP/2 200 OK

// Result
total_time: 635ms
wasted_on_setup: 423ms (66%)
status: inefficient // connection pooling required
// 05 — latency drivers

What makes handshakes
take so long.

The primary factors that inflate TCP handshake times. Physical distance is the dominant driver, but routing complexity and target server load can add severe variance.

AVG HANDSHAKE ·  ·  ·  ·  45–120 ms
POOLING GAIN ·  ·  ·  ·   up to 80%
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Geographic distance (RTT)

physics limit · Fiber optic propagation delay across oceans
02

Proxy routing hops

network path · Each intermediate node adds processing delay
03

Target SYN queue load

server side · High traffic targets delay SYN-ACK responses
04

Packet loss / Retransmits

network health · Dropped SYNs trigger 1-second timeout penalties
05

OS TCP stack tuning

client side · Inefficient ephemeral port allocation
// 06 — our architecture

Terminate at the edge,

pool to the origin.

DataFlirt maintains persistent, warm connection pools from our regional proxy gateways directly to high-value target ASNs. When your scraper requests a page, the TCP and TLS handshakes are already done. You only pay the latency of the HTTP request itself. We handle the keep-alive headers, socket health checks, and graceful reconnects transparently.

tcp-pool.status

Live metrics from a DataFlirt regional gateway managing connections to a major e-commerce target.

target.asn AS16509 · Amazon
gateway.region ap-south-1
sockets.active 4,250warm
sockets.idle 850
pool.hit_rate 98.4%optimal
avg_setup_saved 185ms per req
retransmit_rate 0.02%

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about network latency, connection pooling, and optimizing scraper throughput.

Ask us directly →
Why is my scraper so slow even with high concurrency? +
If you are opening a new TCP connection for every request, you are bottlenecked by network round-trip time, not CPU or memory. High concurrency with no connection pooling leads to ephemeral port exhaustion and massive latency overhead. You must reuse connections to scale effectively.
Does HTTP/2 fix TCP handshake time? +
HTTP/2 multiplexes multiple requests over a single TCP connection, which drastically reduces the number of handshakes needed. However, if you are rotating IP addresses on every single request to avoid rate limits, you are forcing a new TCP connection anyway, negating the HTTP/2 advantage.
How do proxies affect handshake time? +
Proxies add an extra network hop. Your scraper handshakes with the proxy, and the proxy handshakes with the target. If the proxy is geographically far from the target, this doubles your latency. This is why DataFlirt routes requests through proxy gateways located in the same region as the target server.
What is TCP Fast Open (TFO)? +
TCP Fast Open is a protocol extension that allows data to be sent in the initial SYN packet, bypassing the 3-way handshake on subsequent connections. While great in theory, it is rarely supported by anti-bot edge networks (like Cloudflare or Akamai) and is often stripped by intermediate middleboxes.
How does DataFlirt monitor connection health? +
We actively probe our idle connection pools. If a target server silently drops a connection (common with aggressive load balancers), we detect the dead socket and replace it before a scraper tries to use it. This prevents the dreaded 'Connection Reset by Peer' errors mid-scrape.
Should I tune my OS TCP settings for scraping? +
Yes. If you run your own infrastructure, you must tune your Linux kernel. Increase net.ipv4.ip_local_port_range to avoid port exhaustion, and enable net.ipv4.tcp_tw_reuse to quickly recycle sockets in the TIME_WAIT state. Default OS settings are meant for web browsing, not high-throughput scraping.
$ dataflirt scope --new-project --target=tcp-handshake-time READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h