← Glossary / HTTP Request

What is HTTP Request?

An HTTP Request is the fundamental unit of communication in any web scraping pipeline, consisting of a method, a target URL, protocol version, headers, and an optional body. While a standard browser request is noisy and bloated, a scraper's request must be meticulously crafted to balance efficiency with credibility. Send too little, and you trigger anomaly detection; send too much, and you waste bandwidth and compute.

Network LayerProtocolHeadersPayloadTCP/IP
// 02 — definitions

The anatomy of
a fetch.

Every byte you send to a target server is a signal. Here is how those signals are structured, parsed, and judged by modern infrastructure.

Ask a DataFlirt engineer →

TL;DR

An HTTP request is a text-based or binary-framed message sent from client to server to initiate an action. In scraping, the request is your identity. Anti-bot systems don't just look at what you ask for; they analyze the exact byte-order of your headers, the casing of your pseudo-headers, and the TLS context wrapping the request to decide if you are human.

01Definition & structure
An HTTP Request is a structured message sent by a client to a server. It consists of a request line (Method, URI, Protocol Version), a set of HTTP headers, an empty line indicating the end of the headers, and an optional message body (payload). In the context of web scraping, the request is the primary vector for data extraction, but it is also the primary vector for bot detection.
02The role of headers
Headers dictate content negotiation (Accept-Encoding, Accept-Language) and maintain state (Cookie, Authorization). For scrapers, headers must perfectly align with the advertised User-Agent. If you claim to be Chrome on macOS but send headers in the order typical of a Python script, or request Brotli compression but fail to decode it, the target's WAF will flag the request as anomalous.
03HTTP/1.1 vs HTTP/2 framing
While HTTP/1.1 requests are plain text, HTTP/2 requests are binary framed and multiplexed. HTTP/2 introduces pseudo-headers (like :method and :path) which must appear before any regular headers. The strictness of HTTP/2 framing makes it harder to spoof manually, but it also provides massive performance benefits for scrapers by allowing concurrent requests over a single TCP connection without head-of-line blocking.
04How DataFlirt handles it
We don't rely on standard HTTP clients. Our fleet uses a custom network stack that constructs HTTP/2 frames and TLS handshakes at the socket level. This allows us to perfectly emulate the network signature of specific browser versions, ensuring that our requests pass passive WAF checks before the target application even processes the URI. We dynamically adjust header ordering, pseudo-header casing, and connection reuse patterns based on the target's specific anti-bot vendor.
05The pseudo-header trap
A common mistake when upgrading scrapers to HTTP/2 is mishandling pseudo-headers. Browsers send pseudo-headers in a very specific order (e.g., Chrome sends :method, :authority, :scheme, :path). Many generic HTTP/2 libraries alphabetize them or use a different default order. Akamai and Cloudflare actively fingerprint this order; getting it wrong results in an instant, silent shadow-ban or a CAPTCHA challenge.
// 03 — request metrics

What does a
request cost?

A single HTTP request has overhead at the TCP, TLS, and application layers. DataFlirt models this overhead to optimize connection pooling and egress costs across billions of fetches.

Total Request Latency = T = Tdns + Ttcp + Ttls + Tttfb
Connection reuse (Keep-Alive) eliminates the first three terms for subsequent requests. Network fundamentals
Egress Overhead = O = Hsize + TLSrecord + TCPack
HTTP/2 header compression (HPACK) reduces H_size by ~80% on repeated requests. DataFlirt egress model
DataFlirt Concurrency Limit = C = Targetcapacity / (RPSworker × Poolsize)
Dynamic throttling based on 429 response rates and target health. Internal scheduler SLO
// 04 — wire trace

A raw HTTP/2
request frame.

What the server actually sees when a DataFlirt worker initiates a fetch. Notice the strict ordering of pseudo-headers and the HPACK compression.

HTTP/2HPACKTLS 1.3
edge.dataflirt.io — live
CAPTURED
// TLS established, ALPN negotiated h2
h2.stream_id: 1
h2.frame_type: HEADERS

// Pseudo-headers (strict order required)
:method: "GET"
:authority: "api.target.com"
:scheme: "https"
:path: "/v1/catalog/products?limit=100"

// Standard headers
user-agent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)..."
accept: "application/json"
accept-encoding: "gzip, deflate, br"
accept-language: "en-US,en;q=0.9"
sec-fetch-mode: "cors"

// Outbound payload
bytes_sent: 142 // compressed via HPACK
status: 200 OK
// 05 — fingerprint leakage

Where requests
betray bots.

An HTTP request leaks entropy long before JavaScript executes. These are the network-layer signals anti-bot systems use to flag naive HTTP clients.

SAMPLE SIZE ·  ·  ·  ·    8.4M requests
WINDOW ·  ·  ·  ·  ·  ·   7d trailing
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Header ordering

critical · Mismatch with advertised User-Agent
02

Pseudo-header casing

high · HTTP/2 strictness violations
03

Missing sec-fetch headers

high · Fetch metadata absence
04

Accept-Language anomalies

medium · Locale mismatch with IP geo
05

Connection reuse patterns

low · Mechanical timing intervals
// 06 — request engine

Crafted at the byte level,

because standard HTTP clients are too loud.

Standard libraries like Python's requests or Go's net/http announce themselves immediately through default header orders and predictable cipher suites. DataFlirt's request engine bypasses standard libraries, constructing HTTP/2 frames and TLS handshakes directly at the socket level. This allows us to perfectly mimic the network signature of any target browser, ensuring the request looks human before the application layer even parses it.

worker-req-09.trace

Socket-level trace of an outbound request from the DataFlirt fleet.

socket.state established
tls.alpn h2
tls.ja3 771,4865-4866-4867...
h2.pseudo_order m,a,s,pchrome-match
header.entropy 0.04human-like
hpack.compression active81% ratio
waf.classification passed

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about HTTP requests, header spoofing, connection management, and how DataFlirt scales network I/O.

Ask us directly →
Why do my requests work in Postman but fail in my Python script? +
Postman and Python's requests library use different underlying HTTP clients, which means they send different default headers, in a different order, with different TLS fingerprints. Anti-bot systems fingerprint the client based on these network-layer quirks. If the target expects a browser but sees a Python default signature, it drops the request.
Is it legal to spoof headers like User-Agent? +
Yes, spoofing a User-Agent is generally legal and is a standard practice across the web (even browsers spoof each other for compatibility). However, bypassing technical access controls (like authenticated APIs) or ignoring ToS can carry legal risk. We spoof network signatures to prevent discriminatory blocking of legitimate public data access, not to bypass auth.
How does DataFlirt handle HTTP/2 multiplexing? +
We heavily leverage HTTP/2 multiplexing to reduce TCP/TLS overhead. A single DataFlirt worker can multiplex hundreds of concurrent streams over a single connection to a target. However, we dynamically throttle stream concurrency based on the target's WAF sensitivity, as aggressive multiplexing from a single IP is a strong bot signal.
What are sec-fetch headers and why do they matter? +
Fetch Metadata Request Headers (Sec-Fetch-Site, Sec-Fetch-Mode, etc.) are sent by modern browsers to tell the server the context of the request (e.g., is this a top-level navigation or an image load?). Missing or incorrect sec-fetch headers are an immediate red flag to Cloudflare and DataDome that the request did not originate from a real browser rendering engine.
Should I always use Keep-Alive for scraping? +
For API scraping and surface web crawling, yes — connection reuse drops latency by 60% and reduces CPU load. However, for highly protected targets, rotating connections (and thus rotating TLS session tickets and JA3/JA4 signatures) is sometimes necessary to prevent the WAF from building a long-term behavioral profile of the session.
How many requests per second can DataFlirt generate? +
Our distributed fleet can generate millions of requests per second, but raw throughput is rarely the bottleneck. The binding constraint is always the target's capacity and their anti-bot threshold. We optimize for the highest sustainable yield rate, which often means deliberately slowing down requests to maintain a 99.9% success rate rather than sprinting into a 429 block.
$ dataflirt scope --new-project --target=http-request READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h