← Glossary / httpx

What is httpx?

httpx is a fully featured HTTP client for Python 3 that provides both synchronous and asynchronous APIs, alongside native HTTP/2 support. For scraping pipelines, it represents the standard migration path away from the legacy requests library, unlocking high-concurrency I/O and multiplexed connections without the callback hell of older async frameworks. If your fetch layer is bottlenecked by thread context switching, migrating to an async client is the mandatory next step.

PythonAsync I/OHTTP/2Network LayerConcurrency
// 02 — definitions

Beyond
synchronous fetching.

Why modern data pipelines abandon thread pools in favour of event loops, and how httpx bridges the gap between legacy code and high-throughput async I/O.

Ask a DataFlirt engineer →

TL;DR

httpx is the modern successor to Python's requests library. It maintains a nearly identical API while adding native async/await support and HTTP/2 multiplexing. In scraping, this allows a single single-threaded worker to maintain hundreds of concurrent open connections, drastically reducing memory overhead and CPU context switching compared to thread-based concurrency.

01Definition & core capabilities

httpx is a modern HTTP client for Python. It was designed to provide the same developer-friendly API as the ubiquitous requests library, but built from the ground up to support modern web standards. Its two defining features for scraping are async/await support and HTTP/2 support.

While you can use it synchronously (httpx.get()), its real power in data pipelines comes from httpx.AsyncClient(), which allows a single Python process to manage thousands of concurrent network requests without the overhead of OS-level threads.

02Why async matters for scraping

Web scraping is an I/O-bound task. When you request a webpage, your CPU spends 99% of its time doing nothing, waiting for the remote server to respond. In a synchronous model (like requests), that thread is blocked. To make 100 concurrent requests, you need 100 threads, which consumes significant memory and CPU context-switching time.

With httpx and asyncio, the event loop fires off a request and immediately moves on to the next task. When the network response arrives, the loop resumes processing that specific request. This allows massive concurrency on a single thread.

03The HTTP/2 advantage

HTTP/1.1 requires a new TCP connection for every concurrent request to the same domain (or suffers from head-of-line blocking if pipelining is used). httpx supports HTTP/2, which multiplexes multiple requests over a single TCP/TLS connection.

For scraping APIs or crawling a single domain heavily, this eliminates the latency of repeated TLS handshakes and drastically reduces the number of ephemeral ports your scraping server needs to keep open. It also improves stealth, as modern browsers exclusively use HTTP/2 when available.

04How DataFlirt handles it

We use async I/O extensively in our fetch layer, but we rarely use vanilla httpx for targets protected by advanced WAFs. Because httpx relies on Python's standard ssl library, its TLS fingerprint is static and easily flagged by Cloudflare or Akamai.

Instead, we wrap our async clients in a custom network layer that patches the TLS context to perfectly match the JA3/JA4 signatures of modern Chrome or Safari. This gives us the high-concurrency performance of the Python event loop combined with the stealth of a real browser.

05The event loop blocking trap

The most common mistake engineers make when migrating to httpx is mixing async network I/O with synchronous CPU-bound work. If you fetch a page asynchronously, and then pass the HTML to BeautifulSoup (which is synchronous and CPU-heavy) inside the same event loop, the entire loop freezes while parsing.

During that freeze, no other network requests can be sent or received, destroying your concurrency. In production pipelines, network fetching (async) and data extraction (multiprocessing) must be strictly separated.

// 03 — the concurrency model

How async I/O
scales throughput.

Synchronous scrapers block the thread while waiting for network I/O. Async scrapers yield control back to the event loop. The math dictates why async clients like httpx dominate high-volume extraction.

Theoretical async throughput = R = Cmax / Tlatency
Throughput (req/s) scales linearly with max concurrent connections, bounded only by memory and file descriptors. Little's Law applied to event loops
HTTP/2 multiplexing efficiency = E = 1 − (Th2_batch / Σ Th1_individual)
Time saved by sending multiple requests over a single TCP/TLS connection without head-of-line blocking. Network performance baseline
DataFlirt worker density = Workers = RAMavail / (Meminterpreter + (C × Memconn))
Async connections cost kilobytes; threads cost megabytes. We pack 5,000+ connections per node. DataFlirt infrastructure sizing
// 04 — event loop trace

Multiplexing 100 requests
over a single connection.

A debug trace of an httpx.AsyncClient configured for HTTP/2, fetching a batch of JSON endpoints from a target API. Notice the single TLS handshake followed by interleaved stream IDs.

httpx 0.27.0asyncioHTTP/2 ALPN
edge.dataflirt.io — live
CAPTURED
// initializing httpx.AsyncClient(http2=True)
event_loop: "asyncio.SelectorEventLoop"
connection_pool: 100 // max keepalive connections

// establishing connection to api.target.com
tcp.handshake: success 12ms
tls.alpn_negotiation: "h2" // HTTP/2 confirmed
tls.cipher: "TLS_AES_128_GCM_SHA256"

// multiplexed request dispatch
h2.stream_id: 1 method: GET /api/v1/items?page=1
h2.stream_id: 3 method: GET /api/v1/items?page=2
h2.stream_id: 5 method: GET /api/v1/items?page=3
h2.stream_id: 7 method: GET /api/v1/items?page=4

// asynchronous responses arriving out of order
recv.stream_id: 5 status: 200 OK bytes: 14,204
recv.stream_id: 1 status: 200 OK bytes: 14,198
recv.stream_id: 7 status: 200 OK bytes: 14,212
recv.stream_id: 3 status: 429 Too Many Requests // rate limit hit

pool.status: 1 connection active, 0 idle, 99 available
// 05 — failure modes

Where async clients
leak and break.

httpx is powerful, but async programming introduces specific failure modes that don't exist in synchronous thread pools. These are the most common reasons httpx-based scrapers fail in production.

PIPELINES ANALYSED ·  ·   140+ async
PRIMARY CAUSE ·  ·  ·  ·  TLS fingerprinting
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Default TLS fingerprint block

anti-bot · httpx uses standard OpenSSL; easily flagged by Cloudflare
02

Connection pool exhaustion

resource · failing to call response.aclose() leaks connections
03

Event loop blocking

cpu · running heavy HTML parsing (BeautifulSoup) in the async loop
04

DNS resolution bottlenecks

network · asyncio default DNS resolver struggles at >1k req/s
05

HTTP/2 strictness errors

protocol · target servers dropping imperfect h2 frames
// 06 — our fetch architecture

Async by default,

but heavily patched for stealth.

At DataFlirt, we rely heavily on async I/O for our high-throughput API and surface web pipelines. However, a vanilla httpx.AsyncClient is instantly recognizable to modern WAFs due to its predictable TLS Client Hello and HTTP/2 frame settings. We wrap our async clients in a custom network layer that patches the underlying SSL context to mimic specific browser versions (Chrome, Safari) while retaining the high-concurrency benefits of the Python event loop.

df-fetch-worker-04

Live configuration of a DataFlirt async fetch worker targeting a JSON API.

client.engine httpx · v0.27.0
event_loop uvloopoptimized
protocol HTTP/2multiplexed
tls.spoofing chrome_124_ja3active
concurrency.limit 500 req/worker
dns.resolver aiodns · cachednon-blocking
memory.overhead 142 MBstable

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about migrating to httpx, handling async concurrency, and bypassing anti-bot systems with Python HTTP clients.

Ask us directly →
Should I switch from requests to httpx? +
Yes, if you need concurrency or HTTP/2. requests is strictly synchronous and HTTP/1.1 only. If you are scraping a few pages a minute, requests is fine. If you need to fetch 10,000 pages quickly, httpx with asyncio will use a fraction of the memory and CPU compared to a ThreadPoolExecutor running requests.
Does httpx bypass Cloudflare or DataDome? +
No. Out of the box, httpx uses Python's standard ssl module, which produces a highly predictable TLS fingerprint (JA3/JA4) that screams "Python script." To bypass advanced WAFs, you must patch the TLS context (e.g., using curl_cffi) or route the traffic through a proxy network that handles TLS spoofing at the edge.
Why is HTTP/2 important for scraping? +
Two reasons: performance and stealth. Performance-wise, HTTP/2 allows multiplexing — sending multiple requests over a single TCP connection simultaneously, eliminating head-of-line blocking. Stealth-wise, modern browsers default to HTTP/2. If your scraper connects to an HTTP/2-enabled server using HTTP/1.1, it is an immediate anomaly that raises your bot score.
Why does my httpx scraper hang or freeze? +
Usually, it's connection pool exhaustion. If you read a response stream but don't close it, or if an exception occurs before the response is closed, the connection remains checked out from the pool. Always use context managers (async with client.get(...) as response:) or explicitly call response.aclose().
How does httpx compare to aiohttp? +
Both are excellent async clients. aiohttp is older, slightly faster in raw micro-benchmarks, and includes a web server framework. httpx has a much friendlier API (almost identical to requests), native synchronous support, and better out-of-the-box HTTP/2 support. For modern scraping scripts, httpx is generally the preferred choice for developer ergonomics.
How does DataFlirt scale async fetching? +
We deploy containerised async workers using uvloop (a faster drop-in replacement for standard asyncio) and custom DNS resolvers. A single worker can sustain thousands of concurrent connections. We decouple the fetch layer from the extraction layer — the async loop only handles network I/O, immediately offloading the CPU-heavy HTML parsing to separate processes to prevent event loop blocking.
$ dataflirt scope --new-project --target=httpx READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h