← Glossary / Request Throttling

What is Request Throttling?

Request throttling is the deliberate pacing of outbound HTTP requests from a scraper to avoid triggering target rate limits, IP bans, or anti-bot classifiers. Unlike reactive backoff, which kicks in after a 429 Too Many Requests error, throttling is proactive. It shapes traffic to mimic human browsing patterns or stay strictly below a known server threshold, ensuring the pipeline remains invisible and the target infrastructure remains stable.

Traffic ShapingConcurrencyHTTP 429PolitenessToken Bucket
// 02 — definitions

Pacing the
pipeline.

How to extract data at scale without looking like a volumetric attack to the target's edge firewall.

Ask a DataFlirt engineer →

TL;DR

Request throttling controls the velocity of a scraping pipeline. By enforcing delays between requests and capping concurrent connections per IP or session, it prevents HTTP 429 errors and soft blocks. Production pipelines use dynamic throttling, adjusting speed in real-time based on target latency and response codes.

01Definition & structure

Request throttling is the mechanism by which a scraping pipeline intentionally limits its own throughput. Instead of firing HTTP requests as fast as the CPU and network allow, the scraper paces them according to a predefined strategy.

This is typically implemented via a token bucket or leaky bucket algorithm, where workers must acquire a "token" before dispatching a request. Throttling prevents IP bans, reduces the likelihood of triggering behavioral anti-bot challenges, and ensures the scraper acts as a polite network citizen.

02Static vs. Dynamic Throttling

Static throttling uses fixed delays (e.g., time.sleep(2)). It is easy to implement but highly inefficient, as it doesn't account for network latency or server capacity. It is also easily detected by WAFs looking for perfectly uniform request intervals.

Dynamic throttling adjusts the request rate in real-time based on feedback from the target. If response times increase or 429 status codes are returned, the throttle rate decreases. When the server is responding quickly, the rate safely increases up to a defined ceiling.

03The role of concurrency

Throttling is fundamentally tied to concurrency. A single-threaded scraper naturally throttles itself by waiting for the previous response before sending the next request. However, modern pipelines use asynchronous I/O or distributed worker pools to process thousands of URLs concurrently.

In a distributed setup, local throttling is insufficient. If 100 workers each throttle to 1 request per second, the target still sees 100 requests per second. Global throttling via a centralized state store (like Redis) is required to manage the aggregate velocity.

04How DataFlirt handles it

We treat pipeline velocity as a dynamic variable, not a static config. Our orchestration layer uses a distributed token bucket algorithm backed by Redis. Every target domain has a dedicated bucket.

Our scheduler continuously monitors the p95 latency and error rates for each target. If we detect signs of server stress, the token refill rate is automatically slashed. We also inject randomized jitter into the token consumption process to ensure our traffic patterns never look mechanical to behavioral classifiers.

05The "Crawl-delay" misconception

Many developers assume that if a site's robots.txt specifies a Crawl-delay: 10, they simply need to sleep for 10 seconds between requests. This is a misunderstanding of how modern WAFs enforce limits.

The Crawl-delay is an aggregate limit for your entire bot identity. If you route requests through a proxy pool, the target may enforce that delay per IP address or per session cookie. Throttling must be applied at the correct scope—usually per-IP or per-session—to successfully bypass volumetric detection.

// 03 — traffic shaping

How fast is
too fast?

Throttling isn't just about hard limits; it's about queue theory and probability. DataFlirt's scheduler calculates optimal request rates per target, balancing extraction speed against detection risk.

Effective Request Rate = Reff = C / (Twait + Tlatency)
C is concurrency. High latency naturally throttles synchronous workers. Standard Queueing Theory
Token Bucket Capacity = Tc = min(M, Tc + R · Δt)
M is max burst, R is refill rate. Used for distributed rate control. Network Traffic Shaping
DataFlirt Backoff Multiplier = Bn = Tbase · 2n + Jitter
Exponential backoff with randomized jitter to prevent thundering herds. DataFlirt Scheduler SLO
// 04 — pipeline execution

Hitting the ceiling,
backing off gracefully.

A distributed worker pool hitting a target's WAF. The scheduler detects a latency spike and a 429, immediately throttling the global token bucket to recover the session.

Token BucketJitterHTTP 429
edge.dataflirt.io — live
CAPTURED
// worker pool initialization
target: "api.target-retail.com/v2/catalog"
throttle.strategy: "dynamic_token_bucket"
throttle.rate: 15.0 req/s

// execution trace
[00:12:01] req_rate: 14.8 req/s p95_latency: 240ms 200 OK
[00:12:05] req_rate: 15.0 req/s p95_latency: 890ms latency spike
[00:12:06] response: 429 Too Many Requests retry-after: 10s

// reactive throttling engaged
event: circuit_breaker_tripped
action: pause_queue(10s)
throttle.rate_adjusted: 5.0 req/s // backoff applied

// recovery
[00:12:16] req_rate: 4.9 req/s p95_latency: 210ms 200 OK
status: pipeline stabilized
// 05 — bottleneck sources

What forces you
to throttle.

The constraints that dictate a pipeline's maximum safe velocity. Ignoring these triggers automated defenses or degrades target performance.

AVG THROTTLE RATE ·  ·    2–15 req/s
429 RECOVERY ·  ·  ·  ·   99.4% success
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

WAF Rate Limits

hard constraint · Cloudflare/Akamai token buckets per IP or session
02

Anti-Bot Heuristics

soft constraint · High velocity lowers the bot-score threshold
03

Target Server Capacity

infrastructure · Latency spikes indicate backend database stress
04

Proxy Pool Exhaustion

operational · Rotating too fast burns through clean residential IPs
05

robots.txt Crawl-delay

compliance · Explicit publisher requests for traffic pacing
// 06 — our scheduler

Smooth the curve,

never spike the edge.

DataFlirt's distributed scheduler uses a global token bucket algorithm across all worker nodes. Instead of static sleep statements, workers request execution tokens from a centralized Redis queue. If a target's latency increases—a leading indicator of server stress—the token generation rate automatically drops. This closed-loop feedback ensures we maximize throughput without ever crossing the threshold into abusive traffic patterns.

throttle.config.json

Dynamic throttling parameters for a high-volume catalog pipeline.

strategy global_token_bucket
base_rate 12.0 req/sbaseline
max_burst 15 reqcapped
latency_monitor enabledp95 > 800ms triggers backoff
jitter_variance 0.2humanized
429_handling exponential_backoffmax_retries: 5

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about traffic shaping, rate limits, and how DataFlirt manages pipeline velocity at scale.

Ask us directly →
What is the difference between throttling and rate limiting? +
Rate limiting is what the server does to protect itself (e.g., returning a 429 status code). Throttling is what the client (your scraper) does to avoid hitting that rate limit. Throttling is proactive traffic shaping; rate limiting is a reactive server defense.
How do you calculate the right delay between requests? +
You don't use a static delay. Static delays are easily fingerprinted by anti-bot systems. Instead, use a base rate derived from the target's capacity or robots.txt, and apply randomized jitter (e.g., +/- 20%). For distributed crawls, use a token bucket algorithm to enforce a global maximum rate across all concurrent workers.
Will request throttling prevent fingerprinting blocks? +
No. Throttling solves volumetric detection, not identity detection. If your TLS fingerprint (JA3/JA4) or browser canvas hash is flagged as a bot, you will be blocked on your very first request, regardless of how slowly you send it. Throttling only protects valid identities from being burned by velocity.
How does DataFlirt handle dynamic or undocumented rate limits? +
We use latency as a leading indicator. Before a server throws a 503 or 429, its response times usually degrade. Our scheduler monitors the p95 latency of successful requests. If latency spikes by more than 40% over the baseline, the scheduler automatically throttles the token bucket rate until performance stabilizes.
Is it legally required to respect the robots.txt Crawl-delay? +
In most jurisdictions, robots.txt is a convention, not a law. However, ignoring a Crawl-delay is a violation of most platforms' Terms of Service and is the fastest way to get your IP pool permanently banned. Operationally, respecting the delay (or using it as a baseline for your throttling logic) is the only sustainable way to run a long-term pipeline.
How does concurrency interact with throttling? +
Concurrency multiplies your effective rate. If you have a 1-second sleep between requests, but run 50 concurrent threads, you are sending 50 requests per second. True throttling must be managed globally across the entire worker pool, not just locally within a single thread.
$ dataflirt scope --new-project --target=request-throttling READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h