← Glossary / Worker Concurrency

What is Worker Concurrency?

Worker concurrency is the number of parallel execution threads, processes, or asynchronous tasks actively fetching and parsing data within a single scraping job. It dictates your pipeline's throughput ceiling and its memory footprint. Push it too high, and you trigger target rate limits or exhaust local RAM. Keep it too low, and your data freshness guarantees slip. Tuning concurrency is the primary lever for balancing extraction speed against infrastructure cost and target stability.

ThroughputAsync I/ORate LimitingResource AllocationDistributed Systems
// 02 — definitions

Parallel
execution.

The mechanics of running multiple fetch and parse operations simultaneously, and why more threads do not always equal faster pipelines.

Ask a DataFlirt engineer →

TL;DR

Worker concurrency defines how many simultaneous requests a scraper maintains in flight. While asynchronous I/O allows thousands of concurrent network connections, CPU-bound parsing and target-side rate limits usually constrain effective concurrency to much lower numbers. Production pipelines dynamically scale concurrency based on target response times and proxy pool health.

01Definition & structure

A worker is a single unit of execution responsible for taking a URL from a queue, fetching the payload, parsing the data, and storing the result. Worker concurrency is the total number of these units operating simultaneously.

Depending on the language and framework, workers are implemented as OS threads, separate processes, or asynchronous coroutines. Async tasks are the most lightweight, allowing a single CPU core to juggle thousands of concurrent network requests while waiting for I/O responses.

02I/O bound vs CPU bound

Scraping is fundamentally an I/O-bound task. A worker spends 95% of its lifecycle waiting for DNS resolution, TLS handshakes, and HTTP responses. This is why high concurrency is possible. However, the moment the payload arrives, the task becomes CPU-bound (parsing JSON, executing XPath, or rendering DOM). If you run 1,000 concurrent async workers, and 200 responses arrive at the exact same millisecond, your CPU will spike to 100% and parsing will bottleneck.

03The bottleneck shift

As you scale concurrency, the bottleneck moves through your infrastructure stack. At 10 workers, the limit is your code. At 100 workers, the limit is your CPU parsing speed. At 500 workers, the limit is your proxy provider's connection pool. At 1,000 workers, the limit is the target server's rate-limiting firewall. Effective concurrency tuning requires identifying exactly where the current bottleneck sits and scaling just below it.

04How DataFlirt handles it

We do not use static worker counts. Our extraction clusters utilize an elastic concurrency controller. The controller monitors three signals: target latency, proxy error rates, and local CPU utilization. If all three are green, the controller spawns more workers. If target latency increases by more than 20% over the baseline, the controller pauses scaling. If HTTP 429s or proxy timeouts occur, it aggressively scales down. This ensures we extract data as fast as physically possible without burning IPs or crashing targets.

05The headless memory trap

When transitioning from HTTP requests to headless browsers (Playwright/Puppeteer), engineers often forget to adjust concurrency. A machine that easily handles 500 concurrent `httpx` requests will instantly crash with an Out-Of-Memory error if asked to open 500 Chrome tabs. Headless concurrency must be strictly bounded by available RAM, typically capped at 1 tab per 150MB of free memory.

// 03 — the math

How many workers
can you run?

Theoretical concurrency is bounded by local resources, but effective concurrency is bounded by the target and the network. DataFlirt's scheduler calculates these limits dynamically to prevent pipeline stalls.

Little's Law (Throughput) = L = λ × W
Workers (L) = Target RPS (λ) × Average Latency (W). Queueing Theory
Memory Bound = Cmax = (RAMtotalRAMos) / RAMworker
Hard limit for headless browsers. Chrome needs ~150MB per tab. Infrastructure Planning
Target Rate Limit = Csafe = RPSallowed × Latencyavg
If a target allows 5 req/s and latency is 2s, max safe concurrency is 10. DataFlirt AIMD Scheduler
// 04 — worker pool trace

Scaling workers
under backpressure.

A live trace of an async worker pool adjusting concurrency dynamically as target latency spikes and 429 Too Many Requests errors appear.

asynciodynamic scalingbackpressure
edge.dataflirt.io — live
CAPTURED
// init worker pool
pool.target: "api.retail-target.com"
concurrency.initial: 10
strategy: "AIMD (Additive Increase, Multiplicative Decrease)"

// ramp up phase
t=05s workers: 10 rps: 12.5 latency: 800ms OK
t=15s workers: 20 rps: 24.0 latency: 830ms OK
t=25s workers: 40 rps: 45.2 latency: 880ms OK

// target degradation detected
t=35s workers: 60 rps: 52.1 latency: 1150ms
t=38s status: HTTP 429 Too Many Requests (x3)

// backoff and stabilize
event: "backpressure_triggered"
concurrency.adjusted: 30 // halved
t=45s workers: 30 rps: 34.5 latency: 870ms STABLE
// 05 — concurrency limits

What breaks when
concurrency spikes.

The failure modes that emerge when you push worker counts past the system's natural bottlenecks. Ranked by frequency across DataFlirt's infrastructure.

PIPELINES MONITORED ·   850+ active
AVG CONCURRENCY ·  ·  ·   45 workers
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Target rate limits (429s)

external limit · Target server actively rejects the request volume
02

Proxy pool exhaustion

network limit · Running out of clean IPs for the target domain
03

Local memory OOM

hardware limit · Headless browser tabs consume available RAM
04

CPU starvation

hardware limit · Parsing JSON/HTML lags behind network I/O
05

Ephemeral port exhaustion

os limit · TCP stack runs out of available local ports
// 06 — DataFlirt's scheduler

Elastic worker pools,

scaling to the exact limit of the target.

Static concurrency is a liability. If the target slows down, static workers pile up connections, spike latency, and trigger blocks. DataFlirt uses an elastic concurrency model governed by an additive-increase/multiplicative-decrease (AIMD) algorithm. We push worker counts up until we detect latency degradation or proxy backpressure, then back off immediately. This ensures maximum throughput without crossing the target's defensive thresholds.

worker-pool.metrics

Live telemetry from a distributed scraping job on a retail catalog.

job.id retail-sync-092
workers.active 128optimal
throughput.rps 142.5 req/s
latency.p95 890ms
memory.utilization 64%healthy
proxy.backpressure 0.02%
scheduler.state cruising

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About threading, async I/O, headless browser limits, and how DataFlirt manages parallel execution at scale.

Ask us directly →
Should I use threads, processes, or async I/O for scraping? +
Async I/O is the standard for network-bound scraping. A single process can handle thousands of concurrent async connections with minimal memory overhead. Use multiprocessing only when your bottleneck is CPU-bound parsing (like heavy BeautifulSoup or lxml operations). Threads are generally avoided in Python due to the GIL, though they work fine in Go or Java.
How many headless browser tabs can I run concurrently? +
Far fewer than HTTP requests. A single Playwright or Puppeteer context consumes 100MB to 200MB of RAM. A server with 16GB of RAM will safely support 50 to 80 concurrent tabs before risking Out-Of-Memory (OOM) crashes. Always use browser contexts rather than launching entirely new browser binaries to save memory.
Does increasing concurrency always increase throughput? +
No. Throughput follows Little's Law. If you increase concurrency but the target server slows down (latency increases), your actual Requests Per Second (RPS) will plateau or even drop. Pushing concurrency past the target's capacity just results in timeouts, 429s, and wasted proxy bandwidth.
How does DataFlirt handle 429 Too Many Requests errors? +
We treat 429s as a hard backpressure signal. Our scheduler immediately halves the concurrency for that target domain and initiates an exponential backoff. Once the 429s clear, the scheduler slowly probes the limit again using an additive increase strategy to find the new safe ceiling.
What is the impact of proxy latency on concurrency? +
High proxy latency forces you to run higher concurrency to maintain the same throughput. If your target RPS goal is 10, and proxy latency is 1 second, you need 10 workers. If proxy latency jumps to 5 seconds, you need 50 workers to maintain that same 10 RPS. This increases local memory usage and connection tracking overhead.
How do you prevent ephemeral port exhaustion? +
When running thousands of concurrent requests from a single machine, the OS can run out of available TCP ports (usually capped around 65,000). We mitigate this by using connection pooling (keep-alive), tuning the TCP TIME_WAIT state at the OS level, and distributing high-concurrency jobs across multiple Kubernetes pods.
$ dataflirt scope --new-project --target=worker-concurrency READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h