← Glossary / Event Loop Performance

What is Event Loop Performance?

Event loop performance dictates how many concurrent network requests a single scraping worker can sustain. In asynchronous environments like Node.js or Python's asyncio, a single thread juggles thousands of open sockets. If that thread is blocked by a CPU-heavy task — like parsing a 50MB JSON payload or executing a complex regex — the loop stalls. When the loop stalls, active connections time out, keep-alives drop, and pipeline throughput collapses.

AsyncioNode.jsConcurrencyI/O BoundThroughput
// 02 — definitions

One thread,
thousands of sockets.

How modern asynchronous scrapers juggle massive I/O concurrency without spawning expensive OS threads, and why CPU-bound tasks are the silent killer of pipeline throughput.

Ask a DataFlirt engineer →

TL;DR

Scraping is overwhelmingly I/O bound — workers spend 99% of their time waiting for target servers to respond. Asynchronous event loops solve this by parking idle sockets and moving on to the next task. But because the loop runs on a single thread, any synchronous CPU work blocks the entire process, causing cascading timeouts across all active connections.

01Definition & structure
The event loop is the execution model used by asynchronous runtimes (like Node.js and Python's asyncio) to handle concurrency. Instead of assigning one OS thread per network request, a single thread runs an infinite loop. It asks the operating system, "Which network sockets have data ready?" It processes that data, schedules the next step, and immediately moves to the next socket. This allows massive throughput for I/O bound tasks, but introduces a critical vulnerability: any CPU-heavy synchronous code will block the thread, stopping the entire loop.
02The blocking problem
In a multi-threaded scraper, if one thread gets stuck parsing a 50MB JSON file, the other 99 threads keep fetching pages. In an asynchronous scraper, if the event loop gets stuck parsing that same JSON file, every other request freezes. Timers don't fire, incoming packets aren't read, and TLS handshakes time out. This is known as event loop starvation. The most common culprits in scraping are heavy regex evaluations, synchronous HTML parsing (like standard BeautifulSoup), and large JSON decoding.
03Measuring event loop lag
You cannot measure event loop health by looking at CPU usage. A blocked event loop uses exactly 100% of one CPU core, which looks identical to a healthy event loop processing 10,000 requests per second. The correct metric is tick delay (or loop lag). You schedule a timer to run every 10ms. If it actually runs every 11ms, your lag is 1ms (healthy). If it runs every 800ms, your loop is severely blocked, and your pipeline is likely dropping connections.
04How DataFlirt handles it
We enforce strict isolation between fetching and extraction. Our async fetch workers do nothing but manage sockets, handle proxies, and stream bytes. When a response body is fully buffered, it is immediately dispatched to a separate thread pool or a dedicated extraction microservice. This ensures our fetch workers maintain sub-5ms event loop lag, allowing a single core to sustain thousands of concurrent connections without risking timeout cascades.
05Did you know?
Python's standard asyncio event loop is notoriously slow for high-throughput scraping. Swapping it for uvloop (a Cython wrapper around libuv, the same engine Node.js uses) can increase your scraper's network throughput by 2-4x without changing any of your actual fetching logic. It is the easiest performance win in the Python scraping ecosystem.
// 03 — concurrency math

How much can
one loop handle?

The theoretical limit of an event loop is bounded by memory and OS file descriptors. The practical limit is dictated by event loop lag. DataFlirt monitors tick delay to autoscale workers before timeouts occur.

Event Loop Lag = L = TactualTexpected
The delay between scheduling a callback and its execution. L > 50ms indicates CPU blocking. Node.js / libuv metrics
Max Concurrency (Little's Law) = C = RPS × Latency
The number of in-flight requests a worker must hold to sustain a target RPS. Queueing Theory
DataFlirt Worker Saturation = S = (Active_Sockets / Max_FDs) + Lag_Penalty
Triggers horizontal scaling when S > 0.85 to prevent socket exhaustion. DataFlirt auto-scaler
// 04 — event loop trace

When a 40MB JSON
blocks the world.

A live trace of a Python asyncio worker experiencing event loop starvation. A synchronous JSON parse blocks the thread, causing 14 concurrent TLS handshakes to time out.

python 3.11asynciouvloop
edge.dataflirt.io — live
CAPTURED
// worker status: healthy
active_tasks: 1,240 loop_lag: 2.4ms
mem_usage: 412MB

// large payload arrives
recv_bytes: 42,104,832 source: "api.target.com/catalog"
task: "json.loads(payload)" // synchronous call ⚠

// event loop blocked
loop_lag: 840ms // thread is locked parsing JSON
asyncio.warn: Executing <Task pending...> took 0.842 seconds

// cascading failures
timeout: TLS handshake failed (14 tasks)
timeout: ReadTimeout on active sockets (89 tasks)
dropped_connections: 103

// recovery
task: json.loads complete
loop_lag: 3.1ms // loop resumes, but connections are lost
// 05 — starvation vectors

What actually blocks
the event loop.

Scraping is I/O bound until it suddenly isn't. These are the most common CPU-bound operations that accidentally starve asynchronous scraping workers, ranked by frequency in our incident logs.

INCIDENTS ANALYSED ·  ·   1,420 timeouts
RUNTIME ·  ·  ·  ·  ·  ·  Node.js & Python
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Synchronous JSON parsing

40% of stalls · json.loads() on payloads > 10MB locks the thread
02

Heavy DOM traversal

25% of stalls · Complex XPath/CSS evaluation on massive HTML
03

Catastrophic Regex

18% of stalls · Backtracking regex on minified JS bundles
04

Cryptographic hashing

10% of stalls · Generating auth tokens (e.g., PBKDF2) synchronously
05

Blocking Disk I/O

7% of stalls · Writing logs or raw HTML to disk without async wrappers
// 06 — DataFlirt's architecture

Keep the loop clean,

offload the heavy lifting.

At DataFlirt, our fetch workers do exactly one thing: manage asynchronous network sockets. The moment a payload arrives, it is handed off to a separate thread pool or a dedicated extraction service. By strictly isolating I/O from CPU-bound parsing, our event loops maintain sub-5ms tick delays even when processing 10,000 concurrent connections per core. If a JSON payload takes 2 seconds to parse, it happens on a background thread, leaving the main loop free to negotiate TLS handshakes for other tasks.

Worker Node Telemetry

Live metrics from an async fetch worker on a high-throughput pipeline.

worker.id fetch-async-eu-04
active_sockets 8,420high concurrency
event_loop_lag 2.1mshealthy
cpu.main_thread 14%I/O only
cpu.worker_pool 88%parsing offloaded
timeout_rate 0.01%nominal

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About asynchronous scraping, event loop lag, CPU offloading, and how DataFlirt maintains high throughput without dropping connections.

Ask us directly →
What exactly is event loop lag? +
It is the difference between when a callback is scheduled to run and when it actually executes. If you set a timer for 10ms, but the event loop is busy parsing a massive string for 100ms, your timer callback fires at 110ms. That 100ms difference is the lag. High lag means the loop is blocked, and network events (like incoming packets) are sitting unprocessed in the OS buffer.
Why not just use multi-threading instead of async? +
OS threads are expensive. Each thread consumes memory (typically 1-2MB for the stack) and incurs context-switching overhead. Spawning 10,000 threads to handle 10,000 concurrent requests will crash most machines. An asynchronous event loop can handle those 10,000 requests on a single thread using a fraction of the memory, provided the tasks are strictly I/O bound.
How do I parse large JSON without blocking the loop? +
Never use standard synchronous parsers (like JSON.parse in Node or json.loads in Python) for payloads over a few megabytes. Instead, use a streaming parser like ijson, or offload the parsing task to a worker thread (using worker_threads in Node or asyncio.to_thread() in Python). This keeps the CPU-heavy work off the main event loop.
Does Playwright or Puppeteer block the Node.js event loop? +
No. The actual browser rendering, JavaScript execution, and DOM parsing happen in a separate Chromium process. The Node.js event loop only handles the WebSocket communication between your script and the browser. However, if you pull a massive amount of data out of the browser and process it synchronously in your Node script, you will block the loop.
How does DataFlirt monitor event loop health? +
We track tick delay (loop lag) at 1-second intervals across our entire fetch fleet. If a worker's lag exceeds 50ms for three consecutive intervals, our orchestrator stops routing new URLs to that worker and spins up additional horizontal capacity. This ensures that a sudden influx of heavy payloads doesn't cause cascading timeouts for other clients sharing the infrastructure.
What happens to active requests when the loop blocks? +
They rot. If the loop is blocked for 2 seconds, it cannot process incoming TCP packets or send keep-alives. Target servers may assume your client has died and close the connection (Connection Reset by Peer). Even worse, your own internal timeouts may trigger simultaneously when the loop unblocks, causing hundreds of requests to fail at the exact same millisecond.
$ dataflirt scope --new-project --target=event-loop-performance READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h