← Glossary / Multi-Tab Scraping

What is Multi-Tab Scraping?

Multi-tab scraping is the practice of opening concurrent pages within a single browser instance or context to extract data. While it drastically reduces the memory overhead compared to launching separate headless browsers, it forces all tabs to share the same main thread, CPU resources, and session state. For scrapers, it's a high-risk optimization: push the tab count too high, and a single heavy DOM render will crash the entire instance, taking down dozens of in-flight extractions with it.

PlaywrightPuppeteerConcurrencyMemory ManagementBrowser Context
// 02 — definitions

Concurrency
on a budget.

The mechanics of packing multiple extractions into a single browser instance, and why the main thread always wins in the end.

Ask a DataFlirt engineer →

TL;DR

Multi-tab scraping shares a single browser process across multiple pages. It cuts memory usage by up to 80% compared to isolated instances, but introduces shared-state pollution and single-point-of-failure risks. Modern pipelines prefer isolated browser contexts over raw multi-tabbing to balance density with reliability.

01Definition & structure
Multi-tab scraping involves launching a single headless browser executable and calling browser.newPage() repeatedly to process multiple URLs in parallel. Because the base browser process (Chromium/Firefox) is only loaded once, it saves significant RAM compared to launching a new browser for every URL. However, all tabs within the same context share the same cookie jar, local storage, and cache.
02How it works in practice
In a typical Node.js script, an array of URLs is mapped to a pool of open tabs. As one tab finishes extracting data, it navigates to the next URL in the queue. While memory is saved, the V8 JavaScript engine's main thread is shared. If Tab A is parsing a massive React application, Tab B's network requests will stall waiting for the event loop to clear, leading to unpredictable timeout errors.
03The memory vs. isolation tradeoff
The primary reason engineers use multi-tab scraping is cost: running 10 browsers requires ~1.5GB of RAM, while running 1 browser with 10 tabs requires ~400MB. The tradeoff is isolation. If you are scraping an e-commerce site and need to rotate proxies per request to avoid bans, standard tabs will leak IP and cookie state to each other, resulting in a swift IP ban across the entire batch.
04How DataFlirt handles it
We enforce strict isolation. Instead of raw tabs, our infrastructure provisions Playwright Browser Contexts. Each context acts as an incognito window with its own proxy routing, user-agent, and storage. We limit each context to exactly one active tab. This provides the memory efficiency of a shared browser process with the network and state isolation of entirely separate machines.
05The target="_blank" trap
A common failure mode in multi-tab scraping occurs when a script clicks a link that opens in a new tab natively. The browser spawns an unmanaged tab that the scraping framework isn't tracking. This rogue tab consumes memory, plays media, and executes tracking scripts indefinitely until the parent browser is forcefully killed. Production scrapers must explicitly intercept and block new window creation.
// 03 — the resource math

How much memory
does a tab save?

A base Chromium instance costs ~150MB just to launch. Each new tab adds only the DOM and JavaScript heap overhead. But CPU is shared, meaning event loop latency scales linearly with tab count.

Instance Memory = M = 150 + (N × DOM_size)
N tabs in one instance. Memory grows linearly, but avoids the 150MB base tax per page. Chromium process model
CPU Contention = Trender = base_render × (N / cores)
The V8 main thread blocks across tabs. Too many tabs = timeout errors. V8 Engine architecture
DataFlirt Context Limit = Nmax = RAMavail / (DOMavg × 1.5)
We hard-cap tabs per worker to prevent OOM kills during unexpected DOM spikes. Internal SLO
// 04 — playwright trace

Pushing the main thread
past the breaking point.

A Playwright script attempting to open 20 heavy e-commerce product pages in a single browser context. Watch the memory and CPU contention cascade as the event loop chokes.

PlaywrightNode.jsOOM Crash
edge.dataflirt.io — live
CAPTURED
// init
browser: launched Chromium v124
context: created (memory: 142MB)

// loop
tab_01: navigating to /product/1
tab_05: navigating to /product/5
tab_10: navigating to /product/10

// contention
warn: event loop lag > 500ms
tab_03: DOMContentLoaded (memory: 410MB)
tab_15: navigating to /product/15
warn: event loop lag > 2100ms

// crash
error: Page crashed! (Target OOM)
browser: disconnected
pipeline.status: failed (20 records lost)
// 05 — failure modes

Why multi-tab
jobs fail.

Ranked by frequency of pipeline crashes when clients attempt to scale concurrency purely through tab creation rather than horizontal scaling.

SAMPLE SIZE ·  ·  ·  ·    1.2M jobs
FRAMEWORK ·  ·  ·  ·  ·   Playwright/Puppeteer
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Out of Memory (OOM) kills

89% of crashes · DOM spikes exceed container RAM limits
02

Main thread timeouts

76% of crashes · Event loop blocked by heavy JS execution
03

Shared state pollution

62% of crashes · Cookies/storage bleed between independent tasks
04

Unhandled popup hijacking

45% of crashes · target=_blank steals focus and breaks selectors
05

Cross-tab anti-bot detection

31% of crashes · Service workers flag unnatural concurrent behavior
// 06 — our architecture

Contexts over tabs,

isolation over raw density.

At DataFlirt, we rarely use raw multi-tab scraping. Instead, we use isolated Browser Contexts. A context shares the underlying browser executable but maintains completely separate cookie jars, local storage, and cache. If one context crashes due to a massive DOM, it doesn't take down the others. We cap concurrency at 4–8 contexts per core, ensuring the V8 event loop never blocks long enough to trigger a timeout. Density is useless if it destroys reliability.

Browser worker allocation

Live resource allocation for a single DataFlirt extraction worker.

worker.id node-7a-blr
browser.instance Chromium 124
contexts.active 6
tabs.per_context 1
memory.total 1.2 GB
event_loop.lag 12ms
status stable

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About concurrency limits, shared state, Playwright vs Puppeteer, and how DataFlirt scales browser infrastructure without crashing.

Ask us directly →
Why not just open 100 tabs to scrape faster? +
Because browsers are fundamentally single-threaded for JavaScript execution per domain. If you open 100 tabs, they fight for the same CPU core. The event loop blocks, network requests time out, and memory balloons until the OS kills the process. Horizontal scaling (more containers) is always safer than vertical scaling (more tabs).
Do tabs share cookies and local storage? +
Yes, if they are opened within the same browser context. This is useful if you need to log in once and scrape 50 pages behind the auth wall. It is disastrous if you are trying to scrape 50 independent sessions with different residential proxies, as the target will see the same cookies across all IPs and immediately ban you.
How does DataFlirt handle popups that open new tabs? +
We intercept them at the protocol level. When a site tries to open a target="_blank" link, our framework catches the event, prevents the new tab from spawning, and routes the URL to a controlled, isolated context. Unhandled popups steal focus and break DOM selectors; we never let the browser manage them natively.
Is it better to use Playwright or Puppeteer for tab management? +
Playwright. Puppeteer's architecture makes true isolation difficult without launching entirely new browser instances. Playwright was built from the ground up around the BrowserContext API, which provides incognito-like isolation (separate cookies, cache, and proxy settings) while sharing the same underlying browser process.
Can anti-bot systems detect multi-tab scraping? +
Yes. Advanced anti-bot scripts (like DataDome or Akamai) use SharedWorkers or BroadcastChannel APIs to communicate across tabs. If they detect 20 tabs navigating simultaneously with identical hardware concurrency signatures but different proxy IPs, they flag the session as automated.
How do you scale if you don't use hundreds of tabs? +
We scale horizontally. A standard DataFlirt worker node runs a single browser instance with 5 isolated contexts, each running 1 tab. If we need 500 concurrent pages, we spin up 100 worker nodes. Cloud compute is cheap; debugging a 50-tab OOM crash at 3 AM is expensive.
$ dataflirt scope --new-project --target=multi-tab-scraping READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h