← Glossary / DOM Ready Time

What is DOM Ready Time?

DOM ready time is the exact millisecond when a headless browser finishes parsing an HTML document and builds the Document Object Model, firing the DOMContentLoaded event. For scraping pipelines, it marks the earliest possible moment you can safely execute extraction scripts or interact with the page. Waiting for full network idle wastes compute; extracting before DOM ready guarantees missing nodes and silent pipeline failures.

Scraping BrowsersPlaywrightPerformanceEvent LoopLatency
// 02 — definitions

The parsing
threshold.

The critical boundary between raw bytes arriving over the wire and a queryable structure you can actually extract data from.

Ask a DataFlirt engineer →

TL;DR

DOM ready time occurs when the HTML is fully parsed and synchronous scripts have executed, but before external resources like images, iframes, or async stylesheets finish loading. In Playwright and Puppeteer, waiting for domcontentloaded instead of networkidle can cut scrape latency by 40–70% on media-heavy targets.

01Definition & structure
DOM ready time measures the duration from the start of navigation until the browser's main thread finishes parsing the HTML and constructing the Document Object Model. At this exact moment, the browser fires the DOMContentLoaded event. It guarantees that the structural skeleton of the page is complete and that all synchronous <script> tags have been executed.
02How it works in practice
When a headless browser fetches a URL, it reads the HTML stream top-to-bottom. If it encounters a synchronous script, parsing stops until the script is downloaded and executed. Once the closing </html> tag is processed, the DOM is ready. For a scraper, this is the green light: CSS selectors will now work reliably against the static structure of the page.
03The cost of waiting too long
Many scraping tutorials recommend waiting for networkidle (no active network connections for 500ms). This is a massive performance trap. Modern sites constantly poll analytics, stream video chunks, or refresh ads. Network idle might take 15 seconds or simply timeout. If your target data is in the HTML, extracting at DOM ready reduces your compute bill and proxy duration by orders of magnitude.
04How DataFlirt handles it
We configure 90% of our Playwright and Puppeteer fleet to resolve navigation promises at domcontentloaded. We pair this with aggressive request interception to block third-party scripts that would otherwise delay the parser. For client-rendered data, we transition immediately from DOM ready into targeted waitForSelector or waitForResponse calls, ensuring we only wait for the exact bytes we need.
05Did you know?
Scripts loaded with the async or defer attributes do not block the HTML parser. This means the DOMContentLoaded event will fire before these scripts have finished executing. If your target data relies on a deferred script to render, extracting at DOM ready will return empty nodes.
// 03 — latency math

How much time
are you wasting?

The delta between DOM ready and full page load is pure waste if your target data is in the initial HTML. DataFlirt's fleet scheduler uses these metrics to aggressively prune wait states.

Wasted Compute = W = TloadTdom_ready
Often 2–5 seconds on ad-heavy media sites. Pure overhead. Browser Performance API
Effective Scrape Latency = L = TTFB + Parse_Time + Sync_JS_Exec
The critical path to DOM ready. Images and async scripts are excluded. DataFlirt extraction model
DataFlirt Fleet Efficiency = E = 1 − (Idle_Wait / Total_Session_Time)
Target E > 0.92 across our headless cluster. Internal SLO
// 04 — browser trace

The race to
DOMContentLoaded.

A Playwright trace showing the exact sequence of events from navigation start to the DOM ready threshold on a modern e-commerce product page.

PlaywrightTrace ViewerHeadless Chromium
edge.dataflirt.io — live
CAPTURED
// navigation start
page.goto: "https://target.com/product/123"
event: commit
ttfb: 142ms

// parsing phase
html.parse: started
script.sync: blocking main thread (45ms)
html.parse: resumed

// threshold reached
event: domcontentloaded
dom_ready_time: 318ms

// extraction execution
page.evaluate: "extract_product_schema()"
data.yield: 1 record

// background noise (ignored)
image.load: pending (1200ms)
event: networkidle // 2.4s later, completely unnecessary
// 05 — blocking factors

What delays
DOM ready.

The primary culprits that push DOM ready time higher, ranked by their impact across DataFlirt's headless browser fleet.

SAMPLE SIZE ·  ·  ·  ·    8.4M sessions
AVG DOM READY ·  ·  ·  ·  412ms
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Synchronous JavaScript in <head>

Main thread block · Scripts without async/defer halt the HTML parser entirely.
02

Large HTML payload size

Network + Parse time · Megabytes of inline JSON state or bloated DOM trees.
03

Synchronous CSS

Render tree block · Delays script execution, which in turn delays DOM ready.
04

Slow TTFB

Network latency · The parser can't start until the first bytes arrive.
05

CPU throttling

Container limits · Under-provisioned scraping workers parse HTML slower.
// 06 — our stack

Extract at the threshold,

never wait for the dust to settle.

Waiting for a modern web page to fully load is a fool's errand. Ad networks, tracking pixels, and lazy-loaded carousels mean networkidle might never fire, or take 10 seconds if it does. DataFlirt's extraction engine hooks directly into the DOMContentLoaded event. If the data we need is rendered client-side after DOM ready, we don't wait for the whole page — we attach a MutationObserver to the specific parent node. This surgical approach cuts our average headless session time by 68% compared to naive Playwright scripts.

Playwright lifecycle config

Standard navigation parameters for a DataFlirt headless worker.

waitUntil domcontentloaded
timeout 5000ms
abort_resource_types image, media, font
block_domains google-analytics.com, doubleclick.net
mutation_target #price-container
session_duration 412ms
networkidle_fallback disabled

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About browser lifecycles, extraction timing, and how DataFlirt optimises headless compute costs.

Ask us directly →
What is the difference between DOM ready and page load? +
DOM ready (DOMContentLoaded) fires when the initial HTML document has been completely loaded and parsed. Page load (load) fires only after all dependent resources — stylesheets, images, and iframes — have finished loading. For scraping, the data is usually available at DOM ready.
Why does my scraper fail if I extract exactly at DOM ready? +
Because the data you want is likely being fetched via an XHR/fetch request that is triggered by a script executing at DOM ready. In Single Page Applications (SPAs), DOM ready just means the empty container <div id="app"> is ready. You need to wait for the specific network response or DOM mutation, not the generic page load event.
Is DOM ready relevant for plain HTTP scrapers? +
No. Plain HTTP clients (like httpx or requests) just download the raw bytes. They don't parse HTML into a DOM, execute JavaScript, or fire lifecycle events. DOM ready is strictly a browser engine concept.
How does DataFlirt handle SPAs where DOM ready is too early? +
We still navigate until domcontentloaded to release the navigation promise quickly. Then, we explicitly await the specific XHR response containing the JSON payload, or use page.waitForSelector() for the exact element. We never fall back to waiting for network idle, which is non-deterministic and slow.
Does blocking images speed up DOM ready time? +
No. Images are loaded asynchronously and do not block the HTML parser. Blocking images saves bandwidth and speeds up the load event, but it has zero impact on DOMContentLoaded. Blocking synchronous third-party scripts, however, will drastically improve DOM ready time.
Is it legal to block ads and trackers to speed up scraping? +
Yes. You control your client. There is no legal obligation to download or execute third-party tracking scripts, render ads, or load images when accessing public data. Resource blocking is a standard, lawful optimization technique for both human users (via adblockers) and automated pipelines.
$ dataflirt scope --new-project --target=dom-ready-time READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h