← Glossary / XHR Monitoring

What is XHR Monitoring?

XHR Monitoring is the practice of intercepting and analyzing asynchronous HTTP requests made by a web page's JavaScript after the initial document load. For scraping engineers, it's the difference between parsing a brittle, obfuscated DOM and directly capturing the clean JSON payloads that populate it. By hooking into the browser's network layer, pipelines can bypass rendering overhead entirely and extract structured data straight from the source APIs.

Network LayerAJAXJSON ExtractionPlaywrightAPI Interception
// 02 — definitions

Skip the DOM,
read the wire.

Why parse HTML when the data you want is already flowing across the network as perfectly structured JSON?

Ask a DataFlirt engineer →

TL;DR

XHR monitoring intercepts the background API calls a modern single-page application makes to fetch data. Instead of waiting for React or Vue to render a product grid into HTML, a scraper listens for the /api/products response and captures the raw JSON. It's faster, less prone to selector rot, and often yields hidden fields the UI never displays.

01Definition & structure
XHR Monitoring (often encompassing both XMLHttpRequest and the modern Fetch API) is the technique of listening to the browser's network traffic to capture data payloads directly. Instead of parsing the HTML document, the scraper attaches to the network layer and intercepts the JSON or XML responses that the frontend framework uses to populate the page. It is the most efficient way to extract data from Single Page Applications (SPAs).
02How it works in practice
Using browser automation tools like Playwright or Puppeteer, you register an event listener on the page's network traffic (e.g., page.on('response')). When the browser navigates to a target, the frontend JavaScript executes and requests data from a backend API. The listener identifies the specific API endpoint, waits for the response to complete, and parses the raw JSON body. The scraper can then immediately exit, bypassing the need to wait for the DOM to render.
03The hidden data advantage
Frontend developers frequently over-fetch data. An API might return a product object with 50 fields (including exact stock counts, wholesale margins, and internal supplier IDs), but the React component only renders 5 fields (title, price, image, and an "In Stock" badge). By monitoring the XHR response, scrapers gain access to the complete, unredacted dataset that the UI intentionally hides.
04How DataFlirt handles it
We operate a network-first extraction strategy. Every new pipeline is profiled to map its API dependencies. If the target data is available via XHR/Fetch, we configure our Playwright workers to intercept those specific routes. Once the JSON is captured, we aggressively abort all subsequent network requests (images, CSS, fonts) and skip DOM evaluation entirely. This allows us to run headless browsers at near-HTTP-client speeds.
05Did you know?
You can use XHR monitoring to reverse-engineer the API and eventually drop the browser entirely. By observing the exact headers, cookies, and payload structures the browser sends, you can replicate the request in a lightweight HTTP client (like httpx or aiohttp). This transition from browser-based interception to pure API scraping is the holy grail of pipeline optimization.
// 03 — the efficiency model

Why intercepting
beats rendering.

Extracting from XHR responses eliminates DOM traversal and layout calculation. DataFlirt's telemetry shows a massive reduction in CPU cycles and pipeline latency when shifting from DOM scraping to network interception.

Latency Reduction = Lsaved = Trender + Tpaint + Tdom_parse
Time saved by aborting the page load once the target JSON is captured. Browser rendering pipeline
Payload Density = D = JSON_bytes / HTML_bytes
JSON payloads typically have a 5x to 20x higher data-to-markup ratio than rendered HTML. DataFlirt extraction metrics
Network-First Hit Rate = H = XHR_extracts / Total_extracts
H > 0.72 across DataFlirt's active e-commerce pipelines as of v2026.5. DataFlirt internal SLO
// 04 — network trace

Intercepting a
product feed API.

A Playwright script attached to the network layer, capturing a paginated JSON response before the frontend framework even knows it arrived.

Playwright CDProute.continue()JSON
edge.dataflirt.io — live
CAPTURED
// attaching network listener via CDP
page.on('response', async response => ...)

// outbound request detected
req.method: "GET"
req.url: "https://api.target.com/v2/catalog?page=3"
req.headers.authorization: "Bearer eyJhbGci..."
req.headers.x-csrf-token: "a8f9d2..."

// inbound response intercepted
res.status: 200 OK
res.content_type: "application/json"
res.size: 42.8 KB

// payload extraction
data.items_count: 24
data.hidden_fields: ["margin", "supplier_id", "stock_exact"] // UI only shows "In Stock"

pipeline.status: EXTRACTED_WITHOUT_DOM
// 05 — interception targets

Where the best
data hides.

The most common and valuable XHR/Fetch endpoints targeted by DataFlirt's network interception layer across modern single-page applications.

PIPELINES ·  ·  ·  ·  ·   850+ active
XHR VOLUME ·  ·  ·  ·  ·  4.2B req/mo
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Product / Catalog APIs

JSON feeds · Pricing, variants, and exact inventory counts
02

Search / Autocomplete

Algolia / ES · Category hierarchies and facet counts
03

User Reviews / Comments

GraphQL / REST · Paginated user-generated content
04

Pricing / Availability

WebSockets / Polling · Real-time stock and dynamic pricing updates
05

Analytics / Telemetry

Tracking endpoints · Often leaks internal product IDs or A/B test states
// 06 — DataFlirt's network layer

Listen to the wire,

ignore the pixels.

For modern SPAs, rendering the DOM is a waste of compute. DataFlirt's extraction engine defaults to a network-first approach. We load the page, capture the authentication tokens generated by the frontend, and immediately hook the XHR/Fetch streams via the Chrome DevTools Protocol (CDP). If the target data is in the JSON payload, we abort the DOM rendering entirely. This reduces compute costs by up to 60% and completely immunizes the pipeline against CSS class changes and layout redesigns.

xhr-interceptor.config

Live telemetry from a network-first extraction worker on a travel aggregator.

worker.id net-ext-042
strategy network-first
target.url api.travel.in/v3/flights
dom.status aborted (unneeded)
payload.type application/json
schema.match 100%
cpu.utilization 12% (vs 85% headed)
blocked.requests 0

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About network interception, API scraping, legal boundaries, and how DataFlirt scales XHR monitoring.

Ask us directly →
What is the difference between XHR and Fetch monitoring? +
XMLHttpRequest (XHR) is the legacy browser API for asynchronous requests; Fetch is the modern standard. For scraping purposes, the distinction is irrelevant. Both are intercepted at the network layer using the Chrome DevTools Protocol (CDP) in tools like Playwright or Puppeteer. You monitor the network events, not the specific JavaScript API used to trigger them.
Why not just request the API directly using Python/Requests? +
Direct API requests are ideal, but modern APIs are heavily protected. You often need the browser to solve the initial JavaScript challenge (like Cloudflare Turnstile), generate CSRF tokens, or compute dynamic signature headers (like Akamai's sensor data). XHR monitoring lets the real browser do the hard work of authenticating the session, while you simply siphon the data off the wire.
Is intercepting undocumented APIs legal? +
The public data doctrine generally applies to the API response just as it does to the HTML. If the API is public, unauthenticated, and serves publicly available data, accessing it is typically lawful in the US and EU. However, bypassing authentication or reverse-engineering private endpoints carries higher risk. Always review the target's Terms of Service and consult counsel.
What if the API payload is encrypted or obfuscated? +
Some advanced anti-bot systems encrypt the JSON payload and decrypt it client-side using WebAssembly or obfuscated JS. In these cases, raw XHR monitoring yields ciphertext. We handle this by either capturing the decryption keys from the JS context at runtime, or falling back to DOM extraction once the frontend framework renders the decrypted data.
How does DataFlirt handle API rate limits during XHR monitoring? +
We rotate IPs and session tokens across our residential pool, matching the exact request headers the legitimate frontend sends. Because we are driving a real browser, the request cadence naturally mimics human interaction, keeping the API request rate well below the aggressive thresholds that trigger 429 Too Many Requests errors.
Does XHR monitoring prevent selector rot? +
Yes. Selector rot happens when frontend developers change CSS classes or HTML structure. API contracts (the JSON schema) change far less frequently because breaking an API breaks the entire frontend application. By extracting from the XHR response, your pipeline becomes immune to cosmetic UI updates.
$ dataflirt scope --new-project --target=xhr-monitoring READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h