← Glossary / Fetch API Interception

What is Fetch API Interception?

Fetch API interception is the technique of hooking into a browser's native network layer to capture, modify, or block HTTP requests and responses before the page's JavaScript processes them. For scraping engineers, it is the cleanest path to extracting structured JSON from single-page applications, bypassing the need to parse brittle DOM elements or reverse-engineer complex API authentication tokens.

Network LayerHeadless BrowsersJSON ExtractionPlaywrightBandwidth Optimization
// 02 — definitions

Bypass the DOM,
capture the wire.

Why parse HTML when the server is already sending perfectly structured JSON to the client?

Ask a DataFlirt engineer →

TL;DR

Fetch API interception allows a headless browser script to listen to all network traffic generated by a page. Instead of waiting for React or Vue to render a product grid, you intercept the underlying /api/products response, grab the JSON, and abort the render. It reduces pipeline latency, eliminates selector rot, and drastically cuts bandwidth costs.

01Definition & structure
Fetch API interception is a technique used in headless browser automation to monitor, modify, or block HTTP requests made by the browser's native fetch() or XMLHttpRequest APIs. Instead of letting the browser blindly send and receive data, the automation script registers a handler that intercepts the traffic. This allows scrapers to extract clean JSON payloads directly from the network tab, bypassing the need to parse the rendered HTML DOM.
02How it works in practice
In frameworks like Playwright, you use page.route('**/*', handler) to intercept traffic. When the page attempts to load a resource, the handler evaluates the URL. If it's an image or tracking script, the handler aborts the request to save bandwidth. If it's the target API endpoint (e.g., /api/products), the handler allows the request to proceed, captures the JSON response from the server, saves the data to your pipeline, and then immediately closes the browser context before the page wastes CPU cycles rendering the UI.
03Blocking resources for performance
Beyond data extraction, interception is the primary tool for optimizing headless browser performance. Loading a modern e-commerce page might trigger 150 requests and download 4MB of data. By intercepting and aborting requests for images, fonts, CSS, and third-party analytics, you can reduce the payload to just the HTML and core JavaScript (often under 300KB). This drastically reduces proxy bandwidth costs and speeds up page load times.
04How DataFlirt handles it
We default to network interception for all Single Page Applications. Our extraction engine automatically profiles a target site's network traffic during the scoping phase, identifying the exact GraphQL or REST endpoints that carry the payload. In production, our Playwright workers intercept these specific routes, extract the JSON, validate it against our schema contracts, and terminate the session. We rarely parse the DOM unless the target relies entirely on Server-Side Rendering.
05The XHR vs Fetch distinction
Historically, web apps used XMLHttpRequest (XHR) for asynchronous data loading. Modern apps use the fetch() API. While they are different JavaScript interfaces, modern automation tools (via the Chrome DevTools Protocol) intercept both at the network layer. You don't need to write separate handlers for XHR and Fetch; the browser's network stack treats them identically once the request is dispatched.
// 03 — the efficiency model

How much faster
is interception?

Intercepting API responses and aborting subsequent renders changes the cost structure of headless scraping. DataFlirt models pipeline efficiency based on bytes saved and CPU cycles bypassed.

Latency reduction = Lsaved = Trender + Tpaint + Tidle
Time saved by capturing the JSON and immediately closing the page context. Browser performance metrics
Bandwidth optimization = Bsaved = Btotal − (Bhtml + Btarget_api)
Blocking images, fonts, and third-party scripts via interception saves 70-90% of bandwidth. DataFlirt infrastructure benchmarks
DataFlirt extraction yield = Y = (JSON_records / API_bytes) × 1000
High yield indicates efficient interception; low yield implies downloading unnecessary payload bloat. Internal SLO
// 04 — playwright route trace

Intercepting a
GraphQL endpoint.

A live trace of a Playwright script intercepting a target's search API, extracting the payload, and aborting image assets to save bandwidth.

Playwright route()JSON payloadResource blocking
edge.dataflirt.io — live
CAPTURED
// network interception initialized
page.route: "**/*" attached

// outbound request filtering
req.url: "https://target.com/assets/hero.jpg"
action: abort() // blocked to save bandwidth

// target API identified
req.url: "https://target.com/graphql"
req.method: "POST"
req.postData: "query SearchProducts..."
action: continue()

// response captured
res.status: 200 OK
res.contentType: "application/json"
payload.size: 142.5 KB
records.extracted: 48

// pipeline action
page.close: executed // DOM render bypassed
// 05 — interception targets

Where the data
actually lives.

The most common types of intercepted requests across DataFlirt's headless pipelines. Extracting from these endpoints directly is always more stable than DOM parsing.

PIPELINES ·  ·  ·  ·  ·   410+ headless
INTERCEPT RATE ·  ·  ·    82% of jobs
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

GraphQL queries

Search & Catalog · Highly structured, predictable schema
02

REST API JSON

Pagination & Details · Standard XHR/Fetch responses
03

Next.js hydration data

_next/data/*.json · Pre-rendered state payloads
04

Typeahead endpoints

Autocomplete · Fast, lightweight entity resolution
05

WebSocket frames

Real-time feeds · Live pricing and inventory updates
// 06 — our architecture

Listen to the wire,

ignore the paint.

DataFlirt's headless fleet relies heavily on network interception. When we load a modern SPA, we don't wait for the DOM to settle. We inject route handlers that listen for the specific API endpoints containing the target data. Once the JSON payload is captured, we immediately abort the page load. This approach cuts compute costs by 60% and completely immunizes the pipeline against frontend UI redesigns. If the CSS classes change, our extractors don't care—the underlying API contract rarely shifts without warning.

route_handler.config

Standard interception ruleset for a high-volume e-commerce pipeline.

target.endpoint */api/v2/catalog/*
action.on_match capture_body
block.types image, media, fontaborted
block.domains *google-analytics.com*
header.injection x-df-trace-idactive
render.bypass trueDOM ignored

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about network interception, headless browser performance, and bypassing anti-bot protections.

Ask us directly →
Why use interception instead of just parsing the HTML? +
HTML parsing relies on CSS selectors, which break whenever a site redesigns its frontend. Modern sites are usually Single Page Applications (SPAs) that fetch data as JSON and render it client-side. By intercepting the Fetch API, you grab the raw JSON before it becomes HTML. JSON schemas are much more stable than UI layouts, drastically reducing pipeline maintenance.
Can anti-bot systems detect that I am intercepting requests? +
Generally, no. Interception happens at the browser level (via CDP in Puppeteer/Playwright). The server just sees a normal request originating from the browser. However, if you use interception to block essential anti-bot scripts (like DataDome or Cloudflare challenges), the server will notice the missing telemetry and flag your session.
How do you handle encrypted or obfuscated API payloads? +
Some high-security targets encrypt their JSON payloads and decrypt them client-side using WebAssembly or obfuscated JS. In these cases, raw network interception yields useless ciphertext. We handle this by letting the page's native JS decrypt the payload, then intercepting the data at the application state level (e.g., hooking into Redux stores or overriding the JSON.parse method).
How does DataFlirt scale interception across millions of pages? +
We use a hybrid approach. We run a headless browser once to capture the exact headers, cookies, and tokens the SPA generates for the API request. We then extract those credentials and replay the API requests using lightweight, concurrent HTTP clients (like Go's net/http) instead of spinning up a browser for every page. This gives us the scale of plain HTTP with the token-solving power of a real browser.
Can I modify requests before they are sent to the server? +
Yes. Playwright and Puppeteer allow you to mutate outbound requests. You can inject custom headers, modify POST bodies, or strip out tracking cookies. This is particularly useful for bypassing geo-blocks by injecting specific X-Forwarded-For headers or manipulating pagination parameters that aren't exposed in the UI.
Does interception work for Server-Side Rendered (SSR) pages? +
No. If the data is baked into the HTML on the server (like traditional PHP or early Next.js SSR pages), there is no subsequent API request to intercept. For SSR pages, you must either parse the DOM or look for embedded JSON state objects (like __NEXT_DATA__) injected directly into the HTML source.
$ dataflirt scope --new-project --target=fetch-api-interception READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h