← Glossary / Mock HTTP Response

What is Mock HTTP Response?

Mock HTTP Response is a simulated server reply used to test scraping pipelines without hitting the live target. Instead of making an actual network request, the fetch layer intercepts the call and returns a pre-recorded or synthetically generated payload. This isolates the extraction logic from network volatility, prevents accidental rate-limit triggers during CI/CD runs, and allows engineers to deterministically test edge cases like malformed JSON, 503 errors, or unexpected schema drift.

TestingCI/CDExtractionNetwork InterceptionDeterministic
// 02 — definitions

Test the parser,
not the network.

How simulating target responses prevents brittle tests, saves bandwidth, and keeps your IP out of the crosshairs during development.

Ask a DataFlirt engineer →

TL;DR

Mock HTTP responses decouple the extraction layer from the fetch layer. By feeding known HTML or JSON payloads into your parsers, you can validate schema contracts and error handling in milliseconds, without burning proxy bandwidth or risking a ban from the target site.

01Definition & structure
A mock HTTP response is a simulated network reply used during the testing phase of a scraping pipeline. Instead of the HTTP client (like requests, httpx, or Playwright) reaching out to the internet, an interceptor catches the request and immediately returns a predefined payload—often called a "fixture." This payload includes the status code, headers, and the raw body (HTML, JSON, or XML) exactly as the real server would have sent it.
02How it works in practice
In a CI/CD environment, running live scraping tests is an anti-pattern. Live targets are slow, they rate-limit you, and they change unpredictably, leading to flaky tests. By mocking the HTTP response, you isolate the extraction logic. You feed a known HTML string into your CSS selectors and assert that the output matches your expected schema. If the test fails, you know your parser is broken, not that the target site happened to be down for maintenance.
03Static vs Dynamic Mocks
Static mocks are hardcoded files saved in your repository (e.g., product_page.html). They are fast and deterministic but prone to going stale. Dynamic mocks use libraries to generate responses on the fly, allowing you to simulate complex behaviors like pagination loops, rotating anti-bot tokens, or random 500 errors to test your retry queues.
04How DataFlirt handles it
We treat test fixtures as ephemeral data. Our production pipelines automatically sample a fraction of successful and failed HTTP responses, strip them of PII, and push them to a centralized fixture registry. When our engineers modify an extractor, the CI suite pulls the freshest fixtures from the last 24 hours. This ensures our tests are deterministic, yet perfectly aligned with the target's current reality.
05The "Stale Mock" anti-pattern
The most dangerous thing in scraper maintenance is a test suite that passes while production burns. This happens when developers write a parser against a mock HTML file from six months ago. The target site redesigns their layout, production extractors start returning nulls, but the CI pipeline still reports 100% success because it's testing against the outdated mock. If you mock, you must have a strategy for automated fixture rotation.
// 03 — testing efficiency

Why mock in
CI/CD pipelines?

Live network tests are flaky and slow. DataFlirt's CI suite relies on mock responses to achieve sub-second test execution while maintaining high coverage across thousands of target schemas.

Test execution time = T = Ntests × (tparse + tmock_io)
Eliminates network latency, reducing T by ~98% compared to live tests. Standard CI/CD metrics
Mock coverage ratio = C = mocked_schemas / total_production_schemas
Target > 0.95 for stable extraction deployments. DataFlirt QA SLO
DataFlirt CI reliability = R = 1 − (flaky_tests / total_runs)
R > 0.999 when network I/O is fully mocked. Internal Engineering Metrics
// 04 — the test runner

Intercepting requests
in the test suite.

A trace from a Pytest run where the HTTP client is patched to return a local JSON fixture instead of hitting the live e-commerce API.

pytestresponsesmock
edge.dataflirt.io — live
CAPTURED
// test_product_extractor.py
mock.register(GET, "https://api.target.com/v1/product/42")
mock.body: file("fixtures/product_42_out_of_stock.json")
mock.status: 200

// execution
request: GET https://api.target.com/v1/product/42
interceptor: MATCH -> returning local fixture
latency: 1.2ms // vs ~850ms live

// extraction layer
parser.status: "OUT_OF_STOCK"
parser.price: null
schema.validate: PASS

// result
test_out_of_stock_handling: PASSED
// 05 — what to mock

The edge cases
you must simulate.

Live targets rarely fail when you want them to. Mocking allows you to force the extraction layer into failure modes deterministically. Ranked by frequency of unhandled exceptions in production.

PIPELINES TESTED ·  ·  ·  300+ active
FIXTURE COUNT ·  ·  ·  ·  42,000+
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Malformed JSON / Truncated HTML

94% frequency · Tests parser resilience to bad bytes
02

Missing optional fields

82% frequency · Ensures null handling doesn't crash
03

Unexpected HTTP status codes

71% frequency · Simulates 503s, 429s, and 403s
04

Schema drift / renamed keys

55% frequency · Validates quarantine logic
05

Tarpit / extremely slow responses

38% frequency · Tests timeout and retry queues
// 06 — our test infrastructure

Record in production,

replay in CI.

Hand-crafting mock responses is dangerous because developers build what they expect, not what the target actually serves. DataFlirt uses a shadow-traffic recorder. When a pipeline runs in production, a sample of raw HTTP responses is anonymised and committed directly to our fixture repository. When the extraction logic is updated, it is tested against the exact bytes the target served yesterday, not a sanitized ideal.

Fixture generation job

Automated sync of production responses to the CI test suite.

job.id sync-fixtures-092
source.pipeline ecom-catalog-IN
responses.sampled 500 records
anonymization PII scrubbed
schema.variants 4 detected
fixtures.written 4 new files
ci.status tests passing

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About testing strategies, fixture management, legal considerations, and how DataFlirt scales deterministic testing.

Ask us directly →
What's the difference between mocking and stubbing in scraping? +
Stubbing typically replaces an internal function (like returning a hardcoded dictionary instead of calling the parser). Mocking intercepts the HTTP request at the network boundary and returns a raw byte string or Response object. Mocking is superior for scraping because it tests the entire extraction and validation stack exactly as it runs in production.
How do you keep mock data from getting stale? +
Stale mocks are the biggest anti-pattern in scraper testing — your tests pass, but production fails because the site changed. We solve this by automatically recording a percentage of live production traffic and overwriting our test fixtures weekly. If the target's schema drifts, the new fixtures will cause the CI tests to fail, alerting us before the next deployment.
Is it legal to store scraped HTML for testing purposes? +
Generally, yes, under fair use or temporary technical reproduction exemptions, provided the data is used internally for functional testing and not republished or resold. However, if the HTML contains PII, you must scrub it before committing it to a repository to comply with GDPR/CCPA data minimization principles.
How does DataFlirt handle mocking for headless browsers? +
For Playwright and Puppeteer, we use route interception (page.route()). Instead of letting the browser fetch the live URL, we intercept the request and fulfill it with a local HTML file. This allows us to test DOM-based extraction logic (like XPath or CSS selectors) instantly, without the overhead of actual network navigation.
Should I mock the proxy layer too? +
Yes. Your tests should never hit a real proxy provider. Mock the proxy endpoint to return 200 OKs for successful routing, and occasionally return 407 Proxy Authentication Required or 502 Bad Gateway to ensure your retry and proxy-rotation logic handles infrastructure failures gracefully.
How do you scale mock testing across thousands of targets? +
We don't write mocks manually. Our infrastructure automatically generates them. Every target has a defined schema contract. Our CI pipeline dynamically spins up tests that feed both "golden path" fixtures (recorded from production) and synthetically mutated fixtures (missing fields, wrong types) into the parser to guarantee the schema validation layer catches anomalies.
$ dataflirt scope --new-project --target=mock-http-response READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h