← Glossary / Conditional GET (ETag)

What is Conditional GET (ETag)?

Conditional GET (ETag) is an HTTP mechanism that allows a client to ask the server for a resource only if it has changed since the last fetch. By sending an If-None-Match header with a previously cached entity tag (ETag), scrapers can receive a lightweight 304 Not Modified response instead of downloading the full payload again. For high-frequency polling pipelines, it's the difference between saturating your proxy bandwidth and running a highly efficient, near-real-time synchronization loop.

HTTP HeadersBandwidth OptimizationCachingPolling304 Not Modified
// 02 — definitions

Fetch only
what changed.

The mechanics of using entity tags to eliminate redundant data transfer and drastically reduce proxy bandwidth costs on high-frequency pipelines.

Ask a DataFlirt engineer →

TL;DR

A Conditional GET uses the If-None-Match or If-Modified-Since headers to validate a cached resource. If the server's current version matches the client's ETag, it returns a 304 Not Modified with an empty body. This cuts egress costs, lowers latency, and reduces the risk of rate-limiting on target servers.

01Definition & structure
A Conditional GET is an HTTP request that asks the server to return the resource only if it meets certain criteria—typically, if it has been modified since the client last downloaded it. The server provides an ETag (Entity Tag) in its initial response. On subsequent requests, the client sends this tag back in the If-None-Match header. If the server's current tag matches the client's, the server replies with a 304 Not Modified status and an empty body, saving the bandwidth of transferring the full payload.
02How it works in practice
In a high-frequency scraping pipeline (e.g., polling a stock ticker or inventory API every 5 seconds), downloading a 200 KB JSON file repeatedly wastes proxy bandwidth and slows down the extraction loop. By implementing Conditional GETs, the scraper only downloads the 200 KB payload when the data actually changes. The other 99% of the time, the network exchange is just a few hundred bytes of headers, drastically reducing egress costs and execution time.
03Weak vs. Strong ETags
ETags come in two variants. A Strong ETag (e.g., "12345") guarantees that the resource is byte-for-byte identical to the cached version. A Weak ETag (e.g., W/"12345") indicates that the resource is semantically equivalent, even if minor byte-level changes occurred (like a dynamically generated timestamp in the footer). Weak ETags are highly useful for scrapers, as they prevent trivial DOM changes from triggering massive, unnecessary downloads.
04How DataFlirt handles it
We handle ETag caching automatically at the edge worker level. When you configure a pipeline for high-frequency polling, our proxy gateways store the ETags and Last-Modified timestamps for every target URL. We inject the If-None-Match headers into your outbound requests transparently. If the target returns a 304, we serve the cached 200 OK response back to your extraction logic instantly. You write standard stateless scraping code; we optimize the network layer to keep your proxy bills flat.
05Did you know?
ETags can be weaponized by anti-bot systems for stateless tracking. Because a client voluntarily echoes the ETag back to the server in the If-None-Match header, a server can issue a unique, randomized ETag to every new visitor. Even if the scraper clears its cookies and rotates its IP, sending that unique ETag back instantly de-anonymizes the session, allowing the WAF to link the new IP to the old bot profile.
// 03 — the efficiency model

How much bandwidth
does it save?

Conditional GETs shift the bottleneck from network I/O to connection overhead. DataFlirt's scheduler calculates the cache hit rate to optimize polling frequencies without burning proxy data.

Bandwidth Saved = (Rtotal × Psize) × HitRate
Total requests multiplied by payload size, factored by the 304 response rate. Network optimization baseline
Effective Latency = Tttfb + (Ttransfer × (1HitRate))
A 304 response eliminates the transfer time, reducing latency to just the TTFB. DataFlirt performance model
Polling Efficiency Score = Req304 / Reqtotal
Targeting > 0.85 for live inventory feeds to ensure we aren't over-polling. DataFlirt internal SLO
// 04 — the wire trace

A 304 Not Modified
handshake.

A scraper polling a JSON API for inventory updates. The first request fetches the full payload; the second request validates the cache using the returned ETag.

HTTP/2JSON API304 Not Modified
edge.dataflirt.io — live
CAPTURED
// Request 1: Initial Fetch
GET /api/v1/inventory/sku-8842 HTTP/2
Host: api.target.com

// Response 1
HTTP/2 200 OK
ETag: "33a64df551425fcc55e4d42a148795d9f25f89d4"
Content-Length: 142048

// Request 2: Polling 60s later
GET /api/v1/inventory/sku-8842 HTTP/2
If-None-Match: "33a64df551425fcc55e4d42a148795d9f25f89d4"

// Response 2
HTTP/2 304 Not Modified
Content-Length: 0

// Pipeline Result
status: cache_hit
bandwidth_saved: 142 KB
// 05 — implementation failures

Why conditional
requests fail.

Common reasons why a target server ignores an If-None-Match header and forces a full 200 OK download anyway, destroying your bandwidth savings.

PIPELINES ANALYZED ·  ·   1,200+ polling jobs
AVG 304 RATE ·  ·  ·  ·   68.4%
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Load balancer stripping ETags

infrastructure config · CDNs often strip ETags if not explicitly configured to pass them
02

Dynamic ad/token injection

payload mutation · Changing CSRF tokens alter the hash even if data is static
03

Weak ETags rejected

protocol mismatch · Strict clients refusing W/"..." format tags
04

Gzip vs Brotli mismatch

encoding variance · Different compression algorithms yield different ETags
05

Clock skew on Last-Modified

time sync error · If-Modified-Since fails due to server/client drift
// 06 — our architecture

Cache at the edge,

deliver only the diffs.

DataFlirt maintains a distributed ETag state store across our proxy gateways. When a polling pipeline requests a URL, the gateway automatically injects the last known ETag. If the target returns a 304, we immediately yield the cached record from our fast-storage layer. This allows us to poll high-value targets at sub-second intervals without burning through residential proxy bandwidth or triggering volumetric rate limits.

polling-worker-04

Live telemetry from a high-frequency pricing synchronization job.

target.endpoint api.retailer.com/pricing
polling_interval 5000ms
last_200_ok 14 mins ago
consecutive_304s 168 requests
bandwidth_saved 24.1 MB
proxy.status residential · healthy

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About HTTP caching, bandwidth optimization, ETag tracking risks, and how DataFlirt manages high-frequency polling.

Ask us directly →
What is the difference between an ETag and Last-Modified? +
Last-Modified uses a timestamp (validated via If-Modified-Since), which is vulnerable to clock skew and only resolves to the second. An ETag is an opaque string (usually a hash of the content) validated via If-None-Match. ETags are more precise and generally preferred by modern APIs for cache validation.
Can ETags be used to track scrapers? +
Yes. This is known as "ETag tracking" or "supercookies." A server can assign a unique ETag to a client on the first request. Even if the client clears cookies, sending that unique ETag back in an If-None-Match header allows the server to re-identify the session. DataFlirt isolates ETag storage per pipeline and session to prevent cross-contamination.
Is it legal to poll a server aggressively if I use Conditional GETs? +
While a 304 Not Modified saves bandwidth, it still consumes a connection and server CPU to process the request and compute the hash. Aggressive polling can still be considered a Denial of Service or a ToS violation if it degrades target performance. Always respect 429 Too Many Requests and Retry-After headers, regardless of your cache hit rate.
How does DataFlirt handle weak ETags? +
Weak ETags (prefixed with W/) indicate that the content is semantically equivalent but not byte-for-byte identical (e.g., dynamically generated timestamps in a footer). We support weak ETags in our polling infrastructure, allowing pipelines to ignore trivial DOM changes and only trigger extraction when the core data actually mutates.
Why do I get a 200 OK even when the content looks identical? +
Usually because the server is injecting dynamic content into the payload—like a new CSRF token, a unique request ID, or a rotating ad banner. Even a single changed byte alters a strong ETag hash, forcing a 200 OK. In these cases, network-layer caching fails, and you must rely on application-layer deduplication.
Does this work for HTML pages or just APIs? +
It works for both, provided the target server is configured to generate and respect ETags. However, JSON APIs are much more likely to support ETags effectively. HTML pages often contain dynamic user-state or tracking scripts that break cache validation, making Conditional GETs less effective for traditional web scraping compared to API polling.
$ dataflirt scope --new-project --target=conditional-get-(etag) READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h