← Glossary / Access Denied Page

What is Access Denied Page?

An access denied page is the explicit block response served by a Web Application Firewall (WAF) or anti-bot system when a request fails a security policy, IP reputation check, or fingerprint validation. For scraping pipelines, it usually manifests as an HTTP 403 Forbidden accompanied by a branded HTML payload — like Cloudflare's Error 1020 or Akamai's Reference Number page — halting extraction before the target application is even reached.

WAF BlockHTTP 403Cloudflare 1020AkamaiAnti-bot
// 02 — definitions

The wall
before the app.

When your scraper hits the edge network but is denied entry to the origin server. It's the most unambiguous signal that your pipeline's identity has been burned.

Ask a DataFlirt engineer →

TL;DR

An access denied page is a hard block enforced at the CDN or WAF layer. Unlike a soft block (CAPTCHA) or a silent tarpit, an access denied response means the edge has absolute confidence your request is automated. Bypassing it requires rotating the blocked attribute — usually the IP address, TLS fingerprint, or session cookie.

01Definition & structure
An access denied page is an HTML response served by a Web Application Firewall (WAF) when a request is blocked. Instead of returning the target application's content, the edge network intercepts the request and returns a 403 Forbidden status along with a branded page. These pages typically contain:
  • A specific error code (e.g., 1020, 1015)
  • A unique request identifier (Ray ID, Reference Number)
  • The IP address that was blocked
  • A brief explanation (e.g., "Access Denied by Firewall Rule")
This response is generated entirely at the edge; the origin server never sees the request.
02How WAF interception works
When your scraper sends a request, it first hits the target's CDN/WAF. The WAF evaluates the request against a series of rules: IP reputation, geo-location, TLS fingerprint, HTTP header order, and rate limits. If the request triggers a "Block" action in any of these rules, the WAF immediately terminates the connection to the origin and serves the access denied HTML payload back to the client.
03Common WAF signatures
Different WAF vendors have distinct access denied signatures. Cloudflare uses 10xx error codes (1020 for firewall rules, 1015 for rate limits) and includes a cf-ray header. Akamai serves a generic "Access Denied" page with a long alphanumeric Reference Number. DataDome returns a 403 with a specific JSON payload or HTML page containing a datadome cookie. Recognizing these signatures is critical for debugging pipeline failures.
04How DataFlirt handles it
We treat access denied pages as deterministic feedback. Our edge routers parse the HTML payload of every 403 to extract the WAF provider and error code. If a Cloudflare 1020 is detected, we know the IP or ASN is burned, and we instantly rotate the proxy. If a TLS fingerprint mismatch is detected, we regenerate the network context. This automated recovery ensures that a single blocked request doesn't cascade into a pipeline failure.
05The silent drop alternative
While an access denied page is an explicit rejection, some advanced anti-bot configurations prefer "silent drops" or tarpitting. Instead of serving a 403, they hold the connection open indefinitely until the scraper times out, or they return a fake 200 OK with poisoned or empty data. Explicit access denied pages are actually preferable for data engineers, as they fail fast and provide clear diagnostic identifiers.
// 03 — block metrics

How we measure
edge rejection.

DataFlirt tracks access denied responses at the edge layer to calculate pipeline health. A sudden spike in 403s triggers automatic proxy rotation and fingerprint regeneration.

Pipeline Block Rate = 403_responses / total_requests
Sustained rate > 0.5% indicates a burned proxy pool or stale fingerprint. DataFlirt pipeline SLO
WAF Confidence Score = Σ signal_weights > threshold
If the sum of negative signals (datacenter IP, bad JA3) exceeds the WAF threshold, a block page is served. Standard WAF logic
Mean Time to Recovery (MTTR) = TresumeTblock
Time taken for DataFlirt's auto-healer to rotate identity and resume extraction. Target < 2s. Internal telemetry
// 04 — the edge response

Hitting a Cloudflare
firewall rule.

A raw trace of a Python requests client hitting a target protected by Cloudflare Bot Management. The request is blocked at the edge before reaching the origin.

HTTP 403CloudflareError 1020
edge.dataflirt.io — live
CAPTURED
// outbound request
GET /api/v1/pricing HTTP/2
user-agent: "python-requests/2.31.0" // default UA

// edge interception
status: 403 Forbidden
server: "cloudflare"
cf-ray: "885a1b2c3d4e5f6a-LHR"

// response body (truncated)
body: "<!DOCTYPE html>...<title>Access denied</title>"
error_code: 1020
reason: "Access Denied by Firewall Rule"

// pipeline action
action: discard proxy IP
action: rotate TLS fingerprint & retry
// 05 — trigger conditions

Why the edge
drops the hammer.

Access denied pages are deterministic. They are triggered when a specific request attribute matches a pre-configured WAF rule or falls below a machine-learning confidence threshold.

PIPELINES MONITORED ·   300+ active
AVG BLOCK RATE ·  ·  ·    < 0.12%
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Datacenter IP / ASN Block

Static rule · Target explicitly blocks AWS, GCP, or known proxy ASNs.
02

TLS Fingerprint Mismatch

Heuristic · JA3/JA4 signature doesn't match the advertised User-Agent.
03

Geo-Blocking

Static rule · Request originates from a country outside the target's operating region.
04

Missing / Malformed Headers

Static rule · Absence of Accept-Language or incorrect HTTP/2 pseudo-header order.
05

Rate Limit Exceeded

Behavioral · Too many requests from a single IP within a time window.
// 06 — recovery architecture

Don't just retry,

rotate the entire identity context.

When a DataFlirt pipeline hits an access denied page, a naive retry is useless — the WAF has already flagged the IP or fingerprint. Our edge router intercepts the 403, parses the specific WAF error code (e.g., Cloudflare 1020 vs 1015), and determines the burn radius. If it's an IP ban, we swap the proxy. If it's a fingerprint mismatch, we regenerate the TLS and browser context. The retry happens transparently; the extraction worker only ever sees a 200 OK.

WAF Block Recovery Trace

Real-time resolution of a Cloudflare 1020 block on a retail pipeline.

event HTTP 403 · Error 1020
waf.provider Cloudflare Bot Management
burn.analysis IP reputation degraded
action.ip rotate to residential_UK
action.tls regenerate JA4 context
retry.status HTTP 200 OK
latency.penalty +850ms

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about WAF blocks, recovery strategies, and how DataFlirt maintains pipeline uptime when targets deploy aggressive firewall rules.

Ask us directly →
What is the difference between an access denied page and a CAPTCHA? +
An access denied page is a hard block (usually a 403) — the WAF has decided you are a bot and will not let you proceed. A CAPTCHA is a soft block — the WAF is suspicious but offers a challenge to prove you are human. Hard blocks require rotating your network identity; soft blocks can sometimes be solved, though avoiding them entirely is always the better strategy.
Is it legal to bypass an access denied page? +
Accessing publicly available data is generally lawful in the US, reinforced by cases like hiQ v. LinkedIn, even if it requires rotating IPs to bypass rate limits. However, bypassing an access denied page to reach authenticated or non-public areas is a different legal matter entirely. DataFlirt only scrapes public data and honors robots.txt directives. Consult your legal counsel for specific use cases.
Why am I getting blocked in production but not locally? +
Your local machine uses a residential ISP IP address, which has a high trust score. Your production server runs in a datacenter (AWS, DigitalOcean, Hetzner), and its ASN is likely flagged by default in most WAF configurations. To fix this, you must route your production traffic through a residential or high-quality mobile proxy pool.
How does DataFlirt handle sudden WAF rule changes? +
We monitor 403 rates in real-time across all pipelines. If a target deploys a new rule that spikes blocks, our auto-healer pauses the queue, tests new fingerprint and proxy combinations against the target, and resumes extraction once the bypass is validated. This usually happens within minutes, preventing massive data loss.
Should I parse the HTML of an access denied page? +
Yes. Never just log "403 Forbidden" and move on. Extracting the Ray ID (Cloudflare), Reference Number (Akamai), or specific error code allows you to diagnose exactly which WAF rule you tripped. DataFlirt's telemetry layer parses these payloads automatically to classify the failure mode.
Does rotating User-Agents fix a 403 Access Denied error? +
Rarely. Modern WAFs look at TLS fingerprints (JA3/JA4), HTTP/2 framing, and IP reputation. Changing the User-Agent string without changing the underlying network stack actually increases your bot score, because you are now broadcasting a mismatch (e.g., claiming to be Chrome 124 while your TLS handshake looks like Python).
$ dataflirt scope --new-project --target=access-denied-page READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h