← Glossary / HTTP 520 Unknown Error (Cloudflare)

What is HTTP 520 Unknown Error (Cloudflare)?

HTTP 520 Unknown Error (Cloudflare) occurs when an origin server returns an empty, unparseable, or protocol-violating response to Cloudflare's edge nodes. For scraping pipelines, it usually means you've triggered a hard connection reset at the origin's firewall, or your request headers exceeded the origin's buffer limits. It's a blind spot in the proxy chain: Cloudflare knows the connection failed, but doesn't know why, leaving your scraper with a generic 520 instead of a descriptive 4xx block.

CloudflareOrigin ErrorConnection ResetHeader LimitsTCP Drop
// 02 — definitions

When the origin
goes dark.

Cloudflare sits between you and the target. A 520 means Cloudflare reached the target, but the target slammed the door shut without speaking HTTP.

Ask a DataFlirt engineer →

TL;DR

A 520 error is Cloudflare's catch-all for origin server misbehavior. In web scraping, it's rarely a random server crash. It typically indicates that the origin's underlying infrastructure (like HAProxy or Nginx) abruptly dropped your TCP connection because your request looked malicious, your headers were too large, or your proxy IP was blacklisted at the network layer.

01Definition & structure
A 520 Unknown Error is a custom HTTP status code used by Cloudflare. It acts as a catch-all response when the origin server returns something unexpected. This includes empty responses, connection resets (TCP RST), invalid HTTP headers, or responses that violate the HTTP protocol. Because Cloudflare cannot parse the origin's response, it serves a 520 to the client.
02Common triggers in scraping
In web scraping, a 520 is rarely a random glitch. It usually means your scraper has triggered a defense mechanism at the origin server, bypassing Cloudflare's edge protections. Common culprits include sending headers that exceed the origin's configured buffer size (often due to unmanaged cookie jars), or the origin's local firewall (like Fail2Ban or iptables) dropping your proxy IP at the transport layer before the web server can issue a 403.
03Header bloat and cookie jars
The most frequent self-inflicted cause of 520 errors is cookie bloat. Scrapers using libraries like requests or axios with persistent sessions will automatically store and send back every cookie the server sets. Over hundreds of requests, analytics and tracking cookies accumulate. Once the Cookie header exceeds ~8KB, standard Nginx and Apache configurations will instantly drop the connection, resulting in a 520.
04How DataFlirt handles it
We treat 520 errors as state-corruption events. When a DataFlirt worker receives a 520, it immediately halts the current session. We clear the cookie jar, retaining only known-good authentication tokens, and strip all non-essential headers. If a retry still yields a 520, we flag the exit IP as burned at the origin firewall and rotate the request to a new residential node. This automated pruning and rotation recovers the pipeline instantly.
05The difference between 520 and 521
It is easy to confuse Cloudflare's 52x errors. A 521 Web Server Down means Cloudflare tried to connect to the origin, but the connection was refused entirely (the server is offline or blocking Cloudflare's IPs). A 520 Unknown Error means the connection was successfully established, but the origin behaved badly mid-conversation—usually by abruptly closing the socket while Cloudflare was waiting for the HTTP response body.
// 03 — the diagnostic model

Isolating the
520 trigger.

Because 520 is a catch-all, diagnosing it requires eliminating variables. DataFlirt's retry engine uses a deterministic backoff and mutation strategy to identify the root cause.

Header size check = Sheaders = Σ len(headeri) < 8192 bytes
Nginx and Apache default limits. Exceeding this triggers an immediate TCP drop. RFC 9110 / Server Defaults
Connection reset probability = Preset = f(IP_reputation, TCP_fingerprint)
Origin firewalls drop bad IPs at the transport layer before HTTP parsing begins. Network Security Heuristics
DataFlirt 520 recovery rate = Rsuccess = 94.2%
Recovery rate via automated header pruning and IP rotation. DataFlirt pipeline telemetry
// 04 — edge trace

A silent drop
at the origin.

A trace of a scraper hitting a Cloudflare-protected target. The edge node accepts the request, forwards it, and receives a TCP RST instead of an HTTP response.

Cloudflare EdgeTCP RSTHeader Pruning
edge.dataflirt.io — live
CAPTURED
// Inbound request to Cloudflare
cf.edge_receive: GET /api/v1/catalog HTTP/2
cf.request_headers: 14,204 bytes // cookie bloat detected

// Cloudflare to Origin
cf.origin_connect: 192.0.2.44:443
cf.origin_tls_established: true
cf.origin_send: headers transmitted

// Origin response
origin.socket: TCP RST received
origin.http_status: null // connection closed prematurely

// Cloudflare to Client
cf.response: 520 Web Server Returned an Unknown Error
cf.ray_id: 8daaf6152771b0da
// 05 — root causes

Why the origin
drops the connection.

Ranked by frequency across DataFlirt's pipeline telemetry when encountering 520 errors on Cloudflare-fronted targets.

SAMPLE SIZE ·  ·  ·  ·    1.2M 520 errors
WINDOW ·  ·  ·  ·  ·  ·   90d trailing
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Header size limits exceeded

cookie bloat · Accumulated session cookies exceed 8KB
02

TCP connection reset

firewall drop · Origin IP ban bypassing Cloudflare
03

Empty response body

app crash · Backend process died mid-response
04

Invalid HTTP protocol

malformed req · Missing mandatory headers or bad encoding
05

Origin server overload

resource exhaustion · OOM killer terminating worker processes
// 06 — our stack

Prune the headers,

rotate the exit node, try again.

When DataFlirt encounters a 520, our edge workers don't just blindly retry. We assume the origin dropped us intentionally. The first retry strips all non-essential headers and clears the cookie jar to eliminate buffer overflows. If the 520 persists, we assume a network-layer IP block at the origin firewall and rotate the session to a clean residential IP. This deterministic fallback sequence resolves over 94% of 520 errors without manual intervention.

520 Recovery Sequence

Automated mitigation steps triggered by a 520 response.

step.01 detect 520ray_id logged
step.02 analyze_headers14kb detected
step.03 prune_cookiescleared
step.04 retry_request520 returned
step.05 rotate_ipresidential_US
step.06 retry_request200 OK

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about diagnosing and bypassing Cloudflare 520 errors in scraping pipelines.

Ask us directly →
Is a 520 error caused by Cloudflare blocking my bot? +
No. If Cloudflare's Bot Management blocks you, you will receive a 403 Forbidden or a 1020 Access Denied error. A 520 means Cloudflare allowed your request through, but the target's origin server failed to return a valid HTTP response. The block (or crash) happened at the destination, not the edge.
Why does my scraper get 520s but my browser doesn't? +
Usually cookie bloat. Scrapers that persist sessions often accumulate tracking cookies, analytics IDs, and state tokens without clearing them. Once your Cookie header exceeds the origin server's buffer limit (typically 8KB in Nginx), the origin drops the connection instantly. Browsers manage cookie expiration better.
How do I fix header size limits causing 520s? +
Clear your cookie jar. If you are maintaining a session, extract only the specific authentication tokens (like session_id or jwt) and manually inject them into the headers of subsequent requests. Drop all tracking, advertising, and state cookies that the server doesn't strictly require for auth.
Can a 520 mean the target website is actually down? +
Yes. If the backend application crashes (e.g., a Node.js process runs out of memory) and closes the socket without sending an HTTP response, Cloudflare will return a 520. However, if the server is completely unreachable, Cloudflare usually returns a 521 (Web Server Down) or 522 (Connection Timed Out).
How does DataFlirt distinguish between a real crash and an anti-bot drop? +
Through global telemetry. If we observe 520 errors across our entire proxy fleet for a specific target, we classify it as an origin outage and pause the pipeline. If the 520s are isolated to specific proxy subnets or specific scraper sessions, we classify it as an origin firewall drop and trigger our IP rotation and header pruning sequence.
Should I use exponential backoff for 520 errors? +
Yes, but backoff alone isn't enough. You must combine it with state mutation. Retrying the exact same bloated request or using the same flagged IP will just result in another 520. Clear your cookies, rotate your proxy, and then apply the backoff delay before retrying.
$ dataflirt scope --new-project --target=http-520-unknown-error-(cloudflare) READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h