← Glossary / Publicly Available Data Doctrine

What is Publicly Available Data Doctrine?

The Publicly Available Data Doctrine is the legal consensus that accessing and extracting information published on the open web without authentication does not constitute unauthorized access under anti-hacking laws. It forms the foundational legal bedrock for the commercial web scraping industry, establishing that if a server willingly serves a page to an unauthenticated GET request, parsing that response is not a federal crime.

LegalCFAAhiQ v. LinkedInComplianceOpen Web
// 02 — definitions

The legal
baseline.

Why reading what a server voluntarily broadcasts to the public isn't hacking, and where the boundaries of that protection actually lie.

Ask a DataFlirt engineer →

TL;DR

The doctrine asserts that public data — information not protected by a login, password, or paywall — cannot be subject to "unauthorized access" claims under statutes like the US Computer Fraud and Abuse Act (CFAA). While it protects against hacking charges, it does not grant blanket immunity from copyright infringement, breach of contract, or data privacy violations.

01Definition & structure
The Publicly Available Data Doctrine is a legal principle establishing that accessing data on the open internet — data that requires no password, no account, and no special authorization to view — cannot be prosecuted as "unauthorized access" under anti-hacking laws like the CFAA. If a web server is configured to respond to an anonymous HTTP GET request with a 200 OK and a payload of HTML, parsing that HTML is legally equivalent to reading a billboard in a public square.
02The hiQ v. LinkedIn precedent
The doctrine was heavily solidified by the hiQ Labs v. LinkedIn case. LinkedIn attempted to use the CFAA to stop hiQ from scraping public user profiles. The Ninth Circuit Court of Appeals ruled that the CFAA was designed to punish computer hacking (breaking into protected systems), not to police access to data that the creator explicitly chose to make available to the general public.
03The authentication boundary
The absolute limit of this doctrine is the authentication wall. The moment a scraper uses a username, password, or session token to access data that is not visible to an anonymous visitor, the doctrine no longer applies. At that point, access is governed by the site's Terms of Service and the Authorized Access Doctrine. Crossing this line without permission is where civil disputes turn into federal hacking allegations.
04How DataFlirt handles it
We build a hard architectural wall between surface web and deep web pipelines. Our surface web crawlers are physically incapable of holding session state or submitting login forms. If a target site moves a previously public catalog behind a login wall, our pipeline throws a 401/403 and fails closed. We never attempt to bypass access controls, ensuring that every byte of data we deliver under a surface web contract is strictly protected by the publicly available data doctrine.
05The ToS vs. CFAA distinction
A common misconception is that a website's Terms of Service (ToS) can override this doctrine. They cannot. A ToS forbidding scraping is a contract. If you violate it, you might be sued for breach of contract (a civil issue). The CFAA is a federal criminal statute. Courts have repeatedly ruled that simply violating a website's ToS does not transform public access into a criminal hacking offense.
// 03 — the compliance model

How we quantify
legal exposure.

DataFlirt evaluates target risk before a single request is sent. The doctrine protects access, but operational risk includes ToS enforcement, copyright claims, and privacy regulations.

Access Authorization = Auth = Public_GETSession_Tokens
If Auth > 0, CFAA risk is near zero. The data is public. Legal consensus post-hiQ
ToS Enforcement Risk = Rtos = Clickwrap_Presence × Anti_Bot_Aggression
Browsewrap ToS are notoriously difficult to enforce against scrapers. DataFlirt compliance matrix
DataFlirt Compliance Score = C = 1 − (PII_Density + Copyright_Risk)
Pipelines with C < 0.8 require manual legal review before deployment. Internal SLO
// 04 — compliance audit trace

Pre-flight check
for a new target.

Before onboarding a new pipeline, our compliance engine verifies the target's public accessibility and maps the exact authentication boundary.

CFAA checkAuth boundaryrobots.txt
edge.dataflirt.io — live
CAPTURED
// target analysis: public real estate directory
request: GET https://target.com/listings/
response: 200 OK
auth_required: false

// boundary testing
request: GET https://target.com/agent-contact-details/
response: 302 Redirect -> /login
auth_boundary: detected at /agent-contact-details/

// legal doctrine evaluation
cfaa_exposure: minimal // publicly available data doctrine applies
pii_detected: true // agent names and phones
gdpr_risk: high // requires legitimate interest assessment

// pipeline status
status: APPROVED_WITH_RESTRICTIONS
action: exclude /agent-contact-details/ from crawl scope
// 05 — risk vectors

Where public data
still carries risk.

Even when the CFAA doesn't apply, scraping public data isn't universally risk-free. These are the primary legal and operational friction points for surface web pipelines.

PIPELINES AUDITED ·  ·    850+ active
CFAA INCIDENTS ·  ·  ·    0
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Copyright infringement

content rights · Scraping creative works, articles, or proprietary images
02

GDPR / PII extraction

privacy laws · Public personal data is still protected personal data
03

Breach of Contract

ToS violation · Bypassing explicit clickwrap agreements
04

Database Rights (EU)

sui generis · Extracting a substantial part of a protected database
05

Trespass to Chattels

server load · Causing actual material harm to target infrastructure
// 06 — our compliance stack

Public means public,

but we never cross the authentication line.

DataFlirt's infrastructure is hardcoded to respect the authentication boundary. Our surface web crawlers are physically isolated from session-handling capabilities. If a target moves previously public data behind a login wall, our pipeline fails closed. We do not attempt to bypass access controls, create fake accounts, or hijack sessions. This strict architectural separation ensures our clients' datasets are insulated from CFAA liability and firmly protected by the publicly available data doctrine.

compliance.policy.json

Runtime constraints for a surface web pipeline.

cfaa.safe_harbor true
auth.bypass_attempts 0
session.injection disabled
pii.extraction quarantined
robots_txt.respect true
rate_limit.ceiling 0.5 req/s
legal.status cleared

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about the legality of scraping, the CFAA, and how DataFlirt navigates the boundaries of public data.

Ask us directly →
Does this doctrine mean scraping is always 100% legal? +
No. The doctrine specifically addresses "unauthorized access" under anti-hacking laws like the CFAA. It means you aren't a hacker for downloading a public webpage. However, you can still face civil liability for copyright infringement, breach of contract (if you agreed to a ToS), or regulatory fines if you scrape and store personal data (GDPR/CCPA).
What happens if a site's Terms of Service forbid scraping? +
This falls under contract law, not the CFAA. For a ToS to be enforceable, the user typically must explicitly agree to it (a "clickwrap" agreement). "Browsewrap" agreements — where the ToS is just a link in the footer — are historically much harder to enforce against scrapers who never created an account.
How did hiQ v. LinkedIn change the landscape? +
The Ninth Circuit ruled that LinkedIn could not use the CFAA to stop hiQ from scraping public user profiles. The court affirmed that data accessible to the general public without authorization is not protected by the CFAA, cementing the publicly available data doctrine as the industry standard.
How does DataFlirt ensure we don't cross into protected data? +
We enforce strict pipeline scoping. Our surface web crawlers are deployed without authentication capabilities. If a target introduces a login wall, the crawler receives a 302 or 401 and immediately halts. We never attempt to bypass access controls, ensuring your data provenance remains legally sound.
Does the doctrine apply to personal data (PII)? +
The CFAA doesn't distinguish between data types, but privacy laws do. Just because an email address is public doesn't mean you have the right to scrape and store it under GDPR or CCPA. Public PII is still PII. DataFlirt quarantines detected PII unless explicit legal basis (like legitimate interest) is established.
What if a site uses IP blocking or CAPTCHAs to stop us? +
Courts have generally held that bypassing IP blocks or CAPTCHAs to access public data does not suddenly turn the activity into a CFAA violation, because the underlying data remains public. However, it escalates the operational arms race. We prefer to manage request rates and fingerprint quality to avoid triggering these blocks in the first place.
$ dataflirt scope --new-project --target=publicly-available-data-doctrine READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h