SYSTEM all green source pro-football-reference.com queue 12,408 pages p99 latency 312ms dataflirt.com · scraper/pro-football-reference-com

RUN · 14 active pipelines · pro-football-reference.com live

NFL historical data,
at warehouse scale.

We extract player statistics, game logs, play-by-play sequences, draft history, and advanced metrics from Pro-Football-Reference. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Get data from pro-football-reference.com → See how it works

Player profiles

28,491 /total

Game logs

18,942 /season

Play-by-play events

42,109 /week

Active pipelines

Uptime

99.98%

Data Dictionary

Every field we extract from pro-football-reference.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Player Profiles objects from pro-football-reference.com. All fields typed and schema-versioned.

player_idnamepositionheightweightdobcollegedraft_pickactive_statuscareer_av

"player_id": "MahoPa00",
"name": "Patrick Mahomes",
"position": "QB",
"height": "6-2",
"weight": 225,
"college": "Texas Tech",
"career_av": 112,
"active_status": true

#	player_id	name	position	height	weight	dob
1
2
3

Complete list of extractable fields for Game Logs objects from pro-football-reference.com. All fields typed and schema-versioned.

game_idplayer_iddateteamopponentresultpassing_ydsrushing_ydsreceiving_ydstouchdowns

"game_id": "202402110kan",
"player_id": "MahoPa00",
"date": "2024-02-11",
"team": "KAN",
"opponent": "SFO",
"result": "W 25-22",
"passing_yds": 333,
"touchdowns": 2

#	game_id	player_id	date	team	opponent	result
1
2
3

Complete list of extractable fields for Play-by-Play objects from pro-football-reference.com. All fields typed and schema-versioned.

play_idgame_idquartertime_remainingdowndistancefield_positionplay_typedescriptionepa

"play_id": "202402110kan_142",
"game_id": "202402110kan",
"quarter": 4,
"time_remaining": "00:03",
"down": 1,
"distance": "Goal",
"play_type": "Pass",
"epa": 3.42

#	play_id	game_id	quarter	time_remaining	down	distance
1
2
3

Complete list of extractable fields for Team Stats objects from pro-football-reference.com. All fields typed and schema-versioned.

team_idseasonwinslossestiespoints_forpoints_againstsrsosrsdsrs

"team_id": "KAN",
"season": 2023,
"wins": 11,
"losses": 6,
"ties": 0,
"points_for": 371,
"points_against": 294,
"srs": 4.8

#	team_id	season	wins	losses	ties	points_for
1
2
3

Complete list of extractable fields for Draft History objects from pro-football-reference.com. All fields typed and schema-versioned.

draft_yearroundpickplayer_idteam_idpositioncollegeavgames_playedpass_yds

"draft_year": 2017,
"round": 1,
"pick": 10,
"player_id": "MahoPa00",
"team_id": "KAN",
"position": "QB",
"college": "Texas Tech",
"games_played": 96

#	draft_year	round	pick	player_id	team_id	position
1
2
3

Capabilities

Structured NFL data without the copy-paste

Pro-Football-Reference contains the definitive history of the NFL, but querying it programmatically requires handling strict rate limits, hidden DOM nodes, and complex multi-header tables. We manage the extraction layer.

Full Player Statistics

Extract passing, rushing, receiving, and defensive metrics across regular season and playoffs. Normalised across eras.

Play-by-Play Parsing

Convert raw text logs into structured event sequences. Includes EPA, win probability added, and down-and-distance context.

Advanced Metrics

Capture Approximate Value (AV), ANY/A, true completion percentage, and defensive pressure rates.

Draft & Combine Records

Historical draft classes mapped to combine measurements (40-yard dash, vertical, broad jump) and career outcomes.

Coaching & Front Office

Extract coaching tree records, coordinator histories, and executive tenures.

Injury Reports & Snap Counts

Weekly injury designations and positional snap percentage breakdowns per game.

Rate Limit Management

Sports Reference enforces strict 20-request-per-minute limits. We distribute load across residential IPs to maintain throughput.

Complex Table Normalisation

Resolve multi-tier headers, hidden columns, and dynamically injected JavaScript tables into flat, typed records.

Historical Backfilling

Run one-off backfills for decades of NFL history, followed by delta updates every Tuesday morning.

// engagement pipeline

From URL list to warehouse record

Brief in. Clean data out.

Define Scope

d 0

Provide seasons, teams, or specific statistic tables required. We design the extraction schema together.

Pipeline Build

d 2–4

We configure Scrapy crawlers, proxy rotation, request pacing, and table normalisation logic for Pro-Football-Reference.

Validation & QA

d 4–6

Schema validation, null-rate checks, and data type enforcement before full launch.

Delivery

ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our pipeline handles Sports Reference constraints

Pro-Football-Reference employs aggressive rate limiting and complex DOM structures. Here is how we maintain reliable extraction.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

Rate limiting

Distributed request pacing

Sports Reference bans IPs exceeding 20 requests per minute. We route traffic through rotating residential proxies and pace concurrency to avoid detection while maintaining overall pipeline throughput.

Table structure

Multi-header normalisation

Pro-Football-Reference uses complex, multi-tiered HTML tables. We flatten these structures, resolve merged cells, and enforce strict type casting to ensure clean columnar output.

Hidden data

Parsing commented DOM nodes

Many advanced metrics and snap count tables are commented out in the HTML and injected via client-side JavaScript. We parse the raw DOM comments directly to extract the hidden nodes without heavy browser overhead.

Schema drift

Handling historical missing fields

Statistics tracked in 1985 differ from 2023. Our parsers handle missing fields, handle nulls gracefully, and normalise schema drift across decades of NFL history.

Change detection

Efficient weekly deltas

Only fetch active players and recent games. Historical data remains cached. Deltas are pushed to your warehouse weekly following Monday Night Football.

Applications

Who uses NFL data — and how

Teams across industries use pro-football-reference.com data to build competitive products and smarter operations.

Fantasy Sports Modeling

Data scientists build predictive models for DFS platforms using historical snap counts, target shares, and red-zone usage.

Sports Betting Analytics

Quantitative syndicates feed play-by-play data and EPA metrics into algorithms to identify inefficient betting lines.

Academic Research

Economists and statisticians analyse draft outcomes, coaching decisions, and player longevity trends.

Sports Media & Journalism

Publishers automate historical comparisons and generate data-driven narratives for weekly NFL coverage.

Machine Learning Training

ML teams use decades of play-by-play sequences to train outcome prediction models and fourth-down decision engines.

App Development

Developers populate independent sports applications with historical player statistics and team records.

Technical Spec

Pro-Football-Reference scraper — technical capabilities

Everything supported by our pro-football-reference.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Commented DOM parsing

Extracts tables hidden in HTML comments without requiring JavaScript execution

Supported

Multi-header table flattening

Resolves merged cells and nested headers into flat dictionary structures

Supported

Residential proxy rotation

Bypasses Sports Reference 20-request-per-minute IP bans

Supported

Play-by-play standardisation

Parses raw text descriptions into structured event types and yardage

Supported

Weekly delta updates

Incremental fetching of active player stats post-game

Supported

Historical backfills

Full catalogue extraction dating back to 1920

Supported

Stathead proprietary queries

Custom query generation requiring paid Stathead subscription

Partial

User account saved searches

Extraction of personal saved queries from authenticated accounts

Partial

Infrastructure

Infrastructure powering the NFL data pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheusBeautifulSoup4

Scrapy + DOM Parsing

Scrapy handles orchestration and request pacing. Custom middleware parses HTML comments to extract data without the overhead of headless browsers.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies to distribute request load and strictly adhere to Sports Reference rate limits without triggering blocks.

Cloud-Native Orchestration

Pipelines run on AWS ECS. Airflow handles scheduling, ensuring weekly deltas run reliably after Monday Night Football concludes.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Newline-delimited or nested — schema versioned per run

CSV

Flat file with typed columns — Excel/Sheets compatible

XLS

Excel format for business analysts

Parquet

Columnar format for BigQuery, Snowflake, Athena

AWS S3

Direct bucket delivery — compatible with any data lake

Webhook

HTTP POST per record for real-time downstream processing

API

REST endpoint for on-demand data retrieval

PostgreSQL

Direct insertion into your relational database

Direct bucket delivery — compatible with any data lake

// faq

Common questions.

About pro-football-reference.com scraping, legality, and pipeline operations.

Ask us directly →

How do you handle Sports Reference rate limits?

Pro-Football-Reference restricts traffic to 20 requests per minute per IP. We distribute extraction across a large pool of US-based residential proxies and enforce strict concurrency limits in Scrapy to extract data reliably without triggering defensive blocks.

Can you extract data hidden behind Stathead paywalls?

No. We only extract publicly available data from Pro-Football-Reference. We do not bypass authentication walls or extract proprietary data requiring a paid Stathead subscription.

How do you handle the hidden tables in the HTML?

Pro-Football-Reference optimises page load by commenting out secondary tables (like snap counts and advanced metrics) and injecting them via JavaScript. We parse the raw HTML comments directly to extract the table nodes, which is faster and more reliable than executing Playwright.

When is the data updated each week?

For active season pipelines, we run delta updates on Tuesday mornings (UTC) after Monday Night Football concludes, ensuring all statistics and game logs for the week are finalised.

Can you standardise team names across historical eras?

Yes. Our parsers map historical franchise names (e.g., Houston Oilers) to their current franchise identifiers (Tennessee Titans) or maintain historical accuracy based on your schema requirements.

Do you provide play-by-play data parsing?

Yes. We extract the raw play description text and parse it into structured fields including down, distance, play type, yardage gained, and involved players.

Can I request a sample dataset?

Yes. We provide a sample run of up to 50 player profiles or 10 game logs to validate schema fit and data quality before commencing the full extraction.

NFL historical data,
at warehouse scale.

Every field we extract from pro-football-reference.com

Structured NFL data without the copy-paste

From URL list to warehouse record

How our pipeline handles Sports Reference constraints

Who uses NFL data — and how

Pro-Football-Reference scraper — technical capabilities

Infrastructure powering the NFL data pipeline

Your data, your destination

Common questions.

Tell us what
to extract.
We do the rest.

Data Extraction for Every Industry

NFL historical data, at warehouse scale.

Every field we extract from pro-football-reference.com

Structured NFL data without the copy-paste

From URL list to warehouse record

How our pipeline handles Sports Reference constraints

Who uses NFL data — and how

Pro-Football-Reference scraper — technical capabilities

Infrastructure powering the NFL data pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

NFL historical data,
at warehouse scale.

Tell us what
to extract.
We do the rest.