SYSTEM all green source fangraphs.com queue 12,408 pages p99 latency 312ms dataflirt.com · scraper/fangraphs-com
RUN · 17 active pipelines · fangraphs.com live

Sabermetric data,
at warehouse scale.

We extract player leaderboards, advanced metrics (fWAR, wRC+), ZiPS/Steamer projections, and minor league stats from FanGraphs. Delivered as clean JSON, CSV, or Parquet to S3 or BigQuery on your daily cadence.

Player profiles
84,219 total
Daily stat updates
2.1M /24h
Projection rows
412K /run
Active pipelines
17
Uptime
99.94%
Data Dictionary

Every field we extract from fangraphs.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Batting Leaderboards objects from fangraphs.com. All fields typed and schema-versioned.

player_idnameteamgamesplate_appearanceshome_runsstolen_baseswobawrc_plusfwarbabipisostrikeout_pctwalk_pct
batting_leaderboards
● 200 OK
"player_id": "10155",
"name": "Mike Trout",
"team": "LAA",
"woba": 0.395,
"wrc_plus": 155,
"fwar": 6.4,
"strikeout_pct": 0.231
# player_idnameteamgamesplate_appearanceshome_runs
1
2
3

Complete list of extractable fields for Pitching Stats objects from fangraphs.com. All fields typed and schema-versioned.

player_idnameteaminnings_pitchederafipxfipk_per_9bb_per_9hr_per_9babiplob_pctground_ball_pctfwar
pitching_stats
● 200 OK
"player_id": "19361",
"name": "Corbin Burnes",
"era": 2.89,
"fip": 3.12,
"xfip": 3.25,
"k_per_9": 10.4,
"fwar": 5.1
# player_idnameteaminnings_pitchederafip
1
2
3

Complete list of extractable fields for Projections objects from fangraphs.com. All fields typed and schema-versioned.

player_idnameprojection_systemyearprojected_paprojected_hrprojected_rbiprojected_wobaprojected_wrc_plusprojected_zips_warupdated_at
projections
● 200 OK
"player_id": "15640",
"name": "Aaron Judge",
"projection_system": "ZiPS",
"projected_hr": 42,
"projected_wrc_plus": 162,
"projected_zips_war": 7.1,
"updated_at": "2024-03-15T08:00:00Z"
# player_idnameprojection_systemyearprojected_paprojected_hr
1
2
3

Complete list of extractable fields for RosterResource objects from fangraphs.com. All fields typed and schema-versioned.

team_idteam_namepositionplayer_idplayer_nameroster_statusminor_league_optionrule_5_eligibleestimated_service_timesalary_estimate
rosterresource
● 200 OK
"team_id": "NYY",
"position": "CF",
"player_name": "Aaron Judge",
"roster_status": "Active 26-Man",
"minor_league_option": false,
"salary_estimate": 40000000
# team_idteam_namepositionplayer_idplayer_nameroster_status
1
2
3

Complete list of extractable fields for Prospect Board objects from fangraphs.com. All fields typed and schema-versioned.

prospect_idnameorganizationcurrent_levelfuture_valuescouting_hitscouting_game_powerscouting_raw_powerscouting_speedscouting_fieldrisketa
prospect_board
● 200 OK
"name": "Jackson Holliday",
"organization": "BAL",
"future_value": 70,
"scouting_hit": 60,
"scouting_game_power": 55,
"risk": "Low",
"eta": "2024"
# prospect_idnameorganizationcurrent_levelfuture_valuescouting_hit
1
2
3

Capabilities

Everything you need from FanGraphs — nothing you don't

Our FanGraphs scraper handles every layer of the platform: advanced leaderboards, minor league splits, RosterResource depth charts, and daily projection updates.

Advanced Sabermetrics Extraction

Extract wRC+, fWAR, xFIP, SIERA, and hundreds of other advanced metrics across standard, advanced, and batted ball dashboards.

Daily Leaderboard Sweeps

Automated daily extraction of batting, pitching, and fielding leaderboards updated immediately after the overnight statistical processing.

Projection System Scraping

Capture ZiPS, Steamer, ATC, and THE BAT projections for upcoming seasons and rest-of-season forecasts.

RosterResource Tracking

Monitor depth charts, payroll estimates, minor league options, and Rule 5 eligibility across all 30 MLB organisations.

Minor League & College Stats

Extract performance data from Rookie complex leagues up to Triple-A, including translated metrics.

Pitch Tracking & Plate Discipline

Capture O-Swing%, Z-Contact%, pitch velocity, and pitch value metrics from the Pitch Info datasets.

Historical Data Mining

Extract complete season-by-season player data back to 1871 for long-term sabermetric research.

Split Data Aggregation

Pull platoon splits, home/away performance, and high-leverage situation data for granular analysis.

Prospect Leaderboards

Extract scouting grades (hit, power, speed) and Future Value (FV) scores directly from THE BOARD.

Player ID Mapping

We map FanGraphs fg_id to MLBAM, Retrosheet, and Baseball-Reference IDs for immediate database joins.

// engagement pipeline

From leaderboard to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide specific leaderboards, player IDs, or projection systems. We design the extraction schema together.

Pipeline Build
d 2–4

We configure API interceptors and Playwright crawlers to navigate FanGraphs' complex data grids.

Validation & QA
d 4–6

Schema validation, null-rate checks, and cross-reference ID mapping verification before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket or BigQuery dataset on your daily schedule.

Under the hood

How our FanGraphs pipeline handles the hard parts

FanGraphs relies on complex JavaScript data tables and heavy client-side rendering. We extract the raw JSON payloads to ensure complete data capture.

pipeline-monitor · fangraphs.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
DataGrid Extraction
Intercepting backend XHR requests

FanGraphs uses complex React data grids. We intercept the backend API calls rather than scraping the DOM where possible, ensuring high-fidelity data extraction without missing columns.

Pagination
Handling infinite scroll and large datasets

Leaderboards load via dynamic XHR. We orchestrate pagination parameters to extract full historical datasets spanning tens of thousands of rows in seconds.

Rate Limiting
Evasion via residential proxies

Heavy requests to FanGraphs endpoints trigger IP bans. We distribute requests across US-based residential proxies to maintain high throughput without interruptions.

ID Mapping
Cross-referencing player IDs

We map FanGraphs specific fg_id identifiers to standard MLBAM and Retrosheet IDs, allowing you to easily join the extracted sabermetrics with your internal databases.

Daily Deltas
Only update what changes

Stats change daily during the season. We run diffs to only update players whose stats have registered new events, optimising warehouse compute costs.

Applications

Who uses FanGraphs data — and how

Teams across industries use fangraphs.com data to build competitive products and smarter operations.

01
MLB Front Office Analysis

Teams use extracted projections and minor league data to evaluate trade targets and free agents.

02
Fantasy Baseball Projections

High-stakes fantasy players aggregate ZiPS and Steamer projections for draft preparation and in-season management.

03
Sports Betting Models

Syndicates feed daily split data and pitcher xFIP into predictive models for MLB moneylines and prop bets.

04
Agency Player Valuation

Sports agencies use fWAR and wRC+ comparables to negotiate arbitration and free agent contracts.

05
DFS Lineup Optimization

Daily fantasy platforms use split data and RosterResource lineups to generate optimal player projections.

06
Academic Research

Researchers analyse historical pitch values and plate discipline metrics for sabermetric publications.

Why DataFlirt

"FanGraphs holds the definitive public record of advanced baseball statistics, but extracting millions of daily data points from complex JavaScript grids requires dedicated infrastructure."

Most analysts waste hours manually exporting CSVs from FanGraphs leaderboards. DataFlirt automates this process entirely. We navigate the complex data grids, intercept the backend API calls, and deliver clean, structured sabermetrics directly to your data warehouse. You get the data daily, without the manual toil.

Technical Spec

FanGraphs scraper — technical capabilities

Everything supported by our fangraphs.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript data grid extraction
Direct API interception of React grid payloads
Supported
Daily leaderboard updates
Automated overnight runs following MLB game conclusions
Supported
ZiPS/Steamer tracking
Extraction of all major projection systems hosted on-site
Supported
RosterResource payroll data
Capture of depth charts, payrolls, and option years
Supported
Minor league player stats
Full coverage of MiLB levels including complex leagues
Supported
Cross-reference ID mapping
Mapping fg_id to MLBAM and Retrosheet standard IDs
Supported
Historical season data
Extraction of full seasons dating back to 1871
Supported
Pitch trajectory data
Raw pitch coordinates require MLB Savant; FanGraphs hosts aggregated pitch values only
Partial
FanGraphs+ exclusive content
Gated editorial content and premium tools require authenticated subscription access
Partial
Infrastructure

Infrastructure powering the FanGraphs pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
API Interception

We bypass fragile DOM scraping by intercepting FanGraphs' backend XHR requests, extracting clean JSON payloads directly from their data grids.

Proxy Rotation

We distribute requests across residential ISP proxies to bypass rate limits during large historical data backfills.

Cloud-Native Delivery

Pipelines are orchestrated via Apache Airflow on AWS Lambda, ensuring daily updates are delivered reliably by 3 AM EST.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested — schema versioned per run
CSV
Flat file with typed columns — ready for Excel/Sheets
XLS
Standard spreadsheet format for direct analyst use
Parquet
Columnar format for BigQuery, Snowflake, Athena
AWS S3
Direct bucket delivery — compatible with any data lake
Webhook
HTTP POST per record for real-time downstream processing
API
Queryable REST endpoints for on-demand data access
BigQuery
Streamed directly into your dataset with schema auto-detect
PostgreSQL
Upsert into your existing schema with conflict resolution
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About fangraphs.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping FanGraphs legal?

Scraping publicly available statistical data is generally permissible under applicable law, as raw factual data (like baseball statistics) is not copyrightable. DataFlirt extracts only public, non-authenticated leaderboards and projections. We do not circumvent authentication walls for FanGraphs+ content.

How often do you update the data?

Pipelines typically run daily, scheduled overnight after all MLB games conclude and FanGraphs updates their backend databases. We can also configure custom schedules for projection updates.

Can you map FanGraphs IDs to MLBAM IDs?

Yes. We maintain a cross-reference matrix mapping the FanGraphs fg_id to standard MLBAM, Retrosheet, and Baseball-Reference IDs for seamless integration with your existing datasets.

Do you extract minor league data?

Yes, we extract data across all minor league levels, including Triple-A down to rookie complex leagues, as well as translated minor league statistics.

How do you handle FanGraphs' custom data grids?

Instead of attempting to scrape the complex React/ExtJS DOM, we intercept the raw XHR/API responses feeding the grids. This ensures 100% data fidelity and prevents missing columns.

Can I get historical projections?

Yes, we can extract archived ZiPS and Steamer projections from past seasons, provided they are still accessible via the public leaderboards.

$ dataflirt scope --new-project --source=fangraphs.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Stop manually downloading CSVs. Get automated, daily sabermetrics delivered directly to your warehouse.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →