SYSTEM all green source fbref.com queue 12,843 matches p99 latency 184ms dataflirt.com · scraper/fbref-com
RUN · 31 active pipelines · fbref.com live

Football data,
at warehouse scale.

We extract player statistics, match logs, expected goals (xG), and advanced scouting reports from Fbref. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Players extracted
184K /run
Match logs
2.1M /total
Advanced metrics
48M /month
Active pipelines
31
Uptime
99.98%
Data Dictionary

Every field we extract from fbref.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Player Stats objects from fbref.com. All fields typed and schema-versioned.

player_idnamenationalitypositionagematches_playedstartsminutesgoalsassistsxgxagyellow_cardsred_cards
player_stats
● 200 OK
"player_id": "a1b2c3d4",
"name": "Lionel Messi",
"nationality": "ar ARG",
"position": "FW",
"age": 36,
"goals": 20,
"assists": 10,
"xg": 18.5
# player_idnamenationalitypositionagematches_played
1
2
3

Complete list of extractable fields for Match Logs objects from fbref.com. All fields typed and schema-versioned.

match_iddatecompetitionhome_teamaway_teamresultpossessionshotsshots_on_targetfoulscornersxg_homexg_away
match_logs
● 200 OK
"match_id": "e5f6g7h8",
"date": "2023-10-28",
"competition": "La Liga",
"home_team": "Barcelona",
"away_team": "Real Madrid",
"result": "1-2",
"possession": 53,
"xg_home": 1.2
# match_iddatecompetitionhome_teamaway_teamresult
1
2
3

Complete list of extractable fields for Scouting Reports objects from fbref.com. All fields typed and schema-versioned.

player_idtemplateminutes_playedgoals_percentilexg_percentileshot_creating_actionspasses_completedprogressive_passestacklesinterceptionsblocksclearances
scouting_reports
● 200 OK
"player_id": "i9j0k1l2",
"template": "Midfielders",
"minutes_played": 2450,
"goals_percentile": 85,
"xg_percentile": 82,
"progressive_passes": 95,
"tackles": 40,
"interceptions": 60
# player_idtemplateminutes_playedgoals_percentilexg_percentileshot_creating_actions
1
2
3

Complete list of extractable fields for Team Stats objects from fbref.com. All fields typed and schema-versioned.

team_idseasoncompetitionrankmatches_playedwinsdrawslossesgoals_forgoals_againstgoal_differencepointsxg_forxg_against
team_stats
● 200 OK
"team_id": "m3n4o5p6",
"season": "2023-2024",
"competition": "Premier League",
"rank": 1,
"matches_played": 38,
"points": 91,
"goals_for": 96,
"xg_for": 88.5
# team_idseasoncompetitionrankmatches_playedwins
1
2
3

Complete list of extractable fields for Goalkeeping objects from fbref.com. All fields typed and schema-versioned.

player_idmatches_playedshots_on_target_againstsavessave_percentageclean_sheetspenalty_kicks_attemptedpenalty_kicks_allowedpenalty_kicks_savedpsxgpsxg_net
goalkeeping
● 200 OK
"player_id": "q7r8s9t0",
"matches_played": 38,
"shots_on_target_against": 120,
"saves": 90,
"save_percentage": 75.0,
"clean_sheets": 15,
"psxg": 35.2,
"psxg_net": 5.2
# player_idmatches_playedshots_on_target_againstsavessave_percentageclean_sheets
1
2
3

Capabilities

Deep football analytics — parsed and structured

Our Fbref scraper handles complex multi-level tables, strict rate limits, and deep historical pagination to deliver clean, queryable football data.

Player Standard Stats

Extract core metrics including goals, assists, playing time, and card accumulations across all domestic and international competitions.

Advanced xG & Shot Data

Capture expected goals (xG), expected assisted goals (xAG), shot creation actions, and detailed shooting efficiency metrics.

Passing & Possession

Extract pass completion rates, progressive passes, key passes, and possession statistics parsed from complex nested tables.

Defensive Actions

Track tackles, interceptions, blocks, clearances, and aerial duals won for comprehensive defensive profiling.

Advanced Goalkeeping

Extract post-shot expected goals (PSxG), save percentages, cross stopping, and sweeping actions for goalkeeper analysis.

Match Logs & Summaries

Scrape detailed match-by-match logs for players and teams, including event timelines and formation data.

Historical Seasons

Extract data from past seasons across major leagues, maintaining consistent schemas despite historical formatting variations.

Multi-Competition Support

Data extraction spanning top European leagues, international tournaments, and lower divisions available on the platform.

Scheduled Updates

Configure pipelines to run post-matchweek to capture updated statistics and standings automatically.

// engagement pipeline

From target leagues to warehouse tables

Brief in. Clean data out.

Define Scope
d 0

Specify leagues, seasons, and specific data tables (e.g., standard stats, passing, scouting reports) required.

Pipeline Build
d 2–4

We configure Scrapy crawlers, table parsing logic, and rate-limit management systems for Sports Reference infrastructure.

Validation & QA
d 4–6

Schema validation, null-rate checks, and cross-referencing totals to ensure accurate table extraction.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Fbref pipeline handles the hard parts

Sports Reference sites present unique structural and infrastructural challenges. Here is how we build resilient extraction.

pipeline-monitor · fbref.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Table parsing
Multi-level header normalisation

Fbref uses complex HTML tables with multiple header rows (e.g., categorising 'Passes' into 'Total', 'Short', 'Medium', 'Long'). Our parsers flatten these hierarchies into clean, single-level column names suitable for relational databases.

Rate limiting
Strict 429 management

Sports Reference implements aggressive rate limiting. We utilise distributed proxy pools and precise request throttling to maintain extraction volume without triggering IP bans or HTTP 429 responses.

Data linking
Consistent ID extraction

We extract and preserve Fbref unique identifiers for players, teams, and matches, allowing you to build relational models and map entities across different datasets.

Historical variance
Schema versioning for older data

Historical seasons often lack advanced metrics like xG. Our pipelines handle missing columns gracefully, ensuring historical data fits into modern schemas without breaking downstream processes.

Change detection
Efficient updates

For active seasons, we identify updated match logs and recalculate season totals, pushing only the necessary updates to your warehouse.

Applications

Who uses Fbref data — and how

Teams across industries use fbref.com data to build competitive products and smarter operations.

01
Pro Recruitment & Scouting

Professional clubs use Fbref scouting reports and percentile rankings to identify undervalued talent across global leagues.

02
Fantasy Football Models

Data scientists build predictive models for FPL and other fantasy games using underlying xG and xAG metrics rather than raw outputs.

03
Betting & Odds Calculation

Syndicates ingest historical match logs and team performance data to train predictive models and find edge in betting markets.

04
Media & Broadcasting

Sports journalists and broadcasters use advanced metrics to enrich match commentary and analytical articles.

05
Academic Research

Researchers analyse long-term trends in tactical evolution, player longevity, and league competitiveness using historical datasets.

06
Performance Analysis

Coaching staff evaluate team performance against expected metrics to identify tactical inefficiencies and areas for improvement.

Why DataFlirt

"Fbref provides the most comprehensive publicly available football dataset, but extracting it from multi-level HTML tables requires specialised parsing architecture."

Parsing Sports Reference tables is notoriously difficult due to complex headers, embedded JavaScript variables, and strict rate limits. DataFlirt handles the extraction complexity, delivering flattened, typed, and warehouse-ready data so your analysts can focus on building models rather than writing parsing scripts.

Technical Spec

Fbref scraper — technical capabilities

Everything supported by our fbref.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Rate limit circumvention
Distributed proxies and precise request delays to avoid 429 errors
Supported
Multi-level table parsing
Flattens complex HTML table headers into standard database columns
Supported
Historical data extraction
Paginates through past seasons and handles missing metric columns gracefully
Supported
xG metrics capture
Extracts expected goals and related advanced metrics provided by Opta
Supported
Match log pagination
Iterates through all matches for a given player or team
Supported
Player ID extraction
Captures unique Fbref entity IDs for relational mapping
Supported
Stathead custom queries
Data behind the Stathead subscription paywall requires authenticated access
Partial
Private user saved searches
Cannot extract user-specific saved queries or custom dashboards
Partial
Infrastructure

Infrastructure powering the Fbref pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Table parsing engine

Custom Python modules designed specifically to parse Sports Reference DOM structures, handling multi-row headers and dynamic column generation.

Rate limit management

Intelligent request scheduling via Redis and Airflow to respect target site limits while maintaining extraction throughput across distributed proxy pools.

Cloud-native orchestration

Pipelines run on AWS Lambda and Kubernetes. Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested — schema versioned per run
CSV
Flat file with typed columns — Excel/Sheets compatible
XLS
Excel format for direct analyst consumption
Parquet
Columnar format for BigQuery, Snowflake, Athena
AWS S3
Direct bucket delivery — compatible with any data lake
Webhook
HTTP POST per record for real-time downstream processing
API
REST endpoint to query extracted datasets
PostgreSQL
Upsert into your existing schema with conflict resolution
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About fbref.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Fbref legal?

Scraping publicly available factual data (such as sports statistics) is generally permissible. DataFlirt targets only public, non-authenticated statistical data. We do not extract personal data or circumvent authentication walls like Stathead. Clients should review Terms of Service and consult legal counsel for specific use cases.

How do you handle Sports Reference rate limits?

Sports Reference sites enforce strict request limits. We manage this through distributed residential proxies, precise request delays, and concurrency controls to ensure reliable data extraction without triggering blocks.

How frequently can the data be updated?

Pipelines are typically scheduled weekly or daily following matchdays to capture updated statistics. Real-time extraction during matches is not supported as Fbref updates data post-match.

Do you extract expected goals (xG) data?

Yes. We extract all advanced metrics provided on the platform, including xG, xAG, PSxG, and shot-creating actions, preserving the granularity of the original tables.

How deep does the historical data go?

We can extract data as far back as Fbref provides it. Note that advanced metrics like xG are only available for recent seasons; our schema handles these historical variations gracefully.

Can you extract data from Stathead?

No. Stathead requires a paid subscription and authenticated access. We only extract publicly available data from the main Fbref domain.

$ dataflirt scope --new-project --source=fbref.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need historical season data or continuous updates for predictive modelling — we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →