SYSTEM all green source fbref.com queue 12,843 matches p99 latency 184ms dataflirt.com · scraper/fbref-com

RUN · 31 active pipelines · fbref.com live

Football data,
at warehouse scale.

We extract player statistics, match logs, expected goals (xG), and advanced scouting reports from Fbref. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Get data from fbref.com → See how it works

Players extracted

184K /run

Match logs

2.1M /total

Advanced metrics

48M /month

Active pipelines

Uptime

99.98%

◆ Player Statistics◆ Match Logs◆ Team Performance◆ Expected Goals (xG)◆ Scouting Reports◆ Historical Tables◆ Transfer Data◆ Advanced Goalkeeping◆ Passing & Progression◆ Defensive Actions◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA◆ Player Statistics◆ Match Logs◆ Team Performance◆ Expected Goals (xG)◆ Scouting Reports◆ Historical Tables◆ Transfer Data◆ Advanced Goalkeeping◆ Passing & Progression◆ Defensive Actions◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA

Data Dictionary

Every field we extract from fbref.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Player Stats objects from fbref.com. All fields typed and schema-versioned.

player_idnamenationalitypositionagematches_playedstartsminutesgoalsassistsxgxagyellow_cardsred_cards

"player_id": "a1b2c3d4",
"name": "Lionel Messi",
"nationality": "ar ARG",
"position": "FW",
"age": 36,
"goals": 20,
"assists": 10,
"xg": 18.5

#	player_id	name	nationality	position	age	matches_played
1
2
3

Complete list of extractable fields for Match Logs objects from fbref.com. All fields typed and schema-versioned.

match_iddatecompetitionhome_teamaway_teamresultpossessionshotsshots_on_targetfoulscornersxg_homexg_away

"match_id": "e5f6g7h8",
"date": "2023-10-28",
"competition": "La Liga",
"home_team": "Barcelona",
"away_team": "Real Madrid",
"result": "1-2",
"possession": 53,
"xg_home": 1.2

#	match_id	date	competition	home_team	away_team	result
1
2
3

Complete list of extractable fields for Scouting Reports objects from fbref.com. All fields typed and schema-versioned.

player_idtemplateminutes_playedgoals_percentilexg_percentileshot_creating_actionspasses_completedprogressive_passestacklesinterceptionsblocksclearances

"player_id": "i9j0k1l2",
"template": "Midfielders",
"minutes_played": 2450,
"goals_percentile": 85,
"xg_percentile": 82,
"progressive_passes": 95,
"tackles": 40,
"interceptions": 60

#	player_id	template	minutes_played	goals_percentile	xg_percentile	shot_creating_actions
1
2
3

Complete list of extractable fields for Team Stats objects from fbref.com. All fields typed and schema-versioned.

team_idseasoncompetitionrankmatches_playedwinsdrawslossesgoals_forgoals_againstgoal_differencepointsxg_forxg_against

"team_id": "m3n4o5p6",
"season": "2023-2024",
"competition": "Premier League",
"rank": 1,
"matches_played": 38,
"points": 91,
"goals_for": 96,
"xg_for": 88.5

#	team_id	season	competition	rank	matches_played	wins
1
2
3

Complete list of extractable fields for Goalkeeping objects from fbref.com. All fields typed and schema-versioned.

player_idmatches_playedshots_on_target_againstsavessave_percentageclean_sheetspenalty_kicks_attemptedpenalty_kicks_allowedpenalty_kicks_savedpsxgpsxg_net

"player_id": "q7r8s9t0",
"matches_played": 38,
"shots_on_target_against": 120,
"saves": 90,
"save_percentage": 75.0,
"clean_sheets": 15,
"psxg": 35.2,
"psxg_net": 5.2

#	player_id	matches_played	shots_on_target_against	saves	save_percentage	clean_sheets
1
2
3

Capabilities

Deep football analytics — parsed and structured

Our Fbref scraper handles complex multi-level tables, strict rate limits, and deep historical pagination to deliver clean, queryable football data.

Player Standard Stats

Extract core metrics including goals, assists, playing time, and card accumulations across all domestic and international competitions.

Advanced xG & Shot Data

Capture expected goals (xG), expected assisted goals (xAG), shot creation actions, and detailed shooting efficiency metrics.

Passing & Possession

Extract pass completion rates, progressive passes, key passes, and possession statistics parsed from complex nested tables.

Defensive Actions

Track tackles, interceptions, blocks, clearances, and aerial duals won for comprehensive defensive profiling.

Advanced Goalkeeping

Extract post-shot expected goals (PSxG), save percentages, cross stopping, and sweeping actions for goalkeeper analysis.

Match Logs & Summaries

Scrape detailed match-by-match logs for players and teams, including event timelines and formation data.

Historical Seasons

Extract data from past seasons across major leagues, maintaining consistent schemas despite historical formatting variations.

Multi-Competition Support

Data extraction spanning top European leagues, international tournaments, and lower divisions available on the platform.

Scheduled Updates

Configure pipelines to run post-matchweek to capture updated statistics and standings automatically.

// engagement pipeline

From target leagues to warehouse tables

Brief in. Clean data out.

Define Scope

d 0

Specify leagues, seasons, and specific data tables (e.g., standard stats, passing, scouting reports) required.

Pipeline Build

d 2–4

We configure Scrapy crawlers, table parsing logic, and rate-limit management systems for Sports Reference infrastructure.

Validation & QA

d 4–6

Schema validation, null-rate checks, and cross-referencing totals to ensure accurate table extraction.

Delivery

ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Fbref pipeline handles the hard parts

Sports Reference sites present unique structural and infrastructural challenges. Here is how we build resilient extraction.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

Table parsing

Multi-level header normalisation

Fbref uses complex HTML tables with multiple header rows (e.g., categorising 'Passes' into 'Total', 'Short', 'Medium', 'Long'). Our parsers flatten these hierarchies into clean, single-level column names suitable for relational databases.

Rate limiting

Strict 429 management

Sports Reference implements aggressive rate limiting. We utilise distributed proxy pools and precise request throttling to maintain extraction volume without triggering IP bans or HTTP 429 responses.

Data linking

Consistent ID extraction

We extract and preserve Fbref unique identifiers for players, teams, and matches, allowing you to build relational models and map entities across different datasets.

Historical variance

Schema versioning for older data

Historical seasons often lack advanced metrics like xG. Our pipelines handle missing columns gracefully, ensuring historical data fits into modern schemas without breaking downstream processes.

Change detection

Efficient updates

For active seasons, we identify updated match logs and recalculate season totals, pushing only the necessary updates to your warehouse.

Applications

Who uses Fbref data — and how

Teams across industries use fbref.com data to build competitive products and smarter operations.

Pro Recruitment & Scouting

Professional clubs use Fbref scouting reports and percentile rankings to identify undervalued talent across global leagues.

Fantasy Football Models

Data scientists build predictive models for FPL and other fantasy games using underlying xG and xAG metrics rather than raw outputs.

Betting & Odds Calculation

Syndicates ingest historical match logs and team performance data to train predictive models and find edge in betting markets.

Media & Broadcasting

Sports journalists and broadcasters use advanced metrics to enrich match commentary and analytical articles.

Academic Research

Researchers analyse long-term trends in tactical evolution, player longevity, and league competitiveness using historical datasets.

Performance Analysis

Coaching staff evaluate team performance against expected metrics to identify tactical inefficiencies and areas for improvement.

Why DataFlirt

"Fbref provides the most comprehensive publicly available football dataset, but extracting it from multi-level HTML tables requires specialised parsing architecture."

Parsing Sports Reference tables is notoriously difficult due to complex headers, embedded JavaScript variables, and strict rate limits. DataFlirt handles the extraction complexity, delivering flattened, typed, and warehouse-ready data so your analysts can focus on building models rather than writing parsing scripts.

Technical Spec

Fbref scraper — technical capabilities

Everything supported by our fbref.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Rate limit circumvention

Distributed proxies and precise request delays to avoid 429 errors

Supported

Multi-level table parsing

Flattens complex HTML table headers into standard database columns

Supported

Historical data extraction

Paginates through past seasons and handles missing metric columns gracefully

Supported

xG metrics capture

Extracts expected goals and related advanced metrics provided by Opta

Supported

Match log pagination

Iterates through all matches for a given player or team

Supported

Player ID extraction

Captures unique Fbref entity IDs for relational mapping

Supported

Stathead custom queries

Data behind the Stathead subscription paywall requires authenticated access

Partial

Private user saved searches

Cannot extract user-specific saved queries or custom dashboards

Partial

Infrastructure

Infrastructure powering the Fbref pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

Table parsing engine

Custom Python modules designed specifically to parse Sports Reference DOM structures, handling multi-row headers and dynamic column generation.

Rate limit management

Intelligent request scheduling via Redis and Airflow to respect target site limits while maintaining extraction throughput across distributed proxy pools.

Cloud-native orchestration

Pipelines run on AWS Lambda and Kubernetes. Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Newline-delimited or nested — schema versioned per run

CSV

Flat file with typed columns — Excel/Sheets compatible

XLS

Excel format for direct analyst consumption

Parquet

Columnar format for BigQuery, Snowflake, Athena

AWS S3

Direct bucket delivery — compatible with any data lake

Webhook

HTTP POST per record for real-time downstream processing

API

REST endpoint to query extracted datasets

PostgreSQL

Upsert into your existing schema with conflict resolution

Direct bucket delivery — compatible with any data lake

// faq

Common questions.

About fbref.com scraping, legality, and pipeline operations.

Ask us directly →

Is scraping Fbref legal?

Scraping publicly available factual data (such as sports statistics) is generally permissible. DataFlirt targets only public, non-authenticated statistical data. We do not extract personal data or circumvent authentication walls like Stathead. Clients should review Terms of Service and consult legal counsel for specific use cases.

How do you handle Sports Reference rate limits?

Sports Reference sites enforce strict request limits. We manage this through distributed residential proxies, precise request delays, and concurrency controls to ensure reliable data extraction without triggering blocks.

How frequently can the data be updated?

Pipelines are typically scheduled weekly or daily following matchdays to capture updated statistics. Real-time extraction during matches is not supported as Fbref updates data post-match.

Do you extract expected goals (xG) data?

Yes. We extract all advanced metrics provided on the platform, including xG, xAG, PSxG, and shot-creating actions, preserving the granularity of the original tables.

How deep does the historical data go?

We can extract data as far back as Fbref provides it. Note that advanced metrics like xG are only available for recent seasons; our schema handles these historical variations gracefully.

Can you extract data from Stathead?

No. Stathead requires a paid subscription and authenticated access. We only extract publicly available data from the main Fbref domain.

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need historical season data or continuous updates for predictive modelling — we scope, build, and operate the pipeline. Tell us what you need.

Start a fbref.com pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h

Services

Data Extraction for Every Industry

View All Services →

🛍️ eCommerce → 🔍 Search Engine → ⚽ Sports Data → 📱 App Store → 🍕 Food Delivery → 📉 Betting Odds → ✈️ Aviation & Flight → 🛒 Grocery → 🎓 E-Learning → 💹 Stock Market → 🏠 Real Estate → 🤖 AI Training Data → 🧠 LLM Data → 📰 News → ⭐ Reviews → 💼 Job Board → 🏥 Healthcare → 💊 Pharma → 🏢 Company Data → 🤝 B2B Marketplace → 🚗 Automotive → 🌍 Travel → 🏨 Hospitality → 🪙 Cryptocurrency → 💡 IP & Patents → 📈 SEO Data → ⚖️ Legal → 🛡️ Insurance → 📲 Mobile App → 📸 Influencer → 🏛️ Government → 🚚 Transportation → 🎟️ Events → 📂 Directory → ⚡ Dynamic Websites → 📄 PDF Extraction → ✍️ Blog Content → ☁️ Weather → 🖥️ Cloud Scraping → 👨‍💻 Managed Service →

Football data, at warehouse scale.

Every field we extract from fbref.com

Deep football analytics — parsed and structured

From target leagues to warehouse tables

How our Fbref pipeline handles the hard parts

Who uses Fbref data — and how

Fbref scraper — technical capabilities

Infrastructure powering the Fbref pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

Football data,
at warehouse scale.

Tell us what
to extract.
We do the rest.