SYSTEM all green source understat.com queue 12,409 matches p99 latency 98ms dataflirt.com · scraper/understat-com
RUN · 42 active pipelines · understat.com live

Football analytics,
at warehouse scale.

We extract xG models, shot coordinates, PPDA, and player season totals from Understat. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Matches extracted
14.2K /season
Shot coordinates
1.8M /run
Player profiles
8.4K /update
Active pipelines
42
Uptime
99.98%
Data Dictionary

Every field we extract from understat.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Match Stats objects from understat.com. All fields typed and schema-versioned.

match_iddateleaguehome_teamaway_teamhome_goalsaway_goalshome_xgaway_xghome_ppdaaway_ppdahome_deepaway_deep
match_stats
● 200 OK
"match_id": "18202",
"date": "2023-08-11 19:00:00",
"home_team": "Burnley",
"away_team": "Manchester City",
"home_goals": 0,
"away_goals": 3,
"home_xg": 0.31,
"away_xg": 2.15,
"home_ppda": 18.2,
"away_ppda": 8.5
# match_iddateleaguehome_teamaway_teamhome_goals
1
2
3

Complete list of extractable fields for Shot Maps objects from understat.com. All fields typed and schema-versioned.

shot_idmatch_idplayer_idplayerminuteresultxgx_coordy_coordsituationshot_typeassist_player_id
shot_maps
● 200 OK
"shot_id": "512349",
"match_id": "18202",
"player_id": "2371",
"player": "Erling Haaland",
"minute": 4,
"result": "Goal",
"xg": 0.42,
"x_coord": 0.88,
"y_coord": 0.52,
"situation": "OpenPlay",
"shot_type": "LeftFoot"
# shot_idmatch_idplayer_idplayerminuteresult
1
2
3

Complete list of extractable fields for Player Season objects from understat.com. All fields typed and schema-versioned.

player_idplayer_nameteamgamestimegoalsxgassistsxashotskey_passesnpgnpxgxg_chainxg_buildup
player_season
● 200 OK
"player_id": "2371",
"player_name": "Erling Haaland",
"team": "Manchester City",
"games": 35,
"time": 2779,
"goals": 36,
"xg": 28.66,
"assists": 8,
"xa": 3.14,
"shots": 123,
"xg_chain": 32.51
# player_idplayer_nameteamgamestimegoals
1
2
3

Complete list of extractable fields for Team Season objects from understat.com. All fields typed and schema-versioned.

team_idteam_namematcheswinsdrawslosesscoredmissedptsxgxgaxptsppdappda_allowed
team_season
● 200 OK
"team_id": "88",
"team_name": "Manchester City",
"matches": 38,
"wins": 28,
"draws": 5,
"loses": 5,
"scored": 94,
"missed": 33,
"pts": 89,
"xg": 78.55,
"xga": 32.12,
"xpts": 82.4
# team_idteam_namematcheswinsdrawsloses
1
2
3

Complete list of extractable fields for League Standings objects from understat.com. All fields typed and schema-versioned.

leagueseasonpositionteammatchesptsxptsxgxgaxg_diffxpts_diff
league_standings
● 200 OK
"league": "EPL",
"season": "2022",
"position": 1,
"team": "Manchester City",
"matches": 38,
"pts": 89,
"xpts": 82.4,
"xg": 78.55,
"xga": 32.12,
"xg_diff": 15.45,
"xpts_diff": -6.6
# leagueseasonpositionteammatchespts
1
2
3

Capabilities

Extract the raw metrics behind the beautiful game

Understat provides the cleanest public xG models in football, but the data is locked in hex-encoded script tags. We decode, normalise, and deliver it directly to your warehouse.

Shot Map Coordinates

Extract X/Y coordinates, xG value, shot type, situation, and assist origin for every shot taken in the top 6 European leagues.

Player Aggregates

Capture season-level xG, xA, key passes, xGChain, and xGBuildup to isolate player contribution beyond standard goals and assists.

Match Level Metrics

Retrieve PPDA (Passes Allowed Per Defensive Action) and Deep Completions to quantify pressing intensity and final-third dominance.

Team Season Tracking

Track xPTS (Expected Points) and xG difference over the course of a season to identify overperforming or underperforming squads.

Historical Data Backfill

Access complete historical data dating back to the 2014/2015 season across the Premier League, La Liga, Bundesliga, Serie A, Ligue 1, and RFPL.

Hex-Encoded JSON Parsing

Understat embeds raw data in script tags. Our pipeline natively decodes this payload into structured, queryable rows.

Multi-League Support

Normalise team names and player IDs across all 6 supported leagues into a unified relational schema.

Automated Weekly Updates

Configure pipelines to run automatically after matchdays conclude, ensuring your models always have the latest data.

Delta Sync

Only fetch new matches and updated player totals to minimise compute costs and downstream processing load.

// engagement pipeline

From Understat payload to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Specify the leagues, seasons, and granularity (match, player, or shot level) required for your analysis.

Pipeline Build
d 2–4

We configure crawlers to extract and decode the embedded JSON payloads from Understat's frontend.

Validation & QA
d 4–6

Schema validation, null-rate checks, and coordinate normalisation to ensure data integrity before launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Understat pipeline handles the hard parts

Extracting data from Understat requires specific parsing techniques. Here is how we build resilient pipelines for football analytics.

pipeline-monitor · understat.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Payload extraction
Decoding embedded script tags

Understat does not serve data via a public API or standard HTML tables. The raw data is hex-encoded and embedded within `<script>` tags on the page. Our pipeline uses targeted regex and JavaScript execution to isolate, decode, and parse these payloads directly into JSON.

Scheduling
Matchday sync timing

Football data is only valuable when it is up to date. We schedule pipeline runs to trigger immediately after matchdays conclude across different European time zones, ensuring your database is updated before the next round of fixtures.

Data normalisation
Entity resolution across leagues

Understat uses internal IDs for players and teams. We maintain mapping tables to standardise team names (e.g., 'Man Utd' vs 'Manchester United') so the data can be joined cleanly with other sources like Opta or Wyscout.

Historical context
Decade-long backfills

Training predictive models requires deep historical data. Our pipeline can paginate through and extract every match, shot, and player aggregate dating back to the 2014/2015 season in a single bulk run.

Infrastructure
Rate limiting & IP rotation

While Understat is a smaller domain, aggressive scraping still leads to rate limits. We use intelligent request pacing and residential proxy rotation to extract entire seasons of data without triggering blocks or degrading site performance.

Applications

Who uses Understat data — and how

Teams across industries use understat.com data to build competitive products and smarter operations.

01
Betting & Syndicate Models

Quantitative syndicates ingest xG and xPTS data to build predictive models, identify value in Asian Handicap markets, and calculate true match probabilities.

02
Fantasy Football Projections

FPL content creators and predictive algorithms use player xG and xA metrics to forecast future point returns and identify underpriced assets.

03
Club Scouting & Recruitment

Professional clubs analyse xGChain and xGBuildup to identify undervalued players who contribute heavily to possession sequences but lack traditional goals/assists.

04
Sports Media & Journalism

Broadcasters and journalists use shot maps and PPDA metrics to create data-driven narratives and visualisations for post-match analysis.

05
Predictive Analytics

Data scientists train machine learning models on historical shot coordinates and situation data to simulate match outcomes.

06
Tactical Analysis

Coaches evaluate team pressing intensity using PPDA and Deep Completions to quantify tactical shifts over a season.

Why DataFlirt

"Understat provides the cleanest public xG and shot map data available, but accessing the raw JSON requires decoding embedded script tags at scale."

Most analysts waste hours copying tables or writing fragile Python scripts that break when DOM structures change. We decode Understat's embedded payload data natively, normalising shot coordinates and xG metrics into queryable tables. DataFlirt manages the extraction so your quants can focus on predictive modelling rather than parsing hex strings.

Technical Spec

Understat scraper — technical capabilities

Everything supported by our understat.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Embedded JSON decoding
Extracts and parses hex-encoded data payloads directly from page source
Supported
Shot X/Y coordinate extraction
Captures raw pitch coordinates for every recorded shot
Supported
Player xGChain/xGBuildup
Extracts advanced possession involvement metrics
Supported
Historical seasons (2014-present)
Full backfill capability across all 6 covered leagues
Supported
Post-match automatic sync
Scheduled pipelines that run after final whistles
Supported
Cross-league player mapping
Standardises player profiles who transfer between covered leagues
Supported
Live in-play xG updates
Understat only publishes data post-match; real-time updates are not available
Partial
Tracking/Event data (Opta F24)
Understat provides summary shots, not full passing/event stream data
Partial
Infrastructure

Infrastructure powering the Understat pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Payload Decoding

Custom Python middleware using regex and JavaScript execution to isolate and parse the base64/hex encoded JSON payloads embedded in Understat's frontend.

Orchestration

Apache Airflow manages the scheduling logic, triggering extraction tasks only after matchdays conclude to ensure data completeness while minimising unnecessary requests.

Data Normalisation

Post-processing scripts in pandas/Polars clean up team names, handle missing values, and cast numeric fields (like xG) to proper float types before warehouse delivery.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested arrays of match data
CSV
Flat file with typed columns — perfect for pandas or R
Parquet
Columnar format for BigQuery, Snowflake, Athena
S3
Direct bucket delivery — compatible with any data lake
Webhook
HTTP POST per match for downstream processing
BigQuery
Streamed directly into your dataset with schema auto-detect
Postgres
Upsert into your existing schema with conflict resolution
Snowflake
Stage + COPY INTO workflow — incremental updates
// faq

Common questions.

About understat.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Understat legal?

Scraping publicly available statistical data is generally permissible. DataFlirt extracts only public match, player, and shot statistics. We do not bypass authentication or extract personal data. Clients should review Understat's terms of service and consult legal counsel for their specific commercial use cases.

How do you extract the data if it is not in the HTML tables?

Understat renders its tables using JavaScript, pulling from JSON payloads embedded directly in `<script>` tags on the page. Our pipeline bypasses the HTML entirely, locating the script tags, decoding the text, and parsing the raw JSON directly.

How soon after a match is data available?

Understat typically updates its database shortly after matches conclude. Our pipelines are scheduled to run daily or post-matchday to capture these updates as soon as they are published.

Can you map Understat IDs to FPL or Opta IDs?

We provide the raw Understat player and team IDs. While we normalise team names to standard conventions, mapping to proprietary third-party IDs (like Opta or Fantasy Premier League) requires a separate entity resolution table on the client side.

Do you provide full event data (passes, tackles, etc.)?

No. Understat only provides data on shots (coordinates, xG, situation) and aggregated match/player metrics. It does not provide full event-stream data (like Opta F24 feeds).

What leagues are covered?

We support extraction for the English Premier League (EPL), La Liga, Bundesliga, Serie A, Ligue 1, and the Russian Premier League (RFPL), dating back to the 2014/2015 season.

$ dataflirt scope --new-project --source=understat.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a historical backfill of shot maps or a continuous feed of xG metrics for the current season — we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →