We extract xG models, shot coordinates, PPDA, and player season totals from Understat. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Match Stats objects from understat.com. All fields typed and schema-versioned.
"match_id": "18202", "date": "2023-08-11 19:00:00", "home_team": "Burnley", "away_team": "Manchester City", "home_goals": 0, "away_goals": 3, "home_xg": 0.31, "away_xg": 2.15, "home_ppda": 18.2, "away_ppda": 8.5
| # | match_id | date | league | home_team | away_team | home_goals |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Shot Maps objects from understat.com. All fields typed and schema-versioned.
"shot_id": "512349", "match_id": "18202", "player_id": "2371", "player": "Erling Haaland", "minute": 4, "result": "Goal", "xg": 0.42, "x_coord": 0.88, "y_coord": 0.52, "situation": "OpenPlay", "shot_type": "LeftFoot"
| # | shot_id | match_id | player_id | player | minute | result |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Player Season objects from understat.com. All fields typed and schema-versioned.
"player_id": "2371", "player_name": "Erling Haaland", "team": "Manchester City", "games": 35, "time": 2779, "goals": 36, "xg": 28.66, "assists": 8, "xa": 3.14, "shots": 123, "xg_chain": 32.51
| # | player_id | player_name | team | games | time | goals |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Team Season objects from understat.com. All fields typed and schema-versioned.
"team_id": "88", "team_name": "Manchester City", "matches": 38, "wins": 28, "draws": 5, "loses": 5, "scored": 94, "missed": 33, "pts": 89, "xg": 78.55, "xga": 32.12, "xpts": 82.4
| # | team_id | team_name | matches | wins | draws | loses |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for League Standings objects from understat.com. All fields typed and schema-versioned.
"league": "EPL", "season": "2022", "position": 1, "team": "Manchester City", "matches": 38, "pts": 89, "xpts": 82.4, "xg": 78.55, "xga": 32.12, "xg_diff": 15.45, "xpts_diff": -6.6
| # | league | season | position | team | matches | pts |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Understat provides the cleanest public xG models in football, but the data is locked in hex-encoded script tags. We decode, normalise, and deliver it directly to your warehouse.
Extract X/Y coordinates, xG value, shot type, situation, and assist origin for every shot taken in the top 6 European leagues.
Capture season-level xG, xA, key passes, xGChain, and xGBuildup to isolate player contribution beyond standard goals and assists.
Retrieve PPDA (Passes Allowed Per Defensive Action) and Deep Completions to quantify pressing intensity and final-third dominance.
Track xPTS (Expected Points) and xG difference over the course of a season to identify overperforming or underperforming squads.
Access complete historical data dating back to the 2014/2015 season across the Premier League, La Liga, Bundesliga, Serie A, Ligue 1, and RFPL.
Understat embeds raw data in script tags. Our pipeline natively decodes this payload into structured, queryable rows.
Normalise team names and player IDs across all 6 supported leagues into a unified relational schema.
Configure pipelines to run automatically after matchdays conclude, ensuring your models always have the latest data.
Only fetch new matches and updated player totals to minimise compute costs and downstream processing load.
Brief in. Clean data out.
Specify the leagues, seasons, and granularity (match, player, or shot level) required for your analysis.
We configure crawlers to extract and decode the embedded JSON payloads from Understat's frontend.
Schema validation, null-rate checks, and coordinate normalisation to ensure data integrity before launch.
JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.
Extracting data from Understat requires specific parsing techniques. Here is how we build resilient pipelines for football analytics.
Understat does not serve data via a public API or standard HTML tables. The raw data is hex-encoded and embedded within `<script>` tags on the page. Our pipeline uses targeted regex and JavaScript execution to isolate, decode, and parse these payloads directly into JSON.
Football data is only valuable when it is up to date. We schedule pipeline runs to trigger immediately after matchdays conclude across different European time zones, ensuring your database is updated before the next round of fixtures.
Understat uses internal IDs for players and teams. We maintain mapping tables to standardise team names (e.g., 'Man Utd' vs 'Manchester United') so the data can be joined cleanly with other sources like Opta or Wyscout.
Training predictive models requires deep historical data. Our pipeline can paginate through and extract every match, shot, and player aggregate dating back to the 2014/2015 season in a single bulk run.
While Understat is a smaller domain, aggressive scraping still leads to rate limits. We use intelligent request pacing and residential proxy rotation to extract entire seasons of data without triggering blocks or degrading site performance.
Quantitative syndicates ingest xG and xPTS data to build predictive models, identify value in Asian Handicap markets, and calculate true match probabilities.
FPL content creators and predictive algorithms use player xG and xA metrics to forecast future point returns and identify underpriced assets.
Professional clubs analyse xGChain and xGBuildup to identify undervalued players who contribute heavily to possession sequences but lack traditional goals/assists.
Broadcasters and journalists use shot maps and PPDA metrics to create data-driven narratives and visualisations for post-match analysis.
Data scientists train machine learning models on historical shot coordinates and situation data to simulate match outcomes.
Coaches evaluate team pressing intensity using PPDA and Deep Completions to quantify tactical shifts over a season.
"Understat provides the cleanest public xG and shot map data available, but accessing the raw JSON requires decoding embedded script tags at scale."
Most analysts waste hours copying tables or writing fragile Python scripts that break when DOM structures change. We decode Understat's embedded payload data natively, normalising shot coordinates and xG metrics into queryable tables. DataFlirt manages the extraction so your quants can focus on predictive modelling rather than parsing hex strings.
Everything supported by our understat.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Custom Python middleware using regex and JavaScript execution to isolate and parse the base64/hex encoded JSON payloads embedded in Understat's frontend.
Apache Airflow manages the scheduling logic, triggering extraction tasks only after matchdays conclude to ensure data completeness while minimising unnecessary requests.
Post-processing scripts in pandas/Polars clean up team names, handle missing values, and cast numeric fields (like xG) to proper float types before warehouse delivery.
Data delivered to where your team already works — no new tooling required.
About understat.com scraping, legality, and pipeline operations.
Ask us directly →Scraping publicly available statistical data is generally permissible. DataFlirt extracts only public match, player, and shot statistics. We do not bypass authentication or extract personal data. Clients should review Understat's terms of service and consult legal counsel for their specific commercial use cases.
Understat renders its tables using JavaScript, pulling from JSON payloads embedded directly in `<script>` tags on the page. Our pipeline bypasses the HTML entirely, locating the script tags, decoding the text, and parsing the raw JSON directly.
Understat typically updates its database shortly after matches conclude. Our pipelines are scheduled to run daily or post-matchday to capture these updates as soon as they are published.
We provide the raw Understat player and team IDs. While we normalise team names to standard conventions, mapping to proprietary third-party IDs (like Opta or Fantasy Premier League) requires a separate entity resolution table on the client side.
No. Understat only provides data on shots (coordinates, xG, situation) and aggregated match/player metrics. It does not provide full event-stream data (like Opta F24 feeds).
We support extraction for the English Premier League (EPL), La Liga, Bundesliga, Serie A, Ligue 1, and the Russian Premier League (RFPL), dating back to the 2014/2015 season.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a historical backfill of shot maps or a continuous feed of xG metrics for the current season — we scope, build, and operate the pipeline. Tell us what you need.