We extract match commentary, proprietary player ratings, passing matrices, and Opta event data from WhoScored. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Match Summaries objects from whoscored.com. All fields typed and schema-versioned.
"match_id": "1734928", "date": "2023-10-08T15:30:00Z", "home_team": "Arsenal", "away_team": "Manchester City", "home_score": 1, "away_score": 0, "competition": "Premier League", "referee": "Michael Oliver"
| # | match_id | date | home_team | away_team | home_score | away_score |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Player Ratings objects from whoscored.com. All fields typed and schema-versioned.
"player_id": "12345", "player_name": "Bukayo Saka", "team": "Arsenal", "position": "AMR", "minutes_played": 90, "whoscored_rating": 8.14, "man_of_the_match": true, "goals": 1
| # | player_id | player_name | team | position | minutes_played | goals |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Team Statistics objects from whoscored.com. All fields typed and schema-versioned.
"team_id": "13", "team_name": "Arsenal", "possession_pct": 52.4, "pass_success_pct": 84.1, "aerials_won": 14, "shots_total": 12, "shots_on_target": 4, "corners": 6
| # | team_id | team_name | possession_pct | pass_success_pct | aerials_won | shots_total |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Live Events objects from whoscored.com. All fields typed and schema-versioned.
"event_id": "9847123", "match_id": "1734928", "minute": 45, "second": 12, "event_type": "Pass", "x_coordinate": 45.2, "y_coordinate": 68.9, "outcome": "Successful"
| # | event_id | match_id | minute | second | team_id | player_id |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for League Tables objects from whoscored.com. All fields typed and schema-versioned.
"competition_id": "252", "season": "2023/2024", "rank": 1, "team": "Arsenal", "played": 38, "won": 28, "drawn": 5, "lost": 5, "points": 89
| # | competition_id | season | rank | team | played | won |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Our WhoScored scraper parses complex JavaScript payloads to extract pitch-level Opta data, proprietary ratings, and historical archives without triggering Cloudflare blocks.
Extract raw Opta event data including x/y pitch coordinates for passes, shots, tackles, and interceptions.
Capture proprietary match ratings, Man of the Match awards, and algorithmic performance scores per player.
Poll live match centres to extract minute-by-minute commentary and event updates with minimal latency.
Scrape league tables, fixture results, and aggregate player statistics dating back over a decade.
Track individual player statistics across multiple competitions, including positional data and characteristic strengths.
Extract starting XIs, tactical formations, substitutions, and average player positions during matches.
Capture pre-match and historical betting odds aggregated from major bookmakers displayed on the site.
Monitor player availability, expected return dates, and disciplinary records across major leagues.
Extract data from the Premier League, La Liga, Serie A, Bundesliga, Champions League, and international tournaments.
Brief in. Clean data out.
Provide league URLs, team lists, or match IDs. We design the extraction schema together.
We configure Playwright crawlers, XHR interception, and proxy rotation for whoscored.com.
Schema validation, null-rate checks, and coordinate accuracy verification before full launch.
JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.
Football statistics sites heavily obfuscate their data feeds. Here is how we bypass rendering blocks and extract clean JSON.
WhoScored renders match chalkboards via complex JavaScript. Instead of parsing the DOM, we use Playwright to intercept the raw JSON XHR responses containing the structured Opta event data, ensuring zero data loss and higher extraction speed.
WhoScored employs strict Cloudflare protection. Our infrastructure uses residential proxies with realistic TLS fingerprints and automated CAPTCHA solving to maintain continuous access without triggering rate limits.
We normalise WhoScored internal IDs to standard formats, allowing you to easily join this dataset with other sports data providers or your internal databases.
For live matches, we deploy high-frequency polling workers that diff the event feed every few seconds, pushing only new events via webhook to your downstream applications.
Heatmaps and shot maps are often rendered directly to HTML5 canvas elements. We execute custom JavaScript within the browser context to extract the underlying data arrays before they are drawn to the screen.
Quants feed historical match statistics, xG data, and player ratings into predictive models to identify mispriced odds.
Platform operators use underlying performance metrics to project player points and optimise draft strategies.
Professional clubs analyse granular pass completion matrices and defensive actions to identify undervalued talent.
Publishers automate pre-match previews and post-match analysis articles using structured statistical feeds.
Analysts use pitch coordinate data to map team formations, pressing intensity, and spatial control metrics.
In-play betting syndicates consume low-latency live event feeds to execute automated trades on betting exchanges.
"WhoScored aggregates the deepest Opta event data available publicly, but extracting pitch-level coordinates requires executing complex JavaScript."
Most teams fail at scraping football statistics because modern sports portals render data via encrypted JSON payloads and canvas elements. DataFlirt executes full browser sessions to intercept XHR requests, extracting raw event data before it hits the DOM.
Everything supported by our whoscored.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Playwright intercepts network requests, allowing us to bypass DOM parsing and extract the raw, structured JSON payloads that power WhoScored's dynamic visualisations.
We route requests through residential proxies with TLS fingerprint spoofing to bypass Cloudflare, ensuring uninterrupted access to match data.
For live matches, our Redis-backed worker queues poll endpoints at high frequency, computing diffs in memory to emit low-latency event webhooks.
Data delivered to where your team already works — no new tooling required.
About whoscored.com scraping, legality, and pipeline operations.
Ask us directly →Scraping publicly available statistical data is generally permissible. DataFlirt extracts only public, non-authenticated match and player statistics. We do not bypass authentication walls. Clients should review WhoScored's terms of service and consult legal counsel for specific commercial use cases.
Our live pipelines can poll active match centres every few seconds. Once an event is registered on the WhoScored frontend, we extract and deliver it via webhook within 500-800 milliseconds.
Yes. WhoScored uses Opta data to render chalkboards. We intercept the network requests that populate these visualisations to extract the exact x and y coordinates for passes, shots, and other events.
Yes. We can configure backfill pipelines to extract match statistics, player ratings, and league tables from previous seasons available on the platform.
We use high-quality residential ISP proxies combined with Playwright to simulate genuine browser fingerprints. This prevents our crawlers from triggering automated security challenges.
Our minimum engagement typically starts with a defined set of leagues (e.g., top 5 European leagues) for a full season, including historical backfill and live weekly updates.
Yes. Heatmaps are often drawn on HTML5 canvas elements. We execute JavaScript within the browser context to extract the raw intensity values before the image is rendered.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need historical match archives or low-latency live event streams, we build and operate the infrastructure. Tell us your requirements.