SYSTEM all green source whoscored.com queue 12,491 matches p99 latency 214ms dataflirt.com · scraper/whoscored-com
RUN - 84 active pipelines - whoscored.com live

Football statistics,
parsed at pitch level.

We extract match commentary, proprietary player ratings, passing matrices, and Opta event data from WhoScored. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Matches extracted
4.2K /week
Player profiles
184K /run
Event points
14.7M /24h
Active pipelines
84
Uptime
99.94%
Data Dictionary

Every field we extract from whoscored.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Match Summaries objects from whoscored.com. All fields typed and schema-versioned.

match_iddatehome_teamaway_teamhome_scoreaway_scorecompetitionrefereestadiumattendance
match_summaries
● 200 OK
"match_id": "1734928",
"date": "2023-10-08T15:30:00Z",
"home_team": "Arsenal",
"away_team": "Manchester City",
"home_score": 1,
"away_score": 0,
"competition": "Premier League",
"referee": "Michael Oliver"
# match_iddatehome_teamaway_teamhome_scoreaway_score
1
2
3

Complete list of extractable fields for Player Ratings objects from whoscored.com. All fields typed and schema-versioned.

player_idplayer_nameteampositionminutes_playedgoalsassistsyellow_cardsred_cardswhoscored_ratingman_of_the_match
player_ratings
● 200 OK
"player_id": "12345",
"player_name": "Bukayo Saka",
"team": "Arsenal",
"position": "AMR",
"minutes_played": 90,
"whoscored_rating": 8.14,
"man_of_the_match": true,
"goals": 1
# player_idplayer_nameteampositionminutes_playedgoals
1
2
3

Complete list of extractable fields for Team Statistics objects from whoscored.com. All fields typed and schema-versioned.

team_idteam_namepossession_pctpass_success_pctaerials_wonshots_totalshots_on_targettacklescornersfouls
team_statistics
● 200 OK
"team_id": "13",
"team_name": "Arsenal",
"possession_pct": 52.4,
"pass_success_pct": 84.1,
"aerials_won": 14,
"shots_total": 12,
"shots_on_target": 4,
"corners": 6
# team_idteam_namepossession_pctpass_success_pctaerials_wonshots_total
1
2
3

Complete list of extractable fields for Live Events objects from whoscored.com. All fields typed and schema-versioned.

event_idmatch_idminutesecondteam_idplayer_idevent_typex_coordinatey_coordinateoutcome
live_events
● 200 OK
"event_id": "9847123",
"match_id": "1734928",
"minute": 45,
"second": 12,
"event_type": "Pass",
"x_coordinate": 45.2,
"y_coordinate": 68.9,
"outcome": "Successful"
# event_idmatch_idminutesecondteam_idplayer_id
1
2
3

Complete list of extractable fields for League Tables objects from whoscored.com. All fields typed and schema-versioned.

competition_idseasonrankteamplayedwondrawnlostgoals_forgoals_againstgoal_differencepoints
league_tables
● 200 OK
"competition_id": "252",
"season": "2023/2024",
"rank": 1,
"team": "Arsenal",
"played": 38,
"won": 28,
"drawn": 5,
"lost": 5,
"points": 89
# competition_idseasonrankteamplayedwon
1
2
3

Capabilities

Deep football data extraction

Our WhoScored scraper parses complex JavaScript payloads to extract pitch-level Opta data, proprietary ratings, and historical archives without triggering Cloudflare blocks.

Match Chalkboards

Extract raw Opta event data including x/y pitch coordinates for passes, shots, tackles, and interceptions.

WhoScored Ratings

Capture proprietary match ratings, Man of the Match awards, and algorithmic performance scores per player.

Real-Time Event Streaming

Poll live match centres to extract minute-by-minute commentary and event updates with minimal latency.

Historical Archives

Scrape league tables, fixture results, and aggregate player statistics dating back over a decade.

Player Profiles

Track individual player statistics across multiple competitions, including positional data and characteristic strengths.

Team Formations

Extract starting XIs, tactical formations, substitutions, and average player positions during matches.

Betting Odds

Capture pre-match and historical betting odds aggregated from major bookmakers displayed on the site.

Injury & Suspension Logs

Monitor player availability, expected return dates, and disciplinary records across major leagues.

Multi-League Support

Extract data from the Premier League, La Liga, Serie A, Bundesliga, Champions League, and international tournaments.

// engagement pipeline

From fixture list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide league URLs, team lists, or match IDs. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Playwright crawlers, XHR interception, and proxy rotation for whoscored.com.

Validation & QA
d 4–6

Schema validation, null-rate checks, and coordinate accuracy verification before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our WhoScored pipeline handles the hard parts

Football statistics sites heavily obfuscate their data feeds. Here is how we bypass rendering blocks and extract clean JSON.

pipeline-monitor · whoscored.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
XHR interception
Bypassing the DOM entirely

WhoScored renders match chalkboards via complex JavaScript. Instead of parsing the DOM, we use Playwright to intercept the raw JSON XHR responses containing the structured Opta event data, ensuring zero data loss and higher extraction speed.

Anti-bot layer
Cloudflare bypass with residential IPs

WhoScored employs strict Cloudflare protection. Our infrastructure uses residential proxies with realistic TLS fingerprints and automated CAPTCHA solving to maintain continuous access without triggering rate limits.

Data normalisation
Standardising player and team IDs

We normalise WhoScored internal IDs to standard formats, allowing you to easily join this dataset with other sports data providers or your internal databases.

High-frequency polling
Low-latency live match updates

For live matches, we deploy high-frequency polling workers that diff the event feed every few seconds, pushing only new events via webhook to your downstream applications.

Canvas extraction
Parsing graphical data

Heatmaps and shot maps are often rendered directly to HTML5 canvas elements. We execute custom JavaScript within the browser context to extract the underlying data arrays before they are drawn to the screen.

Applications

Who uses WhoScored data

Teams across industries use whoscored.com data to build competitive products and smarter operations.

01
Sports Betting Models

Quants feed historical match statistics, xG data, and player ratings into predictive models to identify mispriced odds.

02
Fantasy Football Analytics

Platform operators use underlying performance metrics to project player points and optimise draft strategies.

03
Scouting & Recruitment

Professional clubs analyse granular pass completion matrices and defensive actions to identify undervalued talent.

04
Media & Broadcasting

Publishers automate pre-match previews and post-match analysis articles using structured statistical feeds.

05
Tactical Analysis

Analysts use pitch coordinate data to map team formations, pressing intensity, and spatial control metrics.

06
Algorithmic Trading

In-play betting syndicates consume low-latency live event feeds to execute automated trades on betting exchanges.

Why DataFlirt

"WhoScored aggregates the deepest Opta event data available publicly, but extracting pitch-level coordinates requires executing complex JavaScript."

Most teams fail at scraping football statistics because modern sports portals render data via encrypted JSON payloads and canvas elements. DataFlirt executes full browser sessions to intercept XHR requests, extracting raw event data before it hits the DOM.

Technical Spec

WhoScored scraper - technical capabilities

Everything supported by our whoscored.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

XHR interception
Direct capture of backend JSON responses for match events
Supported
Live match polling
Sub-minute refresh rates for active fixtures
Supported
Historical season archives
Full data extraction for past seasons dating back to 2009
Supported
Residential proxy rotation
ISP-grade residential IPs to bypass Cloudflare protection
Supported
Change detection (diffs)
Hash-based diffing for live event streams
Supported
Webhook delivery
HTTP POST per event for real-time applications
Supported
Canvas heatmap extraction
JavaScript evaluation to pull data behind visual heatmaps
Supported
Premium Opta raw feeds
Commercial API feeds not displayed on the public frontend
Partial
User account preferences
Custom user dashboards and saved team preferences
Partial
Infrastructure

Infrastructure powering the WhoScored pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheusKafka
XHR Interception Stack

Playwright intercepts network requests, allowing us to bypass DOM parsing and extract the raw, structured JSON payloads that power WhoScored's dynamic visualisations.

Proxy Routing & Anti-Bot

We route requests through residential proxies with TLS fingerprint spoofing to bypass Cloudflare, ensuring uninterrupted access to match data.

High-Frequency Polling

For live matches, our Redis-backed worker queues poll endpoints at high frequency, computing diffs in memory to emit low-latency event webhooks.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested arrays
CSV
Flat files for tabular statistics
XLS
Excel compatible exports for analysts
Parquet
Columnar format for data warehouses
AWS S3
Direct bucket delivery
Webhook
HTTP POST for live match events
API
REST endpoints to query extracted data
BigQuery
Direct streaming inserts
PostgreSQL
Direct database upserts
Snowflake
Stage and copy workflows
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About whoscored.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping WhoScored legal?

Scraping publicly available statistical data is generally permissible. DataFlirt extracts only public, non-authenticated match and player statistics. We do not bypass authentication walls. Clients should review WhoScored's terms of service and consult legal counsel for specific commercial use cases.

How fast is the live match data?

Our live pipelines can poll active match centres every few seconds. Once an event is registered on the WhoScored frontend, we extract and deliver it via webhook within 500-800 milliseconds.

Can you extract x/y pitch coordinates?

Yes. WhoScored uses Opta data to render chalkboards. We intercept the network requests that populate these visualisations to extract the exact x and y coordinates for passes, shots, and other events.

Do you provide historical data?

Yes. We can configure backfill pipelines to extract match statistics, player ratings, and league tables from previous seasons available on the platform.

How do you handle Cloudflare blocks?

We use high-quality residential ISP proxies combined with Playwright to simulate genuine browser fingerprints. This prevents our crawlers from triggering automated security challenges.

What is the minimum engagement?

Our minimum engagement typically starts with a defined set of leagues (e.g., top 5 European leagues) for a full season, including historical backfill and live weekly updates.

Can you parse player heatmaps?

Yes. Heatmaps are often drawn on HTML5 canvas elements. We execute JavaScript within the browser context to extract the raw intensity values before the image is rendered.

$ dataflirt scope --new-project --source=whoscored.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need historical match archives or low-latency live event streams, we build and operate the infrastructure. Tell us your requirements.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →