SYSTEM all green source statmuse.com queue 12,943 queries p99 latency 218ms dataflirt.com · scraper/statmuse-com
RUN . 42 active pipelines . statmuse.com live

Sports statistics,
at warehouse scale.

We extract player profiles, game logs, team standings, and natural language query responses from Statmuse. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Queries resolved
415K /day
Game logs
1.2M /24h
Player profiles
84K /run
Active pipelines
42
Uptime
99.94%
Data Dictionary

Every field we extract from statmuse.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Player Profiles objects from statmuse.com. All fields typed and schema-versioned.

player_idfull_namesport_categorycurrent_teampositionheight_inchesweight_lbsbirth_datedraft_infocareer_summaryavatar_image_url
player_profiles
● 200 OK
"player_id": "lebron-james-123",
"full_name": "LeBron James",
"sport_category": "NBA",
"current_team": "Los Angeles Lakers",
"position": "SF",
"height_inches": 81,
"weight_lbs": 250,
"career_summary": "27.1 PPG, 7.5 RPG, 7.4 APG"
# player_idfull_namesport_categorycurrent_teampositionheight_inches
1
2
3

Complete list of extractable fields for Game Logs objects from statmuse.com. All fields typed and schema-versioned.

game_iddateplayer_idopponentminutes_playedpointsreboundsassistsstealsblocksturnoversfield_goals_madefield_goals_attemptedplus_minusoutcome
game_logs
● 200 OK
"game_id": "nba-2023-11-15-lal-sac",
"date": "2023-11-15",
"points": 28,
"rebounds": 10,
"assists": 11,
"steals": 4,
"turnovers": 3,
"outcome": "L 110-125"
# game_iddateplayer_idopponentminutes_playedpoints
1
2
3

Complete list of extractable fields for Team Standings objects from statmuse.com. All fields typed and schema-versioned.

team_idseasonsportconferencedivisionwinslosseswin_pctgames_behindpoints_per_gamepoints_allowed_per_gamehome_recordaway_recordstreak
team_standings
● 200 OK
"team_id": "bos-celtics",
"season": "2023-24",
"sport": "NBA",
"wins": 64,
"losses": 18,
"win_pct": 0.78,
"streak": "W5",
"points_per_game": 120.6
# team_idseasonsportconferencedivisionwins
1
2
3

Complete list of extractable fields for Query Results objects from statmuse.com. All fields typed and schema-versioned.

query_hashinput_textsportgenerated_answerprimary_stat_valuesecondary_stats_arrayrelated_searchesillustration_urltimestampsource_url
query_results
● 200 OK
"query_hash": "q8f7d6s",
"input_text": "Who has the most passing yards in a single NFL season?",
"sport": "NFL",
"generated_answer": "Peyton Manning has the most passing yards in a single season, with 5,477 yards in 2013.",
"primary_stat_value": "5,477",
"timestamp": "2024-05-12T10:15:00Z"
# query_hashinput_textsportgenerated_answerprimary_stat_valuesecondary_stats_array
1
2
3

Complete list of extractable fields for Betting & Fantasy objects from statmuse.com. All fields typed and schema-versioned.

player_idgame_datesportprojected_fantasy_pointsover_under_lineprop_bet_targetactual_resultvarianceinjury_statussalary_cap_hit
betting_& fantasy
● 200 OK
"player_id": "patrick-mahomes-456",
"game_date": "2024-01-28",
"sport": "NFL",
"projected_fantasy_points": 22.5,
"over_under_line": 285.5,
"injury_status": "Active",
"actual_result": 24.1
# player_idgame_datesportprojected_fantasy_pointsover_under_lineprop_bet_target
1
2
3

Capabilities

Everything you need from Statmuse, nothing you don't

Our Statmuse scraper navigates dynamic layouts, multi-sport schemas, and strict rate limits to deliver structured sports data directly to your warehouse.

Multi-Sport Extraction

Parse NBA, NFL, NHL, MLB, PGA, and Premier League data from a unified schema.

Natural Language Query Scraping

Submit thousands of text queries and extract the generated answers, primary stats, and illustrations.

Historical Game Logs

Extract full box scores and player performance metrics across decades of historical matchups.

Player Profile Syncing

Track career averages, physical attributes, draft history, and active roster status.

Dynamic DOM Parsing

Handle Statmuse's varying page layouts that shift based on whether the query targets a player, team, or historical event.

Rate Limit Evasion

Distribute query volume across ISP proxy pools to bypass strict search rate limits.

Shot Chart & Visual Data

Extract metadata and image URLs for player shot charts and AI-generated avatars.

Team Standings & Records

Monitor division rankings, win/loss records, and streak data on a daily cadence.

Scheduled Pipeline Delivery

Run daily or hourly extractions to keep fantasy models and betting algorithms updated.

// engagement pipeline

From search queries to warehouse records

Brief in. Clean data out.

Define Scope
d 0

Provide query lists, player IDs, or team endpoints. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy crawlers, proxy rotation, session management, and DOM routing for statmuse.com.

Validation & QA
d 4–6

Schema validation, null-rate checks, and sample query responses before full launch.

Delivery
ongoing

JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Statmuse pipeline handles the hard parts

Statmuse relies on dynamic rendering and strict rate limits to protect its database. Here is how we ensure reliable data delivery.

pipeline-monitor · statmuse.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Variable Layouts
Query-dependent DOM structures

Statmuse returns entirely different HTML layouts depending on the search intent. A query for 'LeBron James stats' looks different from 'Lakers 2001 roster'. Our parsers use intent-classification to route HTML to the correct extraction schema.

Rate Limiting
Strict query frequency caps

Statmuse aggressively throttles high-frequency searches. We distribute requests across a rotating pool of residential IPs, injecting randomised delays to mimic human research patterns and prevent IP bans.

Data Completeness
Handling nulls in historical data

Older sports records often lack specific metrics like blocks or turnovers. Our schema validates data types and assigns explicit nulls rather than breaking the pipeline on missing historical fields.

JavaScript Rendering
Client-side hydration

Many interactive elements and tables on Statmuse require JavaScript execution. We use Playwright to render the full page state before extraction, ensuring we capture data that headless HTTP requests miss.

Change Detection
Delta exports for active seasons

Instead of re-scraping entire player histories daily, we hash the latest game logs and only export new rows. This reduces pipeline compute and your ingestion costs.

Applications

Who uses Statmuse data

Teams across industries use statmuse.com data to build competitive products and smarter operations.

01
Fantasy Sports Projections

Feed historical logs into ML models to predict player performance and optimise daily fantasy lineups.

02
Sports Betting Algorithms

Correlate player trends, team streaks, and historical matchups to identify mispriced prop bets and lines.

03
Sports Media & Journalism

Automate statistical research for articles, broadcast graphics, and social media content generation.

04
Fan Engagement Apps

Integrate verified sports trivia, historical records, and player stats into interactive mobile applications.

05
Predictive Analytics

Analyse decades of draft data and career trajectories to build scouting and player development models.

06
AI Model Training

Use the natural language query and response pairs to fine-tune sports-specific large language models.

Why DataFlirt

"Statmuse aggregates decades of sports history into natural language answers, but extracting that data systematically requires navigating dynamic layouts and strict rate limits."

Most teams underestimate the investment required: reliable Statmuse scraping requires residential proxies, full JavaScript rendering, CAPTCHA handling, daily selector maintenance, and anomaly monitoring. DataFlirt absorbs that complexity so your engineers can focus on the analysis, not the infrastructure.

Technical Spec

Statmuse scraper technical capabilities

Everything supported by our statmuse.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Playwright sessions for dynamic tables and client-side rendering
Supported
CAPTCHA bypass
Automated 2Captcha and CapSolver integration
Supported
Multi-sport support
NBA, NFL, NHL, MLB, PGA, and Premier League data
Supported
Natural language query submission
Automated search bar interactions and response parsing
Supported
Historical game logs
Full pagination across historical seasons and playoffs
Supported
Avatar image extraction
URL capture for AI-generated player illustrations
Supported
Proxy rotation
Residential IPs for rate limit evasion
Supported
Webhook delivery
HTTP POST for real-time query results
Supported
Statmuse+ Premium Data
Gated advanced metrics and unlimited historical lookbacks require paid authentication
Partial
Live in-game tick data
Sub-second real-time play-by-play updates are not provided by Statmuse
Partial
Infrastructure

Infrastructure powering the Statmuse pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright handles JavaScript rendering, cookie sessions, and interaction flows. Combined via scrapy-playwright middleware.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies. Rotation happens per-request with sticky sessions where required. IP score monitoring prevents blacklisted pool contamination.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested structures
CSV
Flat file with typed columns
XLS
Excel compatible format for analyst teams
Parquet
Columnar format for BigQuery and Snowflake
AWS S3
Direct bucket delivery
Webhook
HTTP POST per record for real-time processing
API
REST endpoint for on-demand queries
BigQuery
Streamed directly into your dataset
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About statmuse.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Statmuse legal?

Scraping publicly available sports statistics is generally permissible under applicable law. DataFlirt targets only public, non-authenticated game logs, player profiles, and query results. We do not circumvent paywalls for Statmuse+ premium data.

How do you handle search rate limits?

We use residential ISP proxies, full Playwright browser sessions with realistic fingerprints, and request timing modelled on human behaviour to avoid rate limit triggers and IP bans.

Which sports are supported?

We support NBA, NFL, NHL, MLB, PGA, and Premier League data, mapping the unique metrics of each sport into a normalised schema.

Can you extract the AI-generated illustrations?

Yes, we capture the CDN URLs for the custom player avatars and shot charts generated by Statmuse queries.

How fast can I get query results?

Batch processing for large query sets typically completes within 2 to 4 hours. Historical game log pipelines can run daily or hourly depending on your requirements.

Do you support Statmuse+ features?

No, we only extract publicly available data. We do not extract data gated behind the Statmuse+ subscription wall.

What is the minimum engagement?

Our minimum engagement starts at defined query lists or player sets. Contact us with your specific use case for a scoped quote.

$ dataflirt scope --new-project --source=statmuse.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need historical game logs or continuous query extraction for fantasy modeling, we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →