SYSTEM all green source ourworldindata.org queue 4,192 charts p99 latency 214ms dataflirt.com · scraper/ourworldindata-org
RUN · 14 active pipelines · ourworldindata.org live

Global indicator data,
at warehouse scale.

We extract time-series datasets, country profiles, Grapher chart data, and source metadata from Our World in Data. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Data points extracted
45.2M /run
Charts parsed
8,419 /run
Entities tracked
243
Active pipelines
14
Uptime
99.98%
Data Dictionary

Every field we extract from ourworldindata.org

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Chart Data (Grapher) objects from ourworldindata.org. All fields typed and schema-versioned.

chart_idtitlesubtitleindicator_nameentitiesyearsvaluessource_descriptionnotechart_type
chart_data (grapher)
● 200 OK
"chart_id": "co2-emissions-by-region",
"title": "Annual CO2 emissions",
"indicator_name": "Annual CO2 emissions (zero filled)",
"entities": "['World', 'Asia', 'Europe']",
"years": "[2019, 2020, 2021]",
"values": "[37000000000, 35000000000, 37120000000]",
"chart_type": "StackedArea"
# chart_idtitlesubtitleindicator_nameentitiesyears
1
2
3

Complete list of extractable fields for Country Profiles objects from ourworldindata.org. All fields typed and schema-versioned.

country_codecountry_nameregionincome_grouppopulationgdp_per_capitalife_expectancyco2_per_capitalatest_yearprofile_url
country_profiles
● 200 OK
"country_code": "IND",
"country_name": "India",
"region": "Asia",
"income_group": "Lower middle income",
"population": 1428627663,
"gdp_per_capita": 7112.0,
"life_expectancy": 67.2,
"latest_year": 2023
# country_codecountry_nameregionincome_grouppopulationgdp_per_capita
1
2
3

Complete list of extractable fields for Dataset Metadata objects from ourworldindata.org. All fields typed and schema-versioned.

dataset_idnamedescriptionprimary_sourcecollection_methodupdate_frequencylicensecitationpublication_dateversion
dataset_metadata
● 200 OK
"dataset_id": "global-energy-substitution",
"name": "Global Primary Energy Consumption",
"description": "Primary energy consumption by source, measured in terawatt-hours.",
"primary_source": "Energy Institute Statistical Review of World Energy",
"license": "CC BY 4.0",
"publication_date": "2023-06-26",
"version": "v1.2"
# dataset_idnamedescriptionprimary_sourcecollection_methodupdate_frequency
1
2
3

Complete list of extractable fields for Time-Series Indicators objects from ourworldindata.org. All fields typed and schema-versioned.

indicator_idindicator_nameentity_nameentity_codeyearvalueunitdata_quality_flagssource_link
time-series_indicators
● 200 OK
"indicator_id": "child_mortality_rate",
"indicator_name": "Child mortality rate",
"entity_name": "Brazil",
"entity_code": "BRA",
"year": 2021,
"value": 1.44,
"unit": "%",
"source_link": "https://ourworldindata.org/child-mortality"
# indicator_idindicator_nameentity_nameentity_codeyearvalue
1
2
3

Complete list of extractable fields for Topic Pages objects from ourworldindata.org. All fields typed and schema-versioned.

topic_slugtitleauthorpublish_daterelated_chartskey_insightscitation_countpdf_urllast_updated
topic_pages
● 200 OK
"topic_slug": "poverty",
"title": "Poverty",
"author": "['Max Roser', 'Joe Hasell']",
"publish_date": "2023-11-04",
"related_charts": 42,
"citation_count": 156,
"last_updated": "2024-01-12T08:30:00Z"
# topic_slugtitleauthorpublish_daterelated_chartskey_insights
1
2
3

Capabilities

Extract global datasets with precision

Our World in Data relies heavily on interactive Grapher components. Our pipeline extracts the underlying time-series data, normalises entity codes, and preserves critical source citations without manual CSV downloads.

Grapher State Extraction

Bypass the interactive UI. We parse the embedded Grapher JSON state to extract raw time-series arrays, entity mappings, and axis configurations.

Entity Normalisation

Map proprietary entity names to standard ISO 3166-1 alpha-3 country codes for immediate joins in your data warehouse.

Metadata & Citations

Extract primary sources, methodology notes, and academic citations linked to every indicator to maintain data provenance.

Time-Series Alignment

Transform nested year-value arrays into flat, queryable columnar formats ideal for SQL analysis.

Change Detection

Monitor dataset update timestamps and only sync modified indicators, saving compute and storage costs.

Topic Page Scraping

Aggregate all charts, insights, and datasets associated with high-level topics like Climate Change or Economic Growth.

Cross-Dataset Mapping

Link indicators across different datasets using unified entity codes and temporal dimensions.

Bulk Export Automation

Automate the extraction of thousands of CSV endpoints systematically, rather than clicking through the interface.

Schema Versioning

Track changes in indicator definitions or unit measurements over time with strict schema validation.

// engagement pipeline

From topic URL to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide target topics, specific chart URLs, or indicator names. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy parsers to handle Grapher JSON state extraction and time-series flattening.

Validation & QA
d 4–6

Schema validation, unit consistency checks, and entity mapping verification before full launch.

Delivery
ongoing

JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our pipeline handles interactive data

Extracting data from scientific publications requires structural parsing rather than simple HTML scraping. Here is how we build reliable pipelines for Our World in Data.

pipeline-monitor · ourworldindata.org · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
State extraction
Parsing embedded Grapher JSON

The charts on Our World in Data are powered by a custom visualization tool called Grapher. We do not attempt to scrape the SVG elements. Instead, we extract and parse the complete JSON state object embedded in the page source, retrieving the exact underlying values.

Data modelling
Flattening nested time-series

Grapher stores data in highly nested arrays optimized for frontend rendering. Our pipeline flattens these structures into standard long-format tables (Entity, Year, Value, Indicator), ready for SQL aggregation.

Entity resolution
Standardising geographical dimensions

We map internal entity IDs and historical country names to standard ISO codes. This ensures the extracted data can be joined immediately with your internal geographical datasets.

Provenance tracking
Preserving source metadata

Scientific data requires strict provenance. We extract and link the specific primary source description, methodology notes, and academic citations for every indicator extracted.

Update monitoring
Efficient change detection

We monitor the GitHub repository and internal API endpoints for dataset updates. Pipelines only process and deliver data when the underlying source material has been modified.

Applications

Who uses global indicator data

Teams across industries use ourworldindata.org data to build competitive products and smarter operations.

01
ESG Reporting & Compliance

Corporate sustainability teams integrate global carbon emissions and energy substitution data into their ESG models.

02
Academic Research

Universities automate the collection of demographic and health indicators for large-scale epidemiological studies.

03
Policy Analysis

Think tanks track economic growth and poverty metrics across regions to evaluate the impact of international policy interventions.

04
Macroeconomic Forecasting

Financial analysts ingest historical population and GDP data to train long-term macroeconomic prediction models.

05
Climate Tech Modeling

Startups use historical temperature anomalies and renewable energy adoption rates to validate climate risk models.

06
Global Health Tracking

Healthcare organisations monitor disease prevalence and vaccination rates to allocate resources effectively.

Why DataFlirt

"Our World in Data aggregates the most critical metrics of human progress, but transforming their interactive charts into queryable warehouse tables requires precise pipeline engineering."

Extracting data from OWID requires parsing complex Grapher state objects, normalising uneven time-series arrays, and preserving hierarchical source citations. DataFlirt handles the extraction and schema normalisation so your analysts can query the data immediately without manual wrangling.

Technical Spec

Our World in Data scraper - technical capabilities

Everything supported by our ourworldindata.org scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Grapher JSON extraction
Direct parsing of embedded state objects for accurate values
Supported
ISO country code normalisation
Automatic mapping of entities to standard ISO 3166-1 alpha-3
Supported
Historical time-series
Full historical depth extraction for all available indicators
Supported
Source citation mapping
Extraction of methodology notes and primary source URLs
Supported
Change detection (diffs)
Hash-based diffs to only emit updated datasets
Supported
Webhook delivery
HTTP POST upon dataset update detection
Supported
Chart image generation
Capture of high-resolution SVG or PNG chart exports
Supported
Proprietary third-party microdata
Raw UN or World Bank microdata not exposed in Grapher
Partial
Unpublished draft charts
Access to internal draft visualisations requiring author credentials
Partial
Infrastructure

Infrastructure powering the extraction pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheusdbtSnowflake
Scrapy Extraction Stack

Scrapy handles high-throughput extraction of static HTML and embedded JSON state objects, bypassing the need for heavy browser rendering where possible.

Data Transformation Layer

Python 3.12 workers process nested JSON arrays, applying entity normalisation and flattening time-series data into strict columnar schemas before delivery.

Cloud-Native Orchestration

Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Nested structures preserving metadata and citations
CSV
Flat long-format tables ideal for immediate analysis
XLS
Excel compatible format for business analysts
Parquet
Columnar format for BigQuery, Snowflake, Athena
AWS S3
Direct bucket delivery compatible with any data lake
Webhook
HTTP POST payload triggered on dataset updates
API
REST endpoint to query extracted indicators on demand
BigQuery
Streamed directly into your dataset with schema auto-detect
Snowflake
Stage and COPY INTO workflow for enterprise warehouses
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About ourworldindata.org scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Our World in Data legal?

Yes. Our World in Data publishes its content under a Creative Commons BY license. All data and visualisations are open access. DataFlirt automates the extraction of this public data while adhering to standard rate limits to respect their infrastructure.

How do you extract data from the interactive charts?

We do not scrape the visual SVG elements. Instead, we parse the underlying Grapher configuration and data arrays embedded directly within the page source, ensuring 100% accuracy of the numerical values.

Can you map the data to standard country codes?

Yes. We maintain a mapping dictionary that translates OWID entity names to standard ISO 3166-1 alpha-3 codes, allowing you to join the extracted data directly with your existing geographical tables.

How often is the data updated?

We can configure pipelines to run daily, weekly, or monthly. We monitor dataset metadata for update timestamps and only process diffs to ensure your warehouse always reflects the latest available statistics.

Do you extract historical data or just the latest year?

We extract the complete time-series available for every indicator. If a dataset spans from 1800 to 2023, every available data point is captured and structured in the final output.

What is the minimum viable engagement?

Our minimum engagement typically covers the extraction of a specific thematic dataset cluster (e.g., all Climate Change or Energy indicators) delivered on a recurring schedule. Contact us for a scoped quote based on volume.

$ dataflirt scope --new-project --source=ourworldindata.org ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off extraction of demographic metrics or a continuous feed of climate data updates, we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →