SYSTEM all green source ourworldindata.org queue 4,192 charts p99 latency 214ms dataflirt.com · scraper/ourworldindata-org

RUN · 14 active pipelines · ourworldindata.org live

Global indicator data,
at warehouse scale.

We extract time-series datasets, country profiles, Grapher chart data, and source metadata from Our World in Data. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Get data from ourworldindata.org → See how it works

Data points extracted

45.2M /run

Charts parsed

8,419 /run

Entities tracked

243

Active pipelines

Uptime

99.98%

◆ Time-Series Data◆ Country Profiles◆ Grapher Chart Extraction◆ Global Indicators◆ Climate Change Datasets◆ Population Demographics◆ Health & Disease Metrics◆ Energy Consumption Data◆ Economic Growth Series◆ Source & Citation Metadata◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA◆ Time-Series Data◆ Country Profiles◆ Grapher Chart Extraction◆ Global Indicators◆ Climate Change Datasets◆ Population Demographics◆ Health & Disease Metrics◆ Energy Consumption Data◆ Economic Growth Series◆ Source & Citation Metadata◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA

Data Dictionary

Every field we extract from ourworldindata.org

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Chart Data (Grapher) objects from ourworldindata.org. All fields typed and schema-versioned.

chart_idtitlesubtitleindicator_nameentitiesyearsvaluessource_descriptionnotechart_type

"chart_id": "co2-emissions-by-region",
"title": "Annual CO2 emissions",
"indicator_name": "Annual CO2 emissions (zero filled)",
"entities": "['World', 'Asia', 'Europe']",
"years": "[2019, 2020, 2021]",
"values": "[37000000000, 35000000000, 37120000000]",
"chart_type": "StackedArea"

#	chart_id	title	subtitle	indicator_name	entities	years
1
2
3

Complete list of extractable fields for Country Profiles objects from ourworldindata.org. All fields typed and schema-versioned.

country_codecountry_nameregionincome_grouppopulationgdp_per_capitalife_expectancyco2_per_capitalatest_yearprofile_url

"country_code": "IND",
"country_name": "India",
"region": "Asia",
"income_group": "Lower middle income",
"population": 1428627663,
"gdp_per_capita": 7112.0,
"life_expectancy": 67.2,
"latest_year": 2023

#	country_code	country_name	region	income_group	population	gdp_per_capita
1
2
3

Complete list of extractable fields for Dataset Metadata objects from ourworldindata.org. All fields typed and schema-versioned.

dataset_idnamedescriptionprimary_sourcecollection_methodupdate_frequencylicensecitationpublication_dateversion

"dataset_id": "global-energy-substitution",
"name": "Global Primary Energy Consumption",
"description": "Primary energy consumption by source, measured in terawatt-hours.",
"primary_source": "Energy Institute Statistical Review of World Energy",
"license": "CC BY 4.0",
"publication_date": "2023-06-26",
"version": "v1.2"

#	dataset_id	name	description	primary_source	collection_method	update_frequency
1
2
3

Complete list of extractable fields for Time-Series Indicators objects from ourworldindata.org. All fields typed and schema-versioned.

indicator_idindicator_nameentity_nameentity_codeyearvalueunitdata_quality_flagssource_link

"indicator_id": "child_mortality_rate",
"indicator_name": "Child mortality rate",
"entity_name": "Brazil",
"entity_code": "BRA",
"year": 2021,
"value": 1.44,
"unit": "%",
"source_link": "https://ourworldindata.org/child-mortality"

#	indicator_id	indicator_name	entity_name	entity_code	year	value
1
2
3

Complete list of extractable fields for Topic Pages objects from ourworldindata.org. All fields typed and schema-versioned.

topic_slugtitleauthorpublish_daterelated_chartskey_insightscitation_countpdf_urllast_updated

"topic_slug": "poverty",
"title": "Poverty",
"author": "['Max Roser', 'Joe Hasell']",
"publish_date": "2023-11-04",
"related_charts": 42,
"citation_count": 156,
"last_updated": "2024-01-12T08:30:00Z"

#	topic_slug	title	author	publish_date	related_charts	key_insights
1
2
3

Capabilities

Extract global datasets with precision

Our World in Data relies heavily on interactive Grapher components. Our pipeline extracts the underlying time-series data, normalises entity codes, and preserves critical source citations without manual CSV downloads.

Grapher State Extraction

Bypass the interactive UI. We parse the embedded Grapher JSON state to extract raw time-series arrays, entity mappings, and axis configurations.

Entity Normalisation

Map proprietary entity names to standard ISO 3166-1 alpha-3 country codes for immediate joins in your data warehouse.

Metadata & Citations

Extract primary sources, methodology notes, and academic citations linked to every indicator to maintain data provenance.

Time-Series Alignment

Transform nested year-value arrays into flat, queryable columnar formats ideal for SQL analysis.

Change Detection

Monitor dataset update timestamps and only sync modified indicators, saving compute and storage costs.

Topic Page Scraping

Aggregate all charts, insights, and datasets associated with high-level topics like Climate Change or Economic Growth.

Cross-Dataset Mapping

Link indicators across different datasets using unified entity codes and temporal dimensions.

Bulk Export Automation

Automate the extraction of thousands of CSV endpoints systematically, rather than clicking through the interface.

Schema Versioning

Track changes in indicator definitions or unit measurements over time with strict schema validation.

// engagement pipeline

From topic URL to warehouse record

Brief in. Clean data out.

Define Scope

d 0

Provide target topics, specific chart URLs, or indicator names. We design the extraction schema together.

Pipeline Build

d 2–4

We configure Scrapy parsers to handle Grapher JSON state extraction and time-series flattening.

Validation & QA

d 4–6

Schema validation, unit consistency checks, and entity mapping verification before full launch.

Delivery

ongoing

JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our pipeline handles interactive data

Extracting data from scientific publications requires structural parsing rather than simple HTML scraping. Here is how we build reliable pipelines for Our World in Data.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

State extraction

Parsing embedded Grapher JSON

The charts on Our World in Data are powered by a custom visualization tool called Grapher. We do not attempt to scrape the SVG elements. Instead, we extract and parse the complete JSON state object embedded in the page source, retrieving the exact underlying values.

Data modelling

Flattening nested time-series

Grapher stores data in highly nested arrays optimized for frontend rendering. Our pipeline flattens these structures into standard long-format tables (Entity, Year, Value, Indicator), ready for SQL aggregation.

Entity resolution

Standardising geographical dimensions

We map internal entity IDs and historical country names to standard ISO codes. This ensures the extracted data can be joined immediately with your internal geographical datasets.

Provenance tracking

Preserving source metadata

Scientific data requires strict provenance. We extract and link the specific primary source description, methodology notes, and academic citations for every indicator extracted.

Update monitoring

Efficient change detection

We monitor the GitHub repository and internal API endpoints for dataset updates. Pipelines only process and deliver data when the underlying source material has been modified.

Applications

Who uses global indicator data

Teams across industries use ourworldindata.org data to build competitive products and smarter operations.

ESG Reporting & Compliance

Corporate sustainability teams integrate global carbon emissions and energy substitution data into their ESG models.

Academic Research

Universities automate the collection of demographic and health indicators for large-scale epidemiological studies.

Policy Analysis

Think tanks track economic growth and poverty metrics across regions to evaluate the impact of international policy interventions.

Macroeconomic Forecasting

Financial analysts ingest historical population and GDP data to train long-term macroeconomic prediction models.

Climate Tech Modeling

Startups use historical temperature anomalies and renewable energy adoption rates to validate climate risk models.

Global Health Tracking

Healthcare organisations monitor disease prevalence and vaccination rates to allocate resources effectively.

Why DataFlirt

"Our World in Data aggregates the most critical metrics of human progress, but transforming their interactive charts into queryable warehouse tables requires precise pipeline engineering."

Extracting data from OWID requires parsing complex Grapher state objects, normalising uneven time-series arrays, and preserving hierarchical source citations. DataFlirt handles the extraction and schema normalisation so your analysts can query the data immediately without manual wrangling.

Technical Spec

Our World in Data scraper - technical capabilities

Everything supported by our ourworldindata.org scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Grapher JSON extraction

Direct parsing of embedded state objects for accurate values

Supported

ISO country code normalisation

Automatic mapping of entities to standard ISO 3166-1 alpha-3

Supported

Historical time-series

Full historical depth extraction for all available indicators

Supported

Source citation mapping

Extraction of methodology notes and primary source URLs

Supported

Change detection (diffs)

Hash-based diffs to only emit updated datasets

Supported

Webhook delivery

HTTP POST upon dataset update detection

Supported

Chart image generation

Capture of high-resolution SVG or PNG chart exports

Supported

Proprietary third-party microdata

Raw UN or World Bank microdata not exposed in Grapher

Partial

Unpublished draft charts

Access to internal draft visualisations requiring author credentials

Partial

Infrastructure

Infrastructure powering the extraction pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheusdbtSnowflake

Scrapy Extraction Stack

Scrapy handles high-throughput extraction of static HTML and embedded JSON state objects, bypassing the need for heavy browser rendering where possible.

Data Transformation Layer

Python 3.12 workers process nested JSON arrays, applying entity normalisation and flattening time-series data into strict columnar schemas before delivery.

Cloud-Native Orchestration

Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Nested structures preserving metadata and citations

CSV

Flat long-format tables ideal for immediate analysis

XLS

Excel compatible format for business analysts

Parquet

Columnar format for BigQuery, Snowflake, Athena

AWS S3

Direct bucket delivery compatible with any data lake

Webhook

HTTP POST payload triggered on dataset updates

API

REST endpoint to query extracted indicators on demand

BigQuery

Streamed directly into your dataset with schema auto-detect

Snowflake

Stage and COPY INTO workflow for enterprise warehouses

Direct bucket delivery — compatible with any data lake

// faq

Common questions.

About ourworldindata.org scraping, legality, and pipeline operations.

Ask us directly →

Is scraping Our World in Data legal?

Yes. Our World in Data publishes its content under a Creative Commons BY license. All data and visualisations are open access. DataFlirt automates the extraction of this public data while adhering to standard rate limits to respect their infrastructure.

How do you extract data from the interactive charts?

We do not scrape the visual SVG elements. Instead, we parse the underlying Grapher configuration and data arrays embedded directly within the page source, ensuring 100% accuracy of the numerical values.

Can you map the data to standard country codes?

Yes. We maintain a mapping dictionary that translates OWID entity names to standard ISO 3166-1 alpha-3 codes, allowing you to join the extracted data directly with your existing geographical tables.

How often is the data updated?

We can configure pipelines to run daily, weekly, or monthly. We monitor dataset metadata for update timestamps and only process diffs to ensure your warehouse always reflects the latest available statistics.

Do you extract historical data or just the latest year?

We extract the complete time-series available for every indicator. If a dataset spans from 1800 to 2023, every available data point is captured and structured in the final output.

What is the minimum viable engagement?

Our minimum engagement typically covers the extraction of a specific thematic dataset cluster (e.g., all Climate Change or Energy indicators) delivered on a recurring schedule. Contact us for a scoped quote based on volume.

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off extraction of demographic metrics or a continuous feed of climate data updates, we scope, build, and operate the pipeline. Tell us what you need.

Start a ourworldindata.org pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h

Services

Data Extraction for Every Industry

View All Services →

🛍️ eCommerce → 🔍 Search Engine → ⚽ Sports Data → 📱 App Store → 🍕 Food Delivery → 📉 Betting Odds → ✈️ Aviation & Flight → 🛒 Grocery → 🎓 E-Learning → 💹 Stock Market → 🏠 Real Estate → 🤖 AI Training Data → 🧠 LLM Data → 📰 News → ⭐ Reviews → 💼 Job Board → 🏥 Healthcare → 💊 Pharma → 🏢 Company Data → 🤝 B2B Marketplace → 🚗 Automotive → 🌍 Travel → 🏨 Hospitality → 🪙 Cryptocurrency → 💡 IP & Patents → 📈 SEO Data → ⚖️ Legal → 🛡️ Insurance → 📲 Mobile App → 📸 Influencer → 🏛️ Government → 🚚 Transportation → 🎟️ Events → 📂 Directory → ⚡ Dynamic Websites → 📄 PDF Extraction → ✍️ Blog Content → ☁️ Weather → 🖥️ Cloud Scraping → 👨‍💻 Managed Service →

Global indicator data, at warehouse scale.

Every field we extract from ourworldindata.org

Extract global datasets with precision

From topic URL to warehouse record

How our pipeline handles interactive data

Who uses global indicator data

Our World in Data scraper - technical capabilities

Infrastructure powering the extraction pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

Global indicator data,
at warehouse scale.

Tell us what
to extract.
We do the rest.