We extract time-series datasets, country profiles, Grapher chart data, and source metadata from Our World in Data. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Chart Data (Grapher) objects from ourworldindata.org. All fields typed and schema-versioned.
"chart_id": "co2-emissions-by-region", "title": "Annual CO2 emissions", "indicator_name": "Annual CO2 emissions (zero filled)", "entities": "['World', 'Asia', 'Europe']", "years": "[2019, 2020, 2021]", "values": "[37000000000, 35000000000, 37120000000]", "chart_type": "StackedArea"
| # | chart_id | title | subtitle | indicator_name | entities | years |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Country Profiles objects from ourworldindata.org. All fields typed and schema-versioned.
"country_code": "IND", "country_name": "India", "region": "Asia", "income_group": "Lower middle income", "population": 1428627663, "gdp_per_capita": 7112.0, "life_expectancy": 67.2, "latest_year": 2023
| # | country_code | country_name | region | income_group | population | gdp_per_capita |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Dataset Metadata objects from ourworldindata.org. All fields typed and schema-versioned.
"dataset_id": "global-energy-substitution", "name": "Global Primary Energy Consumption", "description": "Primary energy consumption by source, measured in terawatt-hours.", "primary_source": "Energy Institute Statistical Review of World Energy", "license": "CC BY 4.0", "publication_date": "2023-06-26", "version": "v1.2"
| # | dataset_id | name | description | primary_source | collection_method | update_frequency |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Time-Series Indicators objects from ourworldindata.org. All fields typed and schema-versioned.
"indicator_id": "child_mortality_rate", "indicator_name": "Child mortality rate", "entity_name": "Brazil", "entity_code": "BRA", "year": 2021, "value": 1.44, "unit": "%", "source_link": "https://ourworldindata.org/child-mortality"
| # | indicator_id | indicator_name | entity_name | entity_code | year | value |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Topic Pages objects from ourworldindata.org. All fields typed and schema-versioned.
"topic_slug": "poverty", "title": "Poverty", "author": "['Max Roser', 'Joe Hasell']", "publish_date": "2023-11-04", "related_charts": 42, "citation_count": 156, "last_updated": "2024-01-12T08:30:00Z"
| # | topic_slug | title | author | publish_date | related_charts | key_insights |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Our World in Data relies heavily on interactive Grapher components. Our pipeline extracts the underlying time-series data, normalises entity codes, and preserves critical source citations without manual CSV downloads.
Bypass the interactive UI. We parse the embedded Grapher JSON state to extract raw time-series arrays, entity mappings, and axis configurations.
Map proprietary entity names to standard ISO 3166-1 alpha-3 country codes for immediate joins in your data warehouse.
Extract primary sources, methodology notes, and academic citations linked to every indicator to maintain data provenance.
Transform nested year-value arrays into flat, queryable columnar formats ideal for SQL analysis.
Monitor dataset update timestamps and only sync modified indicators, saving compute and storage costs.
Aggregate all charts, insights, and datasets associated with high-level topics like Climate Change or Economic Growth.
Link indicators across different datasets using unified entity codes and temporal dimensions.
Automate the extraction of thousands of CSV endpoints systematically, rather than clicking through the interface.
Track changes in indicator definitions or unit measurements over time with strict schema validation.
Brief in. Clean data out.
Provide target topics, specific chart URLs, or indicator names. We design the extraction schema together.
We configure Scrapy parsers to handle Grapher JSON state extraction and time-series flattening.
Schema validation, unit consistency checks, and entity mapping verification before full launch.
JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.
Extracting data from scientific publications requires structural parsing rather than simple HTML scraping. Here is how we build reliable pipelines for Our World in Data.
The charts on Our World in Data are powered by a custom visualization tool called Grapher. We do not attempt to scrape the SVG elements. Instead, we extract and parse the complete JSON state object embedded in the page source, retrieving the exact underlying values.
Grapher stores data in highly nested arrays optimized for frontend rendering. Our pipeline flattens these structures into standard long-format tables (Entity, Year, Value, Indicator), ready for SQL aggregation.
We map internal entity IDs and historical country names to standard ISO codes. This ensures the extracted data can be joined immediately with your internal geographical datasets.
Scientific data requires strict provenance. We extract and link the specific primary source description, methodology notes, and academic citations for every indicator extracted.
We monitor the GitHub repository and internal API endpoints for dataset updates. Pipelines only process and deliver data when the underlying source material has been modified.
Corporate sustainability teams integrate global carbon emissions and energy substitution data into their ESG models.
Universities automate the collection of demographic and health indicators for large-scale epidemiological studies.
Think tanks track economic growth and poverty metrics across regions to evaluate the impact of international policy interventions.
Financial analysts ingest historical population and GDP data to train long-term macroeconomic prediction models.
Startups use historical temperature anomalies and renewable energy adoption rates to validate climate risk models.
Healthcare organisations monitor disease prevalence and vaccination rates to allocate resources effectively.
"Our World in Data aggregates the most critical metrics of human progress, but transforming their interactive charts into queryable warehouse tables requires precise pipeline engineering."
Extracting data from OWID requires parsing complex Grapher state objects, normalising uneven time-series arrays, and preserving hierarchical source citations. DataFlirt handles the extraction and schema normalisation so your analysts can query the data immediately without manual wrangling.
Everything supported by our ourworldindata.org scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Scrapy handles high-throughput extraction of static HTML and embedded JSON state objects, bypassing the need for heavy browser rendering where possible.
Python 3.12 workers process nested JSON arrays, applying entity normalisation and flattening time-series data into strict columnar schemas before delivery.
Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.
Data delivered to where your team already works — no new tooling required.
About ourworldindata.org scraping, legality, and pipeline operations.
Ask us directly →Yes. Our World in Data publishes its content under a Creative Commons BY license. All data and visualisations are open access. DataFlirt automates the extraction of this public data while adhering to standard rate limits to respect their infrastructure.
We do not scrape the visual SVG elements. Instead, we parse the underlying Grapher configuration and data arrays embedded directly within the page source, ensuring 100% accuracy of the numerical values.
Yes. We maintain a mapping dictionary that translates OWID entity names to standard ISO 3166-1 alpha-3 codes, allowing you to join the extracted data directly with your existing geographical tables.
We can configure pipelines to run daily, weekly, or monthly. We monitor dataset metadata for update timestamps and only process diffs to ensure your warehouse always reflects the latest available statistics.
We extract the complete time-series available for every indicator. If a dataset spans from 1800 to 2023, every available data point is captured and structured in the final output.
Our minimum engagement typically covers the extraction of a specific thematic dataset cluster (e.g., all Climate Change or Energy indicators) delivered on a recurring schedule. Contact us for a scoped quote based on volume.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off extraction of demographic metrics or a continuous feed of climate data updates, we scope, build, and operate the pipeline. Tell us what you need.