SYSTEM all green source oecd.org queue 12,941 datasets p99 latency 312ms dataflirt.com · scraper/oecd-org
RUN · 38 active pipelines · oecd.org live

Global economic data,
at warehouse scale.

We extract economic indicators, policy trackers, statistical databases, and publications from OECD Data Explorer. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Indicators tracked
8,492
Data points parsed
4.7M /day
Publications extracted
112K /run
Active pipelines
38
Uptime
99.98%
Data Dictionary

Every field we extract from oecd.org

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Economic Indicators objects from oecd.org. All fields typed and schema-versioned.

indicator_idindicator_namesubjectmeasurefrequencycountry_codetime_periodvalueunitflagssource_database
economic_indicators
● 200 OK
"indicator_id": "DP_LIVE",
"indicator_name": "Gross domestic product (GDP)",
"subject": "TOT",
"measure": "MLN_USD",
"frequency": "A",
"country_code": "GBR",
"time_period": "2023",
"value": 3421000.5
# indicator_idindicator_namesubjectmeasurefrequencycountry_code
1
2
3

Complete list of extractable fields for Publications & Reports objects from oecd.org. All fields typed and schema-versioned.

publication_idtitleauthorspublication_dateisbndoiabstracttopicslanguagepdf_urlpage_count
publications_& reports
● 200 OK
"publication_id": "9789264111111-en",
"title": "OECD Economic Outlook",
"authors": "['OECD']",
"publication_date": "2023-11-29",
"isbn": "9789264111111",
"doi": "10.1787/12345678-en",
"abstract": "Analysis of major economic trends.",
"language": "English"
# publication_idtitleauthorspublication_dateisbndoi
1
2
3

Complete list of extractable fields for PISA Education Data objects from oecd.org. All fields typed and schema-versioned.

cycle_yearcountryregionmath_scorereading_scorescience_scoreequity_indexstudent_countschool_countmale_scorefemale_score
pisa_education data
● 200 OK
"cycle_year": "2022",
"country": "Japan",
"math_score": 536,
"reading_score": 516,
"science_score": 547,
"equity_index": 0.85,
"student_count": 6500,
"male_score": 540
# cycle_yearcountryregionmath_scorereading_scorescience_score
1
2
3

Complete list of extractable fields for Tax Policy Data objects from oecd.org. All fields typed and schema-versioned.

tax_domaincountryyeartax_typerevenue_valuecurrencypercentage_gdppercentage_total_taxstatutory_rateadministration_leveldataset_url
tax_policy data
● 200 OK
"tax_domain": "Corporate Tax",
"country": "FRA",
"year": "2022",
"tax_type": "Income and Profits",
"revenue_value": 85000.0,
"currency": "EUR",
"percentage_gdp": 2.8,
"statutory_rate": 25.8
# tax_domaincountryyeartax_typerevenue_valuecurrency
1
2
3

Complete list of extractable fields for Environmental Indicators objects from oecd.org. All fields typed and schema-versioned.

indicator_typepollutantcountryyearemission_volumeunitsectortrend_percentagetarget_valueprotocol_statusdata_source
environmental_indicators
● 200 OK
"indicator_type": "Air and Climate",
"pollutant": "CO2",
"country": "DEU",
"year": "2022",
"emission_volume": 650.5,
"unit": "Million tonnes",
"sector": "Energy",
"trend_percentage": -2.4
# indicator_typepollutantcountryyearemission_volumeunit
1
2
3

Capabilities

Everything you need from OECD.org — nothing you don't

Our OECD scraper handles every layer of the platform: statistical databases, policy trackers, and publication metadata. We bypass complex frontend rendering to extract raw SDMX and JSON arrays.

OECD Data Explorer Extraction

Parse the modern OECD Data Explorer interface, extracting multi-dimensional datasets across all available filters and time periods.

Time-Series Normalisation

Convert complex SDMX structures and pivot tables into flat, queryable time-series records suitable for immediate warehouse ingestion.

Publication Metadata Scraping

Extract titles, abstracts, DOIs, ISBNs, and author lists from the OECD iLibrary, including direct links to open-access PDF assets.

Country Profile Aggregation

Compile unified datasets per member and non-member country, tracking GDP, inflation, and unemployment metrics in a single schema.

Tax Database Parsing

Extract historical statutory tax rates, revenue statistics, and corporate tax policy data across all 38 member states.

PISA Dataset Mining

Extract granular education metrics, gender breakdowns, and regional performance scores from the Programme for International Student Assessment.

Environmental Indicator Tracking

Monitor greenhouse gas emissions, renewable energy adoption, and policy stringency indices updated quarterly.

SDMX & JSON API Navigation

Bypass frontend rendering entirely where possible, hitting underlying SDMX endpoints for higher throughput and schema stability.

Scheduled + Streaming Modes

Run one-off bulk exports or configure continuous pipelines at monthly or quarterly cadences aligned with OECD release schedules.

// engagement pipeline

From indicator list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide dataset URLs, indicator codes, or subject areas. We design the extraction schema together.

Pipeline Build
d 2–4

We configure SDMX parsers, API pagination logic, and rate-limit handling for oecd.org.

Validation & QA
d 4–6

Schema validation, unit normalisation checks, and missing data detection before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our OECD pipeline handles the hard parts

OECD data structures are notoriously complex. Here is how we operationalise them.

pipeline-monitor · oecd.org · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Complex data models
SDMX and multi-dimensional cubes

OECD datasets use deep multi-dimensional structures. We flatten these SDMX cubes into relational formats, unrolling dimensions like country, time, and measure into standard database columns.

Dynamic frontend rendering
React-based Data Explorer navigation

The new OECD Data Explorer relies heavily on client-side state. We map the underlying API calls and session tokens to extract raw JSON rather than scraping the DOM.

Pagination limits
Bypassing 1M row export caps

Many OECD interfaces cap CSV exports at 1 million rows. We paginate programmatically through the backend APIs, extracting full historical datasets without truncation.

Schema volatility
Handling indicator deprecation

OECD frequently updates indicator codes and measurement units. Our pipeline detects schema drift, mapping legacy codes to current identifiers and alerting on unit changes.

Rate limiting
Polite but parallel extraction

While public, OECD infrastructure enforces strict rate limits. We distribute requests across our proxy pool and implement exponential backoff to ensure reliable, continuous extraction.

Applications

Who uses OECD data — and how

Teams across industries use oecd.org data to build competitive products and smarter operations.

01
Macroeconomic Forecasting

Quant funds and economists ingest historical GDP, CPI, and employment data to train macro models.

02
Policy Research

Think tanks and academia track tax policy shifts, environmental regulations, and healthcare spending across member states.

03
ESG Scoring

ESG analysts integrate OECD environmental indicators and social metrics into proprietary corporate scoring frameworks.

04
Sovereign Debt Analysis

Fixed income teams monitor fiscal balances, debt-to-GDP ratios, and structural deficit metrics for sovereign bond pricing.

05
Education Sector Strategy

EdTech companies and policymakers analyse PISA scores to identify regional performance gaps and curriculum efficacy.

06
Supply Chain Risk

Procurement teams track trade balances, foreign direct investment, and production indices to assess geopolitical risk.

Why DataFlirt

"The OECD publishes the definitive datasets for global macroeconomics and policy, but their multi-dimensional cubes require heavy engineering to flatten and operationalise."

Extracting data from the OECD Data Explorer involves handling complex SDMX formats, undocumented API pagination, and strict export limits. DataFlirt handles the extraction, normalisation, and delivery, so your quantitative analysts receive clean, flat time-series records ready for immediate modelling.

Technical Spec

OECD scraper — technical capabilities

Everything supported by our oecd.org scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

SDMX API extraction
Direct parsing of OECD's SDMX-JSON and SDMX-ML endpoints
Supported
Data Explorer scraping
Extraction from the modern React-based frontend interface
Supported
Time-series flattening
Conversion of multi-dimensional cubes to flat relational rows
Supported
Historical data extraction
Full extraction of time-series data dating back to 1960s where available
Supported
Publication metadata
Title, author, DOI, and abstract extraction from OECD iLibrary
Supported
PDF document scraping
Direct extraction of text or tables embedded inside OECD PDF reports
Partial
Change detection (diffs)
Hash-based diff: only emit records with changed values since last run
Supported
Premium iLibrary content
Extraction of gated publications requiring institutional subscription
Partial
Format conversion
Delivery in JSON, CSV, Parquet, or direct database insert
Supported
Infrastructure

Infrastructure powering the OECD pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
API & SDMX Parsing Engine

Custom parsers designed specifically for multi-dimensional statistical data. We bypass frontend rendering to query OECD's backend APIs directly for maximum throughput.

Distributed Request Architecture

Pipelines run on Kubernetes clusters with intelligent rate-limit handling. We respect institutional infrastructure while maintaining strict delivery SLAs.

Cloud-Native Orchestration

Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested — schema versioned per run
CSV
Flat file with typed columns — Excel/Sheets compatible
XLS
Standard Excel format for smaller datasets and manual analysis
Parquet
Columnar format for BigQuery, Snowflake, Athena
AWS S3
Direct bucket delivery — compatible with any data lake
Webhook
HTTP POST per record for real-time downstream processing
API
Query extracted data via our managed REST endpoints
Snowflake
Stage + COPY INTO workflow — incremental or full-replace
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About oecd.org scraping, legality, and pipeline operations.

Ask us directly →
Is it legal to scrape OECD data?

Yes. OECD data is generally public domain or available under open licences intended for public use and dissemination. We strictly extract publicly accessible statistical data and publication metadata, adhering to their open data guidelines.

How do you handle the new OECD Data Explorer?

The Data Explorer uses complex client-side rendering. Rather than scraping the DOM, our pipeline reverse-engineers the underlying API requests, extracting the raw SDMX-JSON responses for higher accuracy and stability.

Can you extract data across all member countries simultaneously?

Yes. We configure pipelines to iterate through all standard country codes (both OECD members and tracked non-members), compiling unified time-series datasets.

How do you deal with the 1 million row export limit?

We bypass frontend export limitations entirely by paginating through the backend APIs programmatically, allowing us to extract multi-million row datasets without truncation.

Do you support custom frequency extraction (monthly, quarterly, annual)?

Yes. We extract all available frequencies for a given indicator, normalising the time-period formatting into standard ISO timestamps.

Can you extract the full text of OECD publications?

We extract comprehensive metadata (abstracts, authors, DOIs). We do not extract gated full-text PDFs that require institutional iLibrary subscriptions.

How often is the data refreshed?

We align our extraction cadences with OECD's publication schedule. Pipelines can run daily, weekly, or monthly depending on the specific indicator's update frequency.

$ dataflirt scope --new-project --source=oecd.org ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off historical export or continuous macro indicator feeds — we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →