SYSTEM all green source numbeo.com queue 14,892 cities p99 latency 312ms dataflirt.com · scraper/numbeo-com
RUN · 31 active pipelines · numbeo.com live

Global city data,
structured for analysis.

We extract cost of living indices, rent prices, crime statistics, and quality of life metrics across thousands of cities on Numbeo. Delivered as clean JSON, CSV, or Parquet to your warehouse.

Cities tracked
11,482
Price items
3.2M /run
Index updates
84K /week
Active pipelines
31
Uptime
99.98%
Data Dictionary

Every field we extract from numbeo.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Cost of Living objects from numbeo.com. All fields typed and schema-versioned.

citycountrycurrencymeal_inexpensivemeal_midrangemilk_1lbread_500geggs_12chicken_1kgapples_1kglocal_cheese_1kgwater_1_5ldata_contributorslast_update_date
cost_of living
● 200 OK
"city": "London",
"country": "United Kingdom",
"currency": "GBP",
"meal_inexpensive": 20.0,
"milk_1l": 1.25,
"eggs_12": 3.4,
"data_contributors": 1428,
"last_update_date": "2026-05-10"
# citycountrycurrencymeal_inexpensivemeal_midrangemilk_1l
1
2
3

Complete list of extractable fields for Rent & Utilities objects from numbeo.com. All fields typed and schema-versioned.

citycountryrent_1bed_centrerent_1bed_outsiderent_3bed_centrerent_3bed_outsidebasic_utilities_85m2mobile_planinternet_60mbpscurrencyprice_range_minprice_range_max
rent_& utilities
● 200 OK
"city": "Berlin",
"country": "Germany",
"rent_1bed_centre": 1250.0,
"rent_1bed_outside": 900.0,
"basic_utilities_85m2": 285.4,
"internet_60mbps": 42.5,
"currency": "EUR",
"price_range_min": 1000.0
# citycountryrent_1bed_centrerent_1bed_outsiderent_3bed_centrerent_3bed_outside
1
2
3

Complete list of extractable fields for Property Prices objects from numbeo.com. All fields typed and schema-versioned.

citycountryprice_sqm_centreprice_sqm_outsideaverage_monthly_salarymortgage_interest_rateprice_to_income_ratiogross_rental_yield_centregross_rental_yield_outsideaffordability_indexcurrency
property_prices
● 200 OK
"city": "Singapore",
"country": "Singapore",
"price_sqm_centre": 28500.0,
"price_sqm_outside": 14200.0,
"average_monthly_salary": 6100.0,
"mortgage_interest_rate": 4.2,
"price_to_income_ratio": 18.5,
"currency": "SGD"
# citycountryprice_sqm_centreprice_sqm_outsideaverage_monthly_salarymortgage_interest_rate
1
2
3

Complete list of extractable fields for Crime & Safety objects from numbeo.com. All fields typed and schema-versioned.

citycountrycrime_indexsafety_indexlevel_of_crimecrime_increasingsafe_walking_daysafe_walking_nightworried_muggedworried_car_stolenviolent_crime_worrycorruption_bribery
crime_& safety
● 200 OK
"city": "Tokyo",
"country": "Japan",
"crime_index": 24.3,
"safety_index": 75.7,
"level_of_crime": "Low",
"safe_walking_night": "Very High",
"worried_mugged": "Very Low",
"corruption_bribery": "Low"
# citycountrycrime_indexsafety_indexlevel_of_crimecrime_increasing
1
2
3

Complete list of extractable fields for Quality of Life objects from numbeo.com. All fields typed and schema-versioned.

citycountryqol_indexpurchasing_power_indexhealthcare_indexclimate_indexcost_of_living_indexproperty_price_to_income_ratiotraffic_commute_time_indexpollution_indexgreen_and_parks_quality
quality_of life
● 200 OK
"city": "Zurich",
"country": "Switzerland",
"qol_index": 198.4,
"purchasing_power_index": 118.2,
"healthcare_index": 74.3,
"climate_index": 81.2,
"cost_of_living_index": 128.5,
"pollution_index": 18.9
# citycountryqol_indexpurchasing_power_indexhealthcare_indexclimate_index
1
2
3

Capabilities

Extract every metric from the world's largest cost of living database

Our Numbeo scraper parses complex HTML tables, normalises crowd-sourced data, handles currency conversions, and tracks historical index shifts without triggering rate limits.

Global City Coverage

Extract data across 11,000+ cities globally. We maintain a master index of valid city URLs to ensure comprehensive coverage without missing secondary municipalities.

Currency Normalisation

Numbeo defaults to local currencies. We capture the base local currency and can apply real-time exchange rates to normalise datasets into USD, EUR, or GBP.

Historical Data Tracking

Extract historical index data from archive pages to build time-series models for inflation, rent increases, and purchasing power degradation.

Itemised Price Extraction

Capture the exact price ranges (min, max, mean) for 50+ individual items per city, from a litre of milk to monthly fitness club fees.

Healthcare & Pollution Metrics

Scrape qualitative indices for healthcare system satisfaction, air quality, water pollution, and green space accessibility.

Traffic & Commute Data

Extract commute time indices, CO2 emission estimates, and traffic inefficiency scores for urban mobility analysis.

Data Validity Indicators

Capture the number of contributors and the last update timestamp for every city metric to filter out low-confidence, stale data points.

Index Calculation Parameters

Extract the baseline reference indices (e.g., New York = 100) and the underlying formulas used to generate the aggregate scores.

Scheduled Updates

Run pipelines monthly or quarterly to capture fresh crowd-sourced submissions and track macroeconomic shifts over time.

// engagement pipeline

From city list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide a list of target cities, countries, or regions. We configure the extraction schema for the specific indices required.

Pipeline Build
d 2–4

We deploy Scrapy crawlers with proxy rotation to navigate Numbeo's geographic hierarchies and parse unstructured HTML tables.

Validation & QA
d 4–6

We run schema validation, check for null rates in low-contribution cities, and verify currency normalisation accuracy.

Delivery
ongoing

Clean JSON, CSV, or Parquet files pushed to your S3 bucket, BigQuery dataset, or API webhook on your defined schedule.

Under the hood

Handling Numbeo's extraction challenges

Extracting data from Numbeo requires parsing heavily nested tables, managing crowd-sourced data inconsistencies, and respecting rate limits. Here is how we build resilience.

pipeline-monitor · numbeo.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Table parsing
Resilient HTML table extraction

Numbeo presents data in complex, variable-length HTML tables. Our parsers map table rows to structured schema fields dynamically, ensuring that missing items in a specific city do not misalign the entire dataset.

Data filtering
Filtering low-confidence data

Because Numbeo is crowd-sourced, smaller cities often have statistically insignificant data. We capture the 'contributors' count and can filter out records below your defined confidence threshold.

Currency management
Handling local currency display

Prices are displayed in local currency by default. We extract the raw local value, the currency code, and apply consistent FX conversion logic to provide unified pricing across global datasets.

Rate limiting
Distributed crawl execution

Numbeo employs rate limiting for aggressive scraping. We distribute requests across rotating proxy pools and introduce randomised delays to maintain continuous extraction without IP bans.

Schema drift
Monitoring metric additions

Numbeo occasionally adds new items to their cost of living basket. Our pipeline detects unexpected table rows and alerts our engineers to map new variables into the schema.

Applications

Who uses Numbeo data — and how

Teams across industries use numbeo.com data to build competitive products and smarter operations.

01
Remote Work Compensation

HR platforms and remote-first companies use cost of living indices to calculate geographic pay bands and localise salaries.

02
Relocation & Mobility Services

Global mobility firms build cost-comparison calculators for expats moving between major financial centres.

03
Macroeconomic Research

Economists and hedge funds track real-time crowd-sourced inflation indicators ahead of official government CPI releases.

04
Real Estate Investment

Property funds analyse price-to-income ratios and gross rental yields across secondary cities to identify undervalued markets.

05
Travel & Tourism Planning

Travel aggregators display local restaurant and transport costs to help users budget for international trips.

06
Supply Chain Logistics

Logistics companies evaluate traffic inefficiency and infrastructure quality indices when planning regional distribution hubs.

Why DataFlirt

"Numbeo holds the most granular, hyper-local cost of living data available globally, but extracting it consistently requires handling thousands of unstructured HTML tables and crowd-sourced anomalies."

Building a reliable pipeline for Numbeo means normalising fragmented crowd-sourced data, standardising local currencies to base rates, and handling constant DOM shifts. DataFlirt manages the extraction layer so your data science team can focus on econometric modelling rather than writing table parsers.

Technical Spec

Numbeo scraper — technical capabilities

Everything supported by our numbeo.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

City-level extraction
Extract data for specific cities using exact URL paths
Supported
Currency normalisation
Capture local currency and standardise to USD/EUR
Supported
Historical data parsing
Extract previous year indices from archive tables
Supported
Itemised price extraction
Capture min, max, and mean for all 50+ individual basket items
Supported
Contributor tracking
Extract the number of user submissions per metric for confidence scoring
Supported
Index calculation parameters
Extract baseline reference points used for aggregate scores
Supported
Premium API endpoint data
Direct access to Numbeo's paid enterprise API fields
Partial
Raw user submission logs
Individual, unaggregated user data entries and timestamps
Partial
Infrastructure

Infrastructure powering the Numbeo pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Distributed Crawling

Scrapy orchestrates high-throughput extraction across Numbeo's geographic directory structure, handling retries and proxy rotation automatically.

Data Normalisation Layer

Custom Python middleware cleans crowd-sourced text inputs, strips currency symbols, and casts price ranges into typed numeric fields.

Warehouse Delivery

Airflow schedules monthly extraction runs, validates data completeness against historical baselines, and pushes Parquet files directly to S3 or BigQuery.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Nested arrays for complex city metrics
CSV
Flat tabular data for immediate analyst use
XLS
Excel format for business teams
Parquet
Columnar format optimised for BigQuery and Athena
AWS S3
Direct delivery to your cloud storage buckets
Webhook
HTTP POST delivery upon pipeline completion
API
Queryable REST endpoints for fetched data
PostgreSQL
Direct database insertion with upsert logic
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About numbeo.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Numbeo legal?

Scraping publicly available, factual data (like prices and indices) is generally permissible. DataFlirt extracts only public aggregated statistics. We do not bypass authentication or extract personal data. Clients should review Numbeo's Terms of Service regarding commercial use of their aggregated data.

How do you handle missing data for smaller cities?

Numbeo relies on user contributions, so smaller cities often lack complete data. Our pipeline emits null values for missing fields rather than breaking, and we extract the 'contributors' count so you can filter out statistically insignificant records.

Can you convert all prices to a single currency?

Yes. While Numbeo displays local currency, we extract the base value and currency code. We can apply standard exchange rates during the pipeline run to normalise all outputs to USD, EUR, or any target currency.

How often should I scrape Numbeo?

Because the data is crowd-sourced and aggregated over time, daily scraping yields minimal changes. We recommend monthly or quarterly pipeline runs to capture meaningful shifts in cost of living indices and property prices.

Do you extract historical data?

Yes. We can target Numbeo's historical archive pages to extract past indices, allowing you to build time-series models comparing current costs to previous years.

Can I request a sample of the data?

Yes. We provide a sample extraction of up to 50 cities during the scoping phase so you can validate the schema, null rates, and currency formatting before committing to a full pipeline.

$ dataflirt scope --new-project --source=numbeo.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off export of global property yields or a quarterly feed of cost of living indices across 10,000 cities — we build and operate the pipeline.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →