SYSTEM all green source onthemarket.com queue 18,942 pages p99 latency 187ms dataflirt.com · scraper/onthemarket-com
RUN · 82 active pipelines · onthemarket.com live

UK property data,
at warehouse scale.

We extract sales and lettings listings, floorplans, agent directories, and 'Only With Us' early properties from OnTheMarket. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Properties extracted
842K /day
Price updates
114K /24h
Agent branches
18.3K /run
Active pipelines
82
Uptime
99.98%
Data Dictionary

Every field we extract from onthemarket.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Property Listings objects from onthemarket.com. All fields typed and schema-versioned.

property_idtitlepriceprice_qualifierproperty_typebedroomsbathroomsdescriptionagent_nameagent_idepc_ratingtenurelisted_dateonly_with_usurl
property_listings
● 200 OK
"property_id": "13482910",
"title": "3 bedroom semi-detached house for sale",
"price": 425000,
"price_qualifier": "Offers in excess of",
"bedrooms": 3,
"epc_rating": "C",
"tenure": "Freehold",
"only_with_us": true
# property_idtitlepriceprice_qualifierproperty_typebedrooms
1
2
3

Complete list of extractable fields for Agent Directory objects from onthemarket.com. All fields typed and schema-versioned.

agent_idbranch_namecompany_nameaddresspostcodephone_numberwebsite_urlproperties_for_saleproperties_to_rentbranch_logo_url
agent_directory
● 200 OK
"agent_id": "AG-7482",
"branch_name": "Dexters London Bridge",
"company_name": "Dexters",
"postcode": "SE1 9SG",
"phone_number": "020 7483 9281",
"properties_for_sale": 142,
"properties_to_rent": 89
# agent_idbranch_namecompany_nameaddresspostcodephone_number
1
2
3

Complete list of extractable fields for Pricing & History objects from onthemarket.com. All fields typed and schema-versioned.

property_idcurrent_priceoriginal_priceprice_reduced_dateprice_reduction_pcthistorical_sold_priceslast_sale_datelast_sale_pricevaluation_estimate
pricing_& history
● 200 OK
"property_id": "13482910",
"current_price": 425000,
"original_price": 450000,
"price_reduced_date": "2023-11-14",
"price_reduction_pct": 5.5,
"last_sale_date": "2018-06-22",
"last_sale_price": 385000
# property_idcurrent_priceoriginal_priceprice_reduced_dateprice_reduction_pcthistorical_sold_prices
1
2
3

Complete list of extractable fields for Features & Media objects from onthemarket.com. All fields typed and schema-versioned.

property_idfloorplan_urlvirtual_tour_urlepc_certificate_urlgardenparkingbroadband_speed_mbpscouncil_tax_bandnearest_station_1nearest_station_distance
features_& media
● 200 OK
"property_id": "13482910",
"floorplan_url": "https://media.onthemarket.com/floorplans/13482910.pdf",
"garden": true,
"parking": "Off-street",
"broadband_speed_mbps": 1000,
"council_tax_band": "D",
"nearest_station_1": "London Bridge",
"nearest_station_distance": "0.4 miles"
# property_idfloorplan_urlvirtual_tour_urlepc_certificate_urlgardenparking
1
2
3

Complete list of extractable fields for New Developments objects from onthemarket.com. All fields typed and schema-versioned.

development_iddeveloper_namesite_namelocationunits_availablecompletion_datestarting_pricemax_pricebrochure_urlshow_home_status
new_developments
● 200 OK
"development_id": "DEV-9921",
"developer_name": "Barratt Homes",
"site_name": "Riverside Quarter",
"units_available": 14,
"starting_price": 350000,
"max_price": 850000,
"show_home_status": "Open daily"
# development_iddeveloper_namesite_namelocationunits_availablecompletion_date
1
2
3

Capabilities

Extract the complete UK property matrix

Our OnTheMarket scraper captures every layer of the portal: residential sales, lettings, agent directories, and exclusive early listings — bypassing bot protection and pagination limits automatically.

Sales & Lettings Extraction

Extract full property metadata including price, bedrooms, bathrooms, tenure, description, and agent details across all UK regions.

'Only With Us' Tracking

Identify and track properties listed exclusively on OnTheMarket 24 hours before they syndicate to Rightmove or Zoopla.

Agent & Branch Intelligence

Scrape the full agent directory to monitor market share, branch locations, and total stock volume per agency.

Floorplan & Media Scraping

Capture URLs for high-resolution images, PDF floorplans, EPC certificates, and virtual tour links.

Location & Transit Data

Extract precise coordinates, nearest railway stations, distance metrics, and local broadband speed estimates.

Price Reduction Monitoring

Track original listing prices against current prices, capturing reduction dates and percentage drops.

New Build Developments

Monitor new residential sites, tracking developer names, phase completions, and unit pricing bands.

Historical Sold Prices

Extract Land Registry sold price history associated with specific postcodes and property records.

Change Detection

Run continuous pipelines that output only new listings, removed listings, or price adjustments to minimise processing overhead.

// engagement pipeline

From postcode list to data warehouse

Brief in. Clean data out.

Define Scope
d 0

Provide target regions, postcodes, or agent IDs. We configure the extraction schema and frequency.

Pipeline Build
d 2–4

We deploy Scrapy crawlers with UK residential proxies and automated CAPTCHA solvers to bypass portal defences.

Validation & QA
d 4–6

Schema validation, coordinate normalisation, and null-rate checks run before production deployment.

Delivery
ongoing

Clean JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on schedule.

Under the hood

Overcoming property portal scraping constraints

UK property portals employ aggressive bot mitigation and pagination limits. Here is how our infrastructure maintains continuous extraction.

pipeline-monitor · onthemarket.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Bot mitigation
Cloudflare and Turnstile bypass

OnTheMarket uses advanced TLS fingerprinting and challenge pages. We utilise UK-based residential proxies and Playwright sessions with spoofed hardware concurrency and canvas fingerprints to maintain high success rates.

Pagination limits
Bypassing the 42-page cap

Search results are capped at 42 pages (approx 1,000 results). To extract entire regions, our pipeline dynamically splits large search areas into smaller geographic polygons, ensuring zero missed properties.

Dynamic rendering
Extracting map-gated coordinates

Precise latitude and longitude coordinates are often obfuscated or require map interaction. We execute the required JavaScript payloads to extract exact location data for spatial analysis.

Schema stability
Resilient DOM selectors

Property detail pages frequently undergo A/B testing. We employ fallback selector chains targeting embedded JSON-LD and Next.js state objects rather than relying solely on brittle CSS classes.

Incremental updates
Efficient diff tracking

For daily market monitoring, we maintain state across runs. The pipeline only outputs new instructions, price changes, or properties marked as sold/let, drastically reducing data ingestion costs.

Applications

Who uses OnTheMarket data — and how

Teams across industries use onthemarket.com data to build competitive products and smarter operations.

01
PropTech Valuations

AVM (Automated Valuation Model) providers ingest pricing, floor area, and feature data to train property valuation algorithms.

02
Investment Analysis

Institutional landlords track asking rents against capital values to calculate gross yields across different UK postcodes.

03
Agent Competitor Tracking

Estate agencies monitor local competitors to calculate market share, instruction velocity, and price reduction frequencies.

04
Energy Efficiency Analysis

Green energy firms extract EPC ratings and property types to target households requiring boiler upgrades or insulation.

05
Lead Generation

Property sourcers identify slow-moving stock with multiple price reductions to target motivated sellers.

06
Urban Planning & Research

Consultancies analyse housing density, new development pipelines, and transit proximity for infrastructure planning.

Why DataFlirt

"OnTheMarket represents a critical segment of the UK property ecosystem, carrying exclusive listings 24 hours before they hit Rightmove or Zoopla."

Extracting property data at scale requires bypassing sophisticated bot protection, managing complex map-based pagination, and maintaining selectors across frequent front-end updates. DataFlirt absorbs that complexity so your engineering team can focus on building valuation models and market analysis — not maintaining scrapers.

Technical Spec

OnTheMarket scraper — technical capabilities

Everything supported by our onthemarket.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Full Playwright sessions to hydrate Next.js state and map components
Supported
CAPTCHA bypass
Automated solver integration for Cloudflare Turnstile challenges
Supported
UK Residential IPs
Geolocated residential proxy pools to prevent regional blocking
Supported
Polygon search extraction
Dynamic grid splitting to bypass 1,000-result pagination limits
Supported
'Only With Us' tracking
Flagging exclusive early-access properties
Supported
Historical price tracking
Extraction of listing price reduction history
Supported
Change detection (diffs)
Only emit records with changed fields since the previous run
Supported
Agent lead submission
Automated submission of viewing requests or contact forms
Partial
Saved properties data
Accessing user-specific saved lists or search alerts
Partial
Webhook delivery
HTTP POST per new listing for real-time alerting systems
Supported
Infrastructure

Infrastructure powering the property pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy manages request queues and deduplication. Playwright handles JavaScript execution for map rendering and Next.js hydration.

UK Proxy Infrastructure

Dedicated pools of UK residential ISP proxies ensure requests appear as legitimate local traffic, preventing geo-blocks and rate limits.

Cloud-Native Orchestration

Pipelines execute on AWS ECS with Airflow handling scheduling, retry logic, and delivery to downstream data warehouses.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited JSON for nested property attributes
CSV
Flat tabular files for quick analyst consumption
XLS
Excel format for business teams and manual review
Parquet
Columnar format optimised for analytical query engines
AWS S3
Direct delivery to your cloud storage buckets
Webhook
Real-time HTTP POST for immediate new listing alerts
API
On-demand REST endpoints to query scraped state
BigQuery
Direct streaming into Google Cloud data warehouses
Snowflake
Automated staging and ingestion for Snowflake instances
Postgres
Direct database upserts with primary key conflict resolution
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About onthemarket.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping OnTheMarket legal?

Scraping publicly accessible property data is generally permissible for analytical purposes. DataFlirt extracts only public listings and agent directory information. We do not bypass authentication walls or extract personal user data. Clients should review portal terms of service and consult legal counsel regarding their specific data usage.

How do you handle Cloudflare and bot detection?

We utilise UK residential proxies, full Playwright browser execution, and automated solvers to navigate challenge pages. Our request headers, TLS fingerprints, and concurrency rates are configured to mimic legitimate user behaviour.

Can you extract 'Only With Us' exclusive listings?

Yes. We specifically capture the 'Only With Us' flag, allowing clients to track properties listed on OnTheMarket before they are syndicated to other major portals.

How do you bypass the 42-page search limit?

OnTheMarket limits search pagination. To extract entire cities or regions, our system dynamically generates small geographic polygons, ensuring result counts remain below the pagination threshold and capturing 100% of available stock.

How fresh is the data?

Pipeline frequency is configurable. We support daily full-market sweeps, or high-frequency intra-day checks on specific postcodes for real-time new instruction alerting.

Can you extract floorplans and EPC certificates?

Yes. We extract the direct URLs for high-resolution images, PDF floorplans, and EPC documents, which can be downloaded or stored in your data lake.

What is the minimum viable engagement?

Our minimum engagement typically starts with daily extraction of a defined set of UK regions or postcodes. Contact our technical team to scope your specific geographic requirements and delivery cadence.

$ dataflirt scope --new-project --source=onthemarket.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a daily feed of new London instructions or a historical price dataset for the entire UK — we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →