SYSTEM all green source confused.com queue 14,892 queries p99 latency 842ms dataflirt.com · scraper/confused-com
RUN · 31 active pipelines · confused.com live

Aggregator pricing,
at warehouse scale.

We extract premium quotes, provider coverage tiers, excess limits, and policy features from Confused.com. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Quotes extracted
1.2M /day
Policy updates
4.8M /24h
Form permutations
85K /run
Active pipelines
31
Uptime
99.94%
Data Dictionary

Every field we extract from confused.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Car Insurance objects from confused.com. All fields typed and schema-versioned.

provider_namepremium_annualpremium_monthlycompulsory_excessvoluntary_excesstotal_excessdefaqto_ratingcourtesy_carpersonal_accident_coverwindscreen_coverlegal_cover
car_insurance
● 200 OK
"provider_name": "Admiral",
"premium_annual": 482.5,
"compulsory_excess": 150,
"voluntary_excess": 200,
"defaqto_rating": 5,
"courtesy_car": true
# provider_namepremium_annualpremium_monthlycompulsory_excessvoluntary_excesstotal_excess
1
2
3

Complete list of extractable fields for Home Insurance objects from confused.com. All fields typed and schema-versioned.

provider_namepremium_annualbuildings_cover_limitcontents_cover_limitaccidental_damagehome_emergencytrace_and_accesstotal_excessdefaqto_ratingalternative_accommodation
home_insurance
● 200 OK
"provider_name": "Churchill",
"premium_annual": 185.0,
"buildings_cover_limit": 1000000,
"contents_cover_limit": 50000,
"accidental_damage": false,
"total_excess": 250
# provider_namepremium_annualbuildings_cover_limitcontents_cover_limitaccidental_damagehome_emergency
1
2
3

Complete list of extractable fields for Policy Add-ons objects from confused.com. All fields typed and schema-versioned.

provider_namepolicy_typebreakdown_cover_pricelegal_protection_pricekey_cover_priceprotected_ncb_pricecourtesy_car_upgradepersonal_injury_cover
policy_add-ons
● 200 OK
"provider_name": "Hastings Direct",
"policy_type": "Car",
"breakdown_cover_price": 29.99,
"legal_protection_price": 19.99,
"key_cover_price": 15.0,
"protected_ncb_price": 35.0
# provider_namepolicy_typebreakdown_cover_pricelegal_protection_pricekey_cover_priceprotected_ncb_price
1
2
3

Complete list of extractable fields for Travel Insurance objects from confused.com. All fields typed and schema-versioned.

provider_namepremium_totalcover_typedestination_zonemedical_cover_limitcancellation_coverbaggage_coverexcess_amountcovid_coverdefaqto_rating
travel_insurance
● 200 OK
"provider_name": "Post Office",
"premium_total": 45.2,
"cover_type": "Annual Multi-trip",
"destination_zone": "Europe",
"medical_cover_limit": 5000000,
"excess_amount": 50
# provider_namepremium_totalcover_typedestination_zonemedical_cover_limitcancellation_cover
1
2
3

Complete list of extractable fields for Pet Insurance objects from confused.com. All fields typed and schema-versioned.

provider_namepremium_annualpremium_monthlycover_typevet_fee_limitexcess_amountco_payment_pctdeath_from_illnessthird_party_liabilitydental_cover
pet_insurance
● 200 OK
"provider_name": "Petplan",
"premium_annual": 345.6,
"cover_type": "Lifetime",
"vet_fee_limit": 4000,
"excess_amount": 95,
"co_payment_pct": 10
# provider_namepremium_annualpremium_monthlycover_typevet_fee_limitexcess_amount
1
2
3

Capabilities

Everything you need from Confused.com

Our pipeline handles every layer of the aggregator platform: multi-step form submissions, dynamic pricing interpolation, excess permutations, and bot circumvention.

Motor Premium Extraction

Extract comprehensive, third-party fire and theft, and third-party only quotes across all providers.

Home Cover Analytics

Capture buildings and contents insurance pricing with granular limits and excess permutations.

Defaqto Rating Tracking

Monitor policy quality scores alongside pricing to map value propositions across the market.

Add-on Price Mapping

Isolate the cost of legal cover, breakdown assistance, and key cover to analyse cross-sell margins.

Multi-step Form Automation

Handle complex quote generation flows with dynamic vehicle registration and postcode inputs.

Excess Permutation Matrix

Iterate through voluntary excess dropdowns to map the exact premium curve for every provider.

Travel Policy Data

Extract single-trip and annual multi-trip premiums mapped against destination zones and medical limits.

Pet Cover Tiers

Capture lifetime, maximum benefit, and time-limited policy pricing for specific breeds and ages.

Scheduled Market Sweeps

Run daily or weekly price comparison sweeps across predefined risk profiles to track market inflation.

// engagement pipeline

From risk profile to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide risk profiles, vehicle sets, or postcodes. We design the extraction schema and input permutations together.

Pipeline Build
d 2–4

We configure Playwright crawlers, handle multi-step form state, manage sessions, and bypass aggregator bot detection.

Validation & QA
d 4–6

Schema validation, null-rate checks, premium outlier detection, and sample quote verification before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our aggregator pipeline handles the hard parts

Confused.com invests heavily in rate limiting and bot detection. Here is how we stay resilient.

pipeline-monitor · confused.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Form state management
Navigating 5-stage quote journeys

Confused.com requires complex sequential inputs. We maintain persistent Playwright sessions to inject postcodes, vehicle data, and driver histories without triggering validation errors.

Anti-bot layer
Residential proxy rotation + fingerprint spoofing

Aggregators block repetitive quote requests. Our crawlers use UK residential ISP proxies with realistic browser fingerprints and randomised typing delays.

Dynamic rendering
Handling asynchronous quote loading

Provider prices load asynchronously via WebSockets and XHR. We monitor network traffic directly to capture quotes the millisecond they return, bypassing UI rendering delays.

Permutation scaling
Matrix execution for risk profiles

To map a pricing curve, we iterate thousands of risk profiles. Our Kubernetes cluster distributes these permutations across parallel workers to complete market sweeps in hours.

Schema stability
Resilient selectors for provider cards

UI layouts change frequently. Our selector strategy uses multiple fallback chains per field so a CSS update does not break your data pipeline overnight.

Applications

Who uses Confused.com data

Teams across industries use confused.com data to build competitive products and smarter operations.

01
Competitor Price Tracking

Underwriting teams monitor rival premiums across specific postcodes and vehicle groups to optimise their own pricing models.

02
Market Inflation Indices

Analysts track average premium fluctuations over time to publish consumer price indices for motor and home insurance.

03
Add-on Margin Analysis

Product teams analyse how competitors price breakdown cover and legal protection to structure profitable cross-sell journeys.

04
Risk Profile Mapping

Actuaries correlate premium changes against specific risk variables like age, occupation, and claims history across the aggregator market.

05
Defaqto Value Positioning

Marketing teams track how their policy features and Defaqto ratings compare to cheaper, lower-tier alternatives on the results page.

06
New Entrant Monitoring

Strategy teams detect when new MGAs or challenger brands appear on the aggregator panel and track their initial pricing strategies.

Why DataFlirt

"Confused.com holds the definitive pulse on UK insurance pricing, but mapping that data requires navigating complex form states and aggressive rate limits."

Most teams underestimate the investment required: reliable aggregator scraping requires UK residential proxies, multi-step form automation, asynchronous network interception, and daily selector maintenance. DataFlirt absorbs that complexity so your actuaries can focus on the analysis, not the infrastructure.

Technical Spec

Confused.com scraper technical capabilities

Everything supported by our confused.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Multi-step form automation
Injects dynamic data into sequential quote forms
Supported
UK Residential proxy rotation
ISP-grade IPs from UK pools to bypass rate limits
Supported
Asynchronous quote capture
Intercepts XHR/WebSocket traffic for immediate price extraction
Supported
Excess permutation iteration
Loops through voluntary excess dropdowns to map price curves
Supported
Defaqto rating extraction
Captures star ratings and policy feature inclusions
Supported
Vehicle registration lookup
Automates VRN input and confirms vehicle details
Supported
Address lookup automation
Handles postcode search and address selection dropdowns
Supported
Change detection (diffs)
Hash-based diff: only emit records with changed premiums
Supported
Purchasing a policy
Executing transactions or entering payment details
Partial
Accessing saved user quotes
Logging into consumer accounts to retrieve historical quotes
Partial
Infrastructure

Infrastructure powering the aggregator pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration and retry logic. Playwright handles multi-step form injection, cookie sessions, and interaction flows.

UK Proxy Infrastructure

We maintain pools of UK residential ISP proxies. Rotation happens per-request with sticky sessions required for the duration of the quote journey.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested - schema versioned per run
CSV
Flat file with typed columns - Excel/Sheets compatible
XLS
Direct Excel exports for actuarial and analyst teams
Parquet
Columnar format for BigQuery, Snowflake, Athena
AWS S3
Direct bucket delivery - compatible with any data lake
Webhook
HTTP POST per record for real-time downstream processing
API
REST endpoints to query historical premium datasets
Postgres
Upsert into your existing schema with conflict resolution
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About confused.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Confused.com legal?

Scraping publicly available pricing from aggregators is generally permissible under UK law for market research purposes. DataFlirt targets only public quote data generated via synthetic risk profiles. We do not extract personal data or circumvent authentication walls. Clients should review terms of service and consult legal counsel.

How do you handle the multi-step quote forms?

We use Playwright to automate the entire journey. We inject postcodes, vehicle registrations, and driver details synthetically, maintaining session state across all 5 stages of the form without triggering validation blocks.

Can you run quotes for thousands of different risk profiles?

Yes. You provide a matrix of risk variables, and our Kubernetes cluster distributes these permutations across parallel workers to generate comprehensive market pricing curves.

How do you bypass aggregator bot detection?

We use UK residential ISP proxies, full Playwright browser sessions with realistic fingerprints, and request timing modelled on human typing speeds. We monitor for CAPTCHA spikes and trigger solver queues automatically.

Do you capture add-on pricing and Defaqto ratings?

Yes. Alongside the base premium, we extract voluntary and compulsory excess, breakdown cover costs, legal protection fees, and the provider's Defaqto star rating.

How fresh is the data?

We execute market sweeps on your specified cadence. Daily and weekly runs are standard for competitor tracking. Execution time depends on the size of your risk profile matrix.

What is the minimum viable engagement?

Our smallest packages start at a defined matrix of 5,000 risk profiles with weekly delivery. For larger matrices or custom schema requirements, we price based on compute volume and delivery frequency.

$ dataflirt scope --new-project --source=confused.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off premium matrix dump or a continuous price-monitoring feed across millions of risk profiles, we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →