SYSTEM all green source kw.com queue 12,408 pages p99 latency 218ms dataflirt.com · scraper/kw-com
RUN · 47 active pipelines · kw.com live

Keller Williams data,
at warehouse scale.

We extract active residential listings, commercial properties, agent directories, pricing histories, and neighbourhood analytics from kw.com. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Listings extracted
412,891 /day
Agent profiles
184,293 /run
Price updates
89,102 /24h
Active pipelines
47
Uptime
99.94%
Data Dictionary

Every field we extract from kw.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Property Listings objects from kw.com. All fields typed and schema-versioned.

listing_idmls_numberproperty_typestatuspricebedsbathssquare_feetlot_sizeyear_builtaddresscitystatezip_codedescriptionagent_idoffice_nameimage_urlsvirtual_tour_url
property_listings
● 200 OK
"listing_id": "KW-738291",
"mls_number": "TX-882910",
"price": 450000,
"beds": 4,
"baths": 3.5,
"square_feet": 2850,
"status": "Active",
"city": "Austin"
# listing_idmls_numberproperty_typestatuspricebeds
1
2
3

Complete list of extractable fields for Agent Profiles objects from kw.com. All fields typed and schema-versioned.

agent_idfull_namelicense_numberoffice_idoffice_namephone_numberemailwebsite_urllanguages_spokenspecialtiesdesignationsbioactive_listings_counttotal_sales_volumesocial_linksprofile_image_url
agent_profiles
● 200 OK
"agent_id": "A-59281",
"full_name": "Sarah Jenkins",
"office_name": "KW Austin Southwest",
"phone_number": "+1-512-555-0198",
"active_listings_count": 14,
"specialties": "['Luxury', 'Relocation']",
"languages_spoken": "['English', 'Spanish']"
# agent_idfull_namelicense_numberoffice_idoffice_namephone_number
1
2
3

Complete list of extractable fields for Pricing & History objects from kw.com. All fields typed and schema-versioned.

listing_idcurrent_priceoriginal_priceprice_per_sqftdays_on_marketprice_historytax_historytax_yeartax_amounthoa_feehoa_frequencyestimated_mortgagestatus_history
pricing_& history
● 200 OK
"listing_id": "KW-738291",
"current_price": 450000,
"original_price": 475000,
"price_per_sqft": 157.89,
"days_on_market": 42,
"tax_amount": 6200,
"hoa_fee": 150
# listing_idcurrent_priceoriginal_priceprice_per_sqftdays_on_marketprice_history
1
2
3

Complete list of extractable fields for Open Houses objects from kw.com. All fields typed and schema-versioned.

listing_idopen_house_idstart_timeend_timedateevent_typeagent_idagent_namersvp_requiredvirtual_eventzoom_linkremarks
open_houses
● 200 OK
"listing_id": "KW-738291",
"open_house_id": "OH-9921",
"date": "2026-06-14",
"start_time": "13:00:00",
"end_time": "16:00:00",
"virtual_event": false,
"agent_name": "Sarah Jenkins"
# listing_idopen_house_idstart_timeend_timedateevent_type
1
2
3

Complete list of extractable fields for Office & Brokerage objects from kw.com. All fields typed and schema-versioned.

office_idoffice_namebroker_nameaddresscitystatezip_codephone_numberagent_countactive_listingstotal_saleswebsite_urloperating_principal
office_& brokerage
● 200 OK
"office_id": "O-9182",
"office_name": "KW Austin Southwest",
"city": "Austin",
"state": "TX",
"agent_count": 342,
"active_listings": 1205,
"phone_number": "+1-512-555-0000"
# office_idoffice_namebroker_nameaddresscitystate
1
2
3

Capabilities

Comprehensive Keller Williams data extraction

Our KW scraper handles every layer of the platform: property listings, dynamic pricing, map-based search results, and agent directories. Built with JavaScript rendering, session management, and anti-bot circumvention.

Full Listing Extraction

Title, beds, baths, square footage, description, images, virtual tours, and every metadata field Keller Williams surfaces.

Agent Directory Scraping

Extract KW associates, bios, contact details, active listings, and specialisations across all market centres.

Real-Time Status Tracking

Monitor active, pending, sold, and off-market status changes on a daily or hourly basis.

Price Reductions & History

Track list price changes, original price, and days on market to identify pricing trends.

HOA & Tax Data

Capture local property taxes, HOA fees, and assessment histories attached to residential listings.

Open House Schedules

Extract dates, times, and agent details for upcoming open houses across target zip codes.

Commercial Real Estate

Extract KW Commercial listings, zoning information, cap rates, and lease terms.

Office Intelligence

Map KW franchises, operating principals, and roster sizes across different states and regions.

Multi-Region Coverage

Extract data across US, Canada, and KW Worldwide regions using a unified extraction schema.

Geospatial Normalisation

Standardise addresses, zip codes, and coordinate data for immediate integration into mapping tools.

// engagement pipeline

From target zip codes to warehouse records

Brief in. Clean data out.

Define Scope
d 0

Provide target zip codes, states, or agent criteria. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy / Playwright crawlers, proxy rotation, and session management for kw.com.

Validation & QA
d 4–6

Schema validation, null-rate checks, and geospatial anomaly detection before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our kw.com pipeline handles the hard parts

Real estate platforms invest heavily in scraping detection. Here is how we stay resilient, and why teams choose managed infrastructure over DIY.

pipeline-monitor · kw.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Anti-bot layer
Residential proxy rotation and fingerprint spoofing

Keller Williams uses anti-scraping firewalls to block data centre IPs. Our crawlers use residential ISP proxies with realistic browser fingerprints, randomised request timing, and full cookie session management.

Map-based search rendering
Full Playwright execution for spatial clusters

KW's map search is heavily JavaScript-rendered and relies on dynamic API calls based on bounding boxes. We run full Playwright browser sessions to trigger map movements and capture the underlying JSON payloads.

Schema stability
Resilient selectors with fallback chains

Real estate DOM structures change frequently based on property type. Our selector strategy uses multiple fallback chains per field so a layout change does not break your data pipeline overnight.

Change detection
Only re-scrape what has changed

For large MLS markets, we maintain a hash index of last-seen values per listing. Subsequent runs only push diffs, reducing compute cost and downstream processing load for status and price updates.

Monitoring & alerting
24/7 pipeline health with anomaly detection

Every run emits structured logs to our observability stack. We alert on null-rate spikes, price outliers, schema drift, and coverage drops, responding before you notice.

Applications

Who uses Keller Williams data, and how

Teams across industries use kw.com data to build competitive products and smarter operations.

01
Investment & Yield Analysis

Identify underpriced properties, calculate price per square foot, and assess rental yield potential across specific neighbourhoods.

02
Agent Recruiting & Retention

Brokerages track top-performing KW agents, sales volumes, and active listings for targeted recruiting campaigns.

03
Market Trend Forecasting

Analyse days on market, price reductions, and inventory levels across specific zip codes to predict market shifts.

04
PropTech Platform Enrichment

Enrich internal real estate portals with active listings, open house dates, and agent contact information.

05
Mortgage & Lending Intelligence

Identify new listings rapidly to target buyers with pre-approval offers and mortgage products.

06
Appraisal & Valuation Models

Feed automated valuation models (AVMs) with historical pricing, tax assessments, and comparable property data.

Why DataFlirt

"Keller Williams holds one of the most comprehensive agent and property datasets in North America, but mapping it requires infrastructure built for dynamic map-based interfaces."

Extracting real estate data at scale requires bypassing sophisticated anti-bot firewalls, rendering complex map clusters, and normalising inconsistent MLS feeds. DataFlirt handles the proxy rotation, JavaScript execution, and schema maintenance so your data science teams can focus on valuation models and market analysis.

Technical Spec

Keller Williams scraper: technical capabilities

Everything supported by our kw.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Full Playwright sessions required for map clusters and dynamic API calls
Supported
Residential proxy rotation
ISP-grade residential IPs from US/CA pools rotated per request
Supported
MLS ID extraction
Capture underlying MLS identifiers for cross-referencing
Supported
Agent contact details
Public office numbers, emails, and social links from agent directories
Supported
Change detection (diffs)
Hash-based diff to emit only records with changed status or price
Supported
High-res image extraction
Capture full-resolution property photos and floor plans
Supported
Webhook delivery
HTTP POST per record for real-time new listing alerts
Supported
Historical sold data
Properties sold more than 3 years ago are typically not surfaced publicly
Partial
Client portal saved searches
Requires authenticated KW consumer account credentials
Partial
Private agent remarks
Confidential MLS notes gated for licensed agents only
Partial
Infrastructure

Infrastructure powering the Keller Williams pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright handles JavaScript rendering, map API interception, and interaction flows.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies across North America. Rotation happens per-request with sticky sessions where required to bypass WAF rules.

Cloud-Native Orchestration

Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested arrays versioned per run
CSV
Flat file with typed columns for direct analysis
Parquet
Columnar format for BigQuery, Snowflake, Athena
AWS S3
Direct bucket delivery compatible with any data lake
Webhook
HTTP POST per record for real-time processing
API
Queryable endpoints for on-demand data retrieval
BigQuery
Streamed directly into your dataset with schema auto-detect
Snowflake
Stage and COPY INTO workflow for incremental updates
XLS
Standard spreadsheet format for non-technical teams
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About kw.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping kw.com legal?

Scraping publicly available information from kw.com is generally permissible under applicable law. DataFlirt targets only public, non-authenticated property listings and public agent profiles. We do not extract private consumer data, circumvent authentication walls, or scrape confidential MLS remarks.

How do you handle map-based search results?

We use Playwright to execute the JavaScript necessary to load the map interfaces, intercepting the underlying API calls that return the JSON payloads for property clusters within specific bounding boxes.

Can you extract agent contact information?

Yes. We extract public office phone numbers, public email addresses, social media links, and website URLs listed on the public KW agent directory.

How fresh is the listing data?

Depending on your requirements, pipelines can be configured for daily full-market refreshes or sub-daily streaming for specific target zip codes to capture new listings and price changes rapidly.

Do you capture price reductions?

Yes. Every pipeline run produces timestamped snapshots. We maintain a time-series record per listing, allowing us to track original price, current price, and calculate days on market.

What is the minimum viable engagement?

Our smallest packages start at a defined city or state level with weekly delivery. For national coverage or custom schema requirements, we price based on volume and delivery frequency.

Can you scrape KW Commercial properties?

Yes. We support extraction of KW Commercial listings, including specific commercial fields like zoning, cap rates, building class, and lease terms.

$ dataflirt scope --new-project --source=kw.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off agent directory export or continuous market monitoring across multiple states, we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →