SYSTEM all green source thumbtack.com queue 18,492 profiles p99 latency 214ms dataflirt.com · scraper/thumbtack-com
RUN - 84 active pipelines - thumbtack.com live

Local service data,
at warehouse scale.

We extract professional profiles, service areas, Top Pro status, pricing estimates, and review text from Thumbtack. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Profiles extracted
412K /day
Price estimates
1.8M /week
Review records
3.2M /run
Active pipelines
84
Uptime
99.94%
Data Dictionary

Every field we extract from thumbtack.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Professional Profiles objects from thumbtack.com. All fields typed and schema-versioned.

pro_idbusiness_namecategorytop_pro_badgeratingreview_counthires_on_thumbtackbackground_checkedemployeesyears_in_businessresponse_timedescriptionprofile_url
professional_profiles
● 200 OK
"pro_id": "pro_98237492837",
"business_name": "Apex Plumbing Services",
"category": "Plumbing",
"top_pro_badge": true,
"rating": 4.9,
"hires_on_thumbtack": 142,
"background_checked": true
# pro_idbusiness_namecategorytop_pro_badgeratingreview_count
1
2
3

Complete list of extractable fields for Pricing & Services objects from thumbtack.com. All fields typed and schema-versioned.

pro_idservice_categorybase_pricehourly_rateminimum_chargetravel_feepayment_methodsfree_estimatediscount_offeredservice_description
pricing_& services
● 200 OK
"pro_id": "pro_98237492837",
"service_category": "Water Heater Repair",
"base_price": 150.0,
"hourly_rate": 85.0,
"free_estimate": true,
"payment_methods": "['Credit Card', 'Zelle', 'Cash']"
# pro_idservice_categorybase_pricehourly_rateminimum_chargetravel_fee
1
2
3

Complete list of extractable fields for Reviews objects from thumbtack.com. All fields typed and schema-versioned.

review_idpro_idauthor_nameratingdate_postedreview_textservice_providedverified_hirepro_responseresponse_date
reviews
● 200 OK
"review_id": "rev_73628472",
"pro_id": "pro_98237492837",
"rating": 5,
"date_posted": "2026-03-14",
"review_text": "Fixed our leak in under an hour. Highly recommended.",
"verified_hire": true,
"service_provided": "Pipe Repair"
# review_idpro_idauthor_nameratingdate_postedreview_text
1
2
3

Complete list of extractable fields for Service Areas objects from thumbtack.com. All fields typed and schema-versioned.

pro_idprimary_locationstreet_addresscitystatezip_codetravel_radius_milesremote_servicescoordinates_latcoordinates_lng
service_areas
● 200 OK
"pro_id": "pro_98237492837",
"primary_location": "Austin, TX",
"city": "Austin",
"state": "TX",
"zip_code": "78701",
"travel_radius_miles": 30,
"remote_services": false
# pro_idprimary_locationstreet_addresscitystatezip_code
1
2
3

Complete list of extractable fields for Search Results objects from thumbtack.com. All fields typed and schema-versioned.

search_keywordzip_coderank_positionpro_idbusiness_namesponsored_placementratingreview_countstarting_pricescraped_at
search_results
● 200 OK
"search_keyword": "plumber",
"zip_code": "78701",
"rank_position": 3,
"pro_id": "pro_98237492837",
"business_name": "Apex Plumbing Services",
"sponsored_placement": false,
"starting_price": 150.0
# search_keywordzip_coderank_positionpro_idbusiness_namesponsored_placement
1
2
3

Capabilities

Every local service data point - nothing you don't

Our Thumbtack scraper handles location spoofing, dynamic category pagination, and heavily nested JSON state extraction - with session management and anti-bot circumvention built in.

Pro Profile Extraction

Business name, description, employee count, and years in business captured directly from the professional profile.

Top Pro & Credential Tracking

Capture Top Pro badges, background check status, and verified licenses to evaluate trust metrics.

Pricing & Estimate Data

Extract base prices, hourly rates, and fixed fees for specific service categories.

Review Corpus Mining

Full review text, star ratings, verified hire tags, and pro responses paginated across all history.

Geo-Targeted Search

Execute searches across specific US zip codes to map local market density and category saturation.

Performance Metrics

Track response times, total hires on Thumbtack, and recent booking velocity for individual professionals.

Service Area Mapping

Extract travel radii and specific cities served by each professional to build accurate coverage maps.

SERP Position Tracking

Monitor organic vs sponsored rank for specific service keywords by zip code.

Media & Portfolio Links

Extract image URLs from pro galleries and completed project showcases.

// engagement pipeline

From zip code list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide zip codes, service categories, or specific pro URLs. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy / Playwright crawlers, proxy rotation, and CAPTCHA handling for thumbtack.com.

Validation & QA
d 4–6

Schema validation, null-rate checks, and location-accuracy verification before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Thumbtack pipeline handles the hard parts

Thumbtack relies on location data and strict bot mitigation. Here is how we maintain extraction reliability.

pipeline-monitor · thumbtack.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Anti-bot layer
Residential proxy rotation + fingerprint spoofing

Thumbtack uses aggressive bot protection. We use US residential ISP proxies with realistic browser fingerprints and full cookie session management to mimic actual user behaviour.

Location spoofing
Geo-coordinate injection

Thumbtack search requires precise location data. We inject latitude and longitude coordinates and mock Geolocation APIs at the browser level to bypass regional blocks.

GraphQL interception
Direct API state extraction

Instead of parsing brittle DOM elements, we intercept Thumbtack's internal GraphQL responses to extract clean, nested JSON data directly from the network tab.

Change detection
Only re-scrape what has changed

We maintain a hash index of last-seen values per field. Subsequent runs only push diffs, reducing compute cost and downstream processing load.

Monitoring & alerting
24/7 pipeline health

Every run emits structured logs to our observability stack. We alert on null-rate spikes and coverage drops, ensuring SLA uptime.

Applications

Who uses Thumbtack data - and how

Teams across industries use thumbtack.com data to build competitive products and smarter operations.

01
Local Market Intelligence

Marketplaces analyze supply density by zip code to identify underserved service categories and expansion opportunities.

02
Competitor Price Tracking

Service franchises monitor local pricing estimates to optimise their own hourly rates and fixed fees.

03
Lead Generation & B2B Sales

SaaS companies selling to SMBs extract newly listed, highly-rated pros for targeted outreach campaigns.

04
Review & Sentiment Analysis

Reputation management platforms aggregate verified reviews to track local business sentiment over time.

05
Labor Market Research

Economists track hourly rates across different geographies to measure local inflation and wage growth.

06
Investment Due Diligence

PE firms evaluate the growth of local service platforms by tracking active pro counts and booking velocity.

Why DataFlirt

"Thumbtack holds the most accurate hyper-local pricing and availability data for US service professionals - but extracting it requires mimicking thousands of local users."

Most teams fail at local directory scraping because they use datacenter IPs and ignore browser geolocation APIs. Thumbtack blocks these requests instantly. DataFlirt manages the residential proxy networks, coordinate spoofing, and GraphQL interception required to extract local data at scale. You get clean records; we handle the infrastructure.

Technical Spec

Thumbtack scraper - technical capabilities

Everything supported by our thumbtack.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Full Playwright sessions required for search loading and profile hydration
Supported
Geolocation spoofing
Browser-level latitude and longitude injection for accurate local search
Supported
GraphQL interception
Direct extraction of structured data from internal network requests
Supported
Residential proxy rotation
US-specific ISP pools rotated per request to bypass rate limits
Supported
Zip-code targeting
Search iteration across all 41,000+ US zip codes
Supported
Review pagination
Extracting historical reviews beyond the initial page load
Supported
Change detection
Hash-based diffing for pricing and profile updates
Supported
Customer messaging
Automated sending of messages to pros via the platform
Partial
Private booking details
Extracting customer names and exact booking addresses
Partial
Infrastructure

Infrastructure powering the Thumbtack pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration and deduplication. Playwright handles JavaScript execution and geolocation spoofing.

US Residential Proxies

Geo-targeted ISP proxies bypass bot protection and serve accurate local results for specific zip codes.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow manages scheduling, dependencies, and SLA alerting.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested arrays
CSV
Flat file with typed columns
XLS
Excel format for business operations
Parquet
Columnar format for data warehouses
AWS S3
Direct bucket delivery
Webhook
HTTP POST per record for real-time processing
API
REST endpoint for on-demand querying
PostgreSQL
Direct database insertion with upsert logic
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About thumbtack.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Thumbtack legal?

Scraping publicly available information is generally permissible under applicable law, reinforced by the hiQ v. LinkedIn ruling. DataFlirt extracts only public, non-authenticated professional profiles and pricing data. We do not extract personal consumer data or bypass authentication walls.

How do you handle Thumbtack bot protection?

We use US residential ISP proxies, full Playwright browser sessions with realistic fingerprints, and request timing modelled on human behaviour. This bypasses automated security layers effectively.

How do you scrape by specific locations?

We inject precise latitude and longitude coordinates directly into the browser's Geolocation API during the Playwright session, ensuring Thumbtack returns accurate hyper-local results for any given zip code.

How fresh is the pricing data?

Pipelines can be configured for daily, weekly, or monthly cadences. For targeted zip codes, daily refreshes capture price changes and new pro listings within a 12-hour window.

What is the minimum viable engagement?

Our minimum engagement typically starts at 10,000 professional profiles or 500 zip codes with weekly delivery. Contact us with your specific volume requirements for a scoped quote.

Can I request a sample dataset?

Yes. We provide a sample run of up to 500 profiles or 10 zip codes during the pre-engagement phase so you can validate schema fit and data quality.

$ dataflirt scope --new-project --source=thumbtack.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a full export of US plumbers or a daily price-tracking feed across 50 cities - we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →