SYSTEM all green source foursquare.com queue 18,492 venues p99 latency 214ms dataflirt.com · scraper/foursquare-com
RUN . 84 active pipelines . foursquare.com live

Foursquare POI data,
at warehouse scale.

We extract venue details, user tips, tastes, ratings, and location metadata from Foursquare City Guide. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Venues extracted
1.2M /day
Tips processed
3.4M /24h
Tastes mapped
840K /run
Active pipelines
84
Uptime
99.98%
Data Dictionary

Every field we extract from foursquare.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Venue Data objects from foursquare.com. All fields typed and schema-versioned.

venue_idnameprimary_categorysub_categorieslatitudelongitudeaddresscitystatecountrypostal_coderatingprice_tierphonewebsiteverified_status
venue_data
● 200 OK
"venue_id": "4b4606f2f964a520c11426e3",
"name": "Blue Bottle Coffee",
"primary_category": "Coffee Shop",
"latitude": 37.7763,
"longitude": -122.4233,
"rating": 9.1,
"price_tier": 2,
"city": "San Francisco"
# venue_idnameprimary_categorysub_categorieslatitudelongitude
1
2
3

Complete list of extractable fields for Tips and Reviews objects from foursquare.com. All fields typed and schema-versioned.

tip_idvenue_iduser_iduser_nametextcreated_atupvotesdownvotesphoto_urllanguage
tips_and reviews
● 200 OK
"tip_id": "5a2b3c4d5e6f7a8b9c0d1e2f",
"venue_id": "4b4606f2f964a520c11426e3",
"user_name": "Jane Doe",
"text": "The New Orleans Iced Coffee is incredible.",
"created_at": "2026-03-12T14:22:00Z",
"upvotes": 42,
"language": "en"
# tip_idvenue_iduser_iduser_nametextcreated_at
1
2
3

Complete list of extractable fields for Attributes and Tastes objects from foursquare.com. All fields typed and schema-versioned.

venue_idtastesoutdoor_seatingcredit_cardsparkingwifireservationswheelchair_accessiblerestroomsmoking
attributes_and tastes
● 200 OK
"venue_id": "4b4606f2f964a520c11426e3",
"tastes": "['cold brew', 'pastries', 'avocado toast']",
"outdoor_seating": true,
"credit_cards": true,
"wifi": "Free",
"wheelchair_accessible": true
# venue_idtastesoutdoor_seatingcredit_cardsparkingwifi
1
2
3

Complete list of extractable fields for Footfall Signals objects from foursquare.com. All fields typed and schema-versioned.

venue_idtotal_checkinstotal_userstotal_tipstotal_visitspopular_hoursrelated_venuestrending_status
footfall_signals
● 200 OK
"venue_id": "4b4606f2f964a520c11426e3",
"total_checkins": 125430,
"total_users": 45120,
"total_tips": 842,
"total_visits": 310500,
"trending_status": false,
"related_venues": "['4c5d6e7f8g9h0i1j']"
# venue_idtotal_checkinstotal_userstotal_tipstotal_visitspopular_hours
1
2
3

Complete list of extractable fields for Photos objects from foursquare.com. All fields typed and schema-versioned.

photo_idvenue_iduser_idurlwidthheightcreated_atvisibilitysource
photos
● 200 OK
"photo_id": "6b7c8d9e0f1a2b3c4d5e6f7a",
"venue_id": "4b4606f2f964a520c11426e3",
"url": "https://fastly.4sqi.net/img/general/original/123456.jpg",
"width": 1920,
"height": 1080,
"created_at": "2026-01-15T09:30:00Z"
# photo_idvenue_iduser_idurlwidthheight
1
2
3

Capabilities

Everything you need from Foursquare, nothing you don't

Our Foursquare scraper handles the complexity of spatial grid pagination, category hierarchies, and rate limits, delivering clean POI data ready for spatial analysis.

Global POI Extraction

Extract comprehensive venue data including coordinates, precise addresses, and verified status across any city or geographic bounding box.

Category Hierarchies

Map venues to Foursquare's detailed category taxonomy, capturing primary and secondary classifications for accurate filtering.

Ratings and Reviews

Capture the proprietary Foursquare rating out of 10, alongside user tips, upvotes, and historical sentiment indicators.

Tastes and Attributes

Extract nuanced venue features like wifi availability, parking situations, outdoor seating, and user-generated taste tags.

Footfall Indicators

Track historical check-in counts, total unique users, and visit metrics to estimate location popularity and foot traffic.

Operating Hours

Parse complex operating hours, including split shifts, holiday exceptions, and popular times data.

Related Venues

Map spatial relationships by extracting 'People also liked' and related venue graphs for competitive analysis.

Photo Metadata

Extract high-resolution image URLs, dimensions, and upload timestamps for visual verification of POIs.

Change Detection

Run continuous pipelines to detect new venue openings, permanent closures, and rating fluctuations over time.

// engagement pipeline

From geographic bounding box to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide target cities, coordinate bounding boxes, or category lists. We design the extraction schema together.

Pipeline Build
d 2–4

We configure spatial grid crawlers, proxy rotation, and session management for foursquare.com.

Validation & QA
d 4–6

Schema validation, null-rate checks, and coordinate accuracy verification before full launch.

Delivery
ongoing

JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Foursquare pipeline handles the hard parts

Extracting global POI data requires complex spatial pagination and rate limit management. Here is how we maintain reliable pipelines.

pipeline-monitor · foursquare.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Spatial pagination
Dynamic coordinate bounding grids

Foursquare limits search results per geographic area. We use H3 spatial indexing to generate dynamic bounding boxes, ensuring comprehensive extraction without missing dense urban areas or wasting requests on empty regions.

Rate limit management
Distributed residential proxies

Foursquare aggressively rate-limits high-volume requests. We distribute requests across a global pool of residential ISP proxies, maintaining strict concurrency limits and randomised delays to avoid IP bans.

Data normalisation
Cleaning unstructured attributes

Venue attributes and hours are often unstructured. Our pipeline parses string representations of operating hours into standard ISO formats and maps raw attribute tags into boolean fields.

Change detection
Tracking venue lifecycle events

For POI monitoring, we maintain a hash index of last-seen values. Subsequent runs only push diffs, highlighting new openings, closures, and significant rating changes without full re-dumps.

Monitoring
24/7 pipeline health checks

Every run emits structured logs. We alert on null-rate spikes in critical fields like coordinates, category drift, and coverage drops, responding before data quality degrades.

Applications

Who uses Foursquare data and how

Teams across industries use foursquare.com data to build competitive products and smarter operations.

01
Urban Planning and GIS

City planners and GIS analysts use POI density, category distribution, and footfall signals to model urban development.

02
Retail Site Selection

Retailers analyse competitor locations, complementary businesses, and popular hours to identify optimal locations for new stores.

03
Travel and Hospitality

Aggregators enrich their own listings with Foursquare ratings, user tips, and taste attributes to improve recommendations.

04
Alternative Data for Finance

Quantitative funds track check-in trends and store closures across retail chains to predict quarterly performance.

05
Local SEO Monitoring

Marketing agencies track venue visibility, rating changes, and review velocity for client locations across different cities.

06
Delivery Logistics

Last-mile delivery platforms verify exact coordinates, operating hours, and entrance details for restaurants and retailers.

Why DataFlirt

"Foursquare maintains one of the most accurate independent POI databases globally, but extracting global venue data requires navigating complex geo-spatial pagination."

Most teams fail at location scraping because they rely on naive grid searches. We use spatial indexing, residential proxies, and dynamic coordinate bounding to extract Foursquare venue data without missing edge cases or triggering rate limits. DataFlirt handles the infrastructure so you can focus on spatial analysis.

Technical Spec

Foursquare scraper technical capabilities

Everything supported by our foursquare.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Spatial grid pagination
Automated H3 hexagonal grid generation for comprehensive area coverage
Supported
Tips extraction
Full pagination of user tips, including text, upvotes, and timestamps
Supported
Tastes and Features
Extraction of venue attributes, amenities, and user-generated taste tags
Supported
Category mapping
Extraction of primary and secondary categories matching Foursquare taxonomy
Supported
Historical footfall
Total check-ins, user counts, and visit metrics per venue
Supported
Change detection
Hash-based diffing to identify new venues, closures, and rating updates
Supported
Private user check-in history
Individual user location history and private check-ins
Partial
Swarm private messages
Direct messages and private social interactions between users
Partial
Infrastructure

Infrastructure powering the Foursquare pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration and spatial grid iteration. Playwright handles JavaScript rendering for complex venue pages and interactive maps.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies across global regions. Rotation happens per-request to bypass rate limits and geographic restrictions.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles spatial job chunking and dependency management. State stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested array formats
CSV
Flat file with typed columns for GIS tools
XLS
Excel compatible format for smaller regional datasets
Parquet
Columnar format for BigQuery, Snowflake, and Athena
AWS S3
Direct bucket delivery compatible with any data lake
Webhook
HTTP POST per record for real-time downstream processing
API
RESTful endpoints to query extracted POI data
BigQuery
Streamed directly into your dataset with schema auto-detect
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About foursquare.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Foursquare legal?

Scraping publicly available information from Foursquare is generally permissible. DataFlirt targets only public, non-authenticated venue data, ratings, and tips. We do not extract private user check-ins or violate GDPR. Clients should review Foursquare ToS and consult legal counsel for specific use cases.

How do you ensure comprehensive geographic coverage?

We use H3 spatial indexing to generate overlapping coordinate bounding boxes for your target regions. This ensures we capture all venues without hitting pagination limits in dense urban areas.

Can you extract data for specific categories only?

Yes. We can filter extraction by Foursquare primary or secondary categories, such as restaurants, retail stores, or parks, reducing pipeline execution time and storage costs.

How fresh is the POI data?

We configure pipeline cadence based on your requirements. Most clients opt for weekly or monthly refreshes to track new openings, closures, and rating changes across large geographic areas.

Do you extract user photos?

We extract the metadata and URLs for public photos uploaded to venues. We do not download or store the actual image files, but provide the direct links for your systems to process.

How do you handle rate limits?

We use residential ISP proxies, strict concurrency controls, and randomised request timing. Our spatial crawlers distribute requests geographically to avoid triggering local rate limit thresholds.

What is the minimum viable engagement?

Our minimum engagements typically start with a defined set of cities or specific category verticals across a country. Contact us with your target regions for a scoped quote.

Can I request a sample dataset?

Yes. We provide a sample run for a specific neighbourhood or small city bounding box during the scoping process, allowing you to validate coordinates, schema fit, and data quality.

$ dataflirt scope --new-project --source=foursquare.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off POI export for a specific city or a continuous monitoring feed across global categories, we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →