SYSTEM all green source pagesjaunes.fr queue 12,841 queries p99 latency 894ms dataflirt.com · scraper/pagesjaunes-fr
RUN | 41 active pipelines | pagesjaunes.fr live

PagesJaunes data,
at warehouse scale.

We extract business profiles, contact details, operating hours, SIRET numbers, and customer reviews from PagesJaunes. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Listings extracted
1.2M /day
Phone numbers
850K /day
Reviews scraped
340K /run
Active pipelines
41
Uptime
99.98%
Data Dictionary

Every field we extract from pagesjaunes.fr

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Business Profiles objects from pagesjaunes.fr. All fields typed and schema-versioned.

business_idnamecategoryaddresscitypostal_codephone_numberwebsite_urlsiretlatitudelongitude
business_profiles
● 200 OK
"business_id": "pj_54829103",
"name": "Plomberie Dupont",
"category": "Plombier",
"address": "14 Rue de la Paix",
"city": "Paris",
"postal_code": "75002",
"phone_number": "01 42 68 55 99"
# business_idnamecategoryaddresscitypostal_code
1
2
3

Complete list of extractable fields for Operating Hours objects from pagesjaunes.fr. All fields typed and schema-versioned.

business_idnamemonday_hourstuesday_hourswednesday_hoursthursday_hoursfriday_hourssaturday_hourssunday_hoursservices_offeredpayment_methods
operating_hours
● 200 OK
"business_id": "pj_54829103",
"monday_hours": "08:00-18:00",
"tuesday_hours": "08:00-18:00",
"wednesday_hours": "08:00-18:00",
"thursday_hours": "08:00-18:00",
"services_offered": "['Dépannage', 'Installation']",
"payment_methods": "['Carte Bleue', 'Espèces']"
# business_idnamemonday_hourstuesday_hourswednesday_hoursthursday_hours
1
2
3

Complete list of extractable fields for Reviews & Ratings objects from pagesjaunes.fr. All fields typed and schema-versioned.

review_idbusiness_idreviewer_nameratingreview_textreview_dateresponse_textresponse_datesource
reviews_& ratings
● 200 OK
"review_id": "rev_9928174",
"business_id": "pj_54829103",
"reviewer_name": "Jean Michel",
"rating": 4.5,
"review_text": "Intervention rapide et efficace.",
"review_date": "2025-09-14"
# review_idbusiness_idreviewer_nameratingreview_textreview_date
1
2
3

Complete list of extractable fields for Search Results objects from pagesjaunes.fr. All fields typed and schema-versioned.

keywordlocationpositionbusiness_idnameis_sponsoredratingreview_countscraped_at
search_results
● 200 OK
"keyword": "plombier",
"location": "Paris",
"position": 3,
"business_id": "pj_54829103",
"name": "Plomberie Dupont",
"is_sponsored": false,
"rating": 4.5
# keywordlocationpositionbusiness_idnameis_sponsored
1
2
3

Complete list of extractable fields for Accreditations objects from pagesjaunes.fr. All fields typed and schema-versioned.

business_idcertification_namecertification_yearissuing_bodyeco_labelhandicap_accessparking_availablelanguages_spoken
accreditations
● 200 OK
"business_id": "pj_54829103",
"certification_name": "Qualibat RGE",
"issuing_body": "Qualibat",
"eco_label": true,
"handicap_access": false,
"languages_spoken": "['Français', 'Anglais']"
# business_idcertification_namecertification_yearissuing_bodyeco_labelhandicap_access
1
2
3

Capabilities

Extract French business data with precision

Our PagesJaunes scraper navigates location-based queries, pagination limits, and Datadome protection to deliver structured B2B records. We handle proxy rotation and dynamic rendering automatically.

Full Profile Extraction

Extract name, address, SIRET, website, and metadata across all business categories.

Contact Number Resolution

Interact with obfuscated phone number elements to reveal and extract the full contact digits.

Review & Rating Mining

Capture star ratings, review text, publication dates, and owner responses across multiple pages.

SERP Position Tracking

Track organic versus sponsored visibility per keyword and city location.

Geolocation & Maps Data

Extract exact latitude and longitude coordinates embedded within the map interface.

Operating Hours Parsing

Transform unstructured opening hours into normalised JSON schedules for every day of the week.

Datadome Circumvention

Bypass French anti-bot systems using residential IP rotation and browser fingerprinting.

Category Mapping

Navigate the complex PagesJaunes taxonomy to extract businesses by niche subcategories.

B2B Lead Enrichment

Cross-reference SIRET and SIREN data with official registries for complete company profiles.

// engagement pipeline

From search query to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide keywords, categories, or French departments. We design the extraction schema together.

Pipeline Build
d 2–4

We configure crawlers, proxy rotation, session management, and Datadome bypass logic.

Validation & QA
d 4–6

Schema validation, null-rate checks, and sample data review before full launch.

Delivery
ongoing

JSON, CSV, or Parquet pushed to your S3 bucket or data warehouse on agreed cadence.

Under the hood

Bypassing French directory protections

PagesJaunes uses aggressive anti-bot measures to protect its proprietary business graph. Here is how we maintain pipeline stability.

pipeline-monitor · pagesjaunes.fr · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Datadome evasion
Handling Datadome CAPTCHA and fingerprinting

PagesJaunes relies heavily on Datadome. We use residential proxies paired with Playwright to generate realistic TLS and browser fingerprints, mimicking human navigation to avoid blocks.

Phone number de-obfuscation
Click-to-reveal interactions via Playwright

Contact numbers are hidden behind JavaScript event listeners. Our scrapers execute the necessary clicks within a headless browser to reveal and capture the full phone number.

Pagination handling
Bypassing 50-page limits via precise query segmentation

Search results on PagesJaunes are artificially capped. We segment queries by micro-regions and postal codes to ensure 100% data extraction without hitting pagination walls.

French residential IPs
Using local FR proxies to avoid geographic blocking

Requests originating outside France are heavily scrutinised. We route all traffic through French ISP residential proxies to maintain high trust scores.

Schema normalisation
Standardising inconsistent address formats

Address formats vary wildly across different French regions. We parse and normalise street names, postal codes, and city names into structured fields.

Applications

Who uses PagesJaunes data

Teams across industries use pagesjaunes.fr data to build competitive products and smarter operations.

01
B2B Lead Generation

Sales teams build targeted outreach lists with direct contact details and SIRET numbers for French businesses.

02
Local SEO Monitoring

Agencies track organic ranking and review sentiment across multiple physical locations for their clients.

03
Market Mapping

Analysts measure business density and competitor presence within specific French departments and cities.

04
Data Enrichment

Operations teams append accurate phone numbers and official registry identifiers to existing CRM records.

05
Franchise Auditing

Brands monitor compliance, operating hours, and customer review scores across hundreds of franchisee locations.

06
Investment Research

Firms track category growth, new business registrations, and closure rates by region to inform investments.

Why DataFlirt

"PagesJaunes holds the definitive graph of French local business, but accessing it at scale requires bypassing strict anti-scraping perimeters."

Extracting complete directories requires more than simple HTTP requests. You need French residential IPs, headless browsers to reveal contact details, and precise logic to segment queries past pagination limits. DataFlirt manages this infrastructure so you receive clean, queryable records.

Technical Spec

PagesJaunes scraper technical capabilities

Everything supported by our pagesjaunes.fr scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Datadome bypass
Playwright stealth mode combined with residential IPs to avoid blocks
Supported
French ISP proxies
Localised IP addresses required for regional queries and trust scoring
Supported
Click-to-reveal phone numbers
Automated interaction to extract full contact digits
Supported
SIRET / SIREN extraction
Capture official registry numbers where listed on the profile
Supported
Review pagination
Extract historical reviews beyond the initial load
Supported
Coordinate extraction
Extract latitude and longitude from embedded map elements
Supported
Change detection
Only emit records with updated hours, reviews, or contact details
Supported
Private user account data
Extraction of saved lists and personal search history
Partial
Direct messaging via platform
Automated outreach to business owners through the portal
Partial
Infrastructure

Infrastructure powering the PagesJaunes pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration and retry logic. Playwright handles JavaScript rendering and interaction flows to reveal hidden data.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies specifically for the FR region to ensure high success rates against Datadome.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling and dependency management. All state is stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested structures
CSV
Flat file with typed columns
Parquet
Columnar format for BigQuery and Snowflake
AWS S3
Direct bucket delivery
Webhook
HTTP POST per record for real-time processing
API
REST endpoints to query your extracted data
XLS
Excel compatible files for non-technical teams
PostgreSQL
Upsert directly into your database schema
BigQuery
Streamed directly into your dataset
Snowflake
Stage and COPY INTO workflow
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About pagesjaunes.fr scraping, legality, and pipeline operations.

Ask us directly →
Is scraping PagesJaunes legal?

Scraping publicly available business information is generally permissible under applicable law. DataFlirt extracts only public directory data. We do not extract personal data or circumvent authentication walls. Clients must ensure their subsequent use of the data complies with GDPR and local regulations.

How do you handle Datadome protection?

We use French residential ISP proxies and full Playwright browser sessions with realistic fingerprints. This setup mimics genuine user navigation, preventing the anti-bot systems from flagging our requests.

Can you bypass the 50-page search limit?

Yes. We segment broad queries by smaller geographical units, such as postal codes or specific neighbourhoods, ensuring the result set for each sub-query falls under the pagination limit.

Do you extract hidden phone numbers?

Yes. Our Playwright integration clicks the necessary elements on the page to trigger the network request that reveals the complete phone number.

Can you target specific French departments or cities?

Yes. Pipelines can be configured to target specific geographic parameters, ranging from entire regions down to individual postal codes.

How fresh is the business data?

Pipelines can be configured to run daily, weekly, or monthly. The data is as fresh as the moment the crawl executes.

Do you provide SIRET and SIREN numbers?

Yes. Where PagesJaunes lists the official company registry numbers, our scrapers extract and normalise them into dedicated fields.

$ dataflirt scope --new-project --source=pagesjaunes.fr ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a full extraction of a specific department or continuous monitoring of competitor reviews across France. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →