We extract product listings, dynamic pricing, JD Plus rates, seller intelligence, and multi-media reviews from JD.com. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Product Listings objects from jd.com. All fields typed and schema-versioned.
"sku_id": "100012043978", "title": "Apple iPhone 14 Pro Max 256GB", "brand": "Apple", "price": 8999.0, "jd_plus_price": 8899.0, "self_operated": true, "jd_delivery": true, "rating": 98.5, "review_count": 2000000, "in_stock": true
| # | sku_id | title | brand | category | sub_category | price |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Pricing & Offers objects from jd.com. All fields typed and schema-versioned.
"sku_id": "100012043978", "current_price": 8999.0, "original_price": 9899.0, "discount_pct": 9, "flash_sale": false, "jd_plus_price": 8899.0, "coupon_details": "Minus 200 over 4000", "price_timestamp": "2026-05-12T09:14:00Z"
| # | sku_id | current_price | original_price | discount_pct | flash_sale | flash_sale_end |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Reviews & Ratings objects from jd.com. All fields typed and schema-versioned.
"review_id": "1847294719", "sku_id": "100012043978", "star_rating": 5, "plus_member": true, "content": "Excellent battery life and camera.", "helpful_votes": 42, "creation_time": "2026-04-18 14:22:10", "user_level": "Diamond"
| # | review_id | sku_id | user_name | user_level | plus_member | star_rating |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Seller Data objects from jd.com. All fields typed and schema-versioned.
"shop_id": "1000000127", "shop_name": "Apple JD Self-operated Store", "self_operated": true, "rating_product": 9.9, "rating_service": 9.9, "rating_logistics": 9.9, "follower_count": 45000000, "company_name": "JD.com"
| # | shop_id | shop_name | shop_url | self_operated | rating_product | rating_service |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Search Results objects from jd.com. All fields typed and schema-versioned.
"keyword": "smartphone", "position": 1, "sku_id": "100012043978", "ad_flag": false, "self_operated": true, "price": 8999.0, "comment_count": "200W+", "scraped_at": "2026-05-12T09:14:33Z"
| # | keyword | position | sku_id | title | price | comment_count |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Our JD scraper navigates the Chinese e-commerce ecosystem: handling heavy JavaScript, slide CAPTCHAs, dynamic pricing widgets, and geo-fenced content to extract structured electronics data.
Title, specifications, dimensions, weight, images, variations, and metadata fields - scraped at SKU level with parent-child variant mapping.
Capture current price, original price, flash sale windows, coupon details, and bulk discount tiers - timestamped per crawl.
Extract member-exclusive pricing and discounts, providing a complete view of the pricing hierarchy.
Full review text, star ratings, helpful vote counts, plus member attribution, and media URLs - paginated across all review pages.
Shop name, self-operated flags, follower counts, and tripartite ratings (product, service, logistics) for every listing.
Identify JD Delivery eligibility, warehouse origin, and expected delivery windows across geographical zones.
Track organic versus sponsored position for any keyword, with self-operated and JD Logistics badge capture.
Support for Joybuy and JD Worldwide listings, tracking import taxes and international shipping metadata.
Run one-off bulk exports or configure continuous pipelines at hourly, daily, or real-time cadences with change-detection diffing.
Brief in. Clean data out.
Provide SKU lists, category URLs, keyword sets, or shop IDs. We design the extraction schema together.
We configure Scrapy / Playwright crawlers, proxy rotation, session management, and CAPTCHA handling for jd.com.
Schema validation, null-rate checks, price-outlier detection, and sample reviews before full launch.
JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.
JD.com deploys aggressive anti-scraping measures, including complex slide CAPTCHAs and behavioural tracking. Here is how we maintain extraction stability.
JD.com bot detection operates on TLS fingerprints, browser headers, mouse-movement heuristics, and IP reputation. Our crawlers use residential ISP proxies from mainland China and Hong Kong with realistic browser fingerprints.
JD product prices and stock levels are heavily JavaScript-rendered via asynchronous API calls. We run full Playwright browser sessions with JavaScript execution to capture data that headless HTTP clients miss entirely.
JD frequently interrupts sessions with complex slide CAPTCHAs. We integrate CapSolver and 2Captcha to process these challenges automatically, maintaining high throughput without manual intervention.
JD changes its DOM structure frequently. Our selector strategy uses multiple fallback chains per field - CSS selectors, XPath, and text-pattern matching - so a layout change does not break your data pipeline.
For large SKU catalogues, we maintain a hash index of last-seen values per field. Subsequent runs only push diffs - reducing compute cost, storage bloat, and downstream processing load.
Electronics brands and third-party sellers monitor pricing, flash sale windows, and coupon stacking to reprice and protect margin.
Brands audit third-party sellers for MAP violations, counterfeit listings, and unauthorised resellers - protecting brand equity at scale.
Analysts track review velocity, new entrant launches, and category saturation trends to identify whitespace and investment opportunities.
ML teams use JD datasets to train recommendation engines, NLP classifiers, and sentiment models on Chinese language text.
Supply chain teams correlate review velocity and stock depth indicators with sales velocity to improve procurement models.
PE firms and analysts track category leaders, seller growth curves, and review-to-rating ratios to evaluate marketplace companies.
"JD.com holds the definitive pricing baseline for electronics in Asia - but extracting it requires bypassing some of the most aggressive anti-bot systems deployed today."
Most teams underestimate the investment required: reliable JD.com scraping requires mainland China residential proxies, full JavaScript rendering, slide CAPTCHA handling, and daily selector maintenance. DataFlirt absorbs that complexity so your engineers can focus on the analysis - not the infrastructure.
Everything supported by our jd.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright handles JavaScript rendering, cookie sessions, and interaction flows. Combined via scrapy-playwright middleware.
We maintain pools of residential ISP proxies across CN/HK regions. Rotation happens per-request with sticky sessions where required. IP score monitoring prevents blacklisted pool contamination.
Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.
Data delivered to where your team already works — no new tooling required.
About jd.com scraping, legality, and pipeline operations.
Ask us directly →Scraping publicly available information from JD.com is generally permissible under applicable law, focusing on public, non-authenticated product, pricing, and review data. We do not extract personal data or circumvent SMS authentication walls. Clients should review JD's ToS and consult legal counsel for specific use cases.
We use CapSolver and 2Captcha integrations trained specifically on JD's slide and puzzle mechanics. When a challenge is presented, the Playwright session pauses, the solver calculates the trajectory, and executes the slide with human-like mouse movements.
Yes. We configure specific crawler sessions to capture the JD Plus tier pricing alongside the standard retail price, allowing you to map the full discount structure.
Real-time streaming pipelines achieve sub-60-minute latency for price and availability signals on a defined SKU set. Full catalogue refreshes at daily cadence complete within a 6-12 hour window depending on size.
Yes. We capture the source URLs for all user-uploaded images and videos attached to reviews, which is critical for product quality monitoring and sentiment analysis.
Our smallest packages start at a defined SKU list (typically 1,000-50,000 SKUs) with weekly delivery. For larger catalogues or custom schema requirements, we price based on volume and delivery frequency.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off electronics catalogue dump or a continuous price-monitoring feed across 1M SKUs - we scope, build, and operate the pipeline. Tell us what you need.