SYSTEM all green source taobao.com queue 38,441 pages p99 latency 204ms dataflirt.com · scraper/taobao-com
RUN · 129 active pipelines · taobao.com live

Taobao data,
at warehouse scale.

We extract product listings, pricing signals, monthly sales volumes, shop profiles, review corpus, and keyword rankings from Taobao. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Products extracted
1.8M /day
Price updates
8.7M /24h
Review records
650K /run
Active pipelines
129
Uptime
99.93%
Data Dictionary

Every field we extract from taobao.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Product Listings objects from taobao.com. All fields typed and schema-versioned.

item_idtitlecategorysub_categoryshop_idshop_nameseller_locationpriceoriginal_pricecurrencydiscount_pctmonthly_salestotal_saleswangwang_idhas_variationsvariation_optionsratingreview_countdsr_scorefree_shippingfreight_typeimage_urlsitem_urlscraped_at
product_listings
● 200 OK
"item_id": "741293847561",
"title": "无线蓝牙耳机 降噪 运动跑步 TWS入耳式",
"shop_name": "数码先锋旗舰店",
"price": 89.00,
"currency": "CNY",
"monthly_sales": 14823,
"rating": 4.8,
"review_count": 38471,
"free_shipping": true,
"discount_pct": 30
# item_idtitlecategorysub_categoryshop_idshop_name
1
2
3

Complete list of extractable fields for Pricing & Promotions objects from taobao.com. All fields typed and schema-versioned.

item_idpriceoriginal_pricediscount_pctcoupon_availablecoupon_valuecoupon_min_spendactivity_priceactivity_nameactivity_end_timebulk_price_tiersvip_priceprice_timestampcurrency
pricing_& promotions
● 200 OK
"item_id": "741293847561",
"price": 89.00,
"original_price": 129.00,
"discount_pct": 31,
"coupon_available": true,
"coupon_value": 10,
"activity_name": "双十一活动",
"activity_end_time": "2026-11-11T23:59:00+08:00",
"price_timestamp": "2026-05-12T09:00:00Z"
# item_idpriceoriginal_pricediscount_pctcoupon_availablecoupon_value
1
2
3

Complete list of extractable fields for Shop Profiles objects from taobao.com. All fields typed and schema-versioned.

shop_idshop_nameseller_typelocationdsr_descriptiondsr_shippingdsr_servicefollowers_counttotal_itemstotal_salesshop_levelyears_activeresponse_ratereturn_ratebrand_authorisedshop_url
shop_profiles
● 200 OK
"shop_id": "shumaxianfeng",
"shop_name": "数码先锋旗舰店",
"seller_type": "flagship_store",
"dsr_description": 4.82,
"dsr_shipping": 4.79,
"dsr_service": 4.84,
"followers_count": 284710,
"years_active": 9,
"brand_authorised": true
# shop_idshop_nameseller_typelocationdsr_descriptiondsr_shipping
1
2
3

Complete list of extractable fields for Search Results objects from taobao.com. All fields typed and schema-versioned.

keywordpositionitem_idtitleshop_namepricemonthly_salesratingreview_countis_tmallis_sponsoredfree_shippingthumbnail_urlscraped_at
search_results
● 200 OK
"keyword": "无线蓝牙耳机",
"position": 1,
"item_id": "741293847561",
"monthly_sales": 14823,
"is_tmall": false,
"is_sponsored": false,
"free_shipping": true,
"scraped_at": "2026-05-12T09:14:33Z"
# keywordpositionitem_idtitleshop_nameprice
1
2
3

Capabilities

Everything you need from Taobao — nothing you don't

Our Taobao scraper covers every layer of China's largest C2C marketplace: product listings, monthly sales volume, shop DSR scores, pricing and promotion data, review corpus, and search rankings.

Full Product Data Extraction

Title, description, specifications, images, variants, and every metadata field Taobao surfaces — scraped at item-ID level with full Chinese-language content preserved.

Monthly Sales Volume

Capture monthly and cumulative sales figures per listing — one of the most direct product demand signals available on any major marketplace.

Shop DSR Intelligence

Detailed Seller Ratings across description accuracy, shipping speed, and service quality — the three scores that drive Taobao search rank and buyer trust.

Price & Promotion Tracking

Capture price, original price, coupon availability, activity pricing, bulk tiers, and VIP pricing — timestamped per crawl.

Review & Rating Mining

Full Chinese-language review text, star ratings, review images, variant purchased, and buyer location — paginated across all review pages.

Keyword & Category Rank Tracking

Monitor organic vs sponsored position for any keyword on Taobao — with Tmall vs Taobao store differentiation and free shipping capture.

Tmall vs Taobao Distinction

Flag Tmall flagship store listings vs standard Taobao listings — a critical quality signal for sourcing research, brand intelligence, and competitive analysis.

Coupon & Campaign Monitoring

Track Double 11, 618, and platform-wide campaign pricing windows, coupon stacks, and activity-level discounts for competitor pricing intelligence.

Scheduled + Streaming Modes

One-off bulk exports or continuous pipelines at daily or real-time cadences with change-detection diffing.

// engagement pipeline

From item ID to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide item IDs, category URLs, keyword sets, or shop IDs. We design the extraction schema and handle Chinese-language field mapping together.

Pipeline Build
d 2–4

We configure Scrapy / Playwright crawlers with Chinese residential proxies, session management, and CAPTCHA handling for taobao.com.

Validation & QA
d 4–6

Schema validation, null-rate checks, sales-volume outlier detection, and sample records before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Taobao pipeline handles the hard parts

Taobao's aggressive bot detection, login requirements, and Chinese network environment require specialised infrastructure that goes well beyond standard scraping setups.

pipeline-monitor · taobao.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Chinese residential proxies
Mainland CN IP pool for geo-authenticated access

Taobao actively filters non-Chinese IP ranges and serves degraded or blocked responses to foreign traffic. Our pipeline uses mainland Chinese residential ISP proxies — mandatory for full product data, monthly sales figures, and DSR scores.

Login wall handling
Session management for Taobao's login gates

Taobao increasingly gates listing detail pages behind Alipay-linked login. Our pipeline manages authenticated sessions with session rotation and cookie refresh to maintain continuous access without triggering account-level risk flags.

JavaScript rendering
Full Playwright execution for Taobao's SPA

Taobao's product pages, review panels, and promotional widgets are fully JavaScript-rendered. We run Playwright sessions with scroll-triggering, lazy-load resolution, and dynamic price widget hydration — capturing data headless HTTP clients cannot reach.

Schema stability
Resilient selectors across frequent DOM changes

Taobao deploys A/B page variants and updates its DOM structure frequently. Our selector strategy uses multiple fallback chains per field — CSS selectors, XPath, and structured data — maintained by our team in near-real-time.

Monitoring & alerting
24/7 pipeline health with anomaly detection

Every run emits structured logs to our observability stack. We alert on null-rate spikes, sales-volume outliers, schema drift, and coverage drops — and respond before you notice. SLA uptime is contractual, not aspirational.

Applications

Who uses Taobao data — and how

Teams across industries use taobao.com data to build competitive products and smarter operations.

01
China Market Entry Research

Brands entering the Chinese market use Taobao data to map category demand, price positioning, top-selling shop profiles, and consumer review sentiment before committing to investment.

02
Sourcing & Supplier Discovery

Global sourcing teams use Taobao monthly sales and DSR data to identify high-volume manufacturers and distributors operating on the platform before moving to direct factory negotiation.

03
Trend Forecasting

Fashion, beauty, and consumer electronics teams use Taobao sales velocity and keyword rank data as a leading indicator of trends that will reach Western markets 6–12 months later.

04
AI Training Data

ML teams use Taobao datasets — Chinese-language product titles, descriptions, category hierarchies, and review corpora — to train Chinese NLP models and cross-lingual classifiers.

05
Brand Protection

Brand protection teams monitor Taobao for counterfeit listings, unauthorised resellers, and parallel import channels at scale — often the first signal of a counterfeit supply chain.

06
Competitive Pricing Intelligence

Importers and retailers use Taobao factory-gate pricing data to benchmark costs, monitor price erosion, and model gross margin on categories sourced from China.

Why DataFlirt

"Taobao is the world's largest C2C marketplace and the most accurate real-time signal for Chinese consumer demand — but its data is locked behind Chinese IPs, login walls, and one of the most sophisticated bot-detection stacks in e-commerce."

Reliable Taobao scraping requires mainland Chinese residential proxies, authenticated Alipay-linked session management, full JavaScript rendering, and daily selector maintenance against frequent DOM updates. DataFlirt absorbs that complexity so your team can focus on China market intelligence — not the infrastructure.

Technical Spec

Taobao scraper — technical capabilities

Everything supported by our taobao.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Full Playwright sessions — mandatory for Taobao's SPA product pages and pricing widgets
Supported
CAPTCHA bypass
Automated 2Captcha + CapSolver integration with fallback to manual queue
Supported
Chinese residential proxies
Mainland CN ISP residential IPs — mandatory for geo-authenticated full data access
Supported
Session / login management
Authenticated session rotation to handle Taobao's Alipay-linked login gates
Supported
Monthly sales capture
Monthly and cumulative sales volume per listing — a direct demand-proxy signal
Supported
DSR score extraction
Three-dimensional Detailed Seller Ratings: description, shipping, and service scores
Supported
Review pagination
Full Chinese-language review corpus with images, variant purchased, and buyer location
Supported
Tmall vs Taobao distinction
Store type flag per listing — flagship, authorised dealer, or standard Taobao seller
Supported
Promotion & coupon capture
Campaign pricing, coupon values, and Double 11 / 618 activity data per listing
Supported
Change detection (diffs)
Hash-based diff: only emit records with changed fields since last run
Supported
Webhook delivery
HTTP POST per record or batch for real-time downstream processing
Supported
Private transaction history
Buyer order history and private seller financials require seller account credentials
Partial
Infrastructure

Infrastructure powering the Taobao pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverCN Residential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright handles Taobao's JavaScript-heavy SPA, scroll interactions, lazy-load triggering, and authenticated session management.

Chinese Residential Proxy Infrastructure

We maintain dedicated pools of mainland Chinese ISP residential proxies — the only proxy type that reliably bypasses Taobao's geo-authentication layer. Rotation happens per-session with IP score monitoring.

Cloud-Native Orchestration

Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested — schema versioned per run
CSV
Flat file with typed columns — Excel/Sheets compatible
Parquet
Columnar format for BigQuery, Snowflake, Athena
S3
Direct bucket delivery — compatible with any data lake
BigQuery
Streamed directly into your dataset with schema auto-detect
Webhook
HTTP POST per record for real-time downstream processing
Postgres
Upsert into your existing schema with conflict resolution
Snowflake
Stage + COPY INTO workflow — incremental or full-replace
// faq

Common questions.

About taobao.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Taobao legal?

Scraping publicly available information from Taobao is a complex legal area that varies by jurisdiction. In India and the US, scraping public data is generally permissible under applicable law and supported by precedents such as hiQ v. LinkedIn. DataFlirt targets only public product, pricing, and review data — not personal data or authenticated seller financials. We recommend clients review Taobao's ToS independently and consult legal counsel for their specific use case and jurisdiction.

Why do you need Chinese residential proxies specifically?

Taobao actively geo-authenticates requests and returns degraded responses — or blocks entirely — for non-Chinese IP ranges. Mainland Chinese ISP residential proxies are the only reliable way to access full product data, monthly sales figures, and DSR scores. VPNs and datacenter proxies are insufficient for sustained production scraping.

How do you handle Taobao's login wall?

Taobao increasingly gates listing detail pages and review content behind Alipay-linked login. Our pipeline manages authenticated sessions with rotation and cookie refresh logic to maintain continuous access. We discuss session provisioning requirements during the scoping engagement.

Can you capture monthly sales volume data?

Yes. Monthly sales count is scraped directly from product listing pages — one of Taobao's most distinctive and valuable data fields. Cumulative total sales are also captured where surfaced. These figures are key demand-proxy signals for market research and sourcing intelligence.

Do you deliver data in Chinese or transliterated?

We deliver raw Chinese-language content (UTF-8 encoded) as scraped — preserving original field values for titles, descriptions, reviews, and shop names. Translation or transliteration can be applied as a post-processing step on request.

What's the minimum viable engagement?

Our smallest packages start at a defined item or category set (typically 1,000–20,000 items) with weekly delivery. Given the infrastructure complexity of Taobao, setup timelines are slightly longer than Western marketplaces. Contact us with your use case for a scoped quote.

Can you distinguish Tmall flagship stores from standard Taobao sellers?

Yes. Each listing is flagged with store type: Tmall flagship, Tmall authorised dealer, or standard Taobao seller. This distinction is critical for sourcing research, brand intelligence, and competitive analysis.

Can I request a sample dataset before committing?

Absolutely. We provide a sample run of up to 300 items or 30 search result pages as part of the pre-engagement scoping process — so you can validate schema fit, Chinese-language field completeness, and data quality before signing any contract.

$ dataflirt scope --new-project --source=taobao.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off China market trends dataset or a continuous price and sales monitoring feed across 500K items — we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →