Taobao Scraper — Product, Pricing & Shop Data Extraction

Data Dictionary

Every field we extract from taobao.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Product Listings objects from taobao.com. All fields typed and schema-versioned.

item_idtitlecategorysub_categoryshop_idshop_nameseller_locationpriceoriginal_pricecurrencydiscount_pctmonthly_salestotal_saleswangwang_idhas_variationsvariation_optionsratingreview_countdsr_scorefree_shippingfreight_typeimage_urlsitem_urlscraped_at

"item_id": "741293847561",
"title": "无线蓝牙耳机 降噪 运动跑步 TWS入耳式",
"shop_name": "数码先锋旗舰店",
"price": 89.00,
"currency": "CNY",
"monthly_sales": 14823,
"rating": 4.8,
"review_count": 38471,
"free_shipping": true,
"discount_pct": 30

#	item_id	title	category	sub_category	shop_id	shop_name
1
2
3

Complete list of extractable fields for Pricing & Promotions objects from taobao.com. All fields typed and schema-versioned.

item_idpriceoriginal_pricediscount_pctcoupon_availablecoupon_valuecoupon_min_spendactivity_priceactivity_nameactivity_end_timebulk_price_tiersvip_priceprice_timestampcurrency

"item_id": "741293847561",
"price": 89.00,
"original_price": 129.00,
"discount_pct": 31,
"coupon_available": true,
"coupon_value": 10,
"activity_name": "双十一活动",
"activity_end_time": "2026-11-11T23:59:00+08:00",
"price_timestamp": "2026-05-12T09:00:00Z"

#	item_id	price	original_price	discount_pct	coupon_available	coupon_value
1
2
3

Complete list of extractable fields for Shop Profiles objects from taobao.com. All fields typed and schema-versioned.

shop_idshop_nameseller_typelocationdsr_descriptiondsr_shippingdsr_servicefollowers_counttotal_itemstotal_salesshop_levelyears_activeresponse_ratereturn_ratebrand_authorisedshop_url

"shop_id": "shumaxianfeng",
"shop_name": "数码先锋旗舰店",
"seller_type": "flagship_store",
"dsr_description": 4.82,
"dsr_shipping": 4.79,
"dsr_service": 4.84,
"followers_count": 284710,
"years_active": 9,
"brand_authorised": true

#	shop_id	shop_name	seller_type	location	dsr_description	dsr_shipping
1
2
3

Complete list of extractable fields for Search Results objects from taobao.com. All fields typed and schema-versioned.

keywordpositionitem_idtitleshop_namepricemonthly_salesratingreview_countis_tmallis_sponsoredfree_shippingthumbnail_urlscraped_at

"keyword": "无线蓝牙耳机",
"position": 1,
"item_id": "741293847561",
"monthly_sales": 14823,
"is_tmall": false,
"is_sponsored": false,
"free_shipping": true,
"scraped_at": "2026-05-12T09:14:33Z"

#	keyword	position	item_id	title	shop_name	price
1
2
3

Capabilities

Everything you need from Taobao — nothing you don't

Our Taobao scraper covers every layer of China's largest C2C marketplace: product listings, monthly sales volume, shop DSR scores, pricing and promotion data, review corpus, and search rankings.

Full Product Data Extraction

Title, description, specifications, images, variants, and every metadata field Taobao surfaces — scraped at item-ID level with full Chinese-language content preserved.

Monthly Sales Volume

Capture monthly and cumulative sales figures per listing — one of the most direct product demand signals available on any major marketplace.

Shop DSR Intelligence

Detailed Seller Ratings across description accuracy, shipping speed, and service quality — the three scores that drive Taobao search rank and buyer trust.

Price & Promotion Tracking

Capture price, original price, coupon availability, activity pricing, bulk tiers, and VIP pricing — timestamped per crawl.

Review & Rating Mining

Full Chinese-language review text, star ratings, review images, variant purchased, and buyer location — paginated across all review pages.

Keyword & Category Rank Tracking

Monitor organic vs sponsored position for any keyword on Taobao — with Tmall vs Taobao store differentiation and free shipping capture.

Tmall vs Taobao Distinction

Flag Tmall flagship store listings vs standard Taobao listings — a critical quality signal for sourcing research, brand intelligence, and competitive analysis.

Coupon & Campaign Monitoring

Track Double 11, 618, and platform-wide campaign pricing windows, coupon stacks, and activity-level discounts for competitor pricing intelligence.

Scheduled + Streaming Modes

One-off bulk exports or continuous pipelines at daily or real-time cadences with change-detection diffing.

// engagement pipeline

From item ID to warehouse record

Brief in. Clean data out.

Define Scope

d 0

Provide item IDs, category URLs, keyword sets, or shop IDs. We design the extraction schema and handle Chinese-language field mapping together.

Pipeline Build

d 2–4

We configure Scrapy / Playwright crawlers with Chinese residential proxies, session management, and CAPTCHA handling for taobao.com.

Validation & QA

d 4–6

Schema validation, null-rate checks, sales-volume outlier detection, and sample records before full launch.

Delivery

ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Taobao pipeline handles the hard parts

Taobao's aggressive bot detection, login requirements, and Chinese network environment require specialised infrastructure that goes well beyond standard scraping setups.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

Chinese residential proxies

Mainland CN IP pool for geo-authenticated access

Taobao actively filters non-Chinese IP ranges and serves degraded or blocked responses to foreign traffic. Our pipeline uses mainland Chinese residential ISP proxies — mandatory for full product data, monthly sales figures, and DSR scores.

Session management for Taobao's login gates

Taobao increasingly gates listing detail pages behind Alipay-linked login. Our pipeline manages authenticated sessions with session rotation and cookie refresh to maintain continuous access without triggering account-level risk flags.

JavaScript rendering

Full Playwright execution for Taobao's SPA

Taobao's product pages, review panels, and promotional widgets are fully JavaScript-rendered. We run Playwright sessions with scroll-triggering, lazy-load resolution, and dynamic price widget hydration — capturing data headless HTTP clients cannot reach.

Schema stability

Resilient selectors across frequent DOM changes

Taobao deploys A/B page variants and updates its DOM structure frequently. Our selector strategy uses multiple fallback chains per field — CSS selectors, XPath, and structured data — maintained by our team in near-real-time.

Monitoring & alerting

24/7 pipeline health with anomaly detection

Every run emits structured logs to our observability stack. We alert on null-rate spikes, sales-volume outliers, schema drift, and coverage drops — and respond before you notice. SLA uptime is contractual, not aspirational.

Applications

Who uses Taobao data — and how

Teams across industries use taobao.com data to build competitive products and smarter operations.

China Market Entry Research

Brands entering the Chinese market use Taobao data to map category demand, price positioning, top-selling shop profiles, and consumer review sentiment before committing to investment.

Sourcing & Supplier Discovery

Global sourcing teams use Taobao monthly sales and DSR data to identify high-volume manufacturers and distributors operating on the platform before moving to direct factory negotiation.

Trend Forecasting

Fashion, beauty, and consumer electronics teams use Taobao sales velocity and keyword rank data as a leading indicator of trends that will reach Western markets 6–12 months later.

AI Training Data

ML teams use Taobao datasets — Chinese-language product titles, descriptions, category hierarchies, and review corpora — to train Chinese NLP models and cross-lingual classifiers.

Brand Protection

Brand protection teams monitor Taobao for counterfeit listings, unauthorised resellers, and parallel import channels at scale — often the first signal of a counterfeit supply chain.

Competitive Pricing Intelligence

Importers and retailers use Taobao factory-gate pricing data to benchmark costs, monitor price erosion, and model gross margin on categories sourced from China.

Technical Spec

Taobao scraper — technical capabilities

Everything supported by our taobao.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering

Full Playwright sessions — mandatory for Taobao's SPA product pages and pricing widgets

Supported

CAPTCHA bypass

Automated 2Captcha + CapSolver integration with fallback to manual queue

Supported

Chinese residential proxies

Mainland CN ISP residential IPs — mandatory for geo-authenticated full data access

Supported

Session / login management

Authenticated session rotation to handle Taobao's Alipay-linked login gates

Supported

Monthly sales capture

Monthly and cumulative sales volume per listing — a direct demand-proxy signal

Supported

DSR score extraction

Three-dimensional Detailed Seller Ratings: description, shipping, and service scores

Supported

Review pagination

Full Chinese-language review corpus with images, variant purchased, and buyer location

Supported

Tmall vs Taobao distinction

Store type flag per listing — flagship, authorised dealer, or standard Taobao seller

Supported

Promotion & coupon capture

Campaign pricing, coupon values, and Double 11 / 618 activity data per listing

Supported

Change detection (diffs)

Hash-based diff: only emit records with changed fields since last run

Supported

Webhook delivery

HTTP POST per record or batch for real-time downstream processing

Supported

Private transaction history

Buyer order history and private seller financials require seller account credentials

Partial

Infrastructure

Infrastructure powering the Taobao pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverCN Residential ProxiesDockerKubernetesGrafanaPrometheus

Scrapy + Playwright Stack

Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright handles Taobao's JavaScript-heavy SPA, scroll interactions, lazy-load triggering, and authenticated session management.

Chinese Residential Proxy Infrastructure

We maintain dedicated pools of mainland Chinese ISP residential proxies — the only proxy type that reliably bypasses Taobao's geo-authentication layer. Rotation happens per-session with IP score monitoring.

Cloud-Native Orchestration

Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

// faq

Common questions.

About taobao.com scraping, legality, and pipeline operations.

Ask us directly →

Is scraping Taobao legal?

Scraping publicly available information from Taobao is a complex legal area that varies by jurisdiction. In India and the US, scraping public data is generally permissible under applicable law and supported by precedents such as hiQ v. LinkedIn. DataFlirt targets only public product, pricing, and review data — not personal data or authenticated seller financials. We recommend clients review Taobao's ToS independently and consult legal counsel for their specific use case and jurisdiction.

Why do you need Chinese residential proxies specifically?

Taobao actively geo-authenticates requests and returns degraded responses — or blocks entirely — for non-Chinese IP ranges. Mainland Chinese ISP residential proxies are the only reliable way to access full product data, monthly sales figures, and DSR scores. VPNs and datacenter proxies are insufficient for sustained production scraping.

How do you handle Taobao's login wall?

Taobao increasingly gates listing detail pages and review content behind Alipay-linked login. Our pipeline manages authenticated sessions with rotation and cookie refresh logic to maintain continuous access. We discuss session provisioning requirements during the scoping engagement.

Can you capture monthly sales volume data?

Yes. Monthly sales count is scraped directly from product listing pages — one of Taobao's most distinctive and valuable data fields. Cumulative total sales are also captured where surfaced. These figures are key demand-proxy signals for market research and sourcing intelligence.

Do you deliver data in Chinese or transliterated?

We deliver raw Chinese-language content (UTF-8 encoded) as scraped — preserving original field values for titles, descriptions, reviews, and shop names. Translation or transliteration can be applied as a post-processing step on request.

What's the minimum viable engagement?

Our smallest packages start at a defined item or category set (typically 1,000–20,000 items) with weekly delivery. Given the infrastructure complexity of Taobao, setup timelines are slightly longer than Western marketplaces. Contact us with your use case for a scoped quote.

Can you distinguish Tmall flagship stores from standard Taobao sellers?

Yes. Each listing is flagged with store type: Tmall flagship, Tmall authorised dealer, or standard Taobao seller. This distinction is critical for sourcing research, brand intelligence, and competitive analysis.

Can I request a sample dataset before committing?

Absolutely. We provide a sample run of up to 300 items or 30 search result pages as part of the pre-engagement scoping process — so you can validate schema fit, Chinese-language field completeness, and data quality before signing any contract.

Taobao data,
at warehouse scale.

Every field we extract from taobao.com

Everything you need from Taobao — nothing you don't

From item ID to warehouse record

How our Taobao pipeline handles the hard parts

Who uses Taobao data — and how

Taobao scraper — technical capabilities

Infrastructure powering the Taobao pipeline

Your data, your destination

Common questions.

Tell us what
to extract.
We do the rest.

Data Extraction for Every Industry

Taobao data, at warehouse scale.

Every field we extract from taobao.com

Everything you need from Taobao — nothing you don't

From item ID to warehouse record

How our Taobao pipeline handles the hard parts

Who uses Taobao data — and how

Taobao scraper — technical capabilities

Infrastructure powering the Taobao pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

Taobao data,
at warehouse scale.

Tell us what
to extract.
We do the rest.