SYSTEM all green source shein.com queue 61,440 pages p99 latency 118ms dataflirt.com · scraper/shein-com
RUN · 174 active pipelines · shein.com live

Shein data,
at catalogue scale.

We extract product listings, pricing signals, discount depths, trend rankings, seller data, reviews, and category intelligence from Shein. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Products extracted
3.1M /day
Price updates
8.4M /24h
Review records
640K /run
Active pipelines
174
Uptime
99.95%
Data Dictionary

Every field we extract from shein.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Product Listings objects from shein.com. All fields typed and schema-versioned.

goods_idtitlebrandcategorysub_categorypriceoriginal_pricecurrencydiscount_pctin_stockstock_levelcolor_countsize_optionsratingreview_countwish_countdescriptionmaterialcare_instructionsimage_urlsvideo_urlvariation_countdate_addedis_new_arrivalis_trendingpage_url
product_listings
● 200 OK
"goods_id": "sg-11203571,
"title": "SHEIN EZwear Floral Print Wrap Midi Dress",
"category": "Women Dresses",
"price": 12.99,
"original_price": 22.99,
"currency": "USD",
"discount_pct": 43,
"rating": 4.3,
"review_count": 3847,
"is_new_arrival": true,
"in_stock": true
# goods_idtitlebrandcategorysub_categoryprice
1
2
3

Complete list of extractable fields for Pricing & Promotions objects from shein.com. All fields typed and schema-versioned.

goods_idpriceoriginal_pricediscount_pctdiscount_absflash_sale_priceflash_sale_ends_atcoupon_eligibleapp_exclusive_pricenew_user_pricebulk_discount_tiersloyalty_priceprice_timestampcurrencymarket
pricing_& promotions
● 200 OK
"goods_id": "sg-11203571",
"price": 12.99,
"original_price": 22.99,
"discount_pct": 43,
"flash_sale_price": 9.99,
"flash_sale_ends_at": "2026-05-13T23:59:00Z",
"app_exclusive_price": 11.49,
"coupon_eligible": true,
"price_timestamp": "2026-05-12T08:22:00Z"
# goods_idpriceoriginal_pricediscount_pctdiscount_absflash_sale_price
1
2
3

Complete list of extractable fields for Reviews & Ratings objects from shein.com. All fields typed and schema-versioned.

review_idgoods_idreviewer_nameverified_purchasestar_ratingreview_titlereview_bodyreview_datehelpful_votessize_purchasedcolor_purchasedfit_feedbackimage_urlscountryheight_cmweight_kg
reviews_& ratings
● 200 OK
"review_id": "rv_sh_4928710",
"goods_id": "sg-11203571",
"star_rating": 5,
"verified_purchase": true,
"review_title": "Perfect summer dress, runs true to size",
"helpful_votes": 84,
"fit_feedback": "true_to_size",
"review_date": "2026-04-29"
# review_idgoods_idreviewer_nameverified_purchasestar_ratingreview_title
1
2
3

Complete list of extractable fields for Category & Trends objects from shein.com. All fields typed and schema-versioned.

category_idcategory_nameparent_categorytrending_ranknew_arrivals_counttotal_productsavg_priceavg_discount_pctavg_ratingtop_goods_idsscraped_at
category_& trends
● 200 OK
"category_id": "cat_dresses_midi",
"category_name": "Midi Dresses",
"trending_rank": 3,
"new_arrivals_count": 1482,
"avg_price": 14.20,
"avg_discount_pct": 38,
"avg_rating": 4.2,
"scraped_at": "2026-05-12T08:30:00Z"
# category_idcategory_nameparent_categorytrending_ranknew_arrivals_counttotal_products
1
2
3

Capabilities

Everything you need from Shein — nothing you don't

Our Shein scraper handles every layer of the platform: product catalogues, dynamic pricing, flash sale windows, trend rankings, and the review corpus — with JavaScript rendering, session management, and anti-bot circumvention built in.

Full Product Data Extraction

Title, description, material, care instructions, images, size options, and every metadata field Shein surfaces — scraped at SKU level with full variant mapping.

Flash Sale & Discount Tracking

Capture price, original price, flash sale windows, app-exclusive rates, new-user discounts, and coupon eligibility — timestamped per crawl.

Trend & Ranking Intelligence

Extract trending rank, new arrival flags, bestseller positions, and wish-count signals across categories — track what's rising in real time.

Review & Fit Feedback Mining

Full review text, star ratings, fit feedback, size purchased, helpful votes, and reviewer body metrics — paginated across all review pages.

Category Catalogue Mapping

Complete category tree with product counts, average pricing, average discount depth, and top-ranked items per sub-category.

Search & Keyword Rank Scraping

Track organic position and sponsored placement for any keyword — with new-arrival, trending, and curated-collection badge capture.

Multi-Market Support

shein.com, shein.co.uk, shein.de, shein.com.au, shein.com.mx and 20+ regional storefronts — all from a unified schema with localised pricing.

Flash Sale & Limited-Time Offer Monitoring

Monitor flash sale eligibility windows, countdown timers, stock depletion rates, and coupon stacking — useful for competitive pricing and trend alerting.

Scheduled + Streaming Modes

Run one-off bulk exports or configure continuous pipelines at hourly, daily, or real-time cadences with change-detection diffing.

// engagement pipeline

From category URL to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide category URLs, keyword sets, or goods ID lists. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy / Playwright crawlers, proxy rotation, session management, and CAPTCHA handling for shein.com.

Validation & QA
d 4–6

Schema validation, null-rate checks, price-outlier detection, and sample reviews before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Shein pipeline handles the hard parts

Shein's platform is heavily JavaScript-rendered with aggressive bot detection. Here's how we stay resilient — and why teams choose managed infrastructure over DIY.

pipeline-monitor · shein.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Anti-bot layer
Residential proxy rotation + fingerprint spoofing

Shein's bot detection operates on TLS fingerprints, browser headers, and IP reputation scoring. Our crawlers use residential ISP proxies with realistic browser fingerprints, randomised request timing, and full cookie session management — trained on real user behaviour patterns.

JavaScript rendering
Full Playwright execution for SPA content

Shein product pages, category feeds, and flash sale pages are fully JavaScript-rendered single-page applications. We run full Playwright browser sessions with lazy-load triggering, scroll simulation, and dynamic price widget hydration — capturing data that headless HTTP clients miss entirely.

Schema stability
Resilient selectors with fallback chains

Shein iterates its frontend rapidly. Our selector strategy uses multiple fallback chains per field — CSS selectors, XPath, text-pattern matching, and structured data extraction — so a layout change doesn't break your data pipeline overnight.

Change detection
Only re-scrape what's changed

For large SKU catalogues, we maintain a hash index of last-seen values per field. Subsequent runs only push diffs — reducing compute cost, storage bloat, and downstream processing load. You get a clean changelog rather than full re-dumps.

Monitoring & alerting
24/7 pipeline health with anomaly detection

Every run emits structured logs to our observability stack. We alert on null-rate spikes, price outliers, schema drift, and coverage drops — and respond before you notice. SLA uptime is contractual, not aspirational.

Applications

Who uses Shein data — and how

Teams across industries use shein.com data to build competitive products and smarter operations.

01
Competitive Pricing & Discount Intelligence

Fashion retailers and D2C brands monitor Shein's aggressive discount cadence, flash sale windows, and price floors to benchmark their own positioning.

02
Trend Forecasting & Assortment Planning

Buyers and merchandisers track new arrival velocity, rising category ranks, and wish-count growth to identify emerging micro-trends weeks before mass adoption.

03
Market Research & Category Analysis

Analysts map category saturation, average price points, and discount depth across thousands of sub-categories to identify whitespace and investment opportunities.

04
AI Training Data

ML teams use Shein datasets to train fashion recommendation engines, visual similarity models, and NLP classifiers on apparel descriptions.

05
Supply Chain & Sourcing Intelligence

Sourcing teams correlate material descriptions, pricing, and review velocity to benchmark supplier costs and identify fast-moving product attributes.

06
Investor & Analyst Due Diligence

PE firms and analysts track category leaders, new arrival frequency, and review growth curves to evaluate fast-fashion platform dynamics.

Why DataFlirt

"Shein lists millions of new SKUs every week — making it the fastest-moving fashion dataset on earth. But none of that trend signal is usable unless you build the pipeline."

Most teams underestimate the complexity: reliable Shein scraping requires residential proxies, full JavaScript rendering, dynamic session handling, and daily selector maintenance. DataFlirt absorbs that infrastructure complexity so your analysts can focus on the fashion intelligence — not the plumbing.

Technical Spec

Shein scraper — technical capabilities

Everything supported by our shein.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Full Playwright sessions — required for product pages, flash sales, and dynamic pricing widgets
Supported
CAPTCHA bypass
Automated 2Captcha + CapSolver integration with fallback to manual queue
Supported
Residential proxy rotation
ISP-grade residential IPs from US / UK / AU / DE pools — rotated per request
Supported
Multi-market support
shein.com, .co.uk, .de, .com.au, .com.mx and 20+ regional storefronts
Supported
Variant/size mapping
All colour and size combinations per goods ID with stock level per variant
Supported
Flash sale tracking
Sale price, countdown timer, and stock-depletion rate captured per run
Supported
Review pagination
Full review corpus including all star-filter pages, fit feedback, and reviewer body metrics
Supported
New arrivals feed
Daily new-arrival ingestion per category — timestamped for trend velocity analysis
Supported
Trend & rank tracking
Trending rank, bestseller position, and wish-count captured per crawl with time-series history
Supported
Change detection (diffs)
Hash-based diff: only emit records with changed fields since last run
Supported
Webhook delivery
HTTP POST per record or batch — useful for real-time pricing and trend alerting workflows
Supported
Authenticated user data
Order history, personal wishlists, and account-gated data require user credentials
Partial
Infrastructure

Infrastructure powering the Shein pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright handles JavaScript rendering, cookie sessions, and scroll interactions. Combined via scrapy-playwright middleware.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies across US/UK/AU/DE regions. Rotation happens per-request with sticky sessions where required. IP score monitoring prevents blacklisted pool contamination.

Cloud-Native Orchestration

Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested — schema versioned per run
CSV
Flat file with typed columns — Excel/Sheets compatible
Parquet
Columnar format for BigQuery, Snowflake, Athena
S3
Direct bucket delivery — compatible with any data lake
BigQuery
Streamed directly into your dataset with schema auto-detect
Webhook
HTTP POST per record for real-time downstream processing
Postgres
Upsert into your existing schema with conflict resolution
Snowflake
Stage + COPY INTO workflow — incremental or full-replace
// faq

Common questions.

About shein.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping Shein legal?

Scraping publicly available information from Shein is generally permissible under applicable law in India, the US, and the UK — consistent with the hiQ v. LinkedIn ruling and similar precedents. DataFlirt targets only public, non-authenticated product, pricing, and review data. We do not extract personal data or circumvent authentication walls. We recommend clients review Shein's ToS independently and consult legal counsel for specific use cases.

How do you handle Shein's anti-bot systems?

We use residential ISP proxies that appear as real consumer traffic, full Playwright browser sessions with realistic fingerprints, and request timing modelled on human behaviour. Our selectors have multi-layer fallback chains so DOM changes don't break the pipeline. We monitor for block-rate spikes in real time and trigger pool rotation or solver queues automatically.

Which Shein markets do you support?

We support shein.com, shein.co.uk, shein.de, shein.com.au, shein.com.mx, shein.com.br, shein.fr, shein.it, shein.es, shein.com.sg, and 15+ additional regional storefronts — all from a unified schema with market-normalised pricing.

How fresh is the data — what latency can I expect?

Latency depends on your agreed cadence. Real-time streaming pipelines achieve sub-60-minute latency for price and flash-sale signals on a defined SKU set. Full catalogue refreshes at daily cadence complete within a 6–12 hour window. New-arrival feeds can be ingested within hours of Shein publishing them.

Can you track trend rankings and new arrivals over time?

Yes. Every pipeline run produces timestamped snapshots. We maintain a time-series table per goods ID for price, trending rank, review count, and wish count. New arrival timestamps allow you to calculate trend velocity from day of listing.

What's the minimum viable engagement?

Our smallest packages start at a defined SKU list or category set (typically 5,000–100,000 items) with weekly delivery. For larger catalogues, ongoing trend monitoring, or custom schema requirements, we price based on volume and delivery frequency. Contact us with your use case for a scoped quote.

Do you capture review images and fit feedback?

Yes — including reviewer-submitted images, size and colour purchased, fit feedback labels (true to size, runs small, runs large), and self-reported height and weight where provided. This makes Shein review data particularly valuable for fashion-fit modelling.

Can I request a sample dataset before committing?

Absolutely. We provide a sample run of up to 1,000 SKUs or 20 category pages as part of the pre-engagement scoping process — so you can validate schema fit, field completeness, and data quality before signing any contract.

$ dataflirt scope --new-project --source=shein.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off trend catalogue snapshot or a continuous flash-sale monitoring feed across 3M SKUs — we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →