Shein Scraper — Product, Pricing & Trend Data Extraction

Data Dictionary

Every field we extract from shein.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Product Listings objects from shein.com. All fields typed and schema-versioned.

goods_idtitlebrandcategorysub_categorypriceoriginal_pricecurrencydiscount_pctin_stockstock_levelcolor_countsize_optionsratingreview_countwish_countdescriptionmaterialcare_instructionsimage_urlsvideo_urlvariation_countdate_addedis_new_arrivalis_trendingpage_url

"goods_id": "sg-11203571,
"title": "SHEIN EZwear Floral Print Wrap Midi Dress",
"category": "Women Dresses",
"price": 12.99,
"original_price": 22.99,
"currency": "USD",
"discount_pct": 43,
"rating": 4.3,
"review_count": 3847,
"is_new_arrival": true,
"in_stock": true

#	goods_id	title	brand	category	sub_category	price
1
2
3

Complete list of extractable fields for Pricing & Promotions objects from shein.com. All fields typed and schema-versioned.

goods_idpriceoriginal_pricediscount_pctdiscount_absflash_sale_priceflash_sale_ends_atcoupon_eligibleapp_exclusive_pricenew_user_pricebulk_discount_tiersloyalty_priceprice_timestampcurrencymarket

"goods_id": "sg-11203571",
"price": 12.99,
"original_price": 22.99,
"discount_pct": 43,
"flash_sale_price": 9.99,
"flash_sale_ends_at": "2026-05-13T23:59:00Z",
"app_exclusive_price": 11.49,
"coupon_eligible": true,
"price_timestamp": "2026-05-12T08:22:00Z"

#	goods_id	price	original_price	discount_pct	discount_abs	flash_sale_price
1
2
3

Complete list of extractable fields for Reviews & Ratings objects from shein.com. All fields typed and schema-versioned.

review_idgoods_idreviewer_nameverified_purchasestar_ratingreview_titlereview_bodyreview_datehelpful_votessize_purchasedcolor_purchasedfit_feedbackimage_urlscountryheight_cmweight_kg

"review_id": "rv_sh_4928710",
"goods_id": "sg-11203571",
"star_rating": 5,
"verified_purchase": true,
"review_title": "Perfect summer dress, runs true to size",
"helpful_votes": 84,
"fit_feedback": "true_to_size",
"review_date": "2026-04-29"

#	review_id	goods_id	reviewer_name	verified_purchase	star_rating	review_title
1
2
3

Complete list of extractable fields for Category & Trends objects from shein.com. All fields typed and schema-versioned.

category_idcategory_nameparent_categorytrending_ranknew_arrivals_counttotal_productsavg_priceavg_discount_pctavg_ratingtop_goods_idsscraped_at

"category_id": "cat_dresses_midi",
"category_name": "Midi Dresses",
"trending_rank": 3,
"new_arrivals_count": 1482,
"avg_price": 14.20,
"avg_discount_pct": 38,
"avg_rating": 4.2,
"scraped_at": "2026-05-12T08:30:00Z"

#	category_id	category_name	parent_category	trending_rank	new_arrivals_count	total_products
1
2
3

Capabilities

Everything you need from Shein — nothing you don't

Our Shein scraper handles every layer of the platform: product catalogues, dynamic pricing, flash sale windows, trend rankings, and the review corpus — with JavaScript rendering, session management, and anti-bot circumvention built in.

Full Product Data Extraction

Title, description, material, care instructions, images, size options, and every metadata field Shein surfaces — scraped at SKU level with full variant mapping.

Flash Sale & Discount Tracking

Capture price, original price, flash sale windows, app-exclusive rates, new-user discounts, and coupon eligibility — timestamped per crawl.

Trend & Ranking Intelligence

Extract trending rank, new arrival flags, bestseller positions, and wish-count signals across categories — track what's rising in real time.

Review & Fit Feedback Mining

Full review text, star ratings, fit feedback, size purchased, helpful votes, and reviewer body metrics — paginated across all review pages.

Category Catalogue Mapping

Complete category tree with product counts, average pricing, average discount depth, and top-ranked items per sub-category.

Search & Keyword Rank Scraping

Track organic position and sponsored placement for any keyword — with new-arrival, trending, and curated-collection badge capture.

Multi-Market Support

shein.com, shein.co.uk, shein.de, shein.com.au, shein.com.mx and 20+ regional storefronts — all from a unified schema with localised pricing.

Flash Sale & Limited-Time Offer Monitoring

Monitor flash sale eligibility windows, countdown timers, stock depletion rates, and coupon stacking — useful for competitive pricing and trend alerting.

Scheduled + Streaming Modes

Run one-off bulk exports or configure continuous pipelines at hourly, daily, or real-time cadences with change-detection diffing.

// engagement pipeline

From category URL to warehouse record

Brief in. Clean data out.

Define Scope

d 0

Provide category URLs, keyword sets, or goods ID lists. We design the extraction schema together.

Pipeline Build

d 2–4

We configure Scrapy / Playwright crawlers, proxy rotation, session management, and CAPTCHA handling for shein.com.

Validation & QA

d 4–6

Schema validation, null-rate checks, price-outlier detection, and sample reviews before full launch.

Delivery

ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Shein pipeline handles the hard parts

Shein's platform is heavily JavaScript-rendered with aggressive bot detection. Here's how we stay resilient — and why teams choose managed infrastructure over DIY.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

Anti-bot layer

Residential proxy rotation + fingerprint spoofing

Shein's bot detection operates on TLS fingerprints, browser headers, and IP reputation scoring. Our crawlers use residential ISP proxies with realistic browser fingerprints, randomised request timing, and full cookie session management — trained on real user behaviour patterns.

JavaScript rendering

Full Playwright execution for SPA content

Shein product pages, category feeds, and flash sale pages are fully JavaScript-rendered single-page applications. We run full Playwright browser sessions with lazy-load triggering, scroll simulation, and dynamic price widget hydration — capturing data that headless HTTP clients miss entirely.

Schema stability

Resilient selectors with fallback chains

Shein iterates its frontend rapidly. Our selector strategy uses multiple fallback chains per field — CSS selectors, XPath, text-pattern matching, and structured data extraction — so a layout change doesn't break your data pipeline overnight.

Change detection

Only re-scrape what's changed

For large SKU catalogues, we maintain a hash index of last-seen values per field. Subsequent runs only push diffs — reducing compute cost, storage bloat, and downstream processing load. You get a clean changelog rather than full re-dumps.

Monitoring & alerting

24/7 pipeline health with anomaly detection

Every run emits structured logs to our observability stack. We alert on null-rate spikes, price outliers, schema drift, and coverage drops — and respond before you notice. SLA uptime is contractual, not aspirational.

Applications

Who uses Shein data — and how

Teams across industries use shein.com data to build competitive products and smarter operations.

Competitive Pricing & Discount Intelligence

Fashion retailers and D2C brands monitor Shein's aggressive discount cadence, flash sale windows, and price floors to benchmark their own positioning.

Trend Forecasting & Assortment Planning

Buyers and merchandisers track new arrival velocity, rising category ranks, and wish-count growth to identify emerging micro-trends weeks before mass adoption.

Market Research & Category Analysis

Analysts map category saturation, average price points, and discount depth across thousands of sub-categories to identify whitespace and investment opportunities.

AI Training Data

ML teams use Shein datasets to train fashion recommendation engines, visual similarity models, and NLP classifiers on apparel descriptions.

Supply Chain & Sourcing Intelligence

Sourcing teams correlate material descriptions, pricing, and review velocity to benchmark supplier costs and identify fast-moving product attributes.

Investor & Analyst Due Diligence

PE firms and analysts track category leaders, new arrival frequency, and review growth curves to evaluate fast-fashion platform dynamics.

Technical Spec

Shein scraper — technical capabilities

Everything supported by our shein.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering

Full Playwright sessions — required for product pages, flash sales, and dynamic pricing widgets

Supported

CAPTCHA bypass

Automated 2Captcha + CapSolver integration with fallback to manual queue

Supported

Residential proxy rotation

ISP-grade residential IPs from US / UK / AU / DE pools — rotated per request

Supported

Multi-market support

shein.com, .co.uk, .de, .com.au, .com.mx and 20+ regional storefronts

Supported

Variant/size mapping

All colour and size combinations per goods ID with stock level per variant

Supported

Flash sale tracking

Sale price, countdown timer, and stock-depletion rate captured per run

Supported

Review pagination

Full review corpus including all star-filter pages, fit feedback, and reviewer body metrics

Supported

New arrivals feed

Daily new-arrival ingestion per category — timestamped for trend velocity analysis

Supported

Trend & rank tracking

Trending rank, bestseller position, and wish-count captured per crawl with time-series history

Supported

Change detection (diffs)

Hash-based diff: only emit records with changed fields since last run

Supported

Webhook delivery

HTTP POST per record or batch — useful for real-time pricing and trend alerting workflows

Supported

Authenticated user data

Order history, personal wishlists, and account-gated data require user credentials

Partial

Infrastructure

Infrastructure powering the Shein pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

Scrapy + Playwright Stack

Scrapy handles crawl orchestration, deduplication, and retry logic. Playwright handles JavaScript rendering, cookie sessions, and scroll interactions. Combined via scrapy-playwright middleware.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies across US/UK/AU/DE regions. Rotation happens per-request with sticky sessions where required. IP score monitoring prevents blacklisted pool contamination.

Cloud-Native Orchestration

Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

// faq

Common questions.

About shein.com scraping, legality, and pipeline operations.

Ask us directly →

Is scraping Shein legal?

Scraping publicly available information from Shein is generally permissible under applicable law in India, the US, and the UK — consistent with the hiQ v. LinkedIn ruling and similar precedents. DataFlirt targets only public, non-authenticated product, pricing, and review data. We do not extract personal data or circumvent authentication walls. We recommend clients review Shein's ToS independently and consult legal counsel for specific use cases.

How do you handle Shein's anti-bot systems?

We use residential ISP proxies that appear as real consumer traffic, full Playwright browser sessions with realistic fingerprints, and request timing modelled on human behaviour. Our selectors have multi-layer fallback chains so DOM changes don't break the pipeline. We monitor for block-rate spikes in real time and trigger pool rotation or solver queues automatically.

Which Shein markets do you support?

We support shein.com, shein.co.uk, shein.de, shein.com.au, shein.com.mx, shein.com.br, shein.fr, shein.it, shein.es, shein.com.sg, and 15+ additional regional storefronts — all from a unified schema with market-normalised pricing.

How fresh is the data — what latency can I expect?

Latency depends on your agreed cadence. Real-time streaming pipelines achieve sub-60-minute latency for price and flash-sale signals on a defined SKU set. Full catalogue refreshes at daily cadence complete within a 6–12 hour window. New-arrival feeds can be ingested within hours of Shein publishing them.

Can you track trend rankings and new arrivals over time?

Yes. Every pipeline run produces timestamped snapshots. We maintain a time-series table per goods ID for price, trending rank, review count, and wish count. New arrival timestamps allow you to calculate trend velocity from day of listing.

What's the minimum viable engagement?

Our smallest packages start at a defined SKU list or category set (typically 5,000–100,000 items) with weekly delivery. For larger catalogues, ongoing trend monitoring, or custom schema requirements, we price based on volume and delivery frequency. Contact us with your use case for a scoped quote.

Do you capture review images and fit feedback?

Yes — including reviewer-submitted images, size and colour purchased, fit feedback labels (true to size, runs small, runs large), and self-reported height and weight where provided. This makes Shein review data particularly valuable for fashion-fit modelling.

Can I request a sample dataset before committing?

Absolutely. We provide a sample run of up to 1,000 SKUs or 20 category pages as part of the pre-engagement scoping process — so you can validate schema fit, field completeness, and data quality before signing any contract.

Shein data,
at catalogue scale.

Every field we extract from shein.com

Everything you need from Shein — nothing you don't

From category URL to warehouse record

How our Shein pipeline handles the hard parts

Who uses Shein data — and how

Shein scraper — technical capabilities

Infrastructure powering the Shein pipeline

Your data, your destination

Common questions.

Tell us what
to extract.
We do the rest.

Data Extraction for Every Industry

Shein data, at catalogue scale.

Every field we extract from shein.com

Everything you need from Shein — nothing you don't

From category URL to warehouse record

How our Shein pipeline handles the hard parts

Who uses Shein data — and how

Shein scraper — technical capabilities

Infrastructure powering the Shein pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

Shein data,
at catalogue scale.

Tell us what
to extract.
We do the rest.