SYSTEM all green source thredup.com queue 112,408 pages p99 latency 284ms dataflirt.com · scraper/thredup-com

RUN · 42 active pipelines · thredup.com live

ThredUp inventory,
normalised at scale.

We extract single-SKU listings, condition grades, pricing deltas, and brand metrics from ThredUp. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Get data from thredup.com → See how it works

Listings extracted

3.8M /day

Inventory updates

840K /24h

Brand records

42K /run

Active pipelines

Uptime

99.98%

◆ ThredUp Inventory Data◆ Single-SKU Tracking◆ Condition Grading◆ Estimated Retail Pricing◆ Brand Resale Metrics◆ Category Velocity◆ Flaw Descriptions◆ Material Composition◆ Exact Measurements◆ Time-in-Inventory◆ Clearance Status◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA◆ ThredUp Inventory Data◆ Single-SKU Tracking◆ Condition Grading◆ Estimated Retail Pricing◆ Brand Resale Metrics◆ Category Velocity◆ Flaw Descriptions◆ Material Composition◆ Exact Measurements◆ Time-in-Inventory◆ Clearance Status◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA

Data Dictionary

Every field we extract from thredup.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Inventory Listings objects from thredup.com. All fields typed and schema-versioned.

item_idbrandcategorysub_categorysizecondition_gradepriceestimated_retail_pricediscount_pctmaterialsmeasurementscolourpatternimage_urlstime_in_inventoryurl

"item_id": "148920193",
"brand": "Madewell",
"category": "Dresses",
"condition_grade": "Excellent",
"price": 34.99,
"estimated_retail_price": 118.0,
"discount_pct": 70,
"colour": "Navy Blue"

#	item_id	brand	category	sub_category	size	condition_grade
1
2
3

Complete list of extractable fields for Brand Directory objects from thredup.com. All fields typed and schema-versioned.

brand_namebrand_slugdesigner_flagpremium_statusactive_listings_countaverage_resale_priceaverage_discount_pcttop_categoriesbrand_descriptionurl

"brand_name": "Reformation",
"brand_slug": "reformation",
"designer_flag": false,
"premium_status": true,
"active_listings_count": 4821,
"average_resale_price": 85.5,
"average_discount_pct": 62,
"top_categories": "['Dresses', 'Tops']"

#	brand_name	brand_slug	designer_flag	premium_status	active_listings_count	average_resale_price
1
2
3

Complete list of extractable fields for Pricing & Conditions objects from thredup.com. All fields typed and schema-versioned.

item_idcurrent_priceoriginal_thredup_priceestimated_retailcondition_gradeflaw_descriptionclearance_statusfinal_sale_flagdays_on_siteprice_drop_historycurrency

"item_id": "148920193",
"current_price": 34.99,
"original_thredup_price": 42.99,
"condition_grade": "Very Good",
"flaw_description": "Minor pilling on fabric.",
"clearance_status": true,
"final_sale_flag": true,
"days_on_site": 45

#	item_id	current_price	original_thredup_price	estimated_retail	condition_grade	flaw_description
1
2
3

Capabilities

Extract the secondary market — structurally

ThredUp's architecture relies on infinite scrolling, single-SKU inventory, and rapid turnover. Our scraper handles dynamic filtering, image extraction, and out-of-stock detection without missing a listing.

Single-SKU Extraction

Capture unique item IDs, measurements, fabric composition, and precise condition grades for one-of-a-kind inventory.

Retail vs Resale Pricing

Extract ThredUp's listed price against the estimated retail value, calculating exact discount percentages across categories.

Condition & Flaw Mapping

Parse structured condition grades (New with Tags, Excellent, Very Good, Good) alongside specific flaw descriptions.

Brand Catalogue Velocity

Track total active listings, average price points, and category distribution for over 40,000 brands on the platform.

Time-in-Inventory Tracking

Monitor how long specific items sit on the platform before selling, providing real sell-through velocity metrics.

High-Resolution Image Pipelines

Extract front, back, and detail image URLs for computer vision training or catalogue matching.

Clearance & Markdown Detection

Identify items pushed to clearance, tracking price drops and final-sale flags over time.

// engagement pipeline

From brand list to warehouse record

Brief in. Clean data out.

Define Scope

d 0

Provide target brands, categories, or specific filter parameters. We design the extraction schema together.

Pipeline Build

d 2–4

We configure Scrapy / Playwright crawlers, handle infinite scroll pagination, and bypass bot protections.

Validation & QA

d 4–6

Schema validation, null-rate checks, and single-SKU deduplication before full launch.

Delivery

ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our ThredUp pipeline handles the hard parts

Scraping a single-SKU marketplace requires handling massive inventory churn and dynamic frontend rendering. Here is how we optimise the pipeline.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

Infinite scroll handling

API interception over DOM parsing

ThredUp relies heavily on infinite scrolling for category pages. Instead of fragile browser automation scrolling, we intercept the underlying GraphQL/REST API payloads, extracting structured JSON directly for faster, more reliable ingestion.

Single-SKU churn

High-frequency delta updates

Unlike traditional retail, every ThredUp item is unique. When an item sells, it disappears. We use hash-based state tracking to emit 'sold' or 'removed' events, ensuring your database accurately reflects live inventory without full catalogue re-crawls.

Bot mitigation

Residential proxies + TLS fingerprinting

ThredUp uses commercial bot protection to block datacenter IPs. We route requests through US-based residential proxy pools, rotating TLS fingerprints and HTTP/2 headers to match legitimate consumer traffic patterns.

Dynamic filtering

URL state parameter parsing

Category pages use complex URL parameters for sizing, condition, and brand filtering. We programmatically generate these filter permutations to bypass 10,000-item pagination limits and extract deep sub-category inventory.

Data normalisation

Standardised measurement and condition schemas

ThredUp's measurements and flaw descriptions can be unstructured. Our pipeline normalises text fields (e.g., 'Length: 34 in' to structured JSON objects) and standardises condition grades for immediate database ingestion.

Applications

Who uses ThredUp data — and how

Teams across industries use thredup.com data to build competitive products and smarter operations.

Secondary Market Pricing

Retailers and competing resale platforms ingest estimated retail vs resale price deltas to optimise their own pricing algorithms.

Brand Valuation & Resale Metrics

Fashion brands monitor their secondary market volume, average resale value, and condition degradation to inform primary market strategy.

Computer Vision Training

ML teams extract millions of garment images paired with structured category, brand, and condition labels to train visual recognition models.

Arbitrage & Sourcing

Professional resellers track specific designer brands for heavily discounted, high-condition items to source inventory for boutique resale.

Sustainability Reporting

Analysts track the volume of secondhand items circulated per brand to calculate lifecycle extension and environmental impact metrics.

Trend & Velocity Forecasting

Hedge funds and retail analysts measure sell-through rates and time-in-inventory across categories to predict macro fashion trends.

Why DataFlirt

"ThredUp is the largest real-time index of clothing depreciation and secondary market velocity — a critical dataset for modern retail intelligence."

Tracking single-SKU inventory at this scale is structurally different from standard eCommerce scraping. You must account for rapid item deletion, complex condition mapping, and infinite API pagination. DataFlirt manages this stateful extraction so you receive clean, diff-based updates rather than messy, incomplete HTML dumps.

Technical Spec

ThredUp scraper — technical capabilities

Everything supported by our thredup.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

API interception

Direct extraction from frontend API endpoints for speed and stability

Supported

Single-SKU lifecycle

Track items from listed to sold/removed with stateful diffing

Supported

Residential proxy rotation

US-based ISP proxies to bypass geographic and bot restrictions

Supported

Condition normalisation

Structured mapping of textual flaw descriptions and grades

Supported

Image extraction

Capture high-resolution CDN URLs for front, back, and tags

Supported

Brand directory scraping

Extract aggregate metrics across all 40,000+ listed brands

Supported

Clearance tracking

Identify price drops and final-sale status changes

Supported

Clean Out bag tracking

Processing status and payout estimates for individual seller kits

Partial

User purchase history

Historical orders and saved favorites tied to specific user accounts

Partial

Infrastructure

Infrastructure powering the ThredUp pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

Scrapy + Playwright Stack

Scrapy handles crawl orchestration and deduplication. Playwright intercepts API calls and handles complex JavaScript rendering. Combined via scrapy-playwright middleware.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies. Rotation happens per-request with sticky sessions where required. IP score monitoring prevents blacklisted pool contamination.

Cloud-Native Orchestration

Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Newline-delimited or nested — schema versioned per run

CSV

Flat file with typed columns — Excel/Sheets compatible

Parquet

Columnar format for BigQuery, Snowflake, Athena

Direct bucket delivery — compatible with any data lake

Webhook

HTTP POST per record for real-time downstream processing

Snowflake

Stage + COPY INTO workflow — incremental or full-replace

// faq

Common questions.

About thredup.com scraping, legality, and pipeline operations.

Ask us directly →

Is scraping ThredUp legal?

Scraping publicly available inventory and pricing data is generally permissible. DataFlirt targets only public listings and brand directories. We do not extract personal user data, bypass authentication walls to access private Clean Out bags, or violate GDPR/CCPA. Clients should consult legal counsel for specific use cases.

How do you handle items selling out during a crawl?

ThredUp inventory is highly volatile. We use stateful diffing backed by Redis. If an item ID present in the previous run returns a 404 or an 'unavailable' state, we flag it as sold/removed rather than throwing an error, giving you accurate sell-through metrics.

Can you extract high-resolution images?

Yes. We extract the direct CDN URLs for all available images per listing, including front, back, and detail shots. We can deliver the URLs in the data payload or download the images directly to your S3 bucket.

How do you bypass ThredUp's pagination limits?

ThredUp limits deep pagination on large categories. We programmatically generate granular URL filter permutations (combining size, brand, colour, and price brackets) to break large categories into sub-10,000 item chunks, ensuring 100% catalogue coverage.

What is the delivery frequency?

Frequency depends on your needs. For broad category monitoring, daily or weekly runs are standard. For arbitrage or specific high-value designer tracking, we can configure sub-hourly streaming pipelines.

What is the minimum viable engagement?

Our smallest packages start at tracking specific brand sets or categories with weekly delivery. For full-site extraction (millions of SKUs), we price based on compute volume and delivery frequency. Contact us for a scoped quote.

ThredUp inventory,
normalised at scale.

Every field we extract from thredup.com

Extract the secondary market — structurally

From brand list to warehouse record

How our ThredUp pipeline handles the hard parts

Who uses ThredUp data — and how

ThredUp scraper — technical capabilities

Infrastructure powering the ThredUp pipeline

Your data, your destination

Common questions.

Tell us what
to extract.
We do the rest.

Data Extraction for Every Industry

ThredUp inventory, normalised at scale.

Every field we extract from thredup.com

Extract the secondary market — structurally

From brand list to warehouse record

How our ThredUp pipeline handles the hard parts

Who uses ThredUp data — and how

ThredUp scraper — technical capabilities

Infrastructure powering the ThredUp pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

ThredUp inventory,
normalised at scale.

Tell us what
to extract.
We do the rest.