SYSTEM all green source thredup.com queue 112,408 pages p99 latency 284ms dataflirt.com · scraper/thredup-com
RUN · 42 active pipelines · thredup.com live

ThredUp inventory,
normalised at scale.

We extract single-SKU listings, condition grades, pricing deltas, and brand metrics from ThredUp. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Listings extracted
3.8M /day
Inventory updates
840K /24h
Brand records
42K /run
Active pipelines
42
Uptime
99.98%
Data Dictionary

Every field we extract from thredup.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Inventory Listings objects from thredup.com. All fields typed and schema-versioned.

item_idbrandcategorysub_categorysizecondition_gradepriceestimated_retail_pricediscount_pctmaterialsmeasurementscolourpatternimage_urlstime_in_inventoryurl
inventory_listings
● 200 OK
"item_id": "148920193",
"brand": "Madewell",
"category": "Dresses",
"condition_grade": "Excellent",
"price": 34.99,
"estimated_retail_price": 118.0,
"discount_pct": 70,
"colour": "Navy Blue"
# item_idbrandcategorysub_categorysizecondition_grade
1
2
3

Complete list of extractable fields for Brand Directory objects from thredup.com. All fields typed and schema-versioned.

brand_namebrand_slugdesigner_flagpremium_statusactive_listings_countaverage_resale_priceaverage_discount_pcttop_categoriesbrand_descriptionurl
brand_directory
● 200 OK
"brand_name": "Reformation",
"brand_slug": "reformation",
"designer_flag": false,
"premium_status": true,
"active_listings_count": 4821,
"average_resale_price": 85.5,
"average_discount_pct": 62,
"top_categories": "['Dresses', 'Tops']"
# brand_namebrand_slugdesigner_flagpremium_statusactive_listings_countaverage_resale_price
1
2
3

Complete list of extractable fields for Pricing & Conditions objects from thredup.com. All fields typed and schema-versioned.

item_idcurrent_priceoriginal_thredup_priceestimated_retailcondition_gradeflaw_descriptionclearance_statusfinal_sale_flagdays_on_siteprice_drop_historycurrency
pricing_& conditions
● 200 OK
"item_id": "148920193",
"current_price": 34.99,
"original_thredup_price": 42.99,
"condition_grade": "Very Good",
"flaw_description": "Minor pilling on fabric.",
"clearance_status": true,
"final_sale_flag": true,
"days_on_site": 45
# item_idcurrent_priceoriginal_thredup_priceestimated_retailcondition_gradeflaw_description
1
2
3

Capabilities

Extract the secondary market — structurally

ThredUp's architecture relies on infinite scrolling, single-SKU inventory, and rapid turnover. Our scraper handles dynamic filtering, image extraction, and out-of-stock detection without missing a listing.

Single-SKU Extraction

Capture unique item IDs, measurements, fabric composition, and precise condition grades for one-of-a-kind inventory.

Retail vs Resale Pricing

Extract ThredUp's listed price against the estimated retail value, calculating exact discount percentages across categories.

Condition & Flaw Mapping

Parse structured condition grades (New with Tags, Excellent, Very Good, Good) alongside specific flaw descriptions.

Brand Catalogue Velocity

Track total active listings, average price points, and category distribution for over 40,000 brands on the platform.

Time-in-Inventory Tracking

Monitor how long specific items sit on the platform before selling, providing real sell-through velocity metrics.

High-Resolution Image Pipelines

Extract front, back, and detail image URLs for computer vision training or catalogue matching.

Clearance & Markdown Detection

Identify items pushed to clearance, tracking price drops and final-sale flags over time.

// engagement pipeline

From brand list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide target brands, categories, or specific filter parameters. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy / Playwright crawlers, handle infinite scroll pagination, and bypass bot protections.

Validation & QA
d 4–6

Schema validation, null-rate checks, and single-SKU deduplication before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our ThredUp pipeline handles the hard parts

Scraping a single-SKU marketplace requires handling massive inventory churn and dynamic frontend rendering. Here is how we optimise the pipeline.

pipeline-monitor · thredup.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Infinite scroll handling
API interception over DOM parsing

ThredUp relies heavily on infinite scrolling for category pages. Instead of fragile browser automation scrolling, we intercept the underlying GraphQL/REST API payloads, extracting structured JSON directly for faster, more reliable ingestion.

Single-SKU churn
High-frequency delta updates

Unlike traditional retail, every ThredUp item is unique. When an item sells, it disappears. We use hash-based state tracking to emit 'sold' or 'removed' events, ensuring your database accurately reflects live inventory without full catalogue re-crawls.

Bot mitigation
Residential proxies + TLS fingerprinting

ThredUp uses commercial bot protection to block datacenter IPs. We route requests through US-based residential proxy pools, rotating TLS fingerprints and HTTP/2 headers to match legitimate consumer traffic patterns.

Dynamic filtering
URL state parameter parsing

Category pages use complex URL parameters for sizing, condition, and brand filtering. We programmatically generate these filter permutations to bypass 10,000-item pagination limits and extract deep sub-category inventory.

Data normalisation
Standardised measurement and condition schemas

ThredUp's measurements and flaw descriptions can be unstructured. Our pipeline normalises text fields (e.g., 'Length: 34 in' to structured JSON objects) and standardises condition grades for immediate database ingestion.

Applications

Who uses ThredUp data — and how

Teams across industries use thredup.com data to build competitive products and smarter operations.

01
Secondary Market Pricing

Retailers and competing resale platforms ingest estimated retail vs resale price deltas to optimise their own pricing algorithms.

02
Brand Valuation & Resale Metrics

Fashion brands monitor their secondary market volume, average resale value, and condition degradation to inform primary market strategy.

03
Computer Vision Training

ML teams extract millions of garment images paired with structured category, brand, and condition labels to train visual recognition models.

04
Arbitrage & Sourcing

Professional resellers track specific designer brands for heavily discounted, high-condition items to source inventory for boutique resale.

05
Sustainability Reporting

Analysts track the volume of secondhand items circulated per brand to calculate lifecycle extension and environmental impact metrics.

06
Trend & Velocity Forecasting

Hedge funds and retail analysts measure sell-through rates and time-in-inventory across categories to predict macro fashion trends.

Why DataFlirt

"ThredUp is the largest real-time index of clothing depreciation and secondary market velocity — a critical dataset for modern retail intelligence."

Tracking single-SKU inventory at this scale is structurally different from standard eCommerce scraping. You must account for rapid item deletion, complex condition mapping, and infinite API pagination. DataFlirt manages this stateful extraction so you receive clean, diff-based updates rather than messy, incomplete HTML dumps.

Technical Spec

ThredUp scraper — technical capabilities

Everything supported by our thredup.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

API interception
Direct extraction from frontend API endpoints for speed and stability
Supported
Single-SKU lifecycle
Track items from listed to sold/removed with stateful diffing
Supported
Residential proxy rotation
US-based ISP proxies to bypass geographic and bot restrictions
Supported
Condition normalisation
Structured mapping of textual flaw descriptions and grades
Supported
Image extraction
Capture high-resolution CDN URLs for front, back, and tags
Supported
Brand directory scraping
Extract aggregate metrics across all 40,000+ listed brands
Supported
Clearance tracking
Identify price drops and final-sale status changes
Supported
Clean Out bag tracking
Processing status and payout estimates for individual seller kits
Partial
User purchase history
Historical orders and saved favorites tied to specific user accounts
Partial
Infrastructure

Infrastructure powering the ThredUp pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration and deduplication. Playwright intercepts API calls and handles complex JavaScript rendering. Combined via scrapy-playwright middleware.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies. Rotation happens per-request with sticky sessions where required. IP score monitoring prevents blacklisted pool contamination.

Cloud-Native Orchestration

Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested — schema versioned per run
CSV
Flat file with typed columns — Excel/Sheets compatible
Parquet
Columnar format for BigQuery, Snowflake, Athena
S3
Direct bucket delivery — compatible with any data lake
Webhook
HTTP POST per record for real-time downstream processing
Snowflake
Stage + COPY INTO workflow — incremental or full-replace
// faq

Common questions.

About thredup.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping ThredUp legal?

Scraping publicly available inventory and pricing data is generally permissible. DataFlirt targets only public listings and brand directories. We do not extract personal user data, bypass authentication walls to access private Clean Out bags, or violate GDPR/CCPA. Clients should consult legal counsel for specific use cases.

How do you handle items selling out during a crawl?

ThredUp inventory is highly volatile. We use stateful diffing backed by Redis. If an item ID present in the previous run returns a 404 or an 'unavailable' state, we flag it as sold/removed rather than throwing an error, giving you accurate sell-through metrics.

Can you extract high-resolution images?

Yes. We extract the direct CDN URLs for all available images per listing, including front, back, and detail shots. We can deliver the URLs in the data payload or download the images directly to your S3 bucket.

How do you bypass ThredUp's pagination limits?

ThredUp limits deep pagination on large categories. We programmatically generate granular URL filter permutations (combining size, brand, colour, and price brackets) to break large categories into sub-10,000 item chunks, ensuring 100% catalogue coverage.

What is the delivery frequency?

Frequency depends on your needs. For broad category monitoring, daily or weekly runs are standard. For arbitrage or specific high-value designer tracking, we can configure sub-hourly streaming pipelines.

What is the minimum viable engagement?

Our smallest packages start at tracking specific brand sets or categories with weekly delivery. For full-site extraction (millions of SKUs), we price based on compute volume and delivery frequency. Contact us for a scoped quote.

$ dataflirt scope --new-project --source=thredup.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a daily snapshot of specific designer brands or a continuous feed of the entire secondary market catalogue — we build and operate the infrastructure. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →