We extract single-SKU listings, condition grades, pricing deltas, and brand metrics from ThredUp. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Inventory Listings objects from thredup.com. All fields typed and schema-versioned.
"item_id": "148920193", "brand": "Madewell", "category": "Dresses", "condition_grade": "Excellent", "price": 34.99, "estimated_retail_price": 118.0, "discount_pct": 70, "colour": "Navy Blue"
| # | item_id | brand | category | sub_category | size | condition_grade |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Brand Directory objects from thredup.com. All fields typed and schema-versioned.
"brand_name": "Reformation", "brand_slug": "reformation", "designer_flag": false, "premium_status": true, "active_listings_count": 4821, "average_resale_price": 85.5, "average_discount_pct": 62, "top_categories": "['Dresses', 'Tops']"
| # | brand_name | brand_slug | designer_flag | premium_status | active_listings_count | average_resale_price |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Pricing & Conditions objects from thredup.com. All fields typed and schema-versioned.
"item_id": "148920193", "current_price": 34.99, "original_thredup_price": 42.99, "condition_grade": "Very Good", "flaw_description": "Minor pilling on fabric.", "clearance_status": true, "final_sale_flag": true, "days_on_site": 45
| # | item_id | current_price | original_thredup_price | estimated_retail | condition_grade | flaw_description |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
ThredUp's architecture relies on infinite scrolling, single-SKU inventory, and rapid turnover. Our scraper handles dynamic filtering, image extraction, and out-of-stock detection without missing a listing.
Capture unique item IDs, measurements, fabric composition, and precise condition grades for one-of-a-kind inventory.
Extract ThredUp's listed price against the estimated retail value, calculating exact discount percentages across categories.
Parse structured condition grades (New with Tags, Excellent, Very Good, Good) alongside specific flaw descriptions.
Track total active listings, average price points, and category distribution for over 40,000 brands on the platform.
Monitor how long specific items sit on the platform before selling, providing real sell-through velocity metrics.
Extract front, back, and detail image URLs for computer vision training or catalogue matching.
Identify items pushed to clearance, tracking price drops and final-sale flags over time.
Brief in. Clean data out.
Provide target brands, categories, or specific filter parameters. We design the extraction schema together.
We configure Scrapy / Playwright crawlers, handle infinite scroll pagination, and bypass bot protections.
Schema validation, null-rate checks, and single-SKU deduplication before full launch.
JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.
Scraping a single-SKU marketplace requires handling massive inventory churn and dynamic frontend rendering. Here is how we optimise the pipeline.
ThredUp relies heavily on infinite scrolling for category pages. Instead of fragile browser automation scrolling, we intercept the underlying GraphQL/REST API payloads, extracting structured JSON directly for faster, more reliable ingestion.
Unlike traditional retail, every ThredUp item is unique. When an item sells, it disappears. We use hash-based state tracking to emit 'sold' or 'removed' events, ensuring your database accurately reflects live inventory without full catalogue re-crawls.
ThredUp uses commercial bot protection to block datacenter IPs. We route requests through US-based residential proxy pools, rotating TLS fingerprints and HTTP/2 headers to match legitimate consumer traffic patterns.
Category pages use complex URL parameters for sizing, condition, and brand filtering. We programmatically generate these filter permutations to bypass 10,000-item pagination limits and extract deep sub-category inventory.
ThredUp's measurements and flaw descriptions can be unstructured. Our pipeline normalises text fields (e.g., 'Length: 34 in' to structured JSON objects) and standardises condition grades for immediate database ingestion.
Retailers and competing resale platforms ingest estimated retail vs resale price deltas to optimise their own pricing algorithms.
Fashion brands monitor their secondary market volume, average resale value, and condition degradation to inform primary market strategy.
ML teams extract millions of garment images paired with structured category, brand, and condition labels to train visual recognition models.
Professional resellers track specific designer brands for heavily discounted, high-condition items to source inventory for boutique resale.
Analysts track the volume of secondhand items circulated per brand to calculate lifecycle extension and environmental impact metrics.
Hedge funds and retail analysts measure sell-through rates and time-in-inventory across categories to predict macro fashion trends.
"ThredUp is the largest real-time index of clothing depreciation and secondary market velocity — a critical dataset for modern retail intelligence."
Tracking single-SKU inventory at this scale is structurally different from standard eCommerce scraping. You must account for rapid item deletion, complex condition mapping, and infinite API pagination. DataFlirt manages this stateful extraction so you receive clean, diff-based updates rather than messy, incomplete HTML dumps.
Everything supported by our thredup.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Scrapy handles crawl orchestration and deduplication. Playwright intercepts API calls and handles complex JavaScript rendering. Combined via scrapy-playwright middleware.
We maintain pools of residential ISP proxies. Rotation happens per-request with sticky sessions where required. IP score monitoring prevents blacklisted pool contamination.
Pipelines run on AWS Lambda (burst) and ECS (sustained). Airflow handles scheduling, dependency management, and SLA alerting. All state stored in managed Postgres.
Data delivered to where your team already works — no new tooling required.
About thredup.com scraping, legality, and pipeline operations.
Ask us directly →Scraping publicly available inventory and pricing data is generally permissible. DataFlirt targets only public listings and brand directories. We do not extract personal user data, bypass authentication walls to access private Clean Out bags, or violate GDPR/CCPA. Clients should consult legal counsel for specific use cases.
ThredUp inventory is highly volatile. We use stateful diffing backed by Redis. If an item ID present in the previous run returns a 404 or an 'unavailable' state, we flag it as sold/removed rather than throwing an error, giving you accurate sell-through metrics.
Yes. We extract the direct CDN URLs for all available images per listing, including front, back, and detail shots. We can deliver the URLs in the data payload or download the images directly to your S3 bucket.
ThredUp limits deep pagination on large categories. We programmatically generate granular URL filter permutations (combining size, brand, colour, and price brackets) to break large categories into sub-10,000 item chunks, ensuring 100% catalogue coverage.
Frequency depends on your needs. For broad category monitoring, daily or weekly runs are standard. For arbitrage or specific high-value designer tracking, we can configure sub-hourly streaming pipelines.
Our smallest packages start at tracking specific brand sets or categories with weekly delivery. For full-site extraction (millions of SKUs), we price based on compute volume and delivery frequency. Contact us for a scoped quote.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a daily snapshot of specific designer brands or a continuous feed of the entire secondary market catalogue — we build and operate the infrastructure. Tell us what you need.