We extract apparel catalogues, size availability, pricing changes, and sustainability metadata from H&M. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Product Information objects from hm.com. All fields typed and schema-versioned.
"article_code": "1023456001", "name": "Oversized Cotton T-shirt", "department": "Men", "category": "T-shirts", "fit": "Oversized", "composition": "Cotton 100%", "conscious_choice": true, "sustainability_label": "Recycled cotton 20%"
| # | article_code | name | department | category | fit | composition |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Pricing & Inventory objects from hm.com. All fields typed and schema-versioned.
"article_code": "1023456001", "colour_name": "Washed Black", "size_label": "L", "price": 1299.0, "original_price": 1499.0, "currency": "INR", "availability_status": "IN_STOCK", "low_stock_warning": false
| # | article_code | colour_name | size_label | size_code | price | original_price |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Reviews & Fit Data objects from hm.com. All fields typed and schema-versioned.
"review_id": "REV-982341", "article_code": "1023456001", "rating": 4.5, "review_title": "Great fit, slightly long", "fit_feedback": "True to size", "length_feedback": "Slightly long", "quality_feedback": "Excellent", "review_date": "2023-11-14"
| # | review_id | article_code | rating | review_title | review_body | fit_feedback |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Our H&M scraper handles dynamic sizing grids, region-specific pricing, and nested article codes. We bypass API rate limits and geo-blocks to deliver clean apparel data.
Extract every colour and size variant linked to a parent product. We map the full SKU matrix so you see exact availability.
Track stock status at the individual size level (e.g., 'M - Out of Stock', 'L - Few Left') across specific regional storefronts.
Capture base price, promotional discounts, and H&M Member exclusive prices across different country domains (hm.com/en_in, hm.com/en_us).
Extract 'Conscious Choice' tags, material composition percentages, and recycling data directly from product descriptions.
Aggregate customer feedback on fit, length, and quality — crucial metrics for apparel returns analysis and competitor benchmarking.
Fast fashion inventory moves quickly. We run high-frequency diffs to catch out-of-stock events and price drops within hours.
Brief in. Clean data out.
Provide target categories, regions, or specific article codes. We design the extraction schema together.
We configure Scrapy / Playwright crawlers, proxy rotation, session management, and CAPTCHA handling for hm.com.
Schema validation, null-rate checks, price-outlier detection, and variant mapping verification before full launch.
JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.
Apparel scraping requires handling complex product matrices and dynamic APIs. Here's how we ensure reliable delivery.
H&M relies heavily on GraphQL endpoints for price and stock hydration. We distribute requests across large residential IP pools, managing request headers and query structures to avoid rate limits.
Stock isn't a single boolean — it varies by colour and size. Our scrapers iterate through the full article code matrix, capturing availability for every specific SKU combination.
H&M redirects or blocks requests that do not match the target region's IP. We use country-specific residential proxies to ensure we see the correct local pricing, currency, and stock levels.
Products are added and removed daily. We maintain a hash index of the catalogue, running continuous diffs to emit only new arrivals, price changes, and stock-outs — saving compute and storage.
Apparel metadata is often unstructured text. We parse composition strings (e.g., 'Cotton 80%, Polyester 20%') and fit descriptors into clean, queryable JSON fields.
Apparel retailers track H&M's pricing tiers, promotional cadences, and markdown strategies to optimise their own pricing models.
Merchandising teams monitor new arrivals, category depth, and colour trends to understand fast fashion assortment strategies.
Analysts track size-level stock-outs to identify supply chain bottlenecks or high-demand product categories.
ESG researchers and brands extract 'Conscious Choice' data and material compositions to track industry shifts toward sustainable materials.
Machine learning teams use product imagery, fit descriptions, and category metadata to train fashion recommendation engines.
Consultancies track review volume and sentiment across regions to gauge brand performance and product quality perception.
"H&M's catalogue moves at the speed of fast fashion — tracking size-level stock and regional pricing requires high-frequency, distributed extraction."
Apparel scraping fails when pipelines cannot handle complex article-code matrices or dynamic stock APIs. DataFlirt manages the residential proxies, JavaScript rendering, and schema normalisation so your team receives clean, structured retail data without the maintenance overhead.
Everything supported by our hm.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Scrapy handles crawl orchestration and deduplication. Playwright executes JavaScript to hydrate dynamic pricing and stock APIs.
We maintain localised residential proxy pools to ensure accurate regional pricing and prevent geo-blocking or rate limiting.
Pipelines run on AWS Lambda and ECS, orchestrated by Airflow. This allows us to scale up for high-frequency inventory diffs.
Data delivered to where your team already works — no new tooling required.
About hm.com scraping, legality, and pipeline operations.
Ask us directly →Scraping publicly available product, pricing, and review data is generally permissible. DataFlirt does not bypass authentication walls, scrape personal user data, or violate GDPR. Clients should consult legal counsel regarding their specific use of the extracted data.
H&M uses geo-IP detection to route users to local storefronts. We use country-specific residential proxies (e.g., UK IPs for hm.com/en_gb) to ensure we capture the correct local currency, pricing, and stock availability.
Yes. Our scrapers map the full article code matrix, capturing the exact stock status (in stock, low stock, out of stock) for every specific size and colour combination.
We can configure pipelines for daily full-catalogue refreshes or high-frequency intra-day runs targeting specific high-velocity categories to catch intra-day stock-outs.
Yes. We parse the product description and metadata to extract material percentages (e.g., 'Cotton 100%') and capture any 'Conscious Choice' or recycling labels.
Engagements typically start with a defined set of categories or a specific regional storefront. We price based on data volume, extraction frequency, and schema complexity.
Absolutely. We provide a sample run of up to 500 products as part of the scoping process, allowing you to validate the schema, variant mapping, and data quality before committing.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or continuous size-level stock monitoring — we scope, build, and operate the pipeline. Tell us what you need.