SYSTEM all green source hm.com queue 14,892 pages p99 latency 210ms dataflirt.com · scraper/hm-com
RUN · 64 active pipelines · hm.com live

H&M data,
at warehouse scale.

We extract apparel catalogues, size availability, pricing changes, and sustainability metadata from H&M. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Products extracted
185K /day
Price updates
420K /24h
Inventory checks
1.2M /run
Active pipelines
64
Uptime
99.94%
Data Dictionary

Every field we extract from hm.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Product Information objects from hm.com. All fields typed and schema-versioned.

article_codenamedepartmentcategoryfitcompositionmaterialssustainability_labelconscious_choicecare_instructionsdescriptionimage_urls
product_information
● 200 OK
"article_code": "1023456001",
"name": "Oversized Cotton T-shirt",
"department": "Men",
"category": "T-shirts",
"fit": "Oversized",
"composition": "Cotton 100%",
"conscious_choice": true,
"sustainability_label": "Recycled cotton 20%"
# article_codenamedepartmentcategoryfitcomposition
1
2
3

Complete list of extractable fields for Pricing & Inventory objects from hm.com. All fields typed and schema-versioned.

article_codecolour_namesize_labelsize_codepriceoriginal_pricecurrencyavailability_statuslow_stock_warningmember_priceregionscraped_at
pricing_& inventory
● 200 OK
"article_code": "1023456001",
"colour_name": "Washed Black",
"size_label": "L",
"price": 1299.0,
"original_price": 1499.0,
"currency": "INR",
"availability_status": "IN_STOCK",
"low_stock_warning": false
# article_codecolour_namesize_labelsize_codepriceoriginal_price
1
2
3

Complete list of extractable fields for Reviews & Fit Data objects from hm.com. All fields typed and schema-versioned.

review_idarticle_coderatingreview_titlereview_bodyfit_feedbacklength_feedbackquality_feedbackreviewer_nicknamereview_datecountry
reviews_& fit data
● 200 OK
"review_id": "REV-982341",
"article_code": "1023456001",
"rating": 4.5,
"review_title": "Great fit, slightly long",
"fit_feedback": "True to size",
"length_feedback": "Slightly long",
"quality_feedback": "Excellent",
"review_date": "2023-11-14"
# review_idarticle_coderatingreview_titlereview_bodyfit_feedback
1
2
3

Capabilities

Extract the complete H&M catalogue — down to the SKU

Our H&M scraper handles dynamic sizing grids, region-specific pricing, and nested article codes. We bypass API rate limits and geo-blocks to deliver clean apparel data.

Article Code Mapping

Extract every colour and size variant linked to a parent product. We map the full SKU matrix so you see exact availability.

Size-Level Inventory

Track stock status at the individual size level (e.g., 'M - Out of Stock', 'L - Few Left') across specific regional storefronts.

Multi-Region Pricing

Capture base price, promotional discounts, and H&M Member exclusive prices across different country domains (hm.com/en_in, hm.com/en_us).

Sustainability Metadata

Extract 'Conscious Choice' tags, material composition percentages, and recycling data directly from product descriptions.

Fit & Review Mining

Aggregate customer feedback on fit, length, and quality — crucial metrics for apparel returns analysis and competitor benchmarking.

High-Frequency Diffs

Fast fashion inventory moves quickly. We run high-frequency diffs to catch out-of-stock events and price drops within hours.

// engagement pipeline

From category URL to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide target categories, regions, or specific article codes. We design the extraction schema together.

Pipeline Build
d 2–4

We configure Scrapy / Playwright crawlers, proxy rotation, session management, and CAPTCHA handling for hm.com.

Validation & QA
d 4–6

Schema validation, null-rate checks, price-outlier detection, and variant mapping verification before full launch.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our H&M pipeline handles the hard parts

Apparel scraping requires handling complex product matrices and dynamic APIs. Here's how we ensure reliable delivery.

pipeline-monitor · hm.com · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
API rate limiting
Distributed GraphQL querying

H&M relies heavily on GraphQL endpoints for price and stock hydration. We distribute requests across large residential IP pools, managing request headers and query structures to avoid rate limits.

Dynamic inventory
Size-level stock resolution

Stock isn't a single boolean — it varies by colour and size. Our scrapers iterate through the full article code matrix, capturing availability for every specific SKU combination.

Geo-blocking
Localised residential proxies

H&M redirects or blocks requests that do not match the target region's IP. We use country-specific residential proxies to ensure we see the correct local pricing, currency, and stock levels.

Fast fashion churn
High-frequency change detection

Products are added and removed daily. We maintain a hash index of the catalogue, running continuous diffs to emit only new arrivals, price changes, and stock-outs — saving compute and storage.

Data normalisation
Structured fit and material data

Apparel metadata is often unstructured text. We parse composition strings (e.g., 'Cotton 80%, Polyester 20%') and fit descriptors into clean, queryable JSON fields.

Applications

Who uses H&M data — and how

Teams across industries use hm.com data to build competitive products and smarter operations.

01
Competitor Price Intelligence

Apparel retailers track H&M's pricing tiers, promotional cadences, and markdown strategies to optimise their own pricing models.

02
Trend & Assortment Analysis

Merchandising teams monitor new arrivals, category depth, and colour trends to understand fast fashion assortment strategies.

03
Inventory Gap Analysis

Analysts track size-level stock-outs to identify supply chain bottlenecks or high-demand product categories.

04
Sustainability Benchmarking

ESG researchers and brands extract 'Conscious Choice' data and material compositions to track industry shifts toward sustainable materials.

05
AI Styling & Recommendation

Machine learning teams use product imagery, fit descriptions, and category metadata to train fashion recommendation engines.

06
Market Research

Consultancies track review volume and sentiment across regions to gauge brand performance and product quality perception.

Why DataFlirt

"H&M's catalogue moves at the speed of fast fashion — tracking size-level stock and regional pricing requires high-frequency, distributed extraction."

Apparel scraping fails when pipelines cannot handle complex article-code matrices or dynamic stock APIs. DataFlirt manages the residential proxies, JavaScript rendering, and schema normalisation so your team receives clean, structured retail data without the maintenance overhead.

Technical Spec

H&M scraper — technical capabilities

Everything supported by our hm.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering
Full Playwright sessions — required for dynamic stock and price hydration
Supported
Proxy rotation
Country-specific residential IPs to bypass geo-redirects
Supported
Article-code variation mapping
Links parent products to all colour and size SKUs
Supported
Multi-region support
Target specific domains (hm.com/en_gb, hm.com/en_in, etc.)
Supported
Fit & size feedback
Extracts aggregated fit metrics (true to size, length) from reviews
Supported
Conscious Choice metadata
Parses material composition percentages and sustainability tags
Supported
Store stock availability
Checks local physical store inventory via postal code
Supported
H&M Member points balance
Requires authenticated user session
Partial
User purchase history
Protected personal data behind login wall
Partial
Infrastructure

Infrastructure powering the H&M pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Stack

Scrapy handles crawl orchestration and deduplication. Playwright executes JavaScript to hydrate dynamic pricing and stock APIs.

Residential Proxy Infrastructure

We maintain localised residential proxy pools to ensure accurate regional pricing and prevent geo-blocking or rate limiting.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS, orchestrated by Airflow. This allows us to scale up for high-frequency inventory diffs.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Newline-delimited or nested — ideal for complex SKU matrices
CSV
Flat file with typed columns — ready for Excel or BI tools
Parquet
Columnar format for BigQuery, Snowflake, Athena
S3
Direct bucket delivery — compatible with any data lake
Webhook
HTTP POST per record for real-time stock alerts
Snowflake
Stage + COPY INTO workflow — incremental or full-replace
// faq

Common questions.

About hm.com scraping, legality, and pipeline operations.

Ask us directly →
Is scraping H&M legal?

Scraping publicly available product, pricing, and review data is generally permissible. DataFlirt does not bypass authentication walls, scrape personal user data, or violate GDPR. Clients should consult legal counsel regarding their specific use of the extracted data.

How do you handle H&M's region-specific pricing?

H&M uses geo-IP detection to route users to local storefronts. We use country-specific residential proxies (e.g., UK IPs for hm.com/en_gb) to ensure we capture the correct local currency, pricing, and stock availability.

Can you track inventory at the size level?

Yes. Our scrapers map the full article code matrix, capturing the exact stock status (in stock, low stock, out of stock) for every specific size and colour combination.

How frequently can you refresh the catalogue?

We can configure pipelines for daily full-catalogue refreshes or high-frequency intra-day runs targeting specific high-velocity categories to catch intra-day stock-outs.

Do you extract material composition and sustainability data?

Yes. We parse the product description and metadata to extract material percentages (e.g., 'Cotton 100%') and capture any 'Conscious Choice' or recycling labels.

What is the minimum viable engagement?

Engagements typically start with a defined set of categories or a specific regional storefront. We price based on data volume, extraction frequency, and schema complexity.

Can I request a sample dataset?

Absolutely. We provide a sample run of up to 500 products as part of the scoping process, allowing you to validate the schema, variant mapping, and data quality before committing.

$ dataflirt scope --new-project --source=hm.com ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or continuous size-level stock monitoring — we scope, build, and operate the pipeline. Tell us what you need.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →