SYSTEM all green source hm.com queue 14,892 pages p99 latency 210ms dataflirt.com · scraper/hm-com

RUN · 64 active pipelines · hm.com live

H&M data,
at warehouse scale.

We extract apparel catalogues, size availability, pricing changes, and sustainability metadata from H&M. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Get data from hm.com → See how it works

Products extracted

185K /day

Price updates

420K /24h

Inventory checks

1.2M /run

Active pipelines

Uptime

99.94%

◆ H&M Product Data◆ Size-Level Inventory◆ Price History Tracking◆ Sustainability Materials◆ Conscious Choice Tags◆ Review & Fit Mining◆ Category Mapping◆ Multi-Region Pricing◆ Stock Availability◆ Article Code Variants◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA◆ H&M Product Data◆ Size-Level Inventory◆ Price History Tracking◆ Sustainability Materials◆ Conscious Choice Tags◆ Review & Fit Mining◆ Category Mapping◆ Multi-Region Pricing◆ Stock Availability◆ Article Code Variants◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA

Data Dictionary

Every field we extract from hm.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Product Information objects from hm.com. All fields typed and schema-versioned.

article_codenamedepartmentcategoryfitcompositionmaterialssustainability_labelconscious_choicecare_instructionsdescriptionimage_urls

"article_code": "1023456001",
"name": "Oversized Cotton T-shirt",
"department": "Men",
"category": "T-shirts",
"fit": "Oversized",
"composition": "Cotton 100%",
"conscious_choice": true,
"sustainability_label": "Recycled cotton 20%"

#	article_code	name	department	category	fit	composition
1
2
3

Complete list of extractable fields for Pricing & Inventory objects from hm.com. All fields typed and schema-versioned.

article_codecolour_namesize_labelsize_codepriceoriginal_pricecurrencyavailability_statuslow_stock_warningmember_priceregionscraped_at

"article_code": "1023456001",
"colour_name": "Washed Black",
"size_label": "L",
"price": 1299.0,
"original_price": 1499.0,
"currency": "INR",
"availability_status": "IN_STOCK",
"low_stock_warning": false

#	article_code	colour_name	size_label	size_code	price	original_price
1
2
3

Complete list of extractable fields for Reviews & Fit Data objects from hm.com. All fields typed and schema-versioned.

review_idarticle_coderatingreview_titlereview_bodyfit_feedbacklength_feedbackquality_feedbackreviewer_nicknamereview_datecountry

"review_id": "REV-982341",
"article_code": "1023456001",
"rating": 4.5,
"review_title": "Great fit, slightly long",
"fit_feedback": "True to size",
"length_feedback": "Slightly long",
"quality_feedback": "Excellent",
"review_date": "2023-11-14"

#	review_id	article_code	rating	review_title	review_body	fit_feedback
1
2
3

Capabilities

Extract the complete H&M catalogue — down to the SKU

Our H&M scraper handles dynamic sizing grids, region-specific pricing, and nested article codes. We bypass API rate limits and geo-blocks to deliver clean apparel data.

Article Code Mapping

Extract every colour and size variant linked to a parent product. We map the full SKU matrix so you see exact availability.

Size-Level Inventory

Track stock status at the individual size level (e.g., 'M - Out of Stock', 'L - Few Left') across specific regional storefronts.

Multi-Region Pricing

Capture base price, promotional discounts, and H&M Member exclusive prices across different country domains (hm.com/en_in, hm.com/en_us).

Sustainability Metadata

Extract 'Conscious Choice' tags, material composition percentages, and recycling data directly from product descriptions.

Fit & Review Mining

Aggregate customer feedback on fit, length, and quality — crucial metrics for apparel returns analysis and competitor benchmarking.

High-Frequency Diffs

Fast fashion inventory moves quickly. We run high-frequency diffs to catch out-of-stock events and price drops within hours.

// engagement pipeline

From category URL to warehouse record

Brief in. Clean data out.

Define Scope

d 0

Provide target categories, regions, or specific article codes. We design the extraction schema together.

Pipeline Build

d 2–4

We configure Scrapy / Playwright crawlers, proxy rotation, session management, and CAPTCHA handling for hm.com.

Validation & QA

d 4–6

Schema validation, null-rate checks, price-outlier detection, and variant mapping verification before full launch.

Delivery

ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our H&M pipeline handles the hard parts

Apparel scraping requires handling complex product matrices and dynamic APIs. Here's how we ensure reliable delivery.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

API rate limiting

Distributed GraphQL querying

H&M relies heavily on GraphQL endpoints for price and stock hydration. We distribute requests across large residential IP pools, managing request headers and query structures to avoid rate limits.

Dynamic inventory

Size-level stock resolution

Stock isn't a single boolean — it varies by colour and size. Our scrapers iterate through the full article code matrix, capturing availability for every specific SKU combination.

Geo-blocking

Localised residential proxies

H&M redirects or blocks requests that do not match the target region's IP. We use country-specific residential proxies to ensure we see the correct local pricing, currency, and stock levels.

Fast fashion churn

High-frequency change detection

Products are added and removed daily. We maintain a hash index of the catalogue, running continuous diffs to emit only new arrivals, price changes, and stock-outs — saving compute and storage.

Data normalisation

Structured fit and material data

Apparel metadata is often unstructured text. We parse composition strings (e.g., 'Cotton 80%, Polyester 20%') and fit descriptors into clean, queryable JSON fields.

Applications

Who uses H&M data — and how

Teams across industries use hm.com data to build competitive products and smarter operations.

Competitor Price Intelligence

Apparel retailers track H&M's pricing tiers, promotional cadences, and markdown strategies to optimise their own pricing models.

Trend & Assortment Analysis

Merchandising teams monitor new arrivals, category depth, and colour trends to understand fast fashion assortment strategies.

Inventory Gap Analysis

Analysts track size-level stock-outs to identify supply chain bottlenecks or high-demand product categories.

Sustainability Benchmarking

ESG researchers and brands extract 'Conscious Choice' data and material compositions to track industry shifts toward sustainable materials.

AI Styling & Recommendation

Machine learning teams use product imagery, fit descriptions, and category metadata to train fashion recommendation engines.

Market Research

Consultancies track review volume and sentiment across regions to gauge brand performance and product quality perception.

Why DataFlirt

"H&M's catalogue moves at the speed of fast fashion — tracking size-level stock and regional pricing requires high-frequency, distributed extraction."

Apparel scraping fails when pipelines cannot handle complex article-code matrices or dynamic stock APIs. DataFlirt manages the residential proxies, JavaScript rendering, and schema normalisation so your team receives clean, structured retail data without the maintenance overhead.

Technical Spec

H&M scraper — technical capabilities

Everything supported by our hm.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering

Full Playwright sessions — required for dynamic stock and price hydration

Supported

Proxy rotation

Country-specific residential IPs to bypass geo-redirects

Supported

Article-code variation mapping

Links parent products to all colour and size SKUs

Supported

Multi-region support

Target specific domains (hm.com/en_gb, hm.com/en_in, etc.)

Supported

Fit & size feedback

Extracts aggregated fit metrics (true to size, length) from reviews

Supported

Conscious Choice metadata

Parses material composition percentages and sustainability tags

Supported

Store stock availability

Checks local physical store inventory via postal code

Supported

H&M Member points balance

Requires authenticated user session

Partial

User purchase history

Protected personal data behind login wall

Partial

Infrastructure

Infrastructure powering the H&M pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

Scrapy + Playwright Stack

Scrapy handles crawl orchestration and deduplication. Playwright executes JavaScript to hydrate dynamic pricing and stock APIs.

Residential Proxy Infrastructure

We maintain localised residential proxy pools to ensure accurate regional pricing and prevent geo-blocking or rate limiting.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS, orchestrated by Airflow. This allows us to scale up for high-frequency inventory diffs.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Newline-delimited or nested — ideal for complex SKU matrices

CSV

Flat file with typed columns — ready for Excel or BI tools

Parquet

Columnar format for BigQuery, Snowflake, Athena

Direct bucket delivery — compatible with any data lake

Webhook

HTTP POST per record for real-time stock alerts

Snowflake

Stage + COPY INTO workflow — incremental or full-replace

// faq

Common questions.

About hm.com scraping, legality, and pipeline operations.

Ask us directly →

Is scraping H&M legal?

Scraping publicly available product, pricing, and review data is generally permissible. DataFlirt does not bypass authentication walls, scrape personal user data, or violate GDPR. Clients should consult legal counsel regarding their specific use of the extracted data.

How do you handle H&M's region-specific pricing?

H&M uses geo-IP detection to route users to local storefronts. We use country-specific residential proxies (e.g., UK IPs for hm.com/en_gb) to ensure we capture the correct local currency, pricing, and stock availability.

Can you track inventory at the size level?

Yes. Our scrapers map the full article code matrix, capturing the exact stock status (in stock, low stock, out of stock) for every specific size and colour combination.

How frequently can you refresh the catalogue?

We can configure pipelines for daily full-catalogue refreshes or high-frequency intra-day runs targeting specific high-velocity categories to catch intra-day stock-outs.

Do you extract material composition and sustainability data?

Yes. We parse the product description and metadata to extract material percentages (e.g., 'Cotton 100%') and capture any 'Conscious Choice' or recycling labels.

What is the minimum viable engagement?

Engagements typically start with a defined set of categories or a specific regional storefront. We price based on data volume, extraction frequency, and schema complexity.

Can I request a sample dataset?

Absolutely. We provide a sample run of up to 500 products as part of the scoping process, allowing you to validate the schema, variant mapping, and data quality before committing.

H&M data,
at warehouse scale.

Every field we extract from hm.com

Extract the complete H&M catalogue — down to the SKU

From category URL to warehouse record

How our H&M pipeline handles the hard parts

Who uses H&M data — and how

H&M scraper — technical capabilities

Infrastructure powering the H&M pipeline

Your data, your destination

Common questions.

Tell us what
to extract.
We do the rest.

Data Extraction for Every Industry

H&M data, at warehouse scale.

Every field we extract from hm.com

Extract the complete H&M catalogue — down to the SKU

From category URL to warehouse record

How our H&M pipeline handles the hard parts

Who uses H&M data — and how

H&M scraper — technical capabilities

Infrastructure powering the H&M pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

H&M data,
at warehouse scale.

Tell us what
to extract.
We do the rest.