SYSTEM all green source lego.com queue 12,408 pages p99 latency 184ms dataflirt.com · scraper/lego-com

RUN · 31 active pipelines · lego.com live

Lego catalogue data,
at warehouse scale.

We extract set metadata, dynamic inventory states, regional pricing, and the complete Pick a Brick catalogue from Lego.com. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Get data from lego.com → See how it works

Sets extracted

14.2K /run

Inventory checks

85.4K /24h

Pick a Brick elements

42.1K /run

Active pipelines

Uptime

99.98%

Data Dictionary

Every field we extract from lego.com

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Set Metadata objects from lego.com. All fields typed and schema-versioned.

item_numbertitlethemesub_themepiece_countminifigure_countage_rangedimensionsrelease_datedescriptionimage_urlsinstruction_urlpage_url

"item_number": "75313",
"title": "AT-AT™",
"theme": "Star Wars™",
"sub_theme": "Ultimate Collector Series",
"piece_count": 6785,
"minifigure_count": 9,
"age_range": "18+"

#	item_number	title	theme	sub_theme	piece_count	minifigure_count
1
2
3

Complete list of extractable fields for Inventory & Pricing objects from lego.com. All fields typed and schema-versioned.

item_numberpricelist_pricecurrencydiscount_pctstock_statusbackorder_dateretiring_soonhard_to_findlimit_per_customerregionscraped_at

"item_number": "75313",
"price": 849.99,
"currency": "USD",
"stock_status": "BACKORDER",
"backorder_date": "2026-11-15",
"retiring_soon": true,
"hard_to_find": true,
"limit_per_customer": 2

#	item_number	price	list_price	currency	discount_pct	stock_status
1
2
3

Complete list of extractable fields for Pick a Brick objects from lego.com. All fields typed and schema-versioned.

element_iddesign_idnamecolourcategorypricecurrencystock_statusweight_gdimensionsimage_url

"element_id": "6335146",
"design_id": "3001",
"name": "Brick 2x4",
"colour": "Bright Red",
"category": "Bricks",
"price": 0.24,
"stock_status": "IN_STOCK",
"weight_g": 2.32

#	element_id	design_id	name	colour	category	price
1
2
3

Complete list of extractable fields for Reviews & Ratings objects from lego.com. All fields typed and schema-versioned.

review_iditem_numberreviewer_nicknameoverall_ratingbuild_experienceplayabilityvalue_for_moneyreview_titlereview_textrecommendeddate_posted

"review_id": "REV-98241",
"item_number": "75313",
"overall_rating": 4.8,
"build_experience": 5.0,
"playability": 4.0,
"value_for_money": 4.5,
"recommended": true,
"date_posted": "2026-02-14"

#	review_id	item_number	reviewer_nickname	overall_rating	build_experience	playability
1
2
3

Capabilities

Every brick, set, and stock state — structured

Our Lego scraper handles the entire digital catalogue: set specifications, dynamic inventory states, regional pricing disparities, and the granular Pick a Brick database — bypassing rate limits and SPA rendering issues.

Complete Set Metadata

Extract item numbers, piece counts, minifigure counts, age ranges, dimensions, and high-resolution image URLs across all themes.

Dynamic Inventory Tracking

Monitor exact stock states: In Stock, Backorder (with estimated ship dates), Out of Stock, and Retiring Soon flags.

Regional Pricing Intelligence

Capture pricing, currency, and availability disparities across US, UK, EU, and APAC regional storefronts.

Pick a Brick Extraction

Scrape individual element IDs, design IDs, exact colour taxonomies, and per-piece pricing for the entire loose parts catalogue.

Review & Rating Mining

Extract granular review metrics including build experience, playability, and value for money ratings alongside full text.

Scheduled + Streaming Modes

Run one-off bulk exports or configure continuous pipelines at hourly or daily cadences with change-detection diffing.

// engagement pipeline

From set numbers to warehouse records

Brief in. Clean data out.

Define Scope

d 0

Provide theme URLs, regional requirements, or specific data targets like Pick a Brick. We design the extraction schema together.

Pipeline Build

d 2–4

We configure Scrapy / Playwright crawlers, proxy rotation, session management, and CAPTCHA handling for lego.com.

Validation & QA

d 4–6

Schema validation, null-rate checks, price-outlier detection, and sample outputs before full launch.

Delivery

ongoing

JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Lego pipeline handles the hard parts

Lego.com relies heavily on GraphQL and client-side rendering. Here is how we maintain resilient extraction pipelines.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

Anti-bot layer

Residential proxy rotation + fingerprint spoofing

Retail sites employ aggressive rate limiting to prevent automated stock checking. Our crawlers use residential ISP proxies with realistic browser fingerprints and randomised request timing to bypass these restrictions.

JavaScript rendering

Full Playwright execution for SPA content

Lego.com is a React-based single-page application. We run full Playwright browser sessions with JavaScript execution to ensure dynamic inventory states and pricing widgets hydrate correctly before extraction.

GraphQL interception

Direct API extraction

Where possible, our pipeline intercepts Lego's backend GraphQL requests, extracting structured JSON directly from the API layer rather than parsing the DOM. This ensures higher reliability and schema stability.

Change detection

Only re-scrape what has changed

For inventory tracking, we maintain a hash index of last-seen stock states. Subsequent runs only push diffs — reducing compute cost and ensuring you only process actual inventory events.

Monitoring & alerting

24/7 pipeline health with anomaly detection

Every run emits structured logs to our observability stack. We alert on null-rate spikes, missing item numbers, and coverage drops — responding before you notice data gaps.

Applications

Who uses Lego data — and how

Teams across industries use lego.com data to build competitive products and smarter operations.

Alternative Investment Tracking

Investors track 'Retiring Soon' flags and backorder velocity to predict secondary market price appreciation for highly sought-after sets.

Retail Arbitrage

Resellers monitor regional pricing disparities and stock availability to identify cross-border arbitrage opportunities.

Competitor Price Monitoring

Toy retailers and department stores track Lego's direct-to-consumer pricing and discount strategies to optimise their own margins.

Supply Chain Analysis

Analysts track backorder dates and out-of-stock durations across themes to model manufacturing constraints and demand curves.

AFOL Community Platforms

Adult Fans of Lego (AFOL) database maintainers synchronise their platforms with official set metadata, piece counts, and instruction links.

Market Research

Industry analysts evaluate price-per-piece metrics, theme longevity, and licensed IP performance based on catalogue composition.

Technical Spec

Lego scraper — technical capabilities

Everything supported by our lego.com scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

JavaScript rendering

Full Playwright sessions — required for dynamic stock states and pricing

Supported

CAPTCHA bypass

Automated 2Captcha + CapSolver integration

Supported

Residential proxy rotation

ISP-grade residential IPs rotated per request to bypass rate limits

Supported

Multi-region support

Extract data across US, UK, EU, and APAC regional storefronts

Supported

GraphQL query extraction

Direct interception of backend API responses for stable schema mapping

Supported

Pick a Brick element mapping

Full extraction of design IDs, element IDs, and exact colour names

Supported

Change detection (diffs)

Hash-based diff: only emit records with changed inventory states

Supported

Webhook delivery

HTTP POST per record for real-time stock alerts

Supported

Lego Insiders point balances

Gated data tied to individual authenticated user accounts

Partial

User order history

Private transactional data behind authentication walls

Partial

Infrastructure

Infrastructure powering the Lego pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

Scrapy + Playwright Stack

Scrapy handles crawl orchestration and retry logic. Playwright executes JavaScript and intercepts GraphQL responses for reliable data capture.

Residential Proxy Infrastructure

We maintain pools of residential ISP proxies to bypass aggressive retail rate-limiting, ensuring continuous inventory monitoring.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling for high-frequency stock checks. All state stored in managed Postgres.

// faq

Common questions.

About lego.com scraping, legality, and pipeline operations.

Ask us directly →

Is scraping Lego.com legal?

Scraping publicly available catalogue and pricing information is generally permissible under applicable law. DataFlirt targets only public, non-authenticated set metadata, inventory states, and reviews. We do not extract personal data or circumvent authentication walls. Clients should review Lego's ToS and consult legal counsel.

How do you handle rate limits on inventory checks?

We use residential ISP proxies and precise request timing modelled on human behaviour. By intercepting GraphQL queries rather than brute-forcing HTML loads, we minimise the footprint of our extraction while maintaining high-frequency stock monitoring.

Can you track the 'Retiring Soon' status?

Yes. We specifically monitor and extract lifecycle flags including 'Retiring Soon', 'Hard to Find', and 'New', alongside exact backorder fulfillment dates.

Do you extract the Pick a Brick catalogue?

Yes. We scrape the entire Pick a Brick database, including design IDs, element IDs, exact colour taxonomies, weight, and per-piece pricing.

Can I track pricing across different countries?

Yes. We can configure pipelines to extract data from specific regional subdomains (e.g., en-gb, en-us, de-de), capturing local currency pricing and regional stock availability.

What is the minimum viable engagement?

Our packages start at defined theme lists or the complete active set catalogue with daily delivery. High-frequency stock monitoring (hourly or minute-level) is priced based on compute and proxy volume.

Lego catalogue data,
at warehouse scale.

Every field we extract from lego.com

Every brick, set, and stock state — structured

From set numbers to warehouse records

How our Lego pipeline handles the hard parts

Who uses Lego data — and how

Lego scraper — technical capabilities

Infrastructure powering the Lego pipeline

Your data, your destination

Common questions.

Tell us what
to extract.
We do the rest.

Data Extraction for Every Industry

Lego catalogue data, at warehouse scale.

Every field we extract from lego.com

Every brick, set, and stock state — structured

From set numbers to warehouse records

How our Lego pipeline handles the hard parts

Who uses Lego data — and how

Lego scraper — technical capabilities

Infrastructure powering the Lego pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

Lego catalogue data,
at warehouse scale.

Tell us what
to extract.
We do the rest.