SYSTEM all green source domus.it queue 14,208 URLs p99 latency 185ms dataflirt.com · scraper/domus-it

RUN · 14 active pipelines · domus.it live

Architecture data,
structured for analysis.

We extract architectural projects, design reviews, designer profiles, and material specifications from Domus. Delivered as clean JSON, CSV, or Parquet to your data warehouse.

Get data from domus.it → See how it works

Projects extracted

42.1K /total

Designer profiles

18.4K /total

Articles parsed

112K /total

Active pipelines

Uptime

99.98%

◆ Architectural Projects◆ Designer Profiles◆ Material Specifications◆ Exhibition Archives◆ Floor Plan Extraction◆ Studio Directories◆ Product Design Reviews◆ Location Mapping◆ Urban Planning Data◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA◆ Architectural Projects◆ Designer Profiles◆ Material Specifications◆ Exhibition Archives◆ Floor Plan Extraction◆ Studio Directories◆ Product Design Reviews◆ Location Mapping◆ Urban Planning Data◆ Managed Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA

Data Dictionary

Every field we extract from domus.it

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Architectural Projects objects from domus.it. All fields typed and schema-versioned.

project_idtitlearchitect_namestudio_namelocation_citylocation_countrycompletion_yearclient_namestructural_engineerprimary_materialsarea_sqmdescriptionimage_urlsfloor_plan_urls

"project_id": "PRJ-84921",
"title": "Bosco Verticale",
"architect_name": "Stefano Boeri",
"location_city": "Milan",
"completion_year": 2014,
"area_sqm": 40000,
"primary_materials": "['Concrete', 'Glass', 'Vegetation']"

#	project_id	title	architect_name	studio_name	location_city	location_country
1
2
3

Complete list of extractable fields for Designer Profiles objects from domus.it. All fields typed and schema-versioned.

designer_idfull_namestudio_affiliationnationalitybirth_yearnotable_projectsawards_wonwebsite_urlbiography_textassociated_firmsdiscipline_tags

"designer_id": "DSG-1042",
"full_name": "Zaha Hadid",
"nationality": "British-Iraqi",
"birth_year": 1950,
"awards_won": "['Pritzker Architecture Prize', 'Stirling Prize']",
"discipline_tags": "['Architecture', 'Product Design']"

#	designer_id	full_name	studio_affiliation	nationality	birth_year	notable_projects
1
2
3

Complete list of extractable fields for Design Articles objects from domus.it. All fields typed and schema-versioned.

article_idheadlineauthor_namepublication_datecategorytagsabstractbody_textimage_creditsrelated_project_idslanguage_code

"article_id": "ART-59211",
"headline": "The Evolution of Brutalist Architecture in London",
"author_name": "Elena Sommariva",
"publication_date": "2023-11-14",
"category": "Architecture",
"language_code": "en",
"tags": "['Brutalism', 'London', 'Urban Planning']"

#	article_id	headline	author_name	publication_date	category	tags
1
2
3

Complete list of extractable fields for Exhibitions & Events objects from domus.it. All fields typed and schema-versioned.

event_idevent_namevenue_namecitycountrystart_dateend_datecuratorsthemefeatured_artiststicket_url

"event_id": "EVT-3301",
"event_name": "Venice Architecture Biennale",
"venue_name": "Giardini della Biennale",
"city": "Venice",
"start_date": "2023-05-20",
"end_date": "2023-11-26",
"theme": "The Laboratory of the Future"

#	event_id	event_name	venue_name	city	country	start_date
1
2
3

Complete list of extractable fields for Product Design objects from domus.it. All fields typed and schema-versioned.

product_idproduct_namedesigner_namemanufacturerlaunch_yearmaterial_compositiondimensionscategoryawardsreview_scorepurchase_url

"product_id": "PRD-7721",
"product_name": "Arco Lamp",
"designer_name": "Achille Castiglioni",
"manufacturer": "Flos",
"launch_year": 1962,
"material_composition": "['Carrara Marble', 'Stainless Steel', 'Aluminum']",
"category": "Lighting"

#	product_id	product_name	designer_name	manufacturer	launch_year	material_composition
1
2
3

Capabilities

Extract architectural intelligence with precision

Our Domus scraper handles editorial layouts, bilingual content toggles, high-resolution media galleries, and nested project metadata — converting unstructured articles into queryable datasets.

Project Metadata Parsing

Extract architect names, completion dates, square footage, materials, and structural engineering credits from editorial project features.

High-Resolution Image Scraping

Capture direct URLs to architectural photography, floor plans, and conceptual sketches embedded within article galleries.

Bilingual Content Alignment

Domus publishes in Italian and English. We map IT/EN content pairs to ensure consistent data delivery regardless of the source language.

Studio & Architect Mapping

Normalise architect names and studio affiliations across decades of articles to build comprehensive designer directories.

Material & Spec Extraction

Identify and classify building materials, furniture brands, and lighting fixtures mentioned in product design reviews.

Exhibition Tracking

Monitor upcoming design weeks, biennales, and gallery exhibitions with dates, venues, and curator information.

Archive Digitisation

Scrape historical articles and retrospective features to build longitudinal datasets of design trends.

Geospatial Data Extraction

Parse location data from project profiles to map architectural developments by city, region, or country.

Scheduled Updates

Run daily or weekly pipelines to capture newly published articles, project profiles, and event announcements.

// engagement pipeline

From editorial content to structured database

Brief in. Clean data out.

Define Scope

d 0

Specify target categories, date ranges, or specific architectural disciplines. We design the extraction schema.

Pipeline Build

d 2–4

We configure Scrapy crawlers to handle Domus's editorial DOM structures, pagination, and media galleries.

Validation & QA

d 4–6

Schema validation, bilingual alignment checks, and null-rate monitoring before full pipeline execution.

Delivery

ongoing

JSON / CSV / Parquet pushed to your S3 bucket or data warehouse on your defined schedule.

Under the hood

Overcoming editorial scraping challenges

Editorial platforms like Domus present unique extraction hurdles. Here is how we normalise unstructured design content.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

Unstructured text

NLP-assisted metadata extraction

Editorial articles often bury project specifications within narrative paragraphs. We use custom parsing logic to extract structured entities — like area, materials, and completion year — from unstructured body text.

Media galleries

Deep linking for high-res assets

Domus uses lazy-loaded image carousels. Our Playwright integration triggers gallery interactions to expose and capture the underlying high-resolution image URLs and floor plan PDFs.

Bilingual architecture

Language toggle resolution

Articles frequently exist in both Italian and English under different URL structures. We map these variants to prevent duplicate records and ensure consistent language delivery.

DOM variability

Resilient editorial selectors

Magazine layouts change frequently for special features. We use multiple fallback selectors to ensure data continuity even when Domus publishes custom-designed editorial pieces.

Pagination

Infinite scroll handling

Category pages use JavaScript-driven infinite scroll. We execute headless browser sessions to simulate user scrolling, ensuring complete historical archive capture without missing records.

Applications

Who uses Domus data

Teams across industries use domus.it data to build competitive products and smarter operations.

Trend Analysis & Forecasting

Design agencies analyse material usage and stylistic keywords across thousands of projects to forecast upcoming architectural trends.

Material Sourcing Research

Procurement teams identify frequently specified materials and manufacturers in high-end commercial and residential projects.

Academic & Urban Research

Universities build longitudinal datasets of urban development and architectural evolution across specific cities or decades.

Competitive Intelligence

Architecture studios track publication frequency, client types, and project scales of competing firms.

AI Training Data

Machine learning teams use paired datasets of architectural imagery and descriptive text to train generative design models.

Event & Exhibition Tracking

Industry professionals aggregate global design events, curatorial themes, and participating artists for market research.

Why DataFlirt

"Domus contains a century of architectural history and design evolution, but extracting structured metadata from editorial layouts requires precision."

Editorial platforms like Domus present unique scraping challenges: unstructured text, embedded media galleries, and bilingual content layers. DataFlirt normalises this editorial sprawl into strictly typed schemas so your research teams can focus on spatial analysis rather than DOM parsing.

Technical Spec

Domus scraper — technical capabilities

Everything supported by our domus.it scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Bilingual scraping (IT/EN)

Automatic detection and mapping of dual-language article variants

Supported

High-res image extraction

Capture of gallery source URLs bypassing thumbnail compression

Supported

Infinite scroll handling

Playwright execution to load all historical articles in category feeds

Supported

Floor plan PDF capture

Extraction of technical drawings linked within project articles

Supported

Historical archive access

Parsing of digitised articles from the Domus historical catalogue

Supported

Geolocation mapping

Normalisation of project locations into queryable city/country fields

Supported

Webhook delivery

HTTP POST delivery upon pipeline completion

Supported

Premium subscriber articles

Extraction of articles gated behind the Domus paywall

Partial

Domus+ Digital Magazine PDFs

Direct download of full digital magazine issues (requires active subscription)

Partial

Infrastructure

Infrastructure powering the Domus pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

Scrapy + Playwright Integration

We combine Scrapy's high-concurrency crawling with Playwright's JavaScript rendering to handle Domus's infinite scroll feeds and dynamic media galleries.

Media Asset Pipelines

Dedicated infrastructure for extracting, validating, and optionally downloading high-resolution architectural photography and technical floor plans.

Cloud-Native Delivery

Pipelines run on Kubernetes clusters with Airflow orchestration, ensuring reliable delivery to your S3 buckets or PostgreSQL databases on schedule.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Nested structures ideal for complex editorial metadata

CSV

Flat files for immediate spreadsheet analysis

XLS

Excel format with basic typing applied

Parquet

Columnar storage for efficient warehouse querying

AWS S3

Direct upload to your cloud storage buckets

Webhook

Automated HTTP POST triggers upon run completion

API

Queryable REST endpoints for pipeline results

PostgreSQL

Direct upsert into your relational database schema

Direct bucket delivery — compatible with any data lake

// faq

Common questions.

About domus.it scraping, legality, and pipeline operations.

Ask us directly →

Can you extract data from both Italian and English versions of Domus?

Yes. We can configure the pipeline to target a specific language preference or extract both, mapping equivalent articles to prevent duplication in your dataset.

How do you handle unstructured project details?

While Domus presents data editorially, our parsers use pattern matching and custom selectors to extract specific entities like architect names, completion years, and materials into structured fields.

Do you scrape the actual images or just the URLs?

Standard delivery includes direct URLs to the highest-resolution images available on the page. If required, we can configure a media pipeline to download and transfer the actual image files to your S3 bucket.

Can you access the Domus+ premium archive?

We only extract publicly available editorial content. Articles and digital magazine PDFs gated behind the Domus+ subscription wall are not supported to comply with access restrictions.

How far back does the historical extraction go?

We can extract any article or project currently indexed and publicly accessible on the domus.it website, which includes extensive digitised historical archives.

What is the typical delivery cadence?

For editorial monitoring, weekly or monthly cadences are standard. We also perform one-off historical bulk extractions covering specific decades or architectural categories.

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a complete historical archive of design projects or a weekly feed of new exhibitions — we build and manage the pipeline. Contact our engineering team to define your schema.

Start a domus.it pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h

Services

Data Extraction for Every Industry

View All Services →

🛍️ eCommerce → 🔍 Search Engine → ⚽ Sports Data → 📱 App Store → 🍕 Food Delivery → 📉 Betting Odds → ✈️ Aviation & Flight → 🛒 Grocery → 🎓 E-Learning → 💹 Stock Market → 🏠 Real Estate → 🤖 AI Training Data → 🧠 LLM Data → 📰 News → ⭐ Reviews → 💼 Job Board → 🏥 Healthcare → 💊 Pharma → 🏢 Company Data → 🤝 B2B Marketplace → 🚗 Automotive → 🌍 Travel → 🏨 Hospitality → 🪙 Cryptocurrency → 💡 IP & Patents → 📈 SEO Data → ⚖️ Legal → 🛡️ Insurance → 📲 Mobile App → 📸 Influencer → 🏛️ Government → 🚚 Transportation → 🎟️ Events → 📂 Directory → ⚡ Dynamic Websites → 📄 PDF Extraction → ✍️ Blog Content → ☁️ Weather → 🖥️ Cloud Scraping → 👨‍💻 Managed Service →

Architecture data, structured for analysis.

Every field we extract from domus.it

Extract architectural intelligence with precision

From editorial content to structured database

Overcoming editorial scraping challenges

Who uses Domus data

Domus scraper — technical capabilities

Infrastructure powering the Domus pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

Architecture data,
structured for analysis.

Tell us what
to extract.
We do the rest.