SYSTEM all green source domus.it queue 14,208 URLs p99 latency 185ms dataflirt.com · scraper/domus-it
RUN · 14 active pipelines · domus.it live

Architecture data,
structured for analysis.

We extract architectural projects, design reviews, designer profiles, and material specifications from Domus. Delivered as clean JSON, CSV, or Parquet to your data warehouse.

Projects extracted
42.1K /total
Designer profiles
18.4K /total
Articles parsed
112K /total
Active pipelines
14
Uptime
99.98%
Data Dictionary

Every field we extract from domus.it

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Architectural Projects objects from domus.it. All fields typed and schema-versioned.

project_idtitlearchitect_namestudio_namelocation_citylocation_countrycompletion_yearclient_namestructural_engineerprimary_materialsarea_sqmdescriptionimage_urlsfloor_plan_urls
architectural_projects
● 200 OK
"project_id": "PRJ-84921",
"title": "Bosco Verticale",
"architect_name": "Stefano Boeri",
"location_city": "Milan",
"completion_year": 2014,
"area_sqm": 40000,
"primary_materials": "['Concrete', 'Glass', 'Vegetation']"
# project_idtitlearchitect_namestudio_namelocation_citylocation_country
1
2
3

Complete list of extractable fields for Designer Profiles objects from domus.it. All fields typed and schema-versioned.

designer_idfull_namestudio_affiliationnationalitybirth_yearnotable_projectsawards_wonwebsite_urlbiography_textassociated_firmsdiscipline_tags
designer_profiles
● 200 OK
"designer_id": "DSG-1042",
"full_name": "Zaha Hadid",
"nationality": "British-Iraqi",
"birth_year": 1950,
"awards_won": "['Pritzker Architecture Prize', 'Stirling Prize']",
"discipline_tags": "['Architecture', 'Product Design']"
# designer_idfull_namestudio_affiliationnationalitybirth_yearnotable_projects
1
2
3

Complete list of extractable fields for Design Articles objects from domus.it. All fields typed and schema-versioned.

article_idheadlineauthor_namepublication_datecategorytagsabstractbody_textimage_creditsrelated_project_idslanguage_code
design_articles
● 200 OK
"article_id": "ART-59211",
"headline": "The Evolution of Brutalist Architecture in London",
"author_name": "Elena Sommariva",
"publication_date": "2023-11-14",
"category": "Architecture",
"language_code": "en",
"tags": "['Brutalism', 'London', 'Urban Planning']"
# article_idheadlineauthor_namepublication_datecategorytags
1
2
3

Complete list of extractable fields for Exhibitions & Events objects from domus.it. All fields typed and schema-versioned.

event_idevent_namevenue_namecitycountrystart_dateend_datecuratorsthemefeatured_artiststicket_url
exhibitions_& events
● 200 OK
"event_id": "EVT-3301",
"event_name": "Venice Architecture Biennale",
"venue_name": "Giardini della Biennale",
"city": "Venice",
"start_date": "2023-05-20",
"end_date": "2023-11-26",
"theme": "The Laboratory of the Future"
# event_idevent_namevenue_namecitycountrystart_date
1
2
3

Complete list of extractable fields for Product Design objects from domus.it. All fields typed and schema-versioned.

product_idproduct_namedesigner_namemanufacturerlaunch_yearmaterial_compositiondimensionscategoryawardsreview_scorepurchase_url
product_design
● 200 OK
"product_id": "PRD-7721",
"product_name": "Arco Lamp",
"designer_name": "Achille Castiglioni",
"manufacturer": "Flos",
"launch_year": 1962,
"material_composition": "['Carrara Marble', 'Stainless Steel', 'Aluminum']",
"category": "Lighting"
# product_idproduct_namedesigner_namemanufacturerlaunch_yearmaterial_composition
1
2
3

Capabilities

Extract architectural intelligence with precision

Our Domus scraper handles editorial layouts, bilingual content toggles, high-resolution media galleries, and nested project metadata — converting unstructured articles into queryable datasets.

Project Metadata Parsing

Extract architect names, completion dates, square footage, materials, and structural engineering credits from editorial project features.

High-Resolution Image Scraping

Capture direct URLs to architectural photography, floor plans, and conceptual sketches embedded within article galleries.

Bilingual Content Alignment

Domus publishes in Italian and English. We map IT/EN content pairs to ensure consistent data delivery regardless of the source language.

Studio & Architect Mapping

Normalise architect names and studio affiliations across decades of articles to build comprehensive designer directories.

Material & Spec Extraction

Identify and classify building materials, furniture brands, and lighting fixtures mentioned in product design reviews.

Exhibition Tracking

Monitor upcoming design weeks, biennales, and gallery exhibitions with dates, venues, and curator information.

Archive Digitisation

Scrape historical articles and retrospective features to build longitudinal datasets of design trends.

Geospatial Data Extraction

Parse location data from project profiles to map architectural developments by city, region, or country.

Scheduled Updates

Run daily or weekly pipelines to capture newly published articles, project profiles, and event announcements.

// engagement pipeline

From editorial content to structured database

Brief in. Clean data out.

Define Scope
d 0

Specify target categories, date ranges, or specific architectural disciplines. We design the extraction schema.

Pipeline Build
d 2–4

We configure Scrapy crawlers to handle Domus's editorial DOM structures, pagination, and media galleries.

Validation & QA
d 4–6

Schema validation, bilingual alignment checks, and null-rate monitoring before full pipeline execution.

Delivery
ongoing

JSON / CSV / Parquet pushed to your S3 bucket or data warehouse on your defined schedule.

Under the hood

Overcoming editorial scraping challenges

Editorial platforms like Domus present unique extraction hurdles. Here is how we normalise unstructured design content.

pipeline-monitor · domus.it · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
Unstructured text
NLP-assisted metadata extraction

Editorial articles often bury project specifications within narrative paragraphs. We use custom parsing logic to extract structured entities — like area, materials, and completion year — from unstructured body text.

Media galleries
Deep linking for high-res assets

Domus uses lazy-loaded image carousels. Our Playwright integration triggers gallery interactions to expose and capture the underlying high-resolution image URLs and floor plan PDFs.

Bilingual architecture
Language toggle resolution

Articles frequently exist in both Italian and English under different URL structures. We map these variants to prevent duplicate records and ensure consistent language delivery.

DOM variability
Resilient editorial selectors

Magazine layouts change frequently for special features. We use multiple fallback selectors to ensure data continuity even when Domus publishes custom-designed editorial pieces.

Pagination
Infinite scroll handling

Category pages use JavaScript-driven infinite scroll. We execute headless browser sessions to simulate user scrolling, ensuring complete historical archive capture without missing records.

Applications

Who uses Domus data

Teams across industries use domus.it data to build competitive products and smarter operations.

01
Trend Analysis & Forecasting

Design agencies analyse material usage and stylistic keywords across thousands of projects to forecast upcoming architectural trends.

02
Material Sourcing Research

Procurement teams identify frequently specified materials and manufacturers in high-end commercial and residential projects.

03
Academic & Urban Research

Universities build longitudinal datasets of urban development and architectural evolution across specific cities or decades.

04
Competitive Intelligence

Architecture studios track publication frequency, client types, and project scales of competing firms.

05
AI Training Data

Machine learning teams use paired datasets of architectural imagery and descriptive text to train generative design models.

06
Event & Exhibition Tracking

Industry professionals aggregate global design events, curatorial themes, and participating artists for market research.

Why DataFlirt

"Domus contains a century of architectural history and design evolution, but extracting structured metadata from editorial layouts requires precision."

Editorial platforms like Domus present unique scraping challenges: unstructured text, embedded media galleries, and bilingual content layers. DataFlirt normalises this editorial sprawl into strictly typed schemas so your research teams can focus on spatial analysis rather than DOM parsing.

Technical Spec

Domus scraper — technical capabilities

Everything supported by our domus.it scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Bilingual scraping (IT/EN)
Automatic detection and mapping of dual-language article variants
Supported
High-res image extraction
Capture of gallery source URLs bypassing thumbnail compression
Supported
Infinite scroll handling
Playwright execution to load all historical articles in category feeds
Supported
Floor plan PDF capture
Extraction of technical drawings linked within project articles
Supported
Historical archive access
Parsing of digitised articles from the Domus historical catalogue
Supported
Geolocation mapping
Normalisation of project locations into queryable city/country fields
Supported
Webhook delivery
HTTP POST delivery upon pipeline completion
Supported
Premium subscriber articles
Extraction of articles gated behind the Domus paywall
Partial
Domus+ Digital Magazine PDFs
Direct download of full digital magazine issues (requires active subscription)
Partial
Infrastructure

Infrastructure powering the Domus pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Scrapy + Playwright Integration

We combine Scrapy's high-concurrency crawling with Playwright's JavaScript rendering to handle Domus's infinite scroll feeds and dynamic media galleries.

Media Asset Pipelines

Dedicated infrastructure for extracting, validating, and optionally downloading high-resolution architectural photography and technical floor plans.

Cloud-Native Delivery

Pipelines run on Kubernetes clusters with Airflow orchestration, ensuring reliable delivery to your S3 buckets or PostgreSQL databases on schedule.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Nested structures ideal for complex editorial metadata
CSV
Flat files for immediate spreadsheet analysis
XLS
Excel format with basic typing applied
Parquet
Columnar storage for efficient warehouse querying
AWS S3
Direct upload to your cloud storage buckets
Webhook
Automated HTTP POST triggers upon run completion
API
Queryable REST endpoints for pipeline results
PostgreSQL
Direct upsert into your relational database schema
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About domus.it scraping, legality, and pipeline operations.

Ask us directly →
Can you extract data from both Italian and English versions of Domus?

Yes. We can configure the pipeline to target a specific language preference or extract both, mapping equivalent articles to prevent duplication in your dataset.

How do you handle unstructured project details?

While Domus presents data editorially, our parsers use pattern matching and custom selectors to extract specific entities like architect names, completion years, and materials into structured fields.

Do you scrape the actual images or just the URLs?

Standard delivery includes direct URLs to the highest-resolution images available on the page. If required, we can configure a media pipeline to download and transfer the actual image files to your S3 bucket.

Can you access the Domus+ premium archive?

We only extract publicly available editorial content. Articles and digital magazine PDFs gated behind the Domus+ subscription wall are not supported to comply with access restrictions.

How far back does the historical extraction go?

We can extract any article or project currently indexed and publicly accessible on the domus.it website, which includes extensive digitised historical archives.

What is the typical delivery cadence?

For editorial monitoring, weekly or monthly cadences are standard. We also perform one-off historical bulk extractions covering specific decades or architectural categories.

$ dataflirt scope --new-project --source=domus.it ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a complete historical archive of design projects or a weekly feed of new exhibitions — we build and manage the pipeline. Contact our engineering team to define your schema.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →