We extract chartered practice directories, Stirling Prize case studies, CPD course listings, and architect profiles from RIBA. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Chartered Practices objects from riba.org. All fields typed and schema-versioned.
"practice_id": "PR-84921", "name": "Foster + Partners", "city": "London", "postcode": "SW11 4AN", "staff_count": "100+", "specialisms": "['Commercial', 'Masterplanning', 'Transport']", "region": "London"
| # | practice_id | name | url | address | city | postcode |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Building Case Studies objects from riba.org. All fields typed and schema-versioned.
"project_id": "CS-9921", "title": "Elizabeth Line", "architect": "Grimshaw", "completion_date": "2022-05-24", "contract_value": 18900000000, "gross_internal_area": 45000, "awards_won": "['RIBA Stirling Prize 2024']"
| # | project_id | title | architect | client | location | completion_date |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for RIBA Awards objects from riba.org. All fields typed and schema-versioned.
"award_year": 2024, "award_name": "RIBA National Award", "project_name": "Chowdhury Walk", "architect_name": "Al-Jawad Pike", "region": "London", "building_type": "Residential", "winner_status": true
| # | award_year | award_name | project_name | architect_name | region | building_type |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for CPD Providers objects from riba.org. All fields typed and schema-versioned.
"provider_id": "CPD-442", "provider_name": "Kingspan Insulation", "course_title": "Fire Performance of Insulated Panel Systems", "format": "Webinar", "duration": "60 mins", "core_curriculum_topic": "Health, safety and wellbeing", "knowledge_level": "General Awareness"
| # | provider_id | provider_name | course_title | format | duration | core_curriculum_topic |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for RIBA Jobs objects from riba.org. All fields typed and schema-versioned.
"job_id": "JB-8831", "job_title": "Part 2 Architectural Assistant", "practice_name": "Haworth Tompkins", "location": "London", "salary_band": "£32,000 - £36,000", "contract_type": "Permanent", "remote_policy": "Hybrid"
| # | job_id | job_title | practice_name | location | salary_band | contract_type |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Our RIBA scraper captures structured data across the entire institute portal: practice directories, award histories, technical case studies, and recruitment data — parsed, cleaned, and normalised.
Extract full contact details, staff counts, specialisms, and regional affiliations for over 4,000 RIBA chartered practices.
Map Stirling Prize, Royal Gold Medal, and Regional Award winners to specific practices and project case studies.
Capture gross internal area (GIA), contract values, sustainability credentials, and client metadata from published projects.
Scrape the RIBA CPD provider network for course topics, delivery formats, and core curriculum alignment.
Monitor architectural hiring trends, salary bands, and remote working policies across the UK sector.
Map practices and projects to RIBA regional chapters and standard UK postcode districts.
Isolate practices by specific sectors: conservation, passivhaus, masterplanning, or commercial fit-out.
Extract text and metadata from public RIBA Plan of Work PDFs and technical guidance documents.
Run monthly diffs to identify newly chartered practices, address changes, or revoked memberships.
Brief in. Clean data out.
Select target datasets: practice directories, awards, case studies, or jobs. We design the extraction schema.
We configure Scrapy crawlers, handle pagination, and set up document parsing for case study metadata.
Schema validation, null-rate checks on contract values, and geographic standardisation before launch.
JSON / CSV / Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.
Extracting data from professional institutes requires handling legacy DOM structures, inconsistent user-submitted data, and nested document metadata.
We utilise UK-based residential proxies to distribute requests across the directory, preventing IP bans and rate-limiting from standard WAF configurations.
The RIBA website mixes legacy directory structures with modern React-based components. Our Playwright instances execute JavaScript to render dynamic search results and hydration states reliably.
Practice profiles are often filled inconsistently by members. We apply post-extraction regex and NLP to standardise addresses, phone formats, and staff count brackets into queryable fields.
Many technical case studies and CPD materials are hosted as PDFs. We route these through a dedicated document parsing microservice to extract text blocks, tables, and metadata alongside the web scraping run.
We maintain a hash index of all chartered practices. Monthly runs only emit diffs, allowing you to easily identify new practices opening or existing practices changing status.
Building material manufacturers use the practice directory to target architects based on their specific sector specialisms and regional location.
Industry analysts aggregate contract values and gross internal areas from case studies to track construction market health.
Agencies monitor RIBA Jobs to track hiring volume, salary band fluctuations, and demand for specific software skills like Revit or ArchiCAD.
Architectural practices track peer award wins, completed project metrics, and stated staff counts to benchmark their own market position.
Universities extract sustainability ratings and material choices from award-winning case studies to analyse trends in sustainable design.
Training providers analyse the CPD directory to identify gaps in the core curriculum and price their own courses competitively.
"The RIBA directory is the definitive graph of British architectural practice, but extracting structured project data requires traversing thousands of nested case studies."
Most teams underestimate the investment required: reliable RIBA scraping requires handling paginated directories, extracting nested PDF metadata, and standardising inconsistent practice formats. DataFlirt absorbs that complexity so your engineers can focus on the analysis — not the infrastructure.
Everything supported by our riba.org scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Scrapy handles the broad directory traversal and deduplication, while Playwright executes JavaScript on modern React components to ensure complete data capture.
Custom Python microservices parse RIBA Plan of Work PDFs and technical guidance documents, extracting structured text arrays alongside the primary HTML scrape.
Pipelines run on AWS Lambda and ECS. Airflow handles scheduling for monthly directory syncs, dependency management, and SLA alerting.
Data delivered to where your team already works — no new tooling required.
About riba.org scraping, legality, and pipeline operations.
Ask us directly →Scraping publicly available information from the RIBA directory is generally permissible. DataFlirt targets only public, non-authenticated practice data, case studies, and jobs. We do not extract personal data of individual non-practicing members or circumvent authentication walls.
Our crawlers traverse the entire directory tree using a combination of A-Z index scraping and regional filters, ensuring no chartered practice is missed during the extraction run.
Yes. Where published in the case study metadata or text body, we extract contract values, gross internal area (GIA), and completion dates, normalising them into standard numeric formats.
Yes. We operate a secondary document parsing pipeline that can extract text blocks and tables from publicly linked PDFs on the RIBA domain.
For RIBA Jobs, we can configure daily or sub-daily pipelines to ensure you capture new postings immediately and track closing dates accurately.
Our smallest packages start with a one-off extraction of the chartered practice directory. For continuous monitoring of jobs or case studies, we price based on delivery frequency and schema complexity.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off practice directory export or continuous monitoring of RIBA Jobs and new case studies — we scope, build, and operate the pipeline. Tell us what you need.