We extract Level 1 entity records and Level 2 ownership hierarchies from the Global Legal Entity Identifier Foundation. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Postgres on your cadence.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Level 1 Entity Data objects from gleif.org. All fields typed and schema-versioned.
"lei_code": "5493006MHB84DD0ZWV18", "legal_name": "DataFlirt Technologies Ltd", "legal_jurisdiction": "GB", "entity_status": "ACTIVE", "legal_form_code": "8FTB", "managing_lou": "EVK05KS7XY1DEII3R011"
| # | lei_code | legal_name | legal_jurisdiction | entity_status | legal_form_code | registration_date |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Level 2 Ownership Data objects from gleif.org. All fields typed and schema-versioned.
"child_lei": "5493006MHB84DD0ZWV18", "parent_lei": "549300O897ZC5R7BMG32", "relationship_type": "ULTIMATE_ACCOUNTING_CONSOLIDATING_PARENT", "relationship_status": "ACTIVE", "accounting_standard": "IFRS", "validation_sources": "FULLY_CORROBORATED"
| # | child_lei | parent_lei | relationship_type | relationship_status | start_date | end_date |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Registration Details objects from gleif.org. All fields typed and schema-versioned.
"lei_code": "5493006MHB84DD0ZWV18", "initial_registration_date": "2018-05-14T09:00:00Z", "next_renewal_date": "2025-05-14T09:00:00Z", "registration_status": "ISSUED", "managing_lou": "EVK05KS7XY1DEII3R011", "corroboration_level": "FULLY_CORROBORATED"
| # | lei_code | initial_registration_date | next_renewal_date | registration_status | managing_lou | validation_authority_id |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Address Information objects from gleif.org. All fields typed and schema-versioned.
"lei_code": "5493006MHB84DD0ZWV18", "legal_address_line1": "123 Tech Park", "legal_address_city": "London", "legal_address_country": "GB", "legal_address_postal_code": "EC1A 1BB", "hq_address_country": "GB"
| # | lei_code | legal_address_line1 | legal_address_city | legal_address_country | legal_address_postal_code | hq_address_line1 |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Event History objects from gleif.org. All fields typed and schema-versioned.
"lei_code": "5493006MHB84DD0ZWV18", "event_type": "ENTITY_NAME_CHANGE", "event_date": "2023-11-04T14:30:00Z", "event_status": "COMPLETED", "previous_name": "DataFlirt Inc", "new_name": "DataFlirt Technologies Ltd"
| # | lei_code | event_type | event_date | event_status | previous_name | new_name |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Our GLEIF scraper processes complex XML schemas and daily delta files, converting millions of nested LEI records and ownership hierarchies into flat, queryable tables.
Extract core entity data, legal forms, jurisdiction codes, and registration statuses for over 2.5 million global entities.
Resolve ultimate and direct parent relationships. We join parent and child LEIs to reconstruct corporate family trees.
Parse daily published delta files for incremental updates, ensuring your database reflects the latest corporate actions.
Convert deeply nested GLEIF Common Data Format (CDF) XML schemas into flat relational tables or JSON objects.
Maintain changelogs for entity status changes, registration renewals, and corporate name updates over time.
Track managing Local Operating Units (LOUs) and validation authorities responsible for corroborating each LEI.
Split and normalise legal and headquarters address fields across different international formats and character sets.
Capture corporate actions, mergers, acquisitions, and name changes published in the GLEIF event logs.
Push updates daily to sync with GLEIF publication cycles, ensuring zero drift between your system and the global index.
Brief in. Clean data out.
Specify whether you need a full database sync or targeted extraction based on jurisdiction, legal form, or LOU.
We configure parsers for GLEIF XML schemas, setup daily delta processing, and implement relationship mapping logic.
Schema validation, null-rate checks, and relationship integrity tests ensure parent-child mappings are accurate.
JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Postgres database on a daily cadence.
Processing GLEIF data requires parsing massive XML files and reconciling daily deltas. Here is how we maintain pipeline integrity.
GLEIF uses deeply nested XML formats (CDF) for entity and relationship data. We flatten these hierarchical structures into relational tables, handling varying schema versions and optional fields automatically.
Applying daily deltas requires strict state management to prevent data corruption. We process additions, modifications, and deletions in sequence, ensuring your local copy mirrors the authoritative index.
Level 2 data splits parents and children across different files and records. We join these foreign keys in transit, validating relationship statuses and accounting standards before delivery.
While GLEIF provides bulk files, API endpoints for real-time lookups throttle heavy requests. We manage concurrency, implement backoff strategies, and use proxy rotation for high-volume API queries.
Address formats vary globally across millions of entities. We apply standardisation rules during extraction, separating street, city, region, and postal codes into consistent typed columns.
Automate counterparty identification and verification using authoritative LEI data to meet global regulatory standards.
Cleanse, deduplicate, and append LEI codes to internal vendor databases to maintain accurate corporate records.
Map ultimate parent entities and subsidiary hierarchies to calculate aggregate exposure across complex corporate groups.
Fulfil MiFID II, EMIR, and Dodd-Frank reporting requirements with verified, up-to-date entity identifiers.
Link disparate market data feeds, credit ratings, and financial statements using the LEI as the primary key.
Trace corporate ownership across global supplier networks to identify concentration risks and geopolitical exposure.
"GLEIF provides the most authoritative corporate identity graph in the world, but navigating the nested XML and relationship mapping requires heavy engineering."
Most teams underestimate the complexity of parsing Level 2 ownership data. Resolving direct and ultimate parent relationships across millions of entities requires strict state management and daily delta reconciliation. DataFlirt handles the extraction and mapping so your compliance systems receive flat, queryable records.
Everything supported by our gleif.org scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Scrapy handles API orchestration, file downloading, and retry logic. Playwright handles any JavaScript-rendered search interfaces or portal interactions required for supplementary data.
We maintain pools of residential ISP proxies to handle high-volume API requests, bypassing rate limits and IP blocks during intensive historical data backfills.
Pipelines run on AWS Lambda and ECS. Airflow handles the complex dependency graphs required for processing daily deltas and reconciling Level 2 relationship mappings.
Data delivered to where your team already works — no new tooling required.
About gleif.org scraping, legality, and pipeline operations.
Ask us directly →Yes. GLEIF data is public domain and intended for global open access. The foundation operates under an open data policy to promote transparency in global financial markets. We extract this public data in compliance with their terms of use.
We monitor GLEIF publication schedules and process the daily delta files every 24 hours. Our pipeline applies additions, modifications, and deletions sequentially to ensure your database remains perfectly synchronized with the global index.
Yes. We extract Level 2 Relationship Record (RR) files and resolve the foreign keys against Level 1 entity data. This provides a complete corporate hierarchy, identifying both direct and ultimate accounting consolidating parents.
No. We handle the complex parsing of GLEIF Common Data Format (CDF) XML files. We flatten the nested elements and deliver clean, typed formats like JSON, CSV, or Parquet directly to your warehouse.
All public fields are included: LEI code, legal name, jurisdiction, entity status, legal form, registration dates, managing LOU, legal addresses, headquarters addresses, and Level 2 relationship types.
A full sync of all 2.5 million+ LEI records and their associated relationship mappings typically completes within 12 hours. Subsequent daily delta updates are processed and delivered within minutes of publication.
20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a full historical sync of global LEI records or a continuous daily delta feed for compliance monitoring — we scope, build, and operate the pipeline.