← Glossary / Data Enrichment API

What is Data Enrichment API?

A Data Enrichment API is an endpoint that accepts a partial record—like a company name, an email address, or a bare domain—and returns a fully populated profile by joining it against a master dataset. For data pipelines, it's the bridge between raw scraped identifiers and actionable business intelligence. The difference between a basic lookup and a production-grade enrichment service lies in match rate, data freshness, and the latency of the underlying entity resolution engine.

Data BusinessEntity ResolutionIdentity GraphAPIB2B Data
// 02 — definitions

Partial in,
profile out.

How incomplete scraped records are transformed into high-value datasets through real-time identity resolution and data appending.

Ask a DataFlirt engineer →

TL;DR

A Data Enrichment API takes a sparse input (like a domain or LinkedIn URL) and appends firmographic, technographic, or contact data. It relies on massive, pre-computed identity graphs. High-quality enrichment isn't just about database size; it's about the probabilistic matching logic that correctly links "Acme Corp" to "Acme Inc" without returning false positives.

01Definition & structure
A Data Enrichment API is a programmatic interface that accepts incomplete data points—such as an email address, a company name, or an IP address—and returns a comprehensive profile. Behind the API sits an identity graph: a massive, pre-joined database of entities and their attributes. When a query is received, the engine normalises the input, calculates similarity scores against known entities, and returns the appended data if the confidence threshold is met.
02How it works in practice
In a typical B2B pipeline, a scraper might extract a list of domains from a conference sponsor page. The pipeline sends these bare domains to the enrichment API. The API looks up each domain, resolves it to a corporate entity, and returns structured JSON containing the company's legal name, employee count, estimated revenue, industry tags, and headquarters location. This transforms a simple list of URLs into a targetable lead list.
03The match rate vs. precision tradeoff
The hardest problem in enrichment is entity resolution. If you query "Apple", do you mean Apple Inc. (tech), Apple Corps (music), or a local orchard? A naive API will return the largest company matching the string, boosting its match rate but destroying precision. Production-grade APIs require secondary signals (like a domain or location) and will intentionally return a "not found" rather than a false positive. High precision is always more valuable than a high match rate.
04How DataFlirt handles it
We operate a hybrid enrichment engine. Our primary layer is a 140M+ record identity graph built from our continuous web crawls. When a query hits our API, we check the graph. If the entity is missing or the data is older than 30 days, and the query includes a valid domain, we instantly dispatch a headless worker to scrape the target site, extract the firmographics, update the graph, and return the fresh data—all within 800 milliseconds.
05The cost of bad enrichment
False positives in enrichment compound downstream. If an API incorrectly resolves a small local business to a Fortune 500 enterprise, your automated pipeline might route that record to an enterprise sales team, trigger the wrong marketing sequence, or skew your market analytics. The operational cost of cleaning up hallucinated enrichment data usually dwarfs the API usage fees.
// 03 — enrichment metrics

Measuring
enrichment quality.

Enrichment APIs are evaluated on match rate, precision, and latency. DataFlirt tracks these metrics continuously across our identity graph to ensure downstream pipelines aren't polluted with hallucinated or mismatched records.

Match Rate = M = successful_resolutions / total_queries
A 40–60% match rate is typical for B2B. Anything over 80% usually indicates loose matching logic and high false positives. Industry Standard
Precision (Accuracy) = P = true_positives / (true_positives + false_positives)
The percentage of returned records that actually belong to the queried entity. Crucial for automated pipelines. Information Retrieval
Effective Cost per Record = Ceff = API_cost / (queries × M × P)
You pay for queries, but you only extract value from accurate matches. Low precision drastically inflates effective cost. DataFlirt Pipeline Economics
// 04 — api trace

Resolving a sparse
company record.

A live trace of a B2B pipeline sending a raw scraped company name and domain to the DataFlirt enrichment endpoint. The engine normalises the input, traverses the identity graph, and appends firmographics.

REST APIJSONEntity Resolution
edge.dataflirt.io — live
CAPTURED
// POST /v1/enrich/company
request.payload: {
"name": "DataFlirt Tech Pvt Ltd",
"domain": "dataflirt.com"
}

// engine execution
step.normalize: "dataflirt tech" // stripped legal entity
step.resolve: match_found "df_org_8f92a1"
step.confidence: 0.98 // exact domain match

// response payload
response.status: 200 OK
data.id: "df_org_8f92a1"
data.legal_name: "DataFlirt Technologies Private Limited"
data.headquarters: "Bengaluru, Karnataka, IN"
data.employee_count: 42
data.technographics: ["Astro", "ClickHouse", "Playwright"]
meta.latency_ms: 48
// 05 — failure modes

Why enrichment
queries fail.

Ranked by share of failed or rejected matches across DataFlirt's enrichment endpoints. Poor input normalisation and ambiguous entity names are the leading causes of missed matches.

QUERIES ANALYSED ·  ·  ·  1.2B trailing 30d
AVG MATCH RATE ·  ·  ·    64.2%
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Input ambiguity

38% of misses · Common names ('Acme') without domain or location context
02

Entity not in graph

27% of misses · Newly registered businesses or long-tail local shops
03

Domain mismatch

18% of misses · Query uses product domain, graph uses corporate domain
04

Stale graph data

11% of misses · Company rebranded or merged, breaking historical links
05

Strict confidence threshold

6% of misses · Match found but scored below the 0.85 precision cutoff
// 06 — our architecture

Cache when possible,

scrape when necessary.

Traditional enrichment APIs fail when a company is too new or too niche to exist in their static database. DataFlirt's enrichment engine uses a hybrid approach: if an entity isn't in our 140M+ record identity graph, the API triggers a real-time micro-scrape of the target domain, extracts the firmographics on the fly, and returns the enriched record within 800ms. You never get a stale 'not found' for a live business.

Hybrid Enrichment Lifecycle

Trace of an enrichment request triggering a real-time fallback scrape.

query.domain new-stealth-startup.ai
graph.lookup miss
fallback.trigger real-time micro-scrape
scrape.status 200 OK · 412ms
extract.firmographics success · 3 fields
graph.upsert entity created
api.response enriched record · 815ms

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About match rates, entity resolution, real-time fallbacks, and how DataFlirt maintains high-precision enrichment at scale.

Ask us directly →
What is the difference between web scraping and data enrichment? +
Web scraping is the mechanism of extracting raw data from websites. Data enrichment is the process of taking a piece of data (often obtained via scraping) and querying a structured database or API to append missing attributes. Scraping builds the graph; enrichment queries the graph.
What is considered a 'good' match rate? +
For B2B company enrichment with domain names, expect 60–75%. For queries using only company names without location data, expect 30–40%. If a vendor promises a 95% match rate on name-only queries, they are likely using loose fuzzy matching, which guarantees a high rate of false positives. Precision is more important than raw match rate.
How do you handle entity resolution for common company names? +
We require a secondary signal. If you query "Apex Solutions", the identity graph will return multiple candidates. To resolve to a single entity, the API requires either a domain name, a geographic location (city/country), or a LinkedIn URL. Without a secondary signal, we return a 404 rather than guessing and polluting your dataset.
How fresh is the data in the enrichment graph? +
DataFlirt's core identity graph is updated continuously via our fleet of background crawlers. High-signal entities (public companies, major tech firms) are refreshed weekly. Long-tail entities are refreshed every 30–90 days. However, our real-time fallback mechanism ensures that if a domain is provided and the cached data is stale, we fetch live data on the fly.
Can I enrich records in bulk? +
Yes. While the REST API is designed for real-time transactional enrichment (e.g., enriching a lead as it enters your CRM), we support bulk enrichment via CSV/Parquet uploads to S3. Bulk jobs bypass the HTTP overhead and process at roughly 10,000 records per second, delivering the enriched dataset back to your bucket.
How does DataFlirt price enrichment? +
We charge per successful match, not per query. If you send us a sparse record and we cannot confidently resolve it to an entity in our graph (or via real-time fallback), you are not billed for that API call. This aligns our incentives with your data quality requirements.
$ dataflirt scope --new-project --target=data-enrichment-api READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h