SYSTEM operational source mca.gov.in queue 12,943 CINs p99 latency 892ms dataflirt.com · scraper/mca-gov
RUN: 41 active pipelines: mca.gov.in live

Indian corporate data,
without V3 portal downtime.

We extract Company Master Data, DIN networks, index of charges, and signatory details from the Ministry of Corporate Affairs. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Postgres.

Companies tracked
2.1M
Director records
3.8M
Daily lookups
84,291 /24h
CAPTCHAs solved
112K /day
Uptime
99.91%
Data Dictionary

Every field we extract from mca.gov.in

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Company Master Data objects from mca.gov.in. All fields typed and schema-versioned.

cincompany_nameroc_coderegistration_numbercompany_categorycompany_subcategoryclass_of_companyauthorised_capitalpaid_up_capitaldate_of_incorporationregistered_addressemail_idlisting_statusactive_compliance
company_master data
● 200 OK
"cin": "L17110MH1973PLC019786",
"company_name": "RELIANCE INDUSTRIES LIMITED",
"roc_code": "RoC-Mumbai",
"authorised_capital": 150000000000.0,
"paid_up_capital": 67659900000.0,
"date_of_incorporation": "1973-05-08",
"listing_status": "Listed",
"active_compliance": "ACTIVE"
# cincompany_nameroc_coderegistration_numbercompany_categorycompany_subcategory
1
2
3

Complete list of extractable fields for Director & Signatory objects from mca.gov.in. All fields typed and schema-versioned.

dinpan_statusfull_namedesignationdate_of_appointmentdsc_registereddin_statussurrender_statusnationalitycurrent_companies
director_& signatory
● 200 OK
"din": "00001695",
"full_name": "MUKESH DHIRUBHAI AMBANI",
"designation": "Managing Director",
"date_of_appointment": "1977-04-01",
"dsc_registered": true,
"din_status": "Approved",
"nationality": "India"
# dinpan_statusfull_namedesignationdate_of_appointmentdsc_registered
1
2
3

Complete list of extractable fields for Index of Charges objects from mca.gov.in. All fields typed and schema-versioned.

cincharge_idcharge_holder_namecreation_datemodification_dateclosure_dateamountaddressasset_type
index_of charges
● 200 OK
"cin": "L17110MH1973PLC019786",
"charge_id": "10002345",
"charge_holder_name": "STATE BANK OF INDIA",
"creation_date": "2015-06-12",
"amount": 5000000000.0,
"closure_date": "None",
"asset_type": "Movable property"
# cincharge_idcharge_holder_namecreation_datemodification_dateclosure_date
1
2
3

Complete list of extractable fields for LLP Master Data objects from mca.gov.in. All fields typed and schema-versioned.

llpinllp_namenumber_of_partnersnumber_of_designated_partnersroc_codemain_division_of_businessdescription_of_main_divisionobligation_of_contributionregistered_addressemail_id
llp_master data
● 200 OK
"llpin": "AAA-1234",
"llp_name": "TECH VENTURES LLP",
"number_of_partners": 4,
"number_of_designated_partners": 2,
"roc_code": "RoC-Delhi",
"obligation_of_contribution": 1000000.0,
"registered_address": "Connaught Place, New Delhi"
# llpinllp_namenumber_of_partnersnumber_of_designated_partnersroc_codemain_division_of_business
1
2
3

Complete list of extractable fields for Filing History objects from mca.gov.in. All fields typed and schema-versioned.

cindocument_nameform_typedate_of_filingsrnstatusfinancial_yearreceipt_numberfiling_fee
filing_history
● 200 OK
"cin": "L17110MH1973PLC019786",
"document_name": "AOC-4",
"form_type": "Financial Statement",
"date_of_filing": "2023-10-15",
"srn": "T12345678",
"status": "Approved",
"financial_year": "2022-2023"
# cindocument_nameform_typedate_of_filingsrnstatus
1
2
3

Capabilities

Extract corporate truth directly from the source

Our MCA scraper navigates the V3 portal architecture, handles aggressive rate limits, and solves CAPTCHAs programmatically to deliver structured company intelligence.

Company Master Data Extraction

Retrieve CIN, RoC code, authorised capital, paid-up capital, and incorporation dates for any registered entity in India.

Director Network Mapping

Extract DIN details, appointment dates, and cross-directorships to map corporate ownership structures.

Charge & Debt Intelligence

Pull the complete index of charges including charge holder names, creation dates, modification dates, and debt amounts.

Strike Off & Compliance Tracking

Monitor active compliance status and track companies moved to strike-off or defunct status by the RoC.

LLP Data Harvesting

Extract LLPINs, partner counts, contribution obligations, and registered addresses for Limited Liability Partnerships.

V3 Portal Navigation

Programmatic execution of the MCA V3 portal Angular state, managing session tokens and routing logic automatically.

Automated CAPTCHA Solving

High-throughput resolution of MCA image CAPTCHAs using integrated solving infrastructure.

IP Rotation & Session Management

Residential proxy pools prevent IP bans and manage session timeouts during large batch lookups.

Scheduled Bulk Lookups

Submit lists of thousands of CINs for batch processing on a daily, weekly, or monthly cadence.

Change Detection

Track changes in paid-up capital, director appointments, or registered addresses over time with automated diffing.

// engagement pipeline

From CIN list to warehouse record

Brief in. Clean data out.

Define Scope
d 0

Provide lists of CINs, DINs, or LLPINs. We design the extraction schema based on your data requirements.

Pipeline Build
d 2–4

We configure Playwright crawlers to navigate the MCA V3 portal, manage sessions, and handle CAPTCHAs.

Validation & QA
d 4–6

Schema validation, null-rate checks, and data normalisation rules are applied to clean inconsistent government records.

Delivery
ongoing

Clean JSON, CSV, or Parquet files pushed to your AWS S3 bucket or Postgres database on schedule.

Under the hood

Navigating the MCA V3 portal infrastructure

The Ministry of Corporate Affairs portal is notoriously unstable, heavily cached, and protected by aggressive rate limits. We handle the session state so you get clean data.

pipeline-monitor · mca.gov.in · live ● active
// fingerprinting
Identity rotation
TLS fingerprintrandomised
User-agentrotated
IP poolresidential
Challenges blocked0
// pagination
Page coverage
48,291 pages queued running
// observability
Pipeline health
99.9%
uptime
142ms
p99 lat
0.3%
null rate
2
alerts
State management
Handling V3 Angular routing

The MCA V3 portal relies on complex client-side routing and temporary session tokens. Our Playwright infrastructure maintains browser state, intercepts XHR requests, and handles token expiration gracefully.

CAPTCHA resolution
Automated solving at scale

Public searches on the MCA portal require solving image CAPTCHAs for every query. We route these challenges through automated solving APIs with fallback queues, maintaining high throughput without manual intervention.

Rate limit evasion
Residential IP rotation

The portal aggressively blocks IPs that make concurrent requests. We distribute lookups across Indian residential proxy networks, pacing requests to mimic normal user behaviour and avoid blacklisting.

Downtime resilience
Queuing during maintenance

Government portals experience frequent unscheduled downtime. Our Airflow orchestration layer detects 503 errors, pauses the pipeline, and queues pending CINs for processing once the portal returns online.

Data normalisation
Cleaning inconsistent records

MCA data often contains formatting errors, missing fields, and inconsistent date strings. We apply strict normalisation rules before delivery, ensuring your database ingests clean, typed data.

Applications

Who uses MCA data

Teams across industries use mca.gov.in data to build competitive products and smarter operations.

01
B2B Onboarding & KYC

Fintechs and B2B platforms verify business entities, check active compliance status, and validate registered addresses during merchant onboarding.

02
Credit Risk & Lending

Banks and NBFCs query the index of charges to assess existing debt obligations before approving corporate loans.

03
Private Equity Due Diligence

Investment firms track director networks via DIN searches to map cross-directorships and identify potential conflicts of interest.

04
Lead Generation

Sales teams target newly incorporated companies or entities that have recently increased their paid-up capital.

05
Competitor Intelligence

Market analysts track capital changes, new director appointments, and statutory filings across competitor portfolios.

06
Supply Chain Verification

Enterprise procurement teams audit vendor compliance and strike-off status to mitigate supply chain risk.

Why DataFlirt

"The MCA database is the absolute ground truth for Indian corporate identity, but extracting that data requires fighting one of the most hostile government portals in existence."

Relying on manual MCA searches or unstable third-party APIs breaks internal workflows. DataFlirt builds dedicated extraction pipelines that navigate the V3 portal's Angular state, solve CAPTCHAs programmatically, and deliver structured JSON directly to your data warehouse.

Technical Spec

MCA scraper technical capabilities

Everything supported by our mca.gov.in scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Company Master Data
Full extraction of CIN, capital details, and registration status
Supported
DIN & Signatory Details
Director names, appointment dates, and current directorships
Supported
Index of Charges
Complete debt registration history for any CIN
Supported
LLP Master Data
Partner details and contribution obligations for LLPs
Supported
V3 Portal CAPTCHA bypass
Automated resolution of search portal image challenges
Supported
Change detection
Compare historical runs to isolate capital or director changes
Supported
Daily incorporation feeds
Monitor specific RoCs for new company registrations
Supported
Strike-off status monitoring
Alerts when a tracked CIN moves to defunct status
Supported
Historical document downloads
MoA, AoA, and Form PDFs require authenticated paid accounts and OTPs
Partial
Director PAN & Aadhaar
Personally identifiable information is masked by the government portal
Partial
Infrastructure

Infrastructure built for government portals

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus
Headless Browser Automation

Playwright handles the complex JavaScript execution and session token management required by the MCA V3 portal architecture.

Proxy & Solver Pipeline

Integrated CapSolver infrastructure resolves image challenges while Indian residential proxies prevent IP-based rate limiting.

Cloud-Native Queueing

Redis and Apache Airflow manage request queues, automatically pausing and retrying batches when the MCA portal experiences downtime.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON
Nested schema containing company, director, and charge arrays
CSV
Flat files separated by data type for easy spreadsheet import
XLS
Excel workbooks with multiple tabs for compliance teams
Parquet
Columnar storage optimised for Athena and BigQuery ingestion
AWS S3
Direct delivery to your cloud storage buckets on schedule
Webhook
HTTP POST delivery for real-time KYC verification flows
API
Query our cached database for instant CIN lookups
PostgreSQL
Direct database upserts with primary key conflict resolution
S3
Direct bucket delivery — compatible with any data lake
// faq

Common questions.

About mca.gov.in scraping, legality, and pipeline operations.

Ask us directly →
Is it legal to scrape mca.gov.in?

Yes. We extract strictly public, non-authenticated data available on the MCA master data search portal. We do not bypass authentication walls or access private filings. Clients should consult their legal counsel regarding specific data storage and usage policies.

How do you handle MCA V3 downtime?

The MCA portal frequently experiences maintenance and timeouts. Our Airflow orchestration detects these outages, pauses the active pipeline, and queues pending requests in Redis. Processing resumes automatically once the portal stabilises.

Can you download MoA and AoA documents?

No. Downloading actual filing PDFs requires a registered MCA user account, payment of government fees, and OTP verification. We only extract the structured metadata available on the public search pages.

Do you provide a daily feed of new incorporations?

Yes. We can configure pipelines to monitor specific Registrars of Companies (RoCs) and deliver daily lists of newly incorporated CINs, including their initial authorised capital and registered addresses.

How fast can you process 100,000 CINs?

Due to the strict rate limits and CAPTCHA requirements of the MCA portal, we process large batches using controlled concurrency. A batch of 100,000 CINs typically completes within 48 to 72 hours to ensure high success rates and avoid IP bans.

Can you track changes in a company's paid-up capital?

Yes. By scheduling recurring lookups for a specific list of CINs, our change-detection system compares the current run against historical data and emits a diff record if the paid-up capital or active status has changed.

Do you extract Director Identification Numbers (DIN)?

Yes. We extract public signatory details, which include the DIN, full name, designation, and appointment date for all directors associated with a given CIN.

$ dataflirt scope --new-project --source=mca.gov.in ready

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Stop fighting the V3 portal. Give us your CIN lists, and we will deliver structured company intelligence directly to your database.

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h
Services

Data Extraction for Every Industry

View All Services →