SYSTEM operational source mca.gov.in queue 12,943 CINs p99 latency 892ms dataflirt.com · scraper/mca-gov

RUN: 41 active pipelines: mca.gov.in live

Indian corporate data,
without V3 portal downtime.

We extract Company Master Data, DIN networks, index of charges, and signatory details from the Ministry of Corporate Affairs. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Postgres.

Get data from mca.gov.in → See how it works

Companies tracked

2.1M

Director records

3.8M

Daily lookups

84,291 /24h

CAPTCHAs solved

112K /day

Uptime

99.91%

◆ Company Master Data◆ DIN Search◆ Signatory Details◆ Index of Charges◆ LLP Master Data◆ Strike Off Status◆ Authorised Capital◆ Paid Up Capital◆ Date of Incorporation◆ Registered Address◆ Director Networks◆ V3 Portal Automation◆ JSON / Parquet Delivery◆ Managed Pipeline◆ Company Master Data◆ DIN Search◆ Signatory Details◆ Index of Charges◆ LLP Master Data◆ Strike Off Status◆ Authorised Capital◆ Paid Up Capital◆ Date of Incorporation◆ Registered Address◆ Director Networks◆ V3 Portal Automation◆ JSON / Parquet Delivery◆ Managed Pipeline

Data Dictionary

Every field we extract from mca.gov.in

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Company Master Data objects from mca.gov.in. All fields typed and schema-versioned.

cincompany_nameroc_coderegistration_numbercompany_categorycompany_subcategoryclass_of_companyauthorised_capitalpaid_up_capitaldate_of_incorporationregistered_addressemail_idlisting_statusactive_compliance

"cin": "L17110MH1973PLC019786",
"company_name": "RELIANCE INDUSTRIES LIMITED",
"roc_code": "RoC-Mumbai",
"authorised_capital": 150000000000.0,
"paid_up_capital": 67659900000.0,
"date_of_incorporation": "1973-05-08",
"listing_status": "Listed",
"active_compliance": "ACTIVE"

#	cin	company_name	roc_code	registration_number	company_category	company_subcategory
1
2
3

Complete list of extractable fields for Director & Signatory objects from mca.gov.in. All fields typed and schema-versioned.

dinpan_statusfull_namedesignationdate_of_appointmentdsc_registereddin_statussurrender_statusnationalitycurrent_companies

"din": "00001695",
"full_name": "MUKESH DHIRUBHAI AMBANI",
"designation": "Managing Director",
"date_of_appointment": "1977-04-01",
"dsc_registered": true,
"din_status": "Approved",
"nationality": "India"

#	din	pan_status	full_name	designation	date_of_appointment	dsc_registered
1
2
3

Complete list of extractable fields for Index of Charges objects from mca.gov.in. All fields typed and schema-versioned.

cincharge_idcharge_holder_namecreation_datemodification_dateclosure_dateamountaddressasset_type

"cin": "L17110MH1973PLC019786",
"charge_id": "10002345",
"charge_holder_name": "STATE BANK OF INDIA",
"creation_date": "2015-06-12",
"amount": 5000000000.0,
"closure_date": "None",
"asset_type": "Movable property"

#	cin	charge_id	charge_holder_name	creation_date	modification_date	closure_date
1
2
3

Complete list of extractable fields for LLP Master Data objects from mca.gov.in. All fields typed and schema-versioned.

llpinllp_namenumber_of_partnersnumber_of_designated_partnersroc_codemain_division_of_businessdescription_of_main_divisionobligation_of_contributionregistered_addressemail_id

"llpin": "AAA-1234",
"llp_name": "TECH VENTURES LLP",
"number_of_partners": 4,
"number_of_designated_partners": 2,
"roc_code": "RoC-Delhi",
"obligation_of_contribution": 1000000.0,
"registered_address": "Connaught Place, New Delhi"

#	llpin	llp_name	number_of_partners	number_of_designated_partners	roc_code	main_division_of_business
1
2
3

Complete list of extractable fields for Filing History objects from mca.gov.in. All fields typed and schema-versioned.

cindocument_nameform_typedate_of_filingsrnstatusfinancial_yearreceipt_numberfiling_fee

"cin": "L17110MH1973PLC019786",
"document_name": "AOC-4",
"form_type": "Financial Statement",
"date_of_filing": "2023-10-15",
"srn": "T12345678",
"status": "Approved",
"financial_year": "2022-2023"

#	cin	document_name	form_type	date_of_filing	srn	status
1
2
3

Capabilities

Extract corporate truth directly from the source

Our MCA scraper navigates the V3 portal architecture, handles aggressive rate limits, and solves CAPTCHAs programmatically to deliver structured company intelligence.

Company Master Data Extraction

Retrieve CIN, RoC code, authorised capital, paid-up capital, and incorporation dates for any registered entity in India.

Director Network Mapping

Extract DIN details, appointment dates, and cross-directorships to map corporate ownership structures.

Charge & Debt Intelligence

Pull the complete index of charges including charge holder names, creation dates, modification dates, and debt amounts.

Strike Off & Compliance Tracking

Monitor active compliance status and track companies moved to strike-off or defunct status by the RoC.

LLP Data Harvesting

Extract LLPINs, partner counts, contribution obligations, and registered addresses for Limited Liability Partnerships.

V3 Portal Navigation

Programmatic execution of the MCA V3 portal Angular state, managing session tokens and routing logic automatically.

Automated CAPTCHA Solving

High-throughput resolution of MCA image CAPTCHAs using integrated solving infrastructure.

IP Rotation & Session Management

Residential proxy pools prevent IP bans and manage session timeouts during large batch lookups.

Scheduled Bulk Lookups

Submit lists of thousands of CINs for batch processing on a daily, weekly, or monthly cadence.

Change Detection

Track changes in paid-up capital, director appointments, or registered addresses over time with automated diffing.

// engagement pipeline

From CIN list to warehouse record

Brief in. Clean data out.

Define Scope

d 0

Provide lists of CINs, DINs, or LLPINs. We design the extraction schema based on your data requirements.

Pipeline Build

d 2–4

We configure Playwright crawlers to navigate the MCA V3 portal, manage sessions, and handle CAPTCHAs.

Validation & QA

d 4–6

Schema validation, null-rate checks, and data normalisation rules are applied to clean inconsistent government records.

Delivery

ongoing

Clean JSON, CSV, or Parquet files pushed to your AWS S3 bucket or Postgres database on schedule.

Under the hood

Navigating the MCA V3 portal infrastructure

The Ministry of Corporate Affairs portal is notoriously unstable, heavily cached, and protected by aggressive rate limits. We handle the session state so you get clean data.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

State management

Handling V3 Angular routing

The MCA V3 portal relies on complex client-side routing and temporary session tokens. Our Playwright infrastructure maintains browser state, intercepts XHR requests, and handles token expiration gracefully.

CAPTCHA resolution

Automated solving at scale

Public searches on the MCA portal require solving image CAPTCHAs for every query. We route these challenges through automated solving APIs with fallback queues, maintaining high throughput without manual intervention.

Rate limit evasion

Residential IP rotation

The portal aggressively blocks IPs that make concurrent requests. We distribute lookups across Indian residential proxy networks, pacing requests to mimic normal user behaviour and avoid blacklisting.

Downtime resilience

Queuing during maintenance

Government portals experience frequent unscheduled downtime. Our Airflow orchestration layer detects 503 errors, pauses the pipeline, and queues pending CINs for processing once the portal returns online.

Data normalisation

Cleaning inconsistent records

MCA data often contains formatting errors, missing fields, and inconsistent date strings. We apply strict normalisation rules before delivery, ensuring your database ingests clean, typed data.

Applications

Who uses MCA data

Teams across industries use mca.gov.in data to build competitive products and smarter operations.

B2B Onboarding & KYC

Fintechs and B2B platforms verify business entities, check active compliance status, and validate registered addresses during merchant onboarding.

Credit Risk & Lending

Banks and NBFCs query the index of charges to assess existing debt obligations before approving corporate loans.

Private Equity Due Diligence

Investment firms track director networks via DIN searches to map cross-directorships and identify potential conflicts of interest.

Lead Generation

Sales teams target newly incorporated companies or entities that have recently increased their paid-up capital.

Competitor Intelligence

Market analysts track capital changes, new director appointments, and statutory filings across competitor portfolios.

Supply Chain Verification

Enterprise procurement teams audit vendor compliance and strike-off status to mitigate supply chain risk.

Why DataFlirt

"The MCA database is the absolute ground truth for Indian corporate identity, but extracting that data requires fighting one of the most hostile government portals in existence."

Relying on manual MCA searches or unstable third-party APIs breaks internal workflows. DataFlirt builds dedicated extraction pipelines that navigate the V3 portal's Angular state, solve CAPTCHAs programmatically, and deliver structured JSON directly to your data warehouse.

Technical Spec

MCA scraper technical capabilities

Everything supported by our mca.gov.in scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Company Master Data

Full extraction of CIN, capital details, and registration status

Supported

DIN & Signatory Details

Director names, appointment dates, and current directorships

Supported

Index of Charges

Complete debt registration history for any CIN

Supported

LLP Master Data

Partner details and contribution obligations for LLPs

Supported

V3 Portal CAPTCHA bypass

Automated resolution of search portal image challenges

Supported

Change detection

Compare historical runs to isolate capital or director changes

Supported

Daily incorporation feeds

Monitor specific RoCs for new company registrations

Supported

Strike-off status monitoring

Alerts when a tracked CIN moves to defunct status

Supported

Historical document downloads

MoA, AoA, and Form PDFs require authenticated paid accounts and OTPs

Partial

Director PAN & Aadhaar

Personally identifiable information is masked by the government portal

Partial

Infrastructure

Infrastructure built for government portals

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

Headless Browser Automation

Playwright handles the complex JavaScript execution and session token management required by the MCA V3 portal architecture.

Proxy & Solver Pipeline

Integrated CapSolver infrastructure resolves image challenges while Indian residential proxies prevent IP-based rate limiting.

Cloud-Native Queueing

Redis and Apache Airflow manage request queues, automatically pausing and retrying batches when the MCA portal experiences downtime.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Nested schema containing company, director, and charge arrays

CSV

Flat files separated by data type for easy spreadsheet import

XLS

Excel workbooks with multiple tabs for compliance teams

Parquet

Columnar storage optimised for Athena and BigQuery ingestion

AWS S3

Direct delivery to your cloud storage buckets on schedule

Webhook

HTTP POST delivery for real-time KYC verification flows

API

Query our cached database for instant CIN lookups

PostgreSQL

Direct database upserts with primary key conflict resolution

Direct bucket delivery — compatible with any data lake

// faq

Common questions.

About mca.gov.in scraping, legality, and pipeline operations.

Ask us directly →

Is it legal to scrape mca.gov.in?

Yes. We extract strictly public, non-authenticated data available on the MCA master data search portal. We do not bypass authentication walls or access private filings. Clients should consult their legal counsel regarding specific data storage and usage policies.

How do you handle MCA V3 downtime?

The MCA portal frequently experiences maintenance and timeouts. Our Airflow orchestration detects these outages, pauses the active pipeline, and queues pending requests in Redis. Processing resumes automatically once the portal stabilises.

Can you download MoA and AoA documents?

No. Downloading actual filing PDFs requires a registered MCA user account, payment of government fees, and OTP verification. We only extract the structured metadata available on the public search pages.

Do you provide a daily feed of new incorporations?

Yes. We can configure pipelines to monitor specific Registrars of Companies (RoCs) and deliver daily lists of newly incorporated CINs, including their initial authorised capital and registered addresses.

How fast can you process 100,000 CINs?

Due to the strict rate limits and CAPTCHA requirements of the MCA portal, we process large batches using controlled concurrency. A batch of 100,000 CINs typically completes within 48 to 72 hours to ensure high success rates and avoid IP bans.

Can you track changes in a company's paid-up capital?

Yes. By scheduling recurring lookups for a specific list of CINs, our change-detection system compares the current run against historical data and emits a diff record if the paid-up capital or active status has changed.

Do you extract Director Identification Numbers (DIN)?

Yes. We extract public signatory details, which include the DIN, full name, designation, and appointment date for all directors associated with a given CIN.

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Stop fighting the V3 portal. Give us your CIN lists, and we will deliver structured company intelligence directly to your database.

Start a mca.gov.in pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h

Services

Data Extraction for Every Industry

View All Services →

🛍️ eCommerce → 🔍 Search Engine → ⚽ Sports Data → 📱 App Store → 🍕 Food Delivery → 📉 Betting Odds → ✈️ Aviation & Flight → 🛒 Grocery → 🎓 E-Learning → 💹 Stock Market → 🏠 Real Estate → 🤖 AI Training Data → 🧠 LLM Data → 📰 News → ⭐ Reviews → 💼 Job Board → 🏥 Healthcare → 💊 Pharma → 🏢 Company Data → 🤝 B2B Marketplace → 🚗 Automotive → 🌍 Travel → 🏨 Hospitality → 🪙 Cryptocurrency → 💡 IP & Patents → 📈 SEO Data → ⚖️ Legal → 🛡️ Insurance → 📲 Mobile App → 📸 Influencer → 🏛️ Government → 🚚 Transportation → 🎟️ Events → 📂 Directory → ⚡ Dynamic Websites → 📄 PDF Extraction → ✍️ Blog Content → ☁️ Weather → 🖥️ Cloud Scraping → 👨‍💻 Managed Service →

Indian corporate data, without V3 portal downtime.

Every field we extract from mca.gov.in

Extract corporate truth directly from the source

From CIN list to warehouse record

Navigating the MCA V3 portal infrastructure

Who uses MCA data

MCA scraper technical capabilities

Infrastructure built for government portals

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

Indian corporate data,
without V3 portal downtime.

Tell us what
to extract.
We do the rest.