We extract Company Master Data, DIN networks, index of charges, and signatory details from the Ministry of Corporate Affairs. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Postgres.
Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.
Complete list of extractable fields for Company Master Data objects from mca.gov.in. All fields typed and schema-versioned.
"cin": "L17110MH1973PLC019786", "company_name": "RELIANCE INDUSTRIES LIMITED", "roc_code": "RoC-Mumbai", "authorised_capital": 150000000000.0, "paid_up_capital": 67659900000.0, "date_of_incorporation": "1973-05-08", "listing_status": "Listed", "active_compliance": "ACTIVE"
| # | cin | company_name | roc_code | registration_number | company_category | company_subcategory |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Director & Signatory objects from mca.gov.in. All fields typed and schema-versioned.
"din": "00001695", "full_name": "MUKESH DHIRUBHAI AMBANI", "designation": "Managing Director", "date_of_appointment": "1977-04-01", "dsc_registered": true, "din_status": "Approved", "nationality": "India"
| # | din | pan_status | full_name | designation | date_of_appointment | dsc_registered |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Index of Charges objects from mca.gov.in. All fields typed and schema-versioned.
"cin": "L17110MH1973PLC019786", "charge_id": "10002345", "charge_holder_name": "STATE BANK OF INDIA", "creation_date": "2015-06-12", "amount": 5000000000.0, "closure_date": "None", "asset_type": "Movable property"
| # | cin | charge_id | charge_holder_name | creation_date | modification_date | closure_date |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for LLP Master Data objects from mca.gov.in. All fields typed and schema-versioned.
"llpin": "AAA-1234", "llp_name": "TECH VENTURES LLP", "number_of_partners": 4, "number_of_designated_partners": 2, "roc_code": "RoC-Delhi", "obligation_of_contribution": 1000000.0, "registered_address": "Connaught Place, New Delhi"
| # | llpin | llp_name | number_of_partners | number_of_designated_partners | roc_code | main_division_of_business |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Complete list of extractable fields for Filing History objects from mca.gov.in. All fields typed and schema-versioned.
"cin": "L17110MH1973PLC019786", "document_name": "AOC-4", "form_type": "Financial Statement", "date_of_filing": "2023-10-15", "srn": "T12345678", "status": "Approved", "financial_year": "2022-2023"
| # | cin | document_name | form_type | date_of_filing | srn | status |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 |
Our MCA scraper navigates the V3 portal architecture, handles aggressive rate limits, and solves CAPTCHAs programmatically to deliver structured company intelligence.
Retrieve CIN, RoC code, authorised capital, paid-up capital, and incorporation dates for any registered entity in India.
Extract DIN details, appointment dates, and cross-directorships to map corporate ownership structures.
Pull the complete index of charges including charge holder names, creation dates, modification dates, and debt amounts.
Monitor active compliance status and track companies moved to strike-off or defunct status by the RoC.
Extract LLPINs, partner counts, contribution obligations, and registered addresses for Limited Liability Partnerships.
Programmatic execution of the MCA V3 portal Angular state, managing session tokens and routing logic automatically.
High-throughput resolution of MCA image CAPTCHAs using integrated solving infrastructure.
Residential proxy pools prevent IP bans and manage session timeouts during large batch lookups.
Submit lists of thousands of CINs for batch processing on a daily, weekly, or monthly cadence.
Track changes in paid-up capital, director appointments, or registered addresses over time with automated diffing.
Brief in. Clean data out.
Provide lists of CINs, DINs, or LLPINs. We design the extraction schema based on your data requirements.
We configure Playwright crawlers to navigate the MCA V3 portal, manage sessions, and handle CAPTCHAs.
Schema validation, null-rate checks, and data normalisation rules are applied to clean inconsistent government records.
Clean JSON, CSV, or Parquet files pushed to your AWS S3 bucket or Postgres database on schedule.
The Ministry of Corporate Affairs portal is notoriously unstable, heavily cached, and protected by aggressive rate limits. We handle the session state so you get clean data.
The MCA V3 portal relies on complex client-side routing and temporary session tokens. Our Playwright infrastructure maintains browser state, intercepts XHR requests, and handles token expiration gracefully.
Public searches on the MCA portal require solving image CAPTCHAs for every query. We route these challenges through automated solving APIs with fallback queues, maintaining high throughput without manual intervention.
The portal aggressively blocks IPs that make concurrent requests. We distribute lookups across Indian residential proxy networks, pacing requests to mimic normal user behaviour and avoid blacklisting.
Government portals experience frequent unscheduled downtime. Our Airflow orchestration layer detects 503 errors, pauses the pipeline, and queues pending CINs for processing once the portal returns online.
MCA data often contains formatting errors, missing fields, and inconsistent date strings. We apply strict normalisation rules before delivery, ensuring your database ingests clean, typed data.
Fintechs and B2B platforms verify business entities, check active compliance status, and validate registered addresses during merchant onboarding.
Banks and NBFCs query the index of charges to assess existing debt obligations before approving corporate loans.
Investment firms track director networks via DIN searches to map cross-directorships and identify potential conflicts of interest.
Sales teams target newly incorporated companies or entities that have recently increased their paid-up capital.
Market analysts track capital changes, new director appointments, and statutory filings across competitor portfolios.
Enterprise procurement teams audit vendor compliance and strike-off status to mitigate supply chain risk.
"The MCA database is the absolute ground truth for Indian corporate identity, but extracting that data requires fighting one of the most hostile government portals in existence."
Relying on manual MCA searches or unstable third-party APIs breaks internal workflows. DataFlirt builds dedicated extraction pipelines that navigate the V3 portal's Angular state, solve CAPTCHAs programmatically, and deliver structured JSON directly to your data warehouse.
Everything supported by our mca.gov.in scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.
Open-source tooling on proven cloud infra — no vendor lock-in, full observability.
Playwright handles the complex JavaScript execution and session token management required by the MCA V3 portal architecture.
Integrated CapSolver infrastructure resolves image challenges while Indian residential proxies prevent IP-based rate limiting.
Redis and Apache Airflow manage request queues, automatically pausing and retrying batches when the MCA portal experiences downtime.
Data delivered to where your team already works — no new tooling required.
About mca.gov.in scraping, legality, and pipeline operations.
Ask us directly →Yes. We extract strictly public, non-authenticated data available on the MCA master data search portal. We do not bypass authentication walls or access private filings. Clients should consult their legal counsel regarding specific data storage and usage policies.
The MCA portal frequently experiences maintenance and timeouts. Our Airflow orchestration detects these outages, pauses the active pipeline, and queues pending requests in Redis. Processing resumes automatically once the portal stabilises.
No. Downloading actual filing PDFs requires a registered MCA user account, payment of government fees, and OTP verification. We only extract the structured metadata available on the public search pages.
Yes. We can configure pipelines to monitor specific Registrars of Companies (RoCs) and deliver daily lists of newly incorporated CINs, including their initial authorised capital and registered addresses.
Due to the strict rate limits and CAPTCHA requirements of the MCA portal, we process large batches using controlled concurrency. A batch of 100,000 CINs typically completes within 48 to 72 hours to ensure high success rates and avoid IP bans.
Yes. By scheduling recurring lookups for a specific list of CINs, our change-detection system compares the current run against historical data and emits a diff record if the paid-up capital or active status has changed.
Yes. We extract public signatory details, which include the DIN, full name, designation, and appointment date for all directors associated with a given CIN.
20-minute scoping call. Pilot dataset within the week. Production within two. Stop fighting the V3 portal. Give us your CIN lists, and we will deliver structured company intelligence directly to your database.