← Glossary / NTLM Authentication

What is NTLM Authentication?

NTLM Authentication is a legacy Microsoft challenge-response protocol still heavily used in enterprise B2B portals, government databases, and older IIS-hosted web applications. For scraping pipelines, it presents a unique hurdle because it requires connection-level statefulness rather than simple cookie or header injection. If your scraper drops the TCP connection mid-handshake, or if your proxy pool rotates IPs too aggressively, the authentication fails silently and your pipeline drops records.

Auth ScrapingLegacy SystemsIISChallenge-ResponseStateful Connection
// 02 — definitions

The three-step
handshake.

Why standard HTTP clients fail against corporate portals, and how the NTLM protocol forces your scraper to maintain strict TCP state.

Ask a DataFlirt engineer →

TL;DR

NTLM is a proprietary Microsoft authentication protocol that uses a three-message exchange (Negotiate, Challenge, Authenticate) over a single persistent TCP connection. Unlike Basic or Bearer auth, you cannot simply attach a token to a stateless GET request. Scraping NTLM-protected targets requires specialized HTTP client adapters, strict connection pooling, and proxies that support session pinning.

01Definition & structure

NTLM Authentication is a challenge-response protocol developed by Microsoft. It operates at the HTTP connection layer rather than the application layer. When a client requests a protected resource, the server responds with a 401 and a WWW-Authenticate: NTLM header. The client then initiates a three-message handshake: Negotiate (Type 1), Challenge (Type 2), and Authenticate (Type 3).

Because the server associates the authentication state with the underlying TCP connection, the client must use HTTP Keep-Alive. If the connection drops, the authentication state is lost.

02The connection persistence problem

Most scraping scripts are built around stateless requests. You fire a GET, receive the HTML, and close the socket. NTLM violently rejects this pattern. If you send the Type 1 message on Socket A, and the Type 3 message on Socket B, the server will reject the authentication because Socket B has no associated challenge nonce.

This means your HTTP client must be explicitly configured to reuse connections, and any intermediate proxies must not sever or multiplex the TCP stream during the handshake.

03Tooling and library support

Out of the box, standard libraries like Python's requests, Node's axios, or Go's net/http do not handle NTLM. You must use wrapper libraries. In Python, requests-ntlm hooks into the authentication dispatch process to automatically perform the handshake when a 401 is encountered. In Node, libraries like httpntlm are required. Headless browsers handle it natively via their network interception APIs.

04How DataFlirt handles it

We maintain dedicated worker pools for legacy B2B targets. When a pipeline is flagged as requiring NTLM, we bypass our standard stateless proxy rotation. Instead, we allocate a pinned residential or datacenter IP, establish a persistent TCP connection, complete the NTLM handshake, and then stream the extraction jobs through that authenticated socket until the server forces a teardown. This guarantees zero dropped nonces and maximizes throughput.

05NTLM vs Kerberos (SPNEGO)

NTLM is technically deprecated by Microsoft in favor of Kerberos. In modern IIS setups, you will often see WWW-Authenticate: Negotiate instead of just NTLM. This uses SPNEGO (Simple and Protected GSSAPI Negotiation Mechanism) to try Kerberos first, and fall back to NTLM if the client isn't part of the Active Directory domain. For external scrapers, Kerberos is usually impossible to negotiate, so the fallback to NTLM is what actually executes.

// 03 — the protocol

How the NTLM
exchange works.

The protocol requires three distinct HTTP messages over the exact same TCP socket. If the socket closes at any point, the server discards the nonce and the sequence resets.

Type 1 (Negotiate) = Client → Server: NTLMSSP_NEGOTIATE
Client advertises its capabilities and supported NTLM versions. RFC-draft: The NTLM Authentication Protocol
Type 2 (Challenge) = Server → Client: NTLMSSP_CHALLENGE + Nonce
Server responds with a 401 and a 64-bit random challenge (nonce). IIS Server behavior
Type 3 (Authenticate) = Client → Server: NTLMSSP_AUTH + Hash(Nonce, Password)
Client hashes the nonce with the user's password and sends it back. NTLMv2 specification
// 04 — wire trace

A successful NTLM
authentication flow.

A raw HTTP trace showing the initial 401 Unauthorized challenge, the Type 1 negotiation, and the final Type 3 authentication message required to establish the session.

HTTP/1.1Keep-Aliverequests-ntlm
edge.dataflirt.io — live
CAPTURED
// 1. Initial unauthenticated request
GET /b2b/inventory/catalog.aspx HTTP/1.1
Host: legacy-portal.corp.local

// Server demands NTLM
HTTP/1.1 401 Unauthorized
WWW-Authenticate: NTLM

// 2. Client sends Type 1 (Negotiate)
GET /b2b/inventory/catalog.aspx HTTP/1.1
Authorization: NTLM TlRMTVNTUAABAAAAB4IIAAAAAAAAAAAAAAAAAAAAAAA=

// Server sends Type 2 (Challenge) with Nonce
HTTP/1.1 401 Unauthorized
WWW-Authenticate: NTLM TlRMTVNTUAACAAAA...

// 3. Client sends Type 3 (Authenticate)
GET /b2b/inventory/catalog.aspx HTTP/1.1
Authorization: NTLM TlRMTVNTUAADAAAAGAAYAHIAAAAYABgAigAAABQAF...

// Server accepts and returns data
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
// 05 — failure modes

Why NTLM scrapers
break in production.

NTLM is brittle by modern web standards. These are the most common reasons an NTLM-authenticated pipeline drops records, ranked by frequency across DataFlirt's B2B extraction jobs.

B2B PIPELINES ·  ·  ·  ·  140+ active
AUTH FAILURES ·  ·  ·  ·  per 10k reqs
UPDATED ·  ·  ·  ·  ·  ·  2026-05-19
01

Connection drops

% of failures · TCP socket closes mid-handshake, resetting the nonce
02

Proxy incompatibility

% of failures · Rotating proxies breaking Keep-Alive or stripping headers
03

Load balancer interference

% of failures · LBs routing Type 3 message to a different backend node
04

NTLMv1 vs NTLMv2 mismatch

% of failures · Server enforces v2, client library defaults to v1
05

Domain misconfiguration

% of failures · Missing or incorrect DOMAIN\ prefix in credentials
// 06 — infrastructure

Stateful connections,

in a stateless scraping architecture.

Modern scraping fleets are designed to be stateless, rotating IPs and tearing down connections aggressively to avoid detection. NTLM breaks this paradigm. To scrape NTLM targets reliably, DataFlirt routes these jobs to specialized worker pools that pin the proxy session and enforce strict TCP keep-alive until the extraction payload is fully delivered. We treat the entire three-step handshake and subsequent data fetch as a single atomic operation.

ntlm-worker-config.yaml

Configuration for a dedicated NTLM extraction worker hitting a legacy B2B portal.

auth.protocol NTLMv2enforced
tcp.keep_alive truemandatory
proxy.rotation per_sessionpinned
pool.max_size 10 connections
domain_prefix CORP_GLOBAL
handshake.timeout 5000ms
pipeline.status active

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

Common questions about scraping legacy Microsoft portals, handling connection state, and scaling NTLM pipelines.

Ask us directly →
Can I use standard Python requests to scrape an NTLM target? +
No. The standard requests library does not support NTLM natively. You will receive a persistent 401 Unauthorized. You must use an adapter like requests-ntlm or httpx-ntlm, which intercepts the 401 and automatically handles the Type 1 and Type 3 message generation.
Why does my proxy pool break NTLM authentication? +
NTLM requires the entire three-step handshake to occur over a single, persistent TCP connection. If you are using a rotating proxy that assigns a new IP per request, the Type 3 authentication message will be sent from a different IP than the Type 1 message. The server will reject it. You must use sticky sessions (session pinning) on your proxy.
Is NTLM still relevant for modern data extraction? +
Yes. While modern consumer web apps use OAuth or JWTs, thousands of enterprise B2B portals, healthcare inventory systems, and government procurement databases still run on legacy IIS servers configured for NTLM. If you extract supply chain or B2B pricing data, you will encounter it.
How does DataFlirt scale NTLM pipelines? +
We use connection pooling heavily. Once an NTLM handshake is completed on a specific TCP socket, that socket is authenticated for its lifespan. We keep the socket open in a pool and multiplex subsequent GET requests over it, avoiding the overhead of re-authenticating for every single page of a catalog.
What is the difference between NTLM and Basic Auth? +
Basic Auth is stateless; it simply base64-encodes your username and password and sends it with every request. NTLM is a stateful challenge-response protocol. The server sends a random nonce, and the client hashes the password with that nonce. The actual password is never sent over the wire, but it requires connection persistence.
Can I scrape NTLM targets with headless browsers like Playwright? +
Yes. Both Playwright and Puppeteer support HTTP credentials and will automatically handle NTLM handshakes if you pass the username and password in the httpCredentials context option. However, spinning up a full browser just to handle an NTLM handshake is massive overkill if the target data is in the raw HTML.
$ dataflirt scope --new-project --target=ntlm-authentication READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

hello@dataflirt.com  ·  Bengaluru  ·  IST  ·  typical reply < 4h