← Glossary / API Response Pagination

What is API Response Pagination?

Q: What is the difference between offset and cursor pagination?

Offset pagination uses limit and offset (e.g., skip 100, take 50). It's easy to implement but suffers from performance degradation on deep pages and data shifts (if a record is added, everything shifts, causing duplicates or missed rows). Cursor pagination uses a unique pointer (e.g., after_id=994 ). It is stable against data shifts and highly performant, but harder to jump to a specific page.

API response pagination is the mechanism servers use to divide large datasets into manageable, sequential chunks across multiple HTTP requests. For scraping pipelines, it dictates how you traverse an endpoint to extract a complete dataset without triggering rate limits or memory exhaustion. Handling pagination incorrectly leads to duplicate records, infinite loops, or silent data loss when the underlying dataset shifts during the crawl.

Network LayerCursorOffsetData CompletenessAPI Scraping

// 02 — definitions

Traversing the
data sequence.

The structural patterns APIs use to serve millions of records without crashing, and how scrapers must adapt to capture every row.

Ask a DataFlirt engineer →

TL;DR

Pagination splits massive API payloads into discrete pages using offsets, cursors, or page tokens. It's a fundamental constraint in data extraction. The difference between a fragile script and a production pipeline is how it handles cursor expiration, rate limits, and dataset mutations while traversing thousands of pages.

01Definition & structure

API response pagination is a design pattern used to restrict the size of data returned in a single HTTP response. Instead of dumping a million records at once—which would crash the server and timeout the client—the API returns a subset (a "page") along with metadata on how to fetch the next subset. For data pipelines, mastering pagination means writing logic that can reliably follow these breadcrumbs until the dataset is completely exhausted.

02Offset vs. Cursor vs. Page Token

There are three dominant paradigms:

Offset: Uses ?limit=100&offset=200. Easy to parallelize, but vulnerable to data shifting (if a record is deleted, everything shifts up, causing you to skip a row).
Cursor: Uses a unique identifier ?after_id=994. Highly stable and performant, but strictly sequential.
Page Token: Uses an opaque string ?token=xyz123. Often stateful on the server side, meaning it can expire if you wait too long between requests.

03The data mutation problem

The biggest silent failure in API scraping is dataset mutation during an offset-based crawl. If you are scraping an active e-commerce catalog and a product is added to page 1 while you are fetching page 5, all subsequent records shift down. When you request page 6, you will ingest a duplicate of the last item from page 5. Conversely, if an item is deleted, you will silently skip a record. This is why cursor-based pagination is heavily preferred for data integrity.

04How DataFlirt handles it

We treat pagination as a critical state machine. Our workers persist their exact position to a Redis cluster after every successful page extraction. If a proxy gets banned or the target API goes down for maintenance, the job suspends and resumes later from the exact cursor or offset. We also deploy automated loop detection to catch APIs that improperly return the final page infinitely, ensuring pipelines terminate cleanly.

05Did you know?

Many modern APIs implement a hard limit on offset depth (commonly 10,000 records) to protect their database from expensive deep-paging queries. If you try to request ?offset=10001, the API will return a 400 error. To extract datasets larger than this limit, scrapers must dynamically slice the queries using filters (like date ranges or price brackets) to ensure no single query exceeds the 10,000 record threshold.

// 03 — the math

Calculating the
traversal depth.

Understanding the bounds of a paginated endpoint is critical for capacity planning. DataFlirt uses these models to allocate worker concurrency and estimate pipeline completion times.

Offset calculation = Offset = (Page − 1) × Limit

The standard SQL-backed pagination formula. Vulnerable to data shifts. Standard REST convention

Total requests required = Reqs = ⌈ Total_Records / Page_Size ⌉

Determines the minimum number of HTTP calls to exhaust the endpoint. Pipeline capacity model

DataFlirt parallelization factor = Workers = min(Max_Concurrency, Total_Records / Chunk_Size)

How we slice offset-based endpoints to reduce a 10-hour crawl to 15 minutes. Internal scheduler logic

// 04 — pipeline execution

Following the cursor
through 50,000 records.

A live trace of a DataFlirt worker traversing a cursor-based API. Notice the state persistence and the handling of a mid-crawl rate limit.

Cursor-basedStateful resumeRate limit backoff

edge.dataflirt.io — live

CAPTURED

// init traversal
target.endpoint: "https://api.target.com/v3/catalog"
pagination.type: "cursor"

// page 1
GET "?limit=1000"
status: 200 OK records: 1000
next_cursor: "eyJpZCI6MTAwMH0="
state.checkpoint: saved to redis

// page 42
GET "?limit=1000&cursor=eyJpZCI6NDIwMDB9"
status: 429 Too Many Requests
retry_after: 15s
worker.action: sleeping 15000ms

// page 42 (retry)
GET "?limit=1000&cursor=eyJpZCI6NDIwMDB9"
status: 200 OK records: 1000

// page 50 (final)
next_cursor: null
pipeline.status: exhausted · 50,000 records extracted

// 05 — failure modes

Where pagination
breaks down.

Ranked by frequency across DataFlirt's API extraction pipelines. Pagination seems simple until you hit scale, at which point state management and edge cases dominate.

PIPELINES MONITORED · 412 active

PAGINATION TYPE · · · API endpoints

UPDATED · · · · · · 2026-05-19

Dataset mutation during crawl

duplicates/misses · Records shift across page boundaries while scraping

Cursor expiration / timeout

state loss · Taking too long between requests invalidates the token

Infinite pagination loops

logic error · API returns the same cursor or last page repeatedly

Hard offset limits

max 10,000 · Elasticsearch/Solr rejecting deep offset queries

Inconsistent page sizes

validation fail · API returns fewer records than limit despite more existing

// 06 — our architecture

Traverse deeply,

without losing your place.

DataFlirt's extraction engine treats pagination as a stateful, resumable operation. We don't just follow next links blindly in memory. We persist cursor state to Redis after every successful batch. If a worker dies, a proxy rotates, or an API throws a 502 on page 4,000, the pipeline resumes exactly where it left off. For offset-based APIs that support it, we partition the total record space using date or category filters and extract chunks concurrently, bypassing hard offset limits and drastically reducing crawl time.

pagination.state

Live state of a resumable pagination job in the DataFlirt scheduler.

job.id api-traverse-882

strategy cursor-based

records.yielded 42,000

current_cursor eyJpZCI6NDIwMDB9

checkpoint.age 1.2s ago

rate_limit.hits 3handled

loop_detection activeclean

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About pagination strategies, handling hard limits, parallelization, and how DataFlirt ensures data completeness across massive API endpoints.

Ask us directly →

What is the difference between offset and cursor pagination? +

Offset pagination uses limit and offset (e.g., skip 100, take 50). It's easy to implement but suffers from performance degradation on deep pages and data shifts (if a record is added, everything shifts, causing duplicates or missed rows). Cursor pagination uses a unique pointer (e.g., after_id=994). It is stable against data shifts and highly performant, but harder to jump to a specific page.

How do you bypass the 10,000 record offset limit? +

Many APIs backed by Elasticsearch or Solr hard-cap offsets at 10,000 to prevent memory exhaustion. To extract 500,000 records, you cannot just paginate deeply. We use filter slicing: we dynamically inject filters (like narrow date ranges, price brackets, or alphabetical prefixes) to ensure no single query matches more than 10,000 records, paginating fully within each slice.

Can you parallelize cursor-based pagination? +

Strictly speaking, no. Cursor pagination is inherently sequential — you need the response of page N to get the cursor for page N+1. However, DataFlirt parallelizes the overall job by splitting the initial query into orthogonal segments (e.g., one worker per category or date range), allowing multiple sequential cursor chains to run concurrently.

How does DataFlirt prevent infinite pagination loops? +

Poorly implemented APIs sometimes return the final page repeatedly instead of an empty array or null cursor. We maintain a rolling hash of the last three response payloads and track cursor values. If the payload hash matches exactly, or the cursor fails to advance, our loop detection circuit breaks the traversal and marks the endpoint as exhausted.

What happens if a cursor expires mid-crawl? +

Some APIs use time-sensitive cursors (like AWS or certain GraphQL endpoints) that expire if not used within 5 minutes. If a rate limit or network error delays the worker and the cursor dies, DataFlirt's state manager falls back to the last known stable anchor, re-fetches the previous page to generate a fresh cursor, and resumes without duplicating data.

Is it legal to scrape paginated public APIs? +

Yes, accessing publicly available data via an exposed API is generally lawful, provided you do not bypass authentication or breach specific terms of service. However, aggressive pagination can trigger denial-of-service protections. We strictly adhere to rate limits and concurrency caps to ensure our extraction remains non-disruptive to the target's infrastructure.

$ dataflirt scope --new-project --target=api-response-pagination READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

Start a pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h

What is API Response Pagination?

Traversing thedata sequence.

TL;DR

Calculating thetraversal depth.

Following the cursorthrough 50,000 records.

Where paginationbreaks down.

Dataset mutation during crawl

Cursor expiration / timeout

Infinite pagination loops

Hard offset limits

Inconsistent page sizes

Traverse deeply,

pagination.state

Stay ahead of the pipeline

Data engineeringintel, weekly.

Commonquestions.

Tell us whatto extract.We do the rest.

Related glossary terms

Cursor-Based Pagination

Offset Pagination

Page Token Pagination

Infinite Pagination Loop