BLS.gov Scraper — Economic, Employment & Price Index Extraction

Data Dictionary

Every field we extract from bls.gov

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Consumer Price Index objects from bls.gov. All fields typed and schema-versioned.

series_idarea_codeitem_codeyearperiodvaluefootnotesrelease_dateseasonally_adjusted

"series_id": "CUUR0000SA0",
"year": 2024,
"period": "M01",
"value": 309.685,
"area_code": "0000",
"item_code": "SA0",
"seasonally_adjusted": false

#	series_id	area_code	item_code	year	period	value
1
2
3

Complete list of extractable fields for Producer Price Index objects from bls.gov. All fields typed and schema-versioned.

series_idindustry_codeproduct_codeyearperiodvaluefootnotesbase_daterelease_date

"series_id": "PCU311---311---",
"industry_code": "311",
"product_code": "311---",
"year": 2024,
"period": "M01",
"value": 254.3,
"base_date": "198412"

#	series_id	industry_code	product_code	year	period	value
1
2
3

Complete list of extractable fields for Employment Stats objects from bls.gov. All fields typed and schema-versioned.

series_idstate_codearea_codeyearperiodlabor_forceemploymentunemploymentunemployment_ratepreliminary

"series_id": "LASST060000000000003",
"year": 2024,
"period": "M01",
"labor_force": 19345000,
"unemployment": 1005940,
"unemployment_rate": 5.2,
"preliminary": true

#	series_id	state_code	area_code	year	period	labor_force
1
2
3

Complete list of extractable fields for JOLTS Data objects from bls.gov. All fields typed and schema-versioned.

series_idindustry_coderegion_codedata_element_codeyearperiodratelevelfootnotes

"series_id": "JTU00000000JOL",
"industry_code": "000000",
"data_element_code": "JO",
"year": 2024,
"period": "M01",
"rate": 5.3,
"level": 8756

#	series_id	industry_code	region_code	data_element_code	year	period
1
2
3

Complete list of extractable fields for Occupational Wages objects from bls.gov. All fields typed and schema-versioned.

area_typearea_codeocc_codeocc_titletotal_employmenthourly_mean_wageannual_mean_wagehourly_median_wageannual_median_wagerelease_year

"occ_code": "15-1252",
"occ_title": "Software Developers",
"total_employment": 1476900,
"hourly_mean_wage": 63.92,
"annual_mean_wage": 132930,
"release_year": 2023

#	area_type	area_code	occ_code	occ_title	total_employment	hourly_mean_wage
1
2
3

Capabilities

Extracting clean time series from government infrastructure

Our bls.gov scraper handles legacy HTML structures, FTP flat files, and undocumented schema changes to deliver continuous, analysis-ready economic data.

Time Series Extraction

Extract continuous historical data across CPI, PPI, and employment metrics without manual file joining.

Legacy HTML Parsing

Navigate archaic government DOM structures and complex nested tables to extract precise data points.

Revision Tracking

Capture preliminary data and automatically update records when the BLS publishes revised figures.

Region Normalisation

Map complex BLS geographic codes to standard state, MSA, and county identifiers for easy joining.

Industry Code Mapping

Translate BLS-specific industry classifications into standard NAICS codes across all datasets.

Flat File Ingestion

Parse massive text and CSV files from the BLS FTP servers into structured warehouse tables.

Automated Schema Updates

Detect and adapt to unannounced changes in BLS reporting formats and table structures.

Cross-Index Joining

Combine wage data with local CPI figures to calculate real wage growth automatically.

Scheduled Updates

Run pipelines immediately following BLS release schedules to ensure minimal data latency.

// engagement pipeline

From government tables to warehouse tables

Brief in. Clean data out.

Define Scope

d 0

Provide the specific series IDs, indexes, or regional data you need. We design the extraction schema.

Pipeline Build

d 2–4

We configure Scrapy crawlers and flat-file parsers to handle bls.gov rate limits and structures.

Validation & QA

d 4–6

Schema validation, continuity checks, and historical data reconciliation before full launch.

Delivery

ongoing

JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage.

Under the hood

How our pipeline handles public data challenges

Government websites present unique scraping hurdles. Here is how we build resilient pipelines for bls.gov data.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

Archaic DOM structures

Resilient parsing for legacy HTML

Many BLS pages rely on nested tables and outdated markup. We build specific XPath and CSS fallback chains to extract data reliably despite structural inconsistencies.

Rate limiting

Strict compliance with government limits

The BLS enforces strict IP-based rate limiting. We manage request concurrency and rotate datacenter proxies to maintain high throughput without triggering blocks.

Data revisions

Automated historical corrections

Economic data is frequently revised months after initial publication. Our pipelines track preliminary flags and automatically update historical warehouse records when revisions occur.

File parsing

Handling massive flat files

For bulk historical data, the BLS provides massive, poorly formatted text files. We parse, type-cast, and normalise these files into clean Parquet columns.

Monitoring

Detecting unannounced changes

Government sites often change formats without notice. We monitor null rates and schema drift, alerting our engineers to adapt parsers before downstream pipelines fail.

Applications

Who uses BLS data and how

Teams across industries use bls.gov data to build competitive products and smarter operations.

Macroeconomic Forecasting

Quantitative funds feed CPI, PPI, and employment time series into models to predict interest rate movements.

Wage Benchmarking

HR platforms use OEWS data to set competitive compensation bands across different MSAs and occupations.

Inflation Adjustment

Supply chain teams use granular PPI data to negotiate material contracts with automatic inflation escalators.

Real Estate Analysis

Commercial real estate firms track local employment growth and wage trends to identify emerging markets.

Academic Research

Universities require clean, continuous historical datasets for long term economic and sociological studies.

Retail Site Selection

Retailers combine local unemployment rates and wage data to determine optimal locations for new stores.

Technical Spec

BLS scraper technical capabilities

Everything supported by our bls.gov scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Legacy HTML parsing

Custom parsers for outdated government table structures

Supported

Text file ingestion

Automated downloading and parsing of bulk FTP text files

Supported

Historical revisions

Tracking and updating preliminary data points upon revision

Supported

Seasonal adjustments

Extraction of both adjusted and unadjusted data series

Supported

NAICS code mapping

Standardisation of BLS industry codes to standard NAICS

Supported

Sub-state regional data

Extraction of MSA, county, and local area statistics

Supported

Webhook delivery

HTTP POST upon completion of scheduled data runs

Supported

Embargoed pre-release data

Data restricted prior to official publication times

Partial

Confidential microdata

Raw survey responses requiring specific government clearance

Partial

Infrastructure

Infrastructure powering the BLS pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

Scrapy Stack

Scrapy handles crawl orchestration, rate limiting, and HTML parsing. Custom middlewares manage retry logic and connection pooling for government servers.

Proxy Infrastructure

We maintain pools of datacenter proxies to distribute requests evenly, preventing IP bans while respecting BLS concurrency limits.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles scheduling to align with BLS data release calendars. All state stored in managed Postgres.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Newline-delimited or nested array structures

CSV

Flat files for easy import into statistical software

XLS

Excel format for manual analyst review

Parquet

Columnar format for fast querying in BigQuery or Snowflake

AWS S3

Direct bucket delivery for data lake integration

Webhook

HTTP POST notifications upon run completion

API

REST endpoint to query your extracted data

BigQuery

Streamed directly into your dataset

Snowflake

Stage and COPY INTO workflow

PostgreSQL

Direct upsert into your relational schema

Direct bucket delivery — compatible with any data lake

// faq

Common questions.

About bls.gov scraping, legality, and pipeline operations.

Ask us directly →

Is scraping bls.gov legal?

Yes. BLS data is public domain information provided by the US government. Scraping is permissible provided it adheres to their terms of service regarding rate limits and server load. DataFlirt manages request concurrency to ensure full compliance.

Why scrape when the BLS has an API?

The BLS Public Data API has strict daily query limits, limits on years per request, and often lacks the most granular regional or cross-tabulated data found in their flat files and HTML tables. Our pipelines bypass these API limitations by parsing the raw data sources.

How do you handle data revisions?

Economic indicators are frequently revised. We track the 'preliminary' flags on data points. Subsequent pipeline runs check for updates to historical periods and overwrite the warehouse records with the finalised figures.

Can you extract historical data?

Yes. We can extract time series data going back decades, depending on the specific index. We normalise historical file formats to match current schemas, providing a continuous dataset.

How quickly is data available after a BLS release?

We schedule pipelines to run immediately following the official BLS release times (typically 8:30 AM EST). Data is usually delivered to your warehouse within minutes of publication.

Do you support NAICS mapping?

Yes. Where the BLS uses legacy or proprietary industry codes, we can map these to standard North American Industry Classification System (NAICS) codes for easier joining with your internal data.

Economic indicators,
at warehouse scale.

Every field we extract from bls.gov

Extracting clean time series from government infrastructure

From government tables to warehouse tables

How our pipeline handles public data challenges

Who uses BLS data and how

BLS scraper technical capabilities

Infrastructure powering the BLS pipeline

Your data, your destination

Common questions.

Tell us what
to extract.
We do the rest.

Data Extraction for Every Industry

Economic indicators, at warehouse scale.

Every field we extract from bls.gov

Extracting clean time series from government infrastructure

From government tables to warehouse tables

How our pipeline handles public data challenges

Who uses BLS data and how

BLS scraper technical capabilities

Infrastructure powering the BLS pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

Economic indicators,
at warehouse scale.

Tell us what
to extract.
We do the rest.