SYSTEM all green source treasury.gov queue 1,492 documents p99 latency 318ms dataflirt.com · scraper/treasury-gov

RUN : 34 active pipelines : treasury.gov live

Treasury data,
at warehouse scale.

We extract yield curve rates, OFAC SDN lists, foreign exchange rates, and public debt figures from treasury.gov. Delivered as clean JSON, CSV, or Parquet to S3, BigQuery, or Snowflake on your cadence.

Get data from treasury.gov → See how it works

Yield rates extracted

4,192 /day

Sanction entities

38,419 /run

Historical records

2.8M /total

Active pipelines

Uptime

99.99%

◆ Daily Yield Curve Rates◆ OFAC SDN Sanctions List◆ Public Debt to the Penny◆ Foreign Exchange Rates◆ Treasury Bulletin Data◆ Interest Rate Statistics◆ Monthly Treasury Statement◆ Legacy HTML Table Parsing◆ PDF Document Extraction◆ Managed Data Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA◆ Daily Yield Curve Rates◆ OFAC SDN Sanctions List◆ Public Debt to the Penny◆ Foreign Exchange Rates◆ Treasury Bulletin Data◆ Interest Rate Statistics◆ Monthly Treasury Statement◆ Legacy HTML Table Parsing◆ PDF Document Extraction◆ Managed Data Pipeline◆ S3 / BigQuery Delivery◆ Bengaluru HQ◆ Enterprise SLA

Data Dictionary

Every field we extract from treasury.gov

Structured, schema-consistent data across all major object types — delivered clean, typed, and ready to query.

Complete list of extractable fields for Yield Curve Rates objects from treasury.gov. All fields typed and schema-versioned.

date1_mo2_mo3_mo6_mo1_yr2_yr3_yr5_yr7_yr10_yr20_yr30_yrscraped_at

"date": "2023-10-24",
"1_mo": 5.54,
"3_mo": 5.59,
"1_yr": 5.42,
"10_yr": 4.84,
"30_yr": 4.96,
"scraped_at": "2023-10-24T18:00:00Z"

#	date	1_mo	2_mo	3_mo	6_mo	1_yr
1
2
3

Complete list of extractable fields for OFAC SDN Sanctions objects from treasury.gov. All fields typed and schema-versioned.

entity_idnametypeprogramsnationalitiescitizenshipsdates_of_birthplaces_of_birthaddressesremarksaliasesscraped_at

"entity_id": "2674",
"name": "KIM, Jong Un",
"type": "individual",
"programs": "DPRK2",
"nationalities": "Democratic People's Republic of Korea",
"remarks": "(Linked To: WORKERS' PARTY OF KOREA)",
"scraped_at": "2023-10-24T18:05:00Z"

#	entity_id	name	type	programs	nationalities	citizenships
1
2
3

Complete list of extractable fields for Public Debt objects from treasury.gov. All fields typed and schema-versioned.

record_datedebt_held_by_publicintragovernmental_holdingstotal_public_debt_outstandingsource_systemreporting_frequencyfiscal_yearfiscal_quarterscraped_at

"record_date": "2023-10-23",
"debt_held_by_public": 26543128491.42,
"intragovernmental_holdings": 7084918231.11,
"total_public_debt_outstanding": 33628046722.53,
"reporting_frequency": "Daily",
"scraped_at": "2023-10-24T18:10:00Z"

#	record_date	debt_held_by_public	intragovernmental_holdings	total_public_debt_outstanding	source_system	reporting_frequency
1
2
3

Complete list of extractable fields for Exchange Rates objects from treasury.gov. All fields typed and schema-versioned.

record_datecountrycurrencyexchange_rateeffective_datesource_documentquarteryearscraped_at

"record_date": "2023-09-30",
"country": "India",
"currency": "Rupee",
"exchange_rate": 83.12,
"effective_date": "2023-09-30",
"quarter": 3,
"year": 2023

#	record_date	country	currency	exchange_rate	effective_date	source_document
1
2
3

Complete list of extractable fields for Treasury Bulletins objects from treasury.gov. All fields typed and schema-versioned.

publication_datetitledocument_urldocument_typefile_size_bytesauthoring_bureausummary_textparsed_tables_countscraped_at

"publication_date": "2023-09-01",
"title": "Treasury Bulletin September 2023",
"document_url": "https://fiscal.treasury.gov/files/bulletin/b2023-3.pdf",
"document_type": "PDF",
"file_size_bytes": 4194304,
"parsed_tables_count": 42

#	publication_date	title	document_url	document_type	file_size_bytes	authoring_bureau
1
2
3

Capabilities

Everything you need from treasury.gov: nothing you don't

Our treasury.gov scraper handles every layer of the platform: legacy HTML tables, complex PDF extraction, inconsistent XML feeds, and daily financial updates.

Yield Curve Extraction

Daily interest rates and historical backfills for 1-month to 30-year maturity periods, parsed directly from Treasury data feeds.

OFAC Sanctions Monitoring

Extract and normalise the Specially Designated Nationals (SDN) list, capturing aliases, addresses, and linked entities.

Debt to the Penny

Daily public debt figures, separating debt held by the public from intragovernmental holdings, timestamped per release.

Exchange Rate Tracking

Quarterly and daily foreign currency exchange rates published by the Treasury for reporting purposes.

Legacy HTML Parsing

Navigate 1990s table structures and nested DOM elements that lack modern CSS classes or IDs.

PDF Data Extraction

OCR and table parsing for official Treasury Bulletins and Monthly Treasury Statements.

XML / CSV Normalisation

Standardising disparate Treasury formats into a unified schema for downstream ingestion.

Scheduled Updates

Run pipelines immediately after daily market close or official Treasury publication times.

Historical Backfills

Extract yield curve and debt data going back to 1990 for comprehensive macroeconomic modelling.

Schema Validation

Ensure financial figures match expected types, preventing strings from breaking your numerical models.

// engagement pipeline

From Treasury endpoint to warehouse record

Brief in. Clean data out.

Define Scope

d 0

Provide specific datasets, historical ranges, or document types. We design the extraction schema together.

Pipeline Build

d 2–4

We configure Scrapy crawlers, PDF parsing modules, and normalisation logic for treasury.gov.

Validation & QA

d 4–6

Schema validation, null-rate checks, format normalisation, and sample exports before full launch.

Delivery

ongoing

JSON, CSV, or Parquet pushed to your S3 bucket, BigQuery dataset, or Snowflake stage on agreed cadence.

Under the hood

How our Treasury pipeline handles the hard parts

Government websites present unique challenges. Here is how we stay resilient and why teams choose managed infrastructure over DIY.

// fingerprinting

Identity rotation

TLS fingerprintrandomised

User-agentrotated

IP poolresidential

Challenges blocked0

// pagination

Page coverage

48,291 pages queued running

// observability

Pipeline health

99.9%

uptime

142ms

p99 lat

0.3%

null rate

alerts

Legacy HTML structures

Dealing with nested tables without CSS classes

Many treasury.gov pages rely on table-based layouts built decades ago. Our selector strategy uses structural mapping and text-pattern matching to extract data reliably, even when standard CSS selectors fail.

PDF Table Extraction

Parsing complex financial tables from Treasury PDFs

Treasury Bulletins are often published solely as PDFs. We use advanced OCR and spatial table parsing to convert these documents into structured, queryable data.

Inconsistent XML schemas

Handling undocumented schema changes

Government data feeds frequently change their XML structures without warning. Our pipelines include schema drift detection and fallback logic to maintain data continuity.

Rate limiting

Managing requests on public endpoints

We implement intelligent request pacing and concurrency limits to respect treasury.gov infrastructure while ensuring timely data extraction.

Data normalisation

Converting strings to exact floats

Financial figures are often presented with commas, currency symbols, or text descriptors like 'Trillion'. Our normalisation layer cleanses these inputs into strict numerical types.

Applications

Who uses Treasury data and how

Teams across industries use treasury.gov data to build competitive products and smarter operations.

Macroeconomic Modelling

Quantitative analysts use daily yield curves to build interest rate models and forecast economic cycles.

KYC & AML Compliance

Financial institutions automate checks against the OFAC SDN list to ensure regulatory compliance.

Currency Risk Management

Corporate treasuries track official exchange rates for accounting and risk mitigation purposes.

Fiscal Policy Analysis

Think tanks and researchers monitor public debt trends and intragovernmental holdings.

Fixed Income Pricing

Traders benchmark corporate bonds and mortgage rates against Treasury yield curves.

Historical Financial Research

Academia analyses 30-year interest rate cycles using our complete historical backfills.

Why DataFlirt

"The US Treasury holds the foundational metrics for global finance, but accessing decades of yield curves and debt figures requires navigating a maze of legacy HTML and PDFs."

Most quantitative teams waste weeks building parsers for treasury.gov. The site mixes modern APIs with 1990s table layouts, inconsistent XML feeds, and complex PDF bulletins. DataFlirt abstracts this chaos. We handle the extraction, normalisation, and schema validation, delivering clean financial time-series directly to your data warehouse.

Technical Spec

Treasury scraper: technical capabilities

Everything supported by our treasury.gov scraper — rendered SPA elements, auth walls, rate-limit evasion and beyond.

Daily Yield Curve Rates

Extraction of all maturity periods, updated daily

Supported

OFAC SDN List Parsing

Full entity resolution including aliases and addresses

Supported

Public Debt to the Penny

Daily extraction of total public debt outstanding

Supported

Legacy HTML Table Extraction

Parsing deeply nested table structures

Supported

Treasury Bulletin PDF Parsing

Spatial extraction of tables within official PDFs

Supported

Historical Data Backfills

Extraction of datasets dating back to 1990

Supported

Change Detection (Diffs)

Only emit records with changed fields since last run

Supported

Webhook Delivery

HTTP POST per record for real-time processing

Supported

Non-Public IRS Tax Records

Individual or corporate tax filings requiring authentication

Partial

Classified Financial Intelligence

Internal Treasury communications and classified reports

Partial

Infrastructure

Infrastructure powering the Treasury pipeline

Open-source tooling on proven cloud infra — no vendor lock-in, full observability.

ScrapyPlaywrightPython 3.12RedisPostgreSQLApache AirflowAWS LambdaS3CloudWatch2CaptchaCapSolverResidential ProxiesDockerKubernetesGrafanaPrometheus

Legacy DOM Parsing

Custom Scrapy pipelines designed specifically to navigate and parse 1990s HTML table structures that lack modern identifiers.

PDF Extraction Pipeline

Advanced OCR and spatial parsing modules to extract structured tables from official Treasury reports and bulletins.

Cloud-Native Orchestration

Pipelines run on AWS Lambda and ECS. Airflow handles daily scheduling immediately following official Treasury publication times.

Output & Delivery

Your data, your destination

Data delivered to where your team already works — no new tooling required.

JSON

Newline-delimited or nested

CSV

Flat file with typed columns

XLS

Excel compatible format for analysts

Parquet

Columnar format for BigQuery, Snowflake, Athena

AWS S3

Direct bucket delivery

Webhook

HTTP POST per record for real-time workflows

API

Queryable REST endpoints for extracted data

PostgreSQL

Upsert into your existing schema

BigQuery

Streamed directly into your dataset

Direct bucket delivery — compatible with any data lake

// faq

Common questions.

About treasury.gov scraping, legality, and pipeline operations.

Ask us directly →

Is scraping treasury.gov legal?

Yes. Treasury.gov publishes public government data intended for transparency and public use. DataFlirt extracts only publicly available information and adheres to reasonable request limits to avoid disrupting government infrastructure.

How quickly are daily yield curves updated?

Our pipelines are scheduled to run immediately after the Treasury publishes daily updates, typically delivering data to your warehouse within minutes of official publication.

Can you parse the OFAC SDN list?

Yes. We extract and normalise the entire Specially Designated Nationals list, resolving complex nested data such as multiple aliases, addresses, and linked entities into a clean relational schema.

Do you extract data from Treasury PDFs?

Yes. We utilise OCR and spatial table parsing to extract structured numerical data from Treasury Bulletins and Monthly Treasury Statements published in PDF format.

How far back does historical data go?

We can backfill yield curve and public debt data as far back as the Treasury provides it on their domain, which for many datasets extends to 1990.

How do you handle changes to the website structure?

We monitor pipelines continuously. If a legacy table structure or XML schema changes, our alerting system flags the anomaly, and our engineers update the selectors to restore data flow.

Can I get this data via API?

Yes. While we specialise in pushing data to your warehouse via S3, BigQuery, or Snowflake, we can also expose the extracted datasets via a managed REST API.

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a daily feed of yield curves or a historical backfill of public debt figures, we build and operate the pipeline. Tell us what you need.

Start a treasury.gov pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h

Services

Data Extraction for Every Industry

View All Services →

🛍️ eCommerce → 🔍 Search Engine → ⚽ Sports Data → 📱 App Store → 🍕 Food Delivery → 📉 Betting Odds → ✈️ Aviation & Flight → 🛒 Grocery → 🎓 E-Learning → 💹 Stock Market → 🏠 Real Estate → 🤖 AI Training Data → 🧠 LLM Data → 📰 News → ⭐ Reviews → 💼 Job Board → 🏥 Healthcare → 💊 Pharma → 🏢 Company Data → 🤝 B2B Marketplace → 🚗 Automotive → 🌍 Travel → 🏨 Hospitality → 🪙 Cryptocurrency → 💡 IP & Patents → 📈 SEO Data → ⚖️ Legal → 🛡️ Insurance → 📲 Mobile App → 📸 Influencer → 🏛️ Government → 🚚 Transportation → 🎟️ Events → 📂 Directory → ⚡ Dynamic Websites → 📄 PDF Extraction → ✍️ Blog Content → ☁️ Weather → 🖥️ Cloud Scraping → 👨‍💻 Managed Service →

Treasury data, at warehouse scale.

Every field we extract from treasury.gov

Everything you need from treasury.gov: nothing you don't

From Treasury endpoint to warehouse record

How our Treasury pipeline handles the hard parts

Who uses Treasury data and how

Treasury scraper: technical capabilities

Infrastructure powering the Treasury pipeline

Your data, your destination

Common questions.

Tell us whatto extract. We do the rest.

Data Extraction for Every Industry

Treasury data,
at warehouse scale.

Tell us what
to extract.
We do the rest.