← Glossary / CSS-Based Text Hiding

What is CSS-Based Text Hiding?

Q: What is the difference between textContent and innerText?

In browser automation (like Playwright or Puppeteer), textContent returns the raw text inside a node exactly as it appears in the DOM, including hidden elements and script tags. innerText evaluates the CSS Object Model (CSSOM) and returns only the text that is visually rendered on the screen, respecting display: none and CSS text transforms. For CSS-hidden text, you must use innerText .

Q: How does custom web font obfuscation work?

The target serves a custom .woff file where the glyph mappings are scrambled. The HTML might contain the string "XYZ", but the custom font maps 'X' to the visual shape of '1', 'Y' to '2', and 'Z' to '3'. The user sees "123", but any scraper reads "XYZ". Bypassing this requires downloading the font file, parsing the TTF/WOFF tables, and building a reverse-mapping dictionary.

CSS-based text hiding is an obfuscation technique where the visual text rendered on a user's screen differs entirely from the raw text in the HTML document. By using pseudo-elements, negative text indents, custom web fonts, or interleaved hidden spans, targets ensure that naive HTTP scrapers extract garbage data. To a human, the price reads "$49.99" — to a DOM parser, it reads "$94.x9x9" with the "x"s hidden and the digits visually reordered by CSS.

ObfuscationDOM ParsingHeadlessPseudo-elementsData Quality

// 02 — definitions

What you see
isn't what you parse.

How targets use the browser's rendering engine against you, turning CSS from a styling language into a data obfuscation layer.

Ask a DataFlirt engineer →

TL;DR

CSS-based text hiding breaks standard HTML parsers by decoupling the DOM structure from the visual output. It relies on techniques like <code>display: none</code> on decoy spans, CSS pseudo-elements injecting content, or custom font ligatures mapping standard characters to different glyphs. Bypassing it requires either a full headless browser to evaluate the render tree or a custom CSS parser.

01Definition & structure

CSS-based text hiding is a defensive technique that exploits the difference between the Document Object Model (DOM) and the CSS Object Model (CSSOM). Standard scrapers parse the DOM; humans look at the rendered CSSOM. By injecting garbage data into the DOM and using CSS to hide or reorder it, targets ensure that automated tools extract poisoned data.

This technique is most commonly used to protect high-value, low-volume data points like pricing, email addresses, phone numbers, and flight availability.

02Common hiding techniques

Targets use several CSS properties to manipulate the visual output:

display: none or visibility: hidden applied to spans containing decoy characters.
position: absolute; left: -9999px to move decoy text off the visible screen.
font-size: 0 applied to a parent container, with specific child spans reset to a visible size.
CSS Flexbox or Grid order properties to visually scramble the sequence of characters that appear sequentially in the HTML.

03Pseudo-elements and Font Obfuscation

More advanced implementations remove the data from the HTML entirely. Using ::before and ::after pseudo-elements, the target injects the actual data via the CSS content property. Since pseudo-elements are not part of the DOM, tools like BeautifulSoup cannot see them.

Alternatively, targets use custom web fonts where the unicode mappings are intentionally broken. The DOM contains the string "ABC", but the custom font file instructs the browser to draw the shapes for "789".

04How DataFlirt handles it

We avoid the performance penalty of running headless browsers on every request. Instead, our pipeline architecture uses a hybrid model. We run a headless Playwright instance as a "canary" to evaluate the CSSOM and extract the true visual text. We then compare this to the raw DOM to generate a deterministic mapping ruleset.

This ruleset is deployed to our stateless extraction workers, allowing them to parse the HTML, apply the CSS logic mathematically, and extract the clean data at 10,000 pages per second without rendering a single pixel.

05The performance cost

The naive solution to CSS hiding is "just use Playwright and call innerText." While accurate, this is operationally disastrous at scale. A stateless HTTP request and HTML parse takes ~50ms and consumes negligible RAM. Booting a headless browser, downloading the CSS, building the render tree, and evaluating the layout takes ~1500ms and hundreds of megabytes of RAM. Relying purely on browsers to bypass CSS hiding will destroy your unit economics.

// 03 — the extraction cost

The math of
visual extraction.

Extracting CSS-obfuscated text forces a shift from fast, stateless HTML parsing to slow, stateful browser rendering. DataFlirt models this cost when deciding whether to reverse-engineer the CSS or just spin up Playwright.

Render penalty = T_render = T_html + T_cssom + T_layout

Headless extraction takes 10–50x longer than raw DOM parsing. Browser architecture model

Obfuscation entropy = H = log₂(decoy_chars / real_chars)

Higher ratio means higher likelihood of regex or heuristic failures. DataFlirt extraction heuristics

Visual confidence score = C = 1 − (hidden_nodes_extracted / total_nodes)

If C drops, a CSS rule changed and the pipeline halts to prevent data poisoning. DataFlirt validation layer

// 04 — the DOM vs the screen

Extracting a price
through the noise.

A trace of a naive scraper hitting a CSS-obfuscated price tag, followed by a render-aware extraction using the CSS Object Model.

BeautifulSoupPlaywrightCSSOM

edge.dataflirt.io — live

CAPTURED

// Naive HTML extraction (Stateless)
dom.raw: "$94.99"
css.rules: ".p-val span:nth-child(odd) { display: none; }"
extracted.text: "$94.99" // Decoy digits included
validation: FAIL — price exceeds historical bounds

// Render-aware extraction (Stateful)
engine: "Blink / CSSOM evaluation"
node.textContent: "$94.99"
node.innerText: "$49" // Respects display: none
node.pseudo_before: "content: '$';"
extracted.visual: "$49"
validation: PASS — matches visual render

// 05 — obfuscation methods

How the text
gets scrambled.

The most common CSS techniques used to decouple the DOM from the visual render, ranked by frequency across our monitored e-commerce and directory targets.

TARGETS MONITORED · · 1,200+ domains

DOMAINS USING CSS HIDING ~14%

UPDATED · · · · · · 2026-05-19

Decoy spans

display: none / opacity: 0 · Injects garbage characters into the DOM

Pseudo-element injection

::before / ::after · Content lives in CSS, not HTML

Absolute positioning

left: -9999px · Moves decoy text off the visible screen

Flex/Grid reordering

order: -1 · Visual sequence differs from DOM sequence

Custom font mapping

Ligatures / swapped glyphs · The letter 'A' renders as '7'

// 06 — our approach

Render the tree,

or reverse the ruleset.

When a target uses CSS-based text hiding, you have two choices: run a headless browser to evaluate the CSS Object Model (CSSOM) and read the computed innerText, or reverse-engineer the CSS rules and apply them to your raw HTML parser. Browsers are accurate but expensive; reverse-engineering is fast but brittle. DataFlirt uses a hybrid approach: we run a headless canary to detect the visual text, generate a mapping ruleset, and push that ruleset to our high-throughput stateless extractors. If the canary detects a mismatch, the ruleset is regenerated automatically.

Extraction Strategy: Hybrid

Live trace of a hybrid extraction worker handling an obfuscated phone number.

target.field contact_phone

obfuscation.type flex_reorder + decoy_spans

canary.status activeruleset matched

extraction.mode stateless_html

ruleset.version v14.2

latency.per_record 12msfast

data.confidence 0.99

Stay ahead of the pipeline

Data engineering
intel, weekly.

Anti-bot shifts, scraping infrastructure updates, dataset delivery patterns, and business outcomes from our pipelines. Short, technical, no fluff.

// 07 — FAQ

Common
questions.

About DOM parsing, CSSOM evaluation, font obfuscation, and how DataFlirt extracts accurate data without paying the headless browser tax on every request.

Ask us directly →

What is the difference between textContent and innerText? +

In browser automation (like Playwright or Puppeteer), textContent returns the raw text inside a node exactly as it appears in the DOM, including hidden elements and script tags. innerText evaluates the CSS Object Model (CSSOM) and returns only the text that is visually rendered on the screen, respecting display: none and CSS text transforms. For CSS-hidden text, you must use innerText.

Can I just use regex to clean up the decoy characters? +

Rarely. Targets that use CSS hiding typically randomize the decoy characters and their positions on every page load. A regex that strips 'x' today will fail tomorrow when the decoy becomes 'q'. You have to evaluate the CSS rules to know which characters are actually visible.

How does custom web font obfuscation work? +

The target serves a custom .woff file where the glyph mappings are scrambled. The HTML might contain the string "XYZ", but the custom font maps 'X' to the visual shape of '1', 'Y' to '2', and 'Z' to '3'. The user sees "123", but any scraper reads "XYZ". Bypassing this requires downloading the font file, parsing the TTF/WOFF tables, and building a reverse-mapping dictionary.

Does DataFlirt use headless browsers for all CSS-hidden text? +

No. Running a headless browser for every request is too slow and expensive for high-volume pipelines. We use headless browsers as "canaries" to periodically evaluate the CSS rules and generate a mapping logic. That logic is then applied to our fast, stateless HTML parsers. We only fall back to 100% headless extraction if the CSS rules are highly dynamic and tied to JavaScript execution.

How do you detect when the CSS rules change? +

Through continuous validation. Our extraction schemas define expected data types and historical bounds (e.g., a price should be numeric and between $10 and $500). If a CSS rule changes and we start extracting decoy characters, the validation layer catches the anomaly, quarantines the record, and triggers the headless canary to regenerate the CSS ruleset.

Is CSS obfuscation considered a legal barrier to scraping? +

Generally, no. CSS obfuscation is a technical countermeasure, not a legal one. It does not constitute an access control or authentication gate (like a password). Bypassing it to read publicly available data is typically viewed the same as standard HTML parsing, though you should always consult counsel regarding the specific Terms of Service of your target.

$ dataflirt scope --new-project --target=css-based-text-hiding READY

Tell us what
to extract.
We do the rest.

20-minute scoping call. Pilot dataset within the week. Production within two. Whether you need a one-off catalogue dump or a continuous feed across millions of records — we scope, build, and operate the pipeline.

Start a pipeline → View pricing

hello@dataflirt.com · Bengaluru · IST · typical reply < 4h

What is CSS-Based Text Hiding?

What you seeisn't what you parse.

TL;DR

The math ofvisual extraction.

Extracting a pricethrough the noise.

How the textgets scrambled.

Decoy spans

Pseudo-element injection

Absolute positioning

Flex/Grid reordering

Custom font mapping

Render the tree,

Extraction Strategy: Hybrid

Stay ahead of the pipeline

Data engineeringintel, weekly.

Commonquestions.

Tell us whatto extract.We do the rest.

Related glossary terms

Headless Browser

Dynamic Class Name Obfuscation

JavaScript Rendering

DOM Change Monitoring