What is Encoding Mismatch?
Encoding mismatch occurs when a scraper interprets a byte stream using a different character set than the one the server used to encode it. This typically happens when HTTP headers declare one encoding, the HTML meta tag declares another, and the actual bytes are a third. For data pipelines, it results in silent data corruption - turning valid text into mojibake that breaks downstream analytics and NLP models.