What is HTML Entity Decoding?
HTML entity decoding is the process of converting safe, escaped HTML representations of characters (like & or ') back into their literal string values (like & or ') during data extraction. It is a mandatory normalisation step in any scraping pipeline. Without it, downstream systems receive polluted text that breaks search indexes, entity resolution, and NLP models.