Index Bloat

Index bloat describes a situation where a site has a far larger number of URLs indexed by Google than it has pieces of genuinely useful content. The excess is typically caused by faceted navigation creating millions of filtered URL combinations, session IDs or tracking parameters generating duplicate URLs, auto-generated thin pages, printer-friendly or PDF versions, and paginated archives that stretch indefinitely. Bloat wastes crawl budget, can dilute link equity across too many URLs, and may signal low quality to Google’s systems. The remedies include canonicalisation, consolidating parameter variants, applying noindex to low-value page types, and blocking unnecessary URL patterns in robots.txt. Content pruning addresses the content dimension of the same problem.