GPTBot

GPTBot is OpenAI’s web crawler, announced in August 2023, that is used to collect publicly available web content for training future large language models. Unlike OAI-SearchBot, which is used for real-time retrieval during ChatGPT Browsing sessions, GPTBot’s primary purpose is training-data collection rather than live search. OpenAI provides a public user-agent string (GPTBot/1.x) and documentation so that webmasters can identify and selectively block the crawler via robots.txt. Blocking GPTBot prevents a site’s content from being included in OpenAI’s future training datasets, but does not necessarily stop the site from being cited in ChatGPT answers sourced via the Bing index or other retrieval methods. Some publishers have chosen to block GPTBot over concerns about unlicensed use of their content for commercial model training, while others have allowed it in the hope of gaining greater visibility within ChatGPT responses.