A recent analysis has revealed that 79% of top news websites in the UK and US are taking proactive measures against AI training bots by blocking their access. These sites include prominent publishers such as the BBC, The New York Times, and The Wall Street Journal, all of which have banned all analyzed AI crawlers from their platforms.
The trend stems from increasing unease over how AI models use published content. Many organizations are particularly concerned about retrieval-augmented generation, a method requiring frequent content scraping to generate accurate, real-time chatbot responses. The analysis highlights that AI development tools like Google’s ‘Google Extended’ enable publishers to exclude their content from being used in AI chatbot Gemini and Vertex training processes. However, completely opting out of Googlebot also risks negatively affecting search visibility, putting publishers in a difficult position.
Despite the growing resistance, some publishers, including Fox News and The Independent, still allow full access to all 11 AI bots reviewed in the study. This divergence shows that opinions surrounding AI’s role in content management remain divided within the media landscape.
Key measures by publishers underline the tension between protecting intellectual property and meeting growing demands for real-time AI-driven information. As AI systems continue to expand their capabilities, concerns about compensation, content misuse, and revenue sharing have added urgency to these decisions.
For AI and technology enthusiasts, as well as SEO and content specialists, this shift signals the need to closely monitor emerging tools that help balance content protection and visibility. For example, web developers managing WordPress clients may need to adjust website configurations to align with evolving AI bot policies.
A tool like the Content Freshness Engine could help publishers optimize content while maintaining control over its accessibility to AI crawlers.
Source: Press Gazette
Source: Press Gazette