Cloud services provider, Cloudflare, has launched a new tool that can detect unwanted AI bots, and stop them from scraping data from the websites they host and using it to train AI models without permission or other malicious activities.
Cloudflare has analyzed AI bot and crawler activity and, using ML and global signals from its network (which sees over 57 million requests per second), can now detect spoof AI bots that mimic the behavior of real-life website users and ignore robot.txt rules, which traditionally told bots which pages it could or couldn’t access.
“When bad actors attempt to crawl websites at scale, they generally use tools and frameworks that we can fingerprint. Based on these signals, our models [are] able to appropriately flag traffic from evasive AI bots as bots.”
Although website owners can block bots by altering the robot.txt files to prevent them from crawling their content pages and stealing data without permission, this rarely works: Many AI scrapers simply bypass the robot.txt, ignore the rules, and continue scraping data without authorization.
By leveraging advanced ML techniques to build finely tuned bot detection models, Cloudflare’s new (free) tool should be able to detect dishonest bots and prevent them from stealing data.