Reddit is suing firms SerApi, OxyLabs, AWMProxy and Perplexity for allegedly scraping its information from search outcomes and utilizing it with out a license, The New York Occasions experiences. The brand new lawsuit follows authorized motion towards AI startup Anthropic, who allegedly used Reddit content material to coach its Claude chatbot.
As of 2023, Reddit prices firms trying entry to posts and different content material within the hopes of making a living on information that may very well be used for AI coaching. The corporate has additionally signed licensing offers with firms like Google and OpenAI, and even constructed an AI reply machine of its personal to leverage the information in customers’ posts. Scraping search outcomes for Reddit content material avoids these funds, which is why the corporate is looking for monetary damages and a everlasting injunction that stops firms from promoting beforehand scraped Reddit materials.
A few of the firms Reddit is concentrated on, like SerApi, OxyLabs and AWMProxy, usually are not precisely family names, however they’ve all made amassing information from search outcomes and promoting it a key a part of their enterprise. Perplexity’s inclusion within the lawsuit could be extra apparent. The AI firm wants information to coach its fashions, and has already been caught seemingly copying and regurgitating materials it hasn’t paid to license. That additionally consists of reportedly ignoring the robots.txt protocol, a method for web sites to speak that they do not need their materials scraped.
Per a replica of the lawsuit offered to Engadget, Reddit had already despatched a cease-and-desist to Perplexity asking it to cease scraping posts with out a license. The corporate claimed it did not use Reddit information, however it additionally continued to quote the platform in solutions from its chatbot. Reddit says it was in a position to show Perplexity was utilizing scraped Reddit content material by making a “take a look at submit” that “might solely be crawled by Google’s search engine and was not in any other case accessible anyplace on the web.” Inside a number of hours, queries made to Perplexity’s reply engine had been in a position to reproduce the content material of the submit.
“The one method that Perplexity might have obtained that Reddit content material after which used it in its ‘reply engine’ is that if it and/or its co-defendants scraped Google [search results] for that Reddit content material and Perplexity then shortly included that information into its reply engine,” the lawsuit claims.
When requested to remark, Perplexity offered the next assertion:
Perplexity has not but acquired the lawsuit, however we’ll at all times battle vigorously for customers’ rights to freely and pretty entry public information. Our method stays principled and accountable as we offer factual solutions with correct AI, and we is not going to tolerate threats towards openness and the general public curiosity.
This new lawsuit suits with the aggressive stance Reddit has taken in direction of defending its information, together with rate-limiting unknown bots and internet crawlers in 2024, and even limiting what entry the Web Archive’s Wayback Machine has to its web site in August 2025. The corporate has additionally sought to outline new phrases round how web sites are crawled by adopting the Actually Easy Licensing normal, which provides licensing phrases to robots.txt.
