October 5, 2025

Reddit blocks the Wayback machine from archiving messages

0
Reddit-Search.jpg


Reddit blocks the Internet Wayback machine archive of the indexing of most of its site, after having discovered that IA companies scratched its data from the digital time capsule.

This decision comes as Reddit tightens its grip on user data. The company does not care about AI companies that form their models on Reddit publications, but they must first pay. Reddit previously said that this would not restrict “actors in good faith” as the Internet archives, but now he thinks that some help IA companies to dodge license costs. Reddit’s sudden change in position underlines how the data license has become a major source of income in the AI era.

The Internet Archive is a non -profit organization dedicated to the creation of a large digital website of websites and other online content. So far, it has archived billions of web pages, as well as millions of books, videos and software. Its signature tool, The Wayback Machine, allows users to record snapshots of web pages and review them later to see exactly what they looked like a specific date.

Reddit says it has evidence that certain AI companies operate the Wayback machine to bypass its policies and scrape the content of users without authorization.

“The Internet Archive provides open web service, but we have been informed of cases where platform policies violate ours, including ours, and scratch data from the Wayback machine,” Reddit spokesperson for Gizmodo told a statement sent by e-mail. “Until they are able to defend their site and comply with platform policies (for example, respecting the confidentiality of users, re: deleting the deleted content), we limit part of their access to Reddit data to protect redditors.”

Reddit told The Verge that the Wayback machine could no longer crawl detailed pages, comments or profiles. Instead, it will only be allowed to index the Reddit home page. Restrictions are starting to “speed up” today, and Reddit says that he has given the Internet archives a warning in advance.

The Internet archives did not immediately respond to a request for comments from Gizmodo.

Reddit has tightened access to access to his data in recent years. Although the company is open to the granting of its data, it writes companies that have not paid. The company has already concluded several million dollars agreements with Google and Openai. In the Google agreement, Reddit has teamed up with Google for research indexing and AI training data, then began to block other search engines in the surface of the recent messages of Reddit in their search results.

In June, Reddit continued the startup Ai Anthropic, accusing it of unauthorized scratch.


https://gizmodo.com/app/uploads/2024/07/Reddit-Search.jpg

About The Author

Leave a Reply

Your email address will not be published. Required fields are marked *