AI Chatbots Involved in Scraping News Reporting and Copyrighted Content, Confirms News Media Alliance

AI Chatbots Involved in Scraping News Reporting and Copyrighted Content, Confirms News Media Alliance

News Media Alliance criticizes AI firms for unlawfully extracting news content to train chatbots, raising concerns over copyright infringement and unauthorized use of news reporting

Editors Note: This article was originally featured in the "Reliable Sources" newsletter. Subscribe to our daily digest to stay updated on the changing media landscape.

A prominent trade group in the news media industry is rebuking A.I. technology firms for using news content for the purpose of training their chatbots.

The News Media Alliance, an organization representing almost 2,000 outlets in the United States, published a study on Tuesday revealing that companies like OpenAI and Google, who develop generative artificial intelligence systems, have been utilizing news, magazine, and digital media content to train their bots. This research is significant as it highlights that A.I. companies have trained their bots to prioritize information from reputable publishers over content from other sources on the internet. Danielle Coffey, the CEO of the News Media Alliance, expressed concerns about this unauthorized copying of their members' content, stating that it is pervasive and more extensive than copying from other sources.

"This demonstrates their acknowledgement of our unique value. However, a majority of these developers do not acquire proper permissions through licensing agreements or provide compensation to publishers for using our content," Coffey stated. "This decrease in the quality of meticulously crafted content not only negatively impacts publishers but also undermines the sustainability of AI models themselves and the availability of trustworthy and reliable information."

In their recently published white paper, the trade group also dismissed arguments suggesting that A.I. bots simply "learn" facts by analyzing various data sets, similar to how humans do. The group argued that drawing such a conclusion would be inaccurate, as the models retain expressions of facts found in their copied training materials (which copyright protects) without truly absorbing any underlying concepts.

Publishers, who have been at odds with A.I. companies, have recently begun implementing defensive measures to safeguard their content. A Reliable Sources review in August discovered that twelve prominent media companies have incorporated code into their websites to protect their content from A.I. bots that scrape information from the internet. Furthermore, numerous other companies have followed suit.

However, these defensive measures only serve to protect news organizations against future instances of scraping. They do not address the issue of previous scraping, which the News Media Alliance and others claim has been utilized to train A.I. bots.

In order to address this problem, the News Media Alliance has provided recommendations for news publishers to safeguard themselves from becoming obsolete in this rapidly evolving world. These recommendations emphasize the importance of policymakers acknowledging that the unauthorized use of copyrighted material for training artificial intelligence bots is a violation. Additionally, it is suggested that publishers should have the ability to efficiently license the use of their content under fair terms.

The News Media Alliance articulated that a solution is crucial to uphold our culture, economy, and democracy. It is imperative for the news and media industry to flourish and thrive, as well as have a stake in the profits and development of the GAI revolution, which is being built upon their hard work and contributions.