About 200,000 books are currently being utilized by major technology companies to train artificial intelligence systems. However, there is a significant issue as the authors of these books were never informed about this usage.
Referred to as Books3, this system has been identified in an investigation by The Atlantic. It is built upon a compilation of pirated e-books covering a wide range of genres, including erotic fiction and prose poetry. The intention behind incorporating books into generative AI systems is to assist them in acquiring the skill of effectively conveying information.
High-quality AI necessitates high-quality text for language absorption, therefore, while some AI training text can be sourced from internet articles, books play a crucial role, as stated by The Atlantic. However, the usage of Books3, a database utilized for training AI systems, has resulted in numerous lawsuits against Meta and other companies. As a result, authors now have the opportunity to check if their books are being utilized for training these AI systems through the recently published database by The Atlantic. Unfortunately, this revelation has sparked discontent among many authors.
Mary H. K. Choi expressed her deep disappointment and frustration after discovering her work being used without permission. In a social media post, she described feeling both outraged and completely powerless. In an email, Choi elaborated on her emotions, explaining how her debut novel "Emergency Contact" was a profoundly personal story that initially faced criticism for being too niche. However, the book proved to be a resounding success, becoming a New York Times bestseller and garnering a global audience.
"A book represents a plethora of choices, countless variations, and even the author's limitations at the moment. The idea of throwing all the depth and complexity of life into a vast, automated process that churns out generic content diminishes its value," she expressed. "Not only does it harm authors financially, but it also deprives booksellers, librarians, and readers of so many meaningful connections."
Min Jin Lee, acclaimed writer of "Pachinko" and "Free Food for Millionaires," echoed these sentiments on social media, boldly labeling the utilization of her books as "a form of theft."
"I dedicated thirty years of my life to crafting my literary works," she expressed. "The large language models of Artificial Intelligence did not acquire or gather data by legitimate means. AI corporations illegitimately appropriated my efforts, time, and creative essence. They shamelessly robbed me of my narratives, pilfering a fragment of my very being."
According to The Atlantic, Nora Roberts, the vastly productive romance writer, boasts an impressive collection of 206 books featured in the Books3 database. This remarkable figure surpasses that of any other living author and stands second only to William Shakespeare. In response, she condemned the existence of this database, as well as its exploitation by technology companies, stating that it is "a profoundly unethical situation."
"We, as human beings and writers, are being exploited by individuals who want to utilize our work without permission or compensation to produce books, scripts, and essays, simply because it is inexpensive and effortless," stated Roberts in a message to CNN.
The author Nik Sharma, whose cookbook "Season" was discovered in the database, was not surprised by this exploitation of writers.
"I am appalled yet unsurprised by the fact that I have been exploited," he stated in a social media post. "Clearly, no permission was sought or any compensation provided for the utilization of my work in AI training."
Sharma further expressed in an email that the prevalence of AI was unavoidable, thus explaining his lack of astonishment. Nonetheless, he found it particularly frustrating that no one bothered to reach out regarding the usage or remuneration. According to him, education is not costless in the US as teachers are compensated and textbooks are purchased.
Sharma stated that the current state of AI is akin to the Wild West, with governmental policy still in its early stages. As a result, tech companies are capitalizing on the situation. Sharma expressed relief that only one cookbook was affected and not the rest. The Atlantic reported that Meta, which reportedly utilized the Books3 database, chose not to provide a comment in response to the request.
A spokesperson for Bloomberg stated that the company utilized various data sources, including Books3, to train their initial BloombergGPT AI model designed for the financial sector. However, the spokesperson clarified that Bloomberg will not incorporate the Books3 dataset in future commercial versions of BloombergGPT.
Not all authors feel disturbed when their work is utilized by AI. James Chappel, author of an academic book on the modern Catholic church, expressed on social media that he is not concerned about it.
"I want my book to (be) read!" he wrote. "I want it to educate!"
Chappel did not respond to requests for further comment.
AI has become a major worry for many writers due to its prevalence in large corporations. The Writers Guild of America took action by going on strike this summer, demanding restrictions on the use of AI in the creation of films and TV shows. Specifically, ChatGPT has been utilized for various tasks such as writing assignments and legal briefs.
However, writers are not the only ones expressing their concerns. Last year, visual artists found themselves in a similar predicament as the popularity of text-to-image AI systems soared. They discovered that their work was being used to train AI without their consent. These instances collectively emphasize the growing apprehension surrounding AI's expanding influence across all artistic mediums, where the creative process can often be deeply personal and intimate.
The discussion instigated by Books3 coincides with US President Joe Biden's announcement of intent to implement an executive order on AI this autumn, emphasizing the country's commitment to pioneering "the path towards responsible AI advancement." However, for authors, navigating the ongoing conflicts surrounding AI and its impact on their craft can be disheartening. Choi, for instance, found it remarkably perplexing to learn that her book had been utilized during the height of the WGA strike, a period marked by heated debates on AI.
"I felt devastated," she expressed in an email. "It seemed like any progress or momentum I had in one aspect could easily be negated in another."
Nevertheless, Choi acknowledged that amidst the multitude of books, her own holds little significance, even though it holds great importance to her.
She expressed that the most discouraging aspect of it all is that in her bleakest moments, it feels completely unavoidable. Choi's sentiment is shared by many. Roberts emphasized the need for solidarity between writers and audiences in order to address these problems.
"We, as storytellers, must unite to combat this exploitation of our skills and dedication," she declared. "We should rally in support of our own work as well as that of our fellow creators. I sincerely hope that readers and viewers will join us in addressing this critical matter."