AI Fanfiction Outperforms Human Imitations in Reader Preference Study

Recent research indicates that readers show a preference for fanfiction created by artificial intelligence over that written by humans, particularly when the AI models are refined to better replicate the style of renowned authors. This revelation has prompted scholars to suggest that ongoing legal debates surrounding the training of AI on copyrighted material may need to be reassessed.

The study, titled “Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers,” was conducted by Tuhin Chakrabarty, assistant professor of computer science at Stony Brook University, along with Jane C. Ginsburg, a professor of law at Columbia University, and Paramveer Dhillon, an associate professor at the University of Michigan”s School of Information. The paper discusses the implications of their findings in light of numerous lawsuits alleging that AI developers have unlawfully utilized authors” works for training purposes.

Among these legal challenges is the case of Bartz v. Anthropic, which is projected to conclude with a settlement amounting to $1.5 billion, following claims that Anthropic employed copyrighted materials without authorization. Another notable case, Kadrey v. Meta, saw Meta successfully defend itself on technical grounds, even as the court acknowledged that using copyright-protected works to train generative AI models without consent is likely illegal.

In the U.S. alone, copyright holders have initiated over 50 lawsuits against AI companies, with claims spanning various media, including video and audio. Legal experts have posited that while training AI on copyrighted texts might qualify as fair use, liability could arise if the AI generates content that closely resembles the original works.

The authors of the recent study aimed to examine whether AI could produce literary texts of high quality that emulate specific authors” styles. “Previous studies indicated that AI struggles to generate sophisticated literary fiction or creative nonfiction compared to seasoned writers,” they stated. To investigate this, they enlisted 28 participants from prestigious Master of Fine Arts (MFA) programs to craft 450-word passages inspired by 50 acclaimed authors.

They compared 150 excerpts created by human writers, mimicking the styles of literary figures such as Alice Munro and Cormac McCarthy, against 150 AI-generated texts. Initially, expert writers and 131 lay readers favored the human-produced works. However, this preference shifted dramatically after the AI models underwent fine-tuning to enhance their stylistic accuracy. The researchers noted that this adjustment countered earlier findings suggesting that AI could not generate what is traditionally deemed great literature.

“In blind pairwise evaluations involving 159 representative expert writers and lay readers, the AI-generated texts were initially less favored by experts in terms of stylistic fidelity,” the authors remarked. “However, once ChatGPT was fine-tuned using the complete works of individual authors, the experts began to prefer the AI-generated texts for their stylistic accuracy and quality, with lay readers showing similar trends.” This fine-tuning process seems to eliminate certain AI stylistic flaws that human readers typically criticize.

While Dhillon was unable to provide detailed comments due to publication restrictions, he indicated that the preference for AI-generated literature, especially in light of its significantly lower production costs, could lead to AI works competing with human-created content. This development suggests that the legal community must now consider the market impact of AI on human-authored works when evaluating fair use of copyrighted content.

Defendants accused of copyright infringement in the U.S. can utilize a fair use defense based on a four-factor assessment, which takes into account the purpose of the use, the nature of the copyrighted work, the quantity of the work copied, and the effect on the market value of the original work.

The authors estimate the average cost of fine-tuning an AI model and generating a 100,000-word novel at $81, representing a staggering reduction of 99.7 percent compared to the estimated $25,000 cost of hiring a professional writer for the same task. They conclude that creating fine-tuned large language models from the collected works of individual authors should not be considered fair use if the result emulates those authors” works.

In anticipation of potential dismissals from legal scholars regarding their findings—given that the AI does not create verbatim copies of published texts—the authors argue that the Copyright Office”s broad interpretation of “potential market for or value of the copied work” suggests that fair use might not apply even if direct copying is not evident in the final product.

The implications of this research are significant and may influence ongoing discussions about copyright law and AI as the industry evolves.