Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
A study has found that the AI models behind widely used platforms like ChatGPT produce more original research ideas than human experts.
In the Standford University study, titled “Can LLMs Generate Novel Research Ideas?” by Chenglei Si, Diyi Yang, and Tatsunori Hashimoto of Standford University, the researchers investigated whether large language models (LLMs) can autonomously generate novel research ideas that are on par with those created by expert human researchers.
LLMs are a type of generative AI foundation model, most widely known for being the model used by OpenAI and Chat GPT.
The research found that LLMs can generate ideas that were ranked higher for novelty, excitement, and effectiveness, while human experts developed more feasible ideas. The AI models produced better ideas overall.
The study’s researcher Chenglei Si told Newsweek that the data means “LLMs could take on a bigger role in these challenging and creative tasks than many people thought.”
He said that while “we don’t have any concrete results” showing the “feasibility and effectiveness of fully end-to-end autonomous research agents,” we are “moving towards that future and it will push for some major shifts in the way that scientific discovery is done when that day comes.”
The study included three controls, the first where 49 human experts produced ideas, the second where ideas were generated by an AI agent, and then the third was a combination, where an AI agent produced ideas, but a human expert re-ranked the ideas.
79 human experts were then recruited to blindly review and rate the ideas of the human experts and the LLMs to determine which were the best in each category.
The ideas were generated in relation to seven different topics: bias, coding, safety, multilingual, factuality, math, and uncertainty.
The research found that LLMs can “provide insights that could inform future methods for improving idea generation systems,” and produce “far more” ideas than “any human could,” and have the ability to filter ideas to “extract the best ones from the large pool,” the report stated.
However, there are limitations to the AI agents, as the more ideas generated by LLMs, the more were duplicated, showing they lacked a degree of diversity in idea generation.
The study also found that LLMs could not yet reliably evaluate ideas, which the researchers added also raises “concerns about trusting conclusions that are based primarily on LLM evaluators,” as written in the report.
The researchers also reflected on the impact the findings could have on human experts and determined that introducing AI into research idea generation may result in “unforeseen consequences.”
They warned that “overreliance on AI could lead to a decline in original human thought” and “might reduce opportunities for human collaboration, which is essential for refining and expanding ideas.”
The study took a year to produce and was posted on X by Chenglei Si, with the comment, “We obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.”
Do you have a story we should be covering? Do you have any questions about this article? Contact [email protected].