GPT-5 is speeding up scientific research, but still can't be trusted to work alone, OpenAI warns
Publish Time: 20 Nov, 2025
gettyimages-2207324622
mim.girl/iStock/Getty Images Plus via Getty Images

Follow : Add us as a preferred source  on Google.


Key takeaways

  • GPT-5 supports researchers across disciplines, a study found. 
  • The model doesn't rival human researchers, however. 
  • The findings don't indicate AGI is coming soon. 

OpenAI's recently released model, GPT-5 is showing promise in advancing scientific discovery. While user reactions to the new model in ChatGPT were less than stellar, it appears to be making more headway as a research assistant. 

In a new paper published Thursday, OpenAI detailed the ways GPT-5 "accelerated" research across a variety of case studies -- albeit with some limitations. 

"Across these early studies, GPT-5 appears able to shorten parts of the research workflow when used by experts," the paper said. "It does not run projects or solve scientific problems autonomously, but it can expand the surface area of exploration and help researchers move faster toward correct results." 

Also: OpenAI tested GPT-5, Claude, and Gemini on real-world tasks - the results were surprising

CEO Sam Altman and Chief Scientist Jakub Pachocki reiterated the company's science-forward goals during a livestream last month, in which they also discussed ambitious timelines for developing artificial general intelligence (AGI), which would theoretically be comparable to human ability. 

(Disclosure: Ziff Davis, 's parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)

It's the first report from OpenAI for Science, a team of internal researchers and recently-hired external academics that the company announced in September. The paper was also supported by researchers from several labs and universities, including Vanderbilt, UC Berkeley, Columbia, Cambridge, Oxford, The Jackson Laboratory, and others. According to a blog accompanying the paper, OpenAI for Science aims to help researchers save time by using frontier models to test hypotheses and reveal insights from vast datasets.

The results are early, but frontier models are evolving rapidly -- for now, researchers appear optimistic that AI will help us unlock novel, if incremental, discoveries. 

The findings 

The paper highlighted several case studies in which GPT-5 helped with or advanced scientific endeavors in biology, math, and algorithmic decision-making. The model's contributions ranged from creating smaller-scale efficiencies -- like improving a proof for a mathematical theorem -- to larger breakthroughs.

Also: AI models know when they're being tested - and change their behavior, research shows

In one example of the latter, Jackson Laboratory scientists had spent months reading and experimenting in an immunology trial to eventually explain a change in immune cells. They gave GPT-5 unpublished data from the trial -- so as to ensure the model hadn't already been trained on it -- to see if it could come up with a similar conclusion. 

"GPT-5 identified the likely cause within minutes from an unpublished chart and suggested an experiment that proved it," OpenAI wrote. The implication is that medical researchers can involve frontier models earlier on in their experiments to improve treatments and understand diseases in minutes, not months. 

In another case study, GPT-5 helped a separate Jackson Laboratory team conduct a deep literature search that revealed connections between the team's newly-proven geometry theorem and other areas of math. GPT-5 efficiently flagged other areas the team could apply its findings to and surfaced reference material it hadn't encountered, including some in other languages. The model saved the researchers the task of manually reviewing literature for connections and broadened their knowledge base in the process. 

Also: Google's Antigravity puts coding productivity before AI hype - and the result is astonishing

"These collaborations help us understand where the models are useful, where they fail, and how to integrate them into the scientific process -- from literature review and proof generation to modeling, simulation, and experimental design," the company wrote. 

New discoveries 

Many of the paper's examples demonstrated that GPT-5 can rapidly reach existing scientific conclusions -- what OpenAI referred to in one case study as "independent rediscovery of known results." However, the paper also mentioned "four new results in mathematics (carefully verified by the human authors), underscoring that GPT-5 can solve problems that people have not yet solved." 

In one example, Columbia researcher Mehtaab Sawhney and OpenAI researcher Mark Sellke explored an existing number-theory problem from Hungarian mathematician Paul Erdős known as #848. It's marked "open," or unresolved, on a public site where users can contribute solutions -- not because humans haven't made headway solving it, but because those proposed solutions are scattered around in notes and textbooks, and not centralized or necessarily agreed upon.

While GPT-5 didn't come up with an entire answer for #848 out of thin air, which really would have rivaled human ability, it was able to identify the final proof's missing step. 

"Human comments on the site had already outlined much of the structure; GPT-5 proposed a key density estimate, and Sawhney and Sellke corrected and tightened it into a complete proof that closed the problem," OpenAI wrote.

In another study, GPT-5 came up with two proofs -- one previously proven, one new -- for a graph theory problem, "relying on a different and more elegant argument than the original human proof." As with other examples, the researchers were able to verify and adopt GPT-5's suggestion. 

Given how quickly frontier models have evolved in the last three years, the researchers believe "these contributions are modest in scope but profound in implication."

AI and the future of science

Despite these strides, GPT-5 wasn't foolproof. OpenAI recommended it only be used with continued oversight from experts.

"GPT-5 can sometimes hallucinate citations, mechanisms, or proofs that appear plausible; it can be sensitive to scaffolding and warm-up problems; it sometimes misses domain-specific subtleties; and it can follow unproductive lines of reasoning if not corrected," OpenAI noted. 

For those reasons and others, the paper doesn't suggest AI tools replace current scientific research methods just yet. Advocating for a partnered approach, OpenAI said that while the core tools of science, including simulators and algebra systems, are crucial to maintaining precision and efficiency, the reasoning abilities advanced models provide are a valuable step forward.

"Where specialized tools exist, we want to use them; where general reasoning is required, we build models designed to handle it," the company wrote. "Both paths reinforce each other." 

The paper emphasized that scientists should remain in charge by defining questions, critiquing concepts, and checking results -- GPT-5, in this case, provides speed and reach to scale that expertise. Like basic forms of prompt engineering, OpenAI noted that scientists must learn to communicate with GPT-5 for the best results, and that ultimately, "productive work often looks like dialogue" between humans and the model -- a common theme across many AI tools and assistants pitched as copilots or drafting companions, though those are often built for simpler consumer tasks.

Also: 10 ChatGPT prompt tricks I use - to get the best results, faster

The paper suggested that GPT-5 is at most approaching the level of a research partner, with some limitations. In another use case, combinatorialist Tim Gowers gave the model several tough questions he was working on and asked it for feedback, critique, and counterexamples. GPT-5 found flaws and offered simpler arguments in some instances, but stalled out or didn't make any progress in others.

"Gowers' overall conclusion was that the model is already useful as a very fast, very knowledgeable critic that can stress-test ideas and save time, even though it does not yet meet his bar for full co-authorship," OpenAI concluded. 

AGI isn't here - yet 

Ultimately, the OpenAI for Science paper exemplifies GPT-5's strengths in refining and assisting -- filling in gaps rather than going toe-to-toe with human minds. While OpenAI acknowledged that models have surpassed just summarizing existing information, that doesn't mean the company is prepared to say GPT-5 is an indicator of AGI.

"We don't view these results as signs that we are close to AGI or a fully capable 'research intern,'" the company told in a statement, referring to Altman's comment in last month's live stream that OpenAI will release a model with intern-equivalent research capabilities by September 2026. "Benchmarks across the field are saturating, so we are putting more of an emphasis on testing a model's capabilities, including how the models work in scientific workflows. That gives us a clearer picture of actual capability and limitations."

I’d like Alerts: