OpenAI’s ‘deep research’ tool: is it useful for scientists?
The model produces cited, pages-long reports that might be helpful for generating literature reviews
Tech giant OpenAI has unveiled a pay-for-access tool called ‘deep research’, which synthesizes information from dozens or hundreds of websites into a cited report several pages long. The tool follows a similar one from Google released in December and acts as a personal assistant, doing the equivalent of hours of work in tens of minutes.
Many scientists who have tried it are impressed with its ability to write literature reviews or full review papers and even identify gaps in knowledge. Others are less enthusiastic. “If a human did this I would be like: this needs a lot of work,” says Kyle Kabasares, a data scientist at the Bay Area Environmental Research Institute in Moffett Field, California, in an online video review.
The firms are presenting the tools as a step towards AI ‘agents’ that can handle complex tasks. Observers say that OpenAI’s deep research tool, released on 2 February, is notable because it combines the improved reasoning skills of the o3 large language model (LLM) with the ability to search the Internet. Google says its Deep Research tool is, for now, based on Gemini 1.5 Pro, rather than on its leading reasoning model 2.0 Flash Thinking.
Review writing
Many users are impressed with both tools. Andrew White, a chemist and AI expert FutureHouse, a start-up in the San Francisco, California, says that Google’s product is “really leveraging Google’s advantages in search and compute” to get users up to speed on a topic quickly, while o3’s reasoning skills add sophistication to OpenAI’s reports.
Derya Unutmaz, an immunologist at the Jackson Laboratory in Farmington, Connecticut, who has free access to ChatGPT Pro granted by OpenAI for medical research, says the OpenAI deep research reports are “extremely impressive”, “trustworthy” and as good or better than published review papers. “I think writing reviews is becoming obsolete.”
White anticipates that AI systems like these could be used to update human-authored reviews. “Authoritative reviews cannot feasibly be updated [by humans] every 6 months.”.
But many caution that all LLM-based tools are sometimes inaccurate or misleading. OpenAI’s website admits that its tool “is still early and has limitations”: it can get citations wrong, hallucinate facts, fail to distinguish authoritative information from rumors and fail to convey its uncertainty accurately. The company expects the issues to improve with more usage and time. Google’s Deep Research has a disclaimer that simply reads “Gemini can make mistakes, so double-check it”.
Enjoying our latest content?
Login or create an account to continue
- Access the most recent journalism from Nature's award-winning team
- Explore the latest features & opinion covering groundbreaking research
or
Sign in or create an account Continue with Google Continue with ORCiDdoi: https://doi.org/10.1038/d41586-025-00377-9
This story originally appeared on: Nature - Author:Nicola Jones