OpenAI has another ace up its sleeve: the new “Deep Research” agent is here, and it has scored twice as many points as the “Final Exam for Humans” did.

Half an hour ago, OpenAI held a special event in Tokyo to officially release its latest research result, Deep Research, which aims to use multi-step internet research capabilities to revolutionize knowledge work and take a crucial step towards realizing the vision of general artificial intelligence (AGI).

According to Sam Altman,

This is the next agent launched by OpenAI. Deep Research is like a superpower; an expert on demand! It can use the internet to perform complex research and reasoning and provide you with a report. It’s really great and can complete tasks that would take hours/days and hundreds of dollars

At the launch event, OpenAI Research Director Mark Chen appeared with team members Isa and Josh and Neo from the product team to introduce this new technology to a global audience. Mark Chen opened by emphasizing the importance that OpenAI attaches to Deep Research, believing that it will profoundly transform knowledge work, help optimize business processes, improve employee efficiency, and ultimately benefit consumers

Openai O series model disadvantages

Looking back, OpenAI launched the “O series” inference models last year, such as O1. The biggest difference between these models and traditional models is that they will think for a long time before giving an answer, and the longer the thinking time, the higher the quality of the answer tends to be. However, these models have a significant limitation: they lack the ability to access tools, especially the ability to browse the Internet. This prevents the models from accessing a large number of information resources in our daily lives, which greatly limits their scope of application.

To compensate for this shortcoming, OpenAI has launched Deep Research. As the name suggests, Deep Research is a model that can conduct multi-step internet research. It can autonomously discover content, integrate content, and reason based on content, and dynamically adjust the research plan as information continues to emerge

Breaking the latency limit: pursuing deeper thinking and more autonomous task execution

One important feature of Deep Research is that it removes the latency limit. Unlike traditional models that strive for fast response, the Deep Research model may take 5 minutes or even 30 minutes to return a response. OpenAI believes that this is not a disadvantage, but rather a sign of the model’s maturity. They emphasize:

It is crucial for the model to autonomously perform longer tasks without supervision, which is a core step in achieving the AGI roadmap

OpenAI’s ultimate goal is to create models that can independently discover and create new knowledge. Deep Research is a solid step towards this goal, as it can synthesize and understand information from the web and generate comprehensive, expert-level research reports

Wide range of applications: empowering knowledge work and daily life

Deep Research has a wide range of applications and is not limited to the field of knowledge work. Mark Chen pointed out that many tasks that require extensive web browsing can be completed with the help of Deep Research. For example, users can use it to accurately search for specific products and filter them based on personal preferences. Mark also personally uses Deep Research to efficiently create PowerPoint presentations

Deep Research will be launched in the Pro version later today, and will gradually be rolled out to the Plus, Team, and Education and Enterprise versions

Live demo: the power of Deep Research

To demonstrate the power of Deep Research, Neil, Product Manager at OpenAI, gave a live demo. Using the example of “Should a new language translation app be developed?”, he sent a complex market research request to Deep Research, asking the model to analyze the adoption rates of iOS and Android, the willingness to learn foreign languages, and changes in mobile penetration rates, and finally generate a formatted report with tables and clear recommendations

Neil pointed out that such a complex query could take hours to complete manually, but Deep Research could be launched quickly. During the demonstration, Deep Research first asked clarifying questions, such as the specific indicators of mobile penetration and the level of interest of users in foreign language learning, which demonstrated the rigorous thinking of the model as a professional analyst

Then Deep Research enters an autonomous research process, and the sidebar displays the model’s reasoning process in real time, including steps such as identifying the target country, collecting information, and performing searches. The demonstration clearly shows how Deep Research simulates the human research process, performing searches, opening web pages, analyzing content, and using the acquired information to guide the next search

Another presenter, Josh, demonstrated Deep Research’s application in shopping decisions. He simulated the scenario of buying skis in Tokyo, and asked Deep Research to recommend skis suitable for advanced skiers, who prefer powder snow, need long skis and have a preference for colorful appearances, and generate a report. Deep Research also quickly carried out the research as requested and output a recommendation report with a detailed comparison table

Deep Research technical analysis: Reinforcement learning-driven deep inference

OpenAI researcher Isa provides an in-depth analysis of the technical details of Deep Research. She reveals that Deep Research is driven by a fine-tuned version of the “student model” (presumably a more powerful inference model to be released soon) and trained through end-to-end reinforcement learning on complex browsing and inference tasks

Through this training, the model learned to plan and execute multi-step trajectories, respond to information in real time, and backtrack if necessary. The final model was able not only to browse the web, but also to process user-uploaded files, use Python tools to perform calculations and generate charts, and embed charts and web images in the final report. More importantly, Deep Research can be cited with precision down to sentences and paragraphs, ensuring the reliability of the report

Isa particularly emphasized Deep Research’s excellent results in multiple benchmark tests, such as the new high scores in the “Humanity’s Last Exam” and Gaya benchmark tests. Internal expert evaluations also show that Deep Research can complete complex tasks that experts need hours to complete, and the performance improvement is positively correlated with the model’s thinking time

It is worth mentioning that Deep Research also performed well in illusion assessment and is the best performing model released by OpenAI. However, Isa reminded users that they still need to check the source of the report to ensure the accuracy of the information

Future prospects: connecting enterprise data and moving towards AGI

Mark Chen concluded by emphasizing once again that the release of Deep Research is only the first step for OpenAI in the field of deep research.

In the future, they will explore the connection of Deep Research with custom contexts and enterprise data storage to enable it to serve enterprise-level applications in a more in-depth manner

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *