Deep Research: More than just a tool, the beginning of a new era (5+ practical examples)

Openai is preparing for new activities again. What will be updated next? Stay tuned.

Deep Research: More than just a tool, the beginning of a new era.

Its o3-based agent system is not just summarizing, but thinking, discovering connections, and independently drawing conclusions. It is really working and creating value.

Just look at the data from the GPQA test: this is a problem that even Google can’t solve. PhD experts score 81% in their own field, but only 31% across fields. And o3? Over 90% overall. And the curve is still rising steeply.

These models have now reached a level of intelligence that surpasses human intelligence.

In this article, Professor Ethan Mollick of the Wharton School of Business offers his unique insights into Deep Research.

Deep Research – the end of searching, the beginning of research. The parallel lines of agents and reasoners finally meet here.

OpenAI’s Deep Research is an AI system that can conduct research with the depth and nuance of a human expert, but at machine speed. It shows us the future: AI that can conduct deep research like a human expert, but much faster.

Reasoners: the revolution in AI thinking

In the past, chatbots were simple: they took input and produced output, responding word for word (or token for token, to be more precise). To improve their reasoning abilities, researchers came up with the idea of prompting in “steps before the answer”. This method, known as thinking-aloud prompting, significantly improved the performance of AI.

Now, reasoning models are different. They generate “thinking tokens” before giving an answer. This breakthrough is important in at least two ways:

First, because AI companies can now teach AI to reason by example from good problem solvers, AI can think more efficiently.

The trained reasoning chains are more effective than human prompting, especially for difficult problems such as mathematics and logic. And these are the areas that old chatbots could not solve.

Second, the longer the reasoning model thinks, the better the answer (although the rate of improvement slows as the thinking time increases).

This means that improving an AI no longer depends solely on making the model bigger, but rather on giving it more time to think. This is because the only way to improve AI performance before now was to train ever larger models, which is expensive and requires a lot of data.

Inference models show that AI performance can be significantly improved simply by having the AI generate more thought tokens when answering questions (i.e. in so-called “inference time computation”), without relying on large-scale computations during the training phase.

The Google defense question answering test (GPQA) at postgraduate level is a series of multiple choice questions. Even with internet access, PhD students can only get 34% correct in non-specialized fields, but 81% in their own field. This test shows how inference models can accelerate the improvement of AI capabilities.

AI agent: AI that acts autonomously

Put simply, give it a goal and it will achieve it on its own. AI labs are now competing to develop general-purpose agents, systems that can handle any task.

Take a look at the OpenAI Operator experiment: for example, imagine a process in which an agent reads a report and generates a chart based on the statistics in it.

At first, everything goes well: it locates the report, parses the data, and logs in to the charting platform. But then it hits a snag: the platform has restrictions on the format and amount of data, and the task is thwarted. It tries to convert the format, split the data, and find an alternative interface, but to no avail.

This process not only demonstrates the potential of general-purpose agents, but also exposes the limitations of existing technology.

Deep Research: agent + inference model

But don’t write off AI agents just yet. Domain-specific agents have already demonstrated impressive capabilities. For example, OpenAI’s Deep Research has shown us the power of specialized AI agents

Narrow-domain agents that focus on specific tasks and have economic value are already beginning to bear fruit. These specialized systems rely on existing large language model technology and can achieve outstanding results in their respective fields. A clear example is OpenAI’s new system Deep Research, which vividly demonstrates the powerful capabilities of focused AI agents.

Professor Ethan Mollick says he used Deep Research to write an article on deep research and got a surprising result: 13 pages and nearly 4,000 words of professional analysis.

The quality of the citations is impressive. It’s not just random Internet articles, but high-quality academic papers, and you can even directly locate key citations. Although it is still restricted by the paywall, it has already demonstrated research capabilities similar to those of a human scholar.

Of course, if Deep Research unlocks those high-quality materials with restricted access, it is likely to produce better results:

Compared with Google’s product of the same name, the difference is obvious. Google cites more sources, but the quality is mixed. It seems to be doing data summarization rather than in-depth research. Using the old version of Gemini 1.5, the output is more like an excellent undergraduate student’s assignment.

But don’t lose sight of the main point: both systems completed the work that would normally take hours in just a few minutes. OpenAI says they can handle 15% of high-value research projects and 9% of top-level projects. From my tests, these figures are not an exaggeration.

The CEO of the well-known LLM framework llama_index introduced that proxy report generation will become a core demand for enterprises. OpenAI’s Deep Research has already proved this.

But to truly take root in enterprises, three key capabilities are still needed:

  1. A professional template system: support for different scenarios such as questionnaires and financial reports, direct output in formats such as PDF and PPT, and adaptability to different business needs.
  2. Offline data processing: building a complete knowledge base index, enabling an “infinite context window”, and supporting technical integration such as RAG.
  3. A human-machine collaboration mechanism: domain-specific editing and verification, customisation for scenarios such as legal and engineering, and deep integration of multiple tools.

The AI jigsaw puzzle is coming together.

Inference models provide thinking capabilities, and agent systems are responsible for action. Narrow-domain agents like Deep Research can now do the work of some high-level expert teams.

But experts are not going to disappear; their role is changing: from direct workers to commanders and verifiers of AI systems.

Major laboratories are betting on the future: better models will break through the bottleneck of general-purpose agents. Let AI browse the web autonomously, process all kinds of information, and take action in the real world.

Operator shows that we have not yet reached that stage, but Deep Research suggests that we may be on the right path.

This is not the end, but a new beginning. AI is moving from a tool to a partner.

Next are practical examples of Deep Research. Note the shared link at the end of each example, which you can open directly to view the results of Deep Research.

Practical examples of Deep Research

A research report on the evolution of TTRPGs (Tabletop Role-Playing Games).

Report length: 30 pages, 10,600 words.

Click here

Research consensus on how microplastics affect the human body

Click here

Experts on any topic, 24/7 assistants

Expert business and technical analysis of DeepSeek’s entire research and development history

Click here

Research report on “The evolution of dramatic storytelling since 2010

Click here

Investigate the molecular basis of lung cancer, risk factors and emerging treatments such as immunotherapy and gene therapy.

Analyze renewable energy storage: battery technology, alternatives, challenges and future solutions.

Click here

Deep Research + o1-pro solves graph problems that R1, Claude3.5, and o1-pro cannot solve individually

OpenAI employee – “The last few weeks using Deep Research have been my personal AGI moment. It now takes 10 minutes to generate accurate and comprehensive competitive and market research (including sources), whereas before I needed at least 3 hours.”

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *