OpenAI's deep research can complete 26% of ‘Humanity’s Last Exam': What is it and what does it mean?- Dilli Dehat se

Artificial intelligence critics argue that the technology may soon outsmart humans, which may lead to a ‘Terminator ’-style situation for humanity. For some, AI is already on its way to turning this future into a reality.

OpenAI's DeepResearch may soon become more intelligent than humans. Find out how(Reuters) — OpenAI’s DeepResearch may soon become more intelligent than humans. Find out how(Reuters)

The deep research AI model, launched by ChatGPT-maker OpenAI earlier this month, has shown an over two-fold increase in performance over the next-best AI model in one of the world’s toughest exams for large language models (LLMs) – Humanity’s Last Exam.

What is the exam about?

Humanity’s Last Exam is a recently released exam for AI models, also called large language models, like ChatGPT, Grok-2 and deep research. It is used to judge the performance of the AI model against a preset list of characteristics.

According to the people behind the exam, it was created as AI models are already scoring 90% accuracy on existing tests. This means that in a way, the scale to measure their performance is falling short. Thus, a larger scale was created in the form of Humanity’s Last Exam.

The exam consists of 2,700 challenging questions, most of which are released to the public, over a hundred subjects.

Did OpenAI prove its dominance?

The Sam Altman-led company’s AI models performed with varied accuracy in the exam. The company’s model which performed with the least accuracy was 4o, which displayed an accuracy of 3.1% with a calibration error of 92.3%.

OpenAI’s o1 model scored 8.8% accuracy and 92.8% calibration error while o3-mini (medium) and o3-mini (high) mediums scored 11.1% and 14% accuracy levels with 91.5% and 92.8% calibration error rates respectively.

OpenAI’s newest model, deep research, scored a staggering 26.6% accuracy in Humanity’s Last Exam. This is over two-fold more accuracy than the next-best performer, which is OpenAI’s o3-mini (high) model.

How did other models fare?

According to the exam’s website, which was last updated on February 11, Elon Musk’s ambitious AI model Grok-2 scored a meagre 3.9% accuracy with a 90.8% calibration error. Another competitor, Anthropic’s Claude 3.5 Sonnet model scored 4.8% accuracy and 88.5% calibration error.

Google’s Gemini Thinking scored 7.2% accuracy and 90.6% calibration error. Chinese firm DeepSeek’s R1 model, which caused a global stock rout for technology firms after it was launched last month, scored higher than all other competitors but couldn’t outperform even the o3-mini (medium) model of OpenAI. It scored 8.6% accuracy with 81.4% calibration error.

What does deep research’s score mean?

The performance showcased by OpenAI’s deep research shows that the model can answer a wide range of analytical, subjective and objective questions with more accuracy than any of its competitors. It also means that the model is more capable of delivering well-rounded answers than other AI models.

This is likely because the model was released primarily to aid people in researching any topic of their choice without the hassle. According to its creators, deep research can conduct multi-step research on the internet for complex tasks in tens of minutes. The same task would otherwise take humans many hours.

Source link

Holi 2025 | Apoorva Arora: Balconies in Delhi make Holi so much better than Mumbai- Dilli Dehat se

Gurugram Weather and AQI Today: Warm start at 23.46 °C, check weather forecast for March 14, 2025- Dilli Dehat se

Delhi Weather and AQI Today: Warm start at 23.35 °C, check weather forecast for March 14, 2025 | Latest News Delhi- Dilli Dehat se

IMD predicts rain for Delhi-NCR, parts of north India on Holi | Weather updates | Latest News India- Dilli Dehat se

Gurugram RWAs to use natural colours, herbal gulaal this Holi- Dilli Dehat se

BJP sets sights on independents in Gurugram, Manesar to consolidate control- Dilli Dehat se

Massive fire guts Kingdom of Dreams in Gurugram- Dilli Dehat se

Delhi Police ASI killed in series of crashes on ORR | Latest News Delhi- Dilli Dehat se

PWD to install height barriers to safeguard Mangey bridge | Latest News Delhi- Dilli Dehat se

Video shows minor stabbed in full public view in SW Delhi | Latest News Delhi- Dilli Dehat se