Level 40 Human Benchmark

News

OpenAI’s GPT-4 exhibits “human-level performance” on professional benchmarks

a large multimodal model that can accept text and image inputs while returning text output that "exhibits human-level performance on various professional and academic benchmarks," according to OpenAI.

Gizmodo5mon

OpenAI Claims Its New Model Reached Human Level on a Test for ‘General Intelligence.’ What Does That Mean?

model has just achieved human-level results on a test designed to measure “general intelligence”. On December 20, OpenAI’s o3 system scored 85% on the ARC-AGI benchmark, well above the ...

Nature4mon

How should we test AI for human-level intelligence? OpenAI’s o3 electrifies quest

How close is AI to human-level intelligence ... Understanding and Reasoning Benchmark for Expert AGI (MMMU), which asks chatbots to do university-level, visual-based tasks such as interpreting ...

Geeky Gadgets6mon

New MIT Research Proves AGI Was Achieved

allowing models to surpass human-level reasoning on the ARC benchmark. The ability to adapt on the fly is a crucial component of general intelligence, bringing AI closer to human-like cognitive ...

Observer8mon

Why OpenAI’s ‘Strawberry’ Reasoning Model Is a Big Deal

We found that o1 surpassed the performance of those human experts, becoming the first model to do so on this benchmark,” said OpenAI in a recent blog post. GPQA (Graduate-Level Google-Proof Q&A ...

The Verge1y

Waymo has 7.1 million driverless miles — how does its driving compare to humans?

The Google spinoff’s robotaxis led to a reduction in injury-related and police-reported crashes when compared to human benchmarks ... not to report certain low-level crashes, like minor fender ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results