OpenAI Introduces GDPval Benchmark to Compare AI with Human Professionals

OpenAI has launched a new benchmark, GDPval, designed to measure how well AI models perform against human experts across major industries. The test is part of OpenAI’s long-term mission to track progress toward artificial general intelligence (AGI), which would enable AI to handle economically valuable work at a human or higher level.

According to the company, its GPT-5 model and Anthropic’s Claude Opus 4.1 are already producing work close to the quality of seasoned professionals. However, OpenAI stresses this doesn’t mean AI is ready to replace people in their jobs. GDPval currently evaluates only a narrow set of tasks, primarily written reports, rather than the full scope of professional responsibilities.

How GDPval Works

The benchmark focuses on nine industries that make up a large share of the U.S. economy — including healthcare, finance, manufacturing, and government. It tests AI performance in 44 job roles, from software engineers to nurses to journalists.

In the first version, GDPval-v0, professionals were asked to review reports generated by both AI and humans, then choose which was better. For example, investment bankers compared AI-generated competitor analyses to those written by other bankers. AI models were then scored on their “win rate” against human work.

GPT-5-high (a stronger version of GPT-5) matched or beat industry experts in 40.6% of cases.
Claude Opus 4.1 achieved 49%, though OpenAI noted its strong visual presentation may have boosted results more than actual substance.
By contrast, GPT-4o scored only 13.7% about 15 months earlier, showing rapid progress.

What It Means

OpenAI’s chief economist Dr. Aaron Chatterji said the results suggest professionals can use AI to offload routine work and focus on higher-value tasks. Tejal Patwardhan, who leads OpenAI’s evaluation team, added that the speed of improvement indicates further gains are likely.

Still, GDPval has limits. Since it mainly tests report-writing, it doesn’t capture the broader skills required in most jobs. OpenAI says it plans to expand future versions to cover more industries and interactive workflows.

Why It Matters

Traditional AI benchmarks like AIME 2025 (math problem-solving) and GPQA Diamond (PhD-level science questions) are nearing saturation, meaning models are already close to maxing them out. GDPval offers a new way to evaluate whether AI is becoming truly useful in real-world economic contexts.

For now, OpenAI sees GDPval as an early but meaningful step in proving AI’s value across industries — though much more comprehensive testing will be needed before declaring that AI consistently outperforms humans.

Share this post

4 thoughts on “OpenAI Introduces GDPval Benchmark to Compare AI with Human Professionals”

xn88 google play Một ưu điểm hoàn hảo tại nổ hũ Fun 88 đó chính là giao diện vô cùng bắt mắt, đẳng cấp cùng luật chơi rõ ràng. Một số sản phẩm siêu hút chân cộng đồng cược thủ có thể kể đến như ăn khế trả vàng, kho kháu tứ linh, ngọn lửa chibi, long quy chi bảo,…

Về chứng nhận hợp pháp, tải 888slot là một trong số ít những địa chỉ cá cược có giấy phép hoạt động từ BMM Compliance, Ủy ban giám sát cờ bạc trực tuyến. Bên cạnh đó, nhà cái còn được các Tổ chức giám sát đầu ngành khác trực tiếp quản lý, ví dụ như GLI, BMM,…

188v com là một nhà cái không thể không nhắc đến khi nói về lĩnh vực cá cược trực tuyến. Với sự phát triển mạnh mẽ trong nhiều năm qua, nhà cái đã khẳng định vị thế của mình trên thị trường cá cược. Hàng triệu người chơi đã tin tưởng và lựa chọn nhờ uy tín, dịch vụ chất lượng, cùng với hệ thống trò đa dạng các chính sách khuyến mãi hấp dẫn. Trong bài viết này sẽ giúp bạn hiểu thêm về nhà cái này nhé.

66b Khi người chơi mới đăng ký và nạp tiền lần đầu, họ sẽ nhận ngay 10% số tiền nạp, tối đa lên đến 18.000.000 VNĐ. Để đảm bảo tính minh bạch, người chơi cần hoàn thành ít nhất 20 vòng cược trước khi rút tiền.

Category Collection