OpenAI GPT-5.2 Achieves Human Expert Performance

OpenAI has launched GPT-5.2, less than a month after the release of its predecessor, GPT-5.1. This rapid update follows an internal "Code Red" alert issued by CEO Sam Altman, which reportedly focused development efforts on enhancing ChatGPT.

Performance Benchmarks and Expert-Level Capabilities

GPT-5.2 demonstrates significant advancements across several evaluation metrics. In OpenAI's GDPval test, designed to assess AI performance in 44 real-world professional tasks, GPT-5.2 Thinking achieved a 70.9% win or draw rate, while GPT-5.2 Pro reached 74.1%. This indicates that the model performs better than or at least as well as human experts in over 70% of these knowledge-work scenarios, which include tasks like creating presentations, spreadsheets, and reports. For comparison, GPT-5 Thinking scored 38.8%, Google's Gemini 3 Pro achieved 53.3%, and Anthropic's Claude Opus 4.5 scored 59.6%. OpenAI stated that GPT-5.2 is its "first model to reach human expert level."

Abstract Reasoning and Intelligence Gains

The ARC-AGI-2 test, which evaluates abstract reasoning and is considered an "AI Turing Test," showed substantial improvement. GPT-5.2 Thinking scored 52.9%, and GPT-5.2 Pro reached 54.2%. This marks a threefold increase from GPT-5.1 Thinking's previous score of 17.6%. Gemini 3 Pro, upon its release three weeks prior, scored 31.1%.

Multimodal and Technical Proficiency

GPT-5.2 also exhibits enhanced capabilities in programming, mathematics, and multimodality. On SWE Bench Pro, an advanced programming benchmark covering four languages, GPT-5.2 Thinking achieved 55.6% accuracy, and 80% on SWE bench Verified. In mathematics, GPT-5.2 Thinking scored a perfect 100% on the AIME 2025 (American Invitational Mathematics Examination) without using external tools, marking the first AI model to achieve this.

For multimodal tasks, OpenAI reported approximately halved error rates. CharXiv Reasoning (scientific chart reasoning) accuracy reached 88.7%, and ScreenSpot Pro (software interface understanding) achieved 86.3%. The model also reduced hallucinations by 30% compared to its predecessor. OpenAI acknowledged that "Like all models, GPT-5.2 is not perfect. For anything important, please double-check its answers."

Version Availability and Pricing

GPT-5.2 is available in three versions: Instant, Thinking, and Pro. The Instant version is designed for daily tasks like Q&A, writing, and translation, maintaining the conversational style of GPT-5.1 with improved clarity. The Thinking version handles complex tasks such as programming, document analysis, mathematical reasoning, and planning. The Pro version, described as the most intelligent but slowest, is intended for scenarios where answer quality is prioritized over speed.

ChatGPT paid users (Plus, Pro, Business, Enterprise) began receiving access today, with free and ChatGPT Go users gaining access tomorrow. GPT-5.1 will remain available as a Legacy Model for three months before retirement. The GPT-5.2 API is also available, priced at $1.75 per million input tokens and $14 per million output tokens. While this represents a 40% increase over GPT-5.1, OpenAI suggests that improved token efficiency may lead to lower overall task costs.

Internal Codename and Development Context

The internal codename for GPT-5.2 was "Garlic," a detail subtly hinted at by OpenAI's official ChatGPT account through images of Sam Altman cooking. Fidji Simo, OpenAI's Application CEO, clarified that GPT-5.2's development predates the recent "Code Red" alert, having been in progress for several months. However, she noted that the Code Red initiative did help focus resources on ChatGPT. Sam Altman is expected to lift the Code Red alert in January of next year.