GPT-5.2 Released: Enhanced AI Performance & Higher Cost

GPT-5.2 has been released in three distinct versions: GPT-5.2 Instant, designed for daily conversations; GPT-5.2 Thinking, optimized for in-depth tasks including code, long documents, mathematics, and planning; and GPT-5.2 Pro, the most powerful variant for complex problems. The release follows an internal "Code Red" memo from Altman, emphasizing a commitment to ChatGPT.

The new models are being rolled out to paying users, with the API already live. The standard GPT-5.2 version is priced 40% higher than GPT-5.1.

Core Performance Metrics

GPT-5.2 demonstrated significant performance improvements across various benchmarks. The Pro version achieved a perfect score on AIME 2025 without tools and became the first model to surpass 90% on ARC-AGI-1, reaching 90.5%. On the more challenging ARC-AGI-2, GPT-5.2 Thinking scored 52.9%, a substantial increase from GPT-5.1's 17.6%.

A new benchmark from OpenAI, GDPval, assesses real-world work tasks across 44 professions. GPT-5.2 Thinking either outperformed or matched human experts in 70.9% of these tasks, while GPT-5.2 Pro achieved 74.1%. The models reportedly complete these tasks 11 times faster than human experts at less than 1% of the cost. For investment banking spreadsheet modeling, the average score improved from 59.1% to 68.4%.

Technical Capabilities

In code writing, GPT-5.2 Thinking scored 55.6% on SWE-Bench Pro, a new benchmark that tests four programming languages, compared to GPT-5.1's 50.8%. Front-end capabilities, particularly in 3D and complex UI generation from single prompts, also showed improvement.

Visual understanding capabilities were enhanced, with error rates approximately halved. On CharXiv Reasoning, a scientific paper chart Q&A benchmark, GPT-5.2 scored 88.7%, up from GPT-5.1's 80.3%. ScreenSpot-Pro, for GUI screenshot understanding, saw an increase from 64.2% to 86.3%. The new model also exhibits stronger spatial understanding, accurately labeling components in low-quality motherboard images, a task where GPT-5.1 struggled with accuracy and completeness.

For long documents, GPT-5.2 Thinking achieved nearly 100% accuracy at 256k token length on the 4-needle variant of OpenAI's MRCRv2 benchmark, which measures the ability to integrate information from multiple sources within a document. GPT-5.1 achieved only about 30% at the same length. The API also introduces a new /compact endpoint to extend the effective context window for tasks with numerous tools and long runtimes.

Tool calling capabilities, measured by Tau2-bench in multi-turn customer service scenarios, also saw gains. In the Telecom domain, GPT-5.2 Thinking scored 98.7% (up from 95.6%), and in Retail, 82.0% (up from 77.9%).

Mathematics, Science, and Safety

In mathematics and science, GPT-5.2 achieved a perfect score on AIME 2025 and 99.4% on HMMT February 2025 (100% for the Pro version). On GPQA Diamond, it scored 92.4% (93.2% for Pro).

Hallucination rates, tested on real user queries in ChatGPT, decreased by 30%, from 8.8% to 6.2% incorrect responses. Safety evaluations indicate improved performance in sensitive conversations related to suicide, self-harm, mental health, and emotional dependence. Age prediction models are being deployed to restrict sensitive content for users under 18, with ongoing efforts to refine over-rejection issues.

Availability and Pricing

GPT-5.2 is rolling out to paying ChatGPT users (Plus, Pro, Go, Business, Enterprise) starting today. GPT-5.1 will remain available for three months before retirement. The API is live with gpt-5.2 (Thinking), gpt-5.2-chat-latest (Instant), and gpt-5.2-pro (Pro) models. While the unit price is higher, the official explanation suggests increased token efficiency may lead to a lower total cost for achieving comparable results. ChatGPT subscription prices remain unchanged.