OpenAI Releases GPT-5.2, Targeting Enterprise Productivity and Accuracy
OpenAI has launched its GPT-5.2 model, making it available to ChatGPT paid users and developers via API. The release follows a "Code Red" alert issued by OpenAI CEO Sam Altman last week, reportedly in response to Google's Gemini 3, which prompted the company to reallocate resources to its core ChatGPT development.
The GPT-5.2 model comes in three versions:
Instant: Optimized for speed, designed for routine tasks such as information retrieval, writing, and translation.
Thinking: Engineered for complex structured tasks, including programming, long document analysis, mathematics, and planning.
Pro: A premium version focused on high accuracy and reliability for challenging applications.
Workplace Integration and Performance Metrics
OpenAI's focus for GPT-5.2 is on practical workplace applications. Fidji Simo, OpenAI Application CEO, stated that the model was designed to generate "economic value" for users, emphasizing its capabilities in tasks like spreadsheet creation, presentation generation, coding, image interpretation, and complex project management.
Data from ChatGPT enterprise users indicates that AI tools save them between 40 and 60 minutes daily, with heavy users saving over 10 hours per week.
The GPT-5.2 Thinking version demonstrated strong performance in the GDPval test, which assesses 44 professional knowledge-based tasks. It achieved or surpassed human expert levels in overall performance, winning or tying in 70.9% of tasks when judged by human experts. These tasks spanned the top nine GDP-ranked industries in the U.S., covering areas such as sales presentations, accounting, emergency scheduling, manufacturing drawings, and video production.
Significant improvements were also noted in programming. In the SWE-Bench Pro test, which evaluates real-world software engineering capabilities across four programming languages, GPT-5.2 Thinking scored 55.6%. It reached 80% in SWE-bench Verified, indicating enhanced reliability for debugging, implementing functional requirements, refactoring codebases, and performing end-to-end repairs. Early testers also reported better performance in complex front-end UI tasks, including those involving 3D elements. OpenAI showcased examples of single-prompt generation for a wave simulator, a holiday greeting card generator, and a typing rain game, each producing a complete single-page application.
Accuracy and Long-Text Processing
GPT-5.2 Thinking exhibited a reduced "hallucination rate" compared to GPT-5.1 Thinking, with a 30% decrease in incorrect answers in anonymized ChatGPT queries. OpenAI advises that manual verification remains necessary for critical tasks.
The model also set a new benchmark for long-text reasoning in OpenAI's MRCRv2 benchmark test. It demonstrated improved accuracy in integrating information across long documents, particularly in deep document analysis involving hundreds of thousands of tokens. In the MRCR 4-shot test, GPT-5.2 approached 100% accuracy with contexts up to 256k tokens, enabling efficient processing of extensive documents such as reports, contracts, and academic papers.
In visual understanding, GPT-5.2 Thinking, described as OpenAI's strongest visual model to date, halved the error rate in chart reasoning and software interface comprehension. This allows for more accurate interpretation of data dashboards, product screenshots, technical drawings, and visual reports.
Spatial understanding and tool-use capabilities also improved, with GPT-5.2 Thinking achieving a 98.7% score in the Tau2-bench Telecom test. This indicates enhanced ability to execute end-to-end workflows and handle multi-turn tasks.
Mathematical and scientific capabilities were also upgraded. GPT-5.2 showed improved performance in graduate-level science Q&A tests like GPQA Diamond and expert-level mathematical problem-solving benchmarks such as FrontierMath. GPT-5.2 Pro achieved over 90% accuracy in the ARC-AGI-1 test, while its cost was reduced by approximately 390 times compared to a previous model. In the ARC-AGI-2 test, GPT-5.2 Thinking scored 52.9%, and GPT-5.2 Pro reached 54.2%, setting new highs for "chain-of-thought models."
An instance of GPT-5.2 Pro providing a feasible proof scheme for an open problem in statistical learning theory, originally posed at the 2019 Conference on Learning Theory (COLT), was noted. The model generated a complete proof without prior algorithmic design or intermediate prompts, which was subsequently verified by human experts.
API Pricing and Future Developments
The GPT-5.2 API is now available, with pricing slightly higher than GPT-5.1. OpenAI states that increased token efficiency may result in lower overall costs. The underlying knowledge bases for all three GPT-5.2 models have been updated. GPT-5.2 is rolling out to ChatGPT, prioritizing paid users, while GPT-5.1 will be retired in three months.
OpenAI also announced a three-year licensing agreement with Disney, allowing users to generate social videos featuring over 200 Disney, Marvel, Pixar, and Star Wars characters. Disney has invested $1 billion in OpenAI as part of this agreement.
Additionally, OpenAI plans to introduce an "adult mode" for ChatGPT, with a projected launch in the first quarter of 2026. The company is developing age identification features to ensure content protection for minors, with age prediction models currently undergoing early testing.
