OpenAI GPT-5.2: Enhanced Professional Tasks & Programming

OpenAI has released its GPT-5.2 model series, comprising three versions: Instant, Thinking, and Pro. The company states that this new series represents its most powerful collection of professional knowledge work models to date, outperforming industry professionals in 44 knowledge-based tasks with defined objectives.

GPT-5.2 Instant is designed for daily tasks, offering optimizations in information retrieval, operational guidance, technical document writing, and translation. GPT-5.2 Thinking targets deep work scenarios, including programming, long document summarization, file Q&A, and mathematical logical deduction. GPT-5.2 Pro is positioned as OpenAI's most intelligent and reliable option for complex problems, though it entails longer processing times.

Overall, GPT-5.2 features upgrades in general intelligence, long-context understanding, agent tool calling, and visual capabilities. OpenAI noted that the model performs better in end-to-end execution of complex real-world tasks, with enhancements in spreadsheet creation, presentation building, code writing, image recognition, and multi-step project handling.

OpenAI CEO Sam Altman stated on social media that the model represents significant progress since GPT-5.1. Microsoft CEO Satya Nadella confirmed GPT-5.2's integration into Copilot, Microsoft Foundry, and Copilot Studio. The Instant, Thinking, and Pro versions of GPT-5.2 are rolling out on the ChatGPT platform, with priority for paid subscribers. The programming interface for these versions is available to all developers. GPT-5.1 will remain available to paid users for three months before retirement.

Enhanced Professional Application

OpenAI emphasized that GPT-5.2's development aimed to unlock greater economic value. GPT-5.2 Thinking, in particular, is highlighted as suitable for practical professional applications, with performance reaching or exceeding human expert levels in certain areas.

In the GDPval evaluation, which assesses specific knowledge work tasks across 44 professions, GPT-5.2 Thinking achieved a new high score. Human expert reviews indicated that GPT-5.2 Thinking performed at or above the level of top industry professionals in 70.9% of GDPval knowledge-based work tasks, including creating presentations and spreadsheets. OpenAI reported that GPT-5.2 Thinking's output speed for GDPval-related tasks is over 11 times faster than professionals, at less than 1% of the cost.

An evaluator for GDPval noted the quality of the output, describing it as comparable to work produced by a professional company, with strong layout and suggestions, despite minor errors in one deliverable. Internally, OpenAI's benchmark test for junior investment banking analysts' spreadsheet modeling tasks showed GPT-5.2 Thinking's average score increased by 9.3% from GPT-5.1, rising from 59.1% to 68.4%. The model's spreadsheets and slides demonstrated improved complexity and format compliance. However, generating complex content with GPT-5.2 Thinking may take several minutes. Access to new spreadsheet and presentation generation features in ChatGPT requires a Plus, Pro, Business, or Enterprise subscription with GPT-5.2 Thinking or Pro selected.

GPT-5.2 Thinking is described as OpenAI's most powerful multimodal visual model to date, reducing error rates in chart reasoning and software interface understanding tasks by approximately half. It exhibits stronger perception of positional relationships in images, which is beneficial for interpreting data dashboards, product screenshots, technical schematics, and visual reports in finance, operations, engineering, design, and customer support.

The model achieved nearly 100% accuracy in the 4-needle MRCR variant test (with a 256,000 token limit), and its accuracy in deep document analysis, requiring information retrieval across hundreds of thousands of tokens, surpassed GPT-5.1 Thinking. This capability allows for efficient processing of long documents such as reports, contracts, and research papers, maintaining logical coherence across extensive content. In the Tau2-bench Telecom benchmark test, GPT-5.2 Thinking scored 98.7%, demonstrating its ability to call tools in lengthy multi-turn tasks. For latency-sensitive applications, GPT-5.2 Thinking's performance in zero-inference consumption mode also showed improvement over GPT-5.1 and GPT-4.1.

Programming Capabilities and Speed

In the SWE-Bench Pro benchmark test, GPT-5.2 Thinking scored 55.6%. This test evaluates real-world software engineering capabilities across four programming languages, with improvements in resistance to data pollution, task challenge, question diversity, and industry practicality compared to SWE-Bench Verified. In the SWE-bench Verified benchmark, GPT-5.2 Thinking achieved 80%. These results suggest the model can debug production code, implement functional requirements, refactor large codebases, and complete bug fixes and deployment with less manual intervention.

GPT-5.2 Thinking also outperformed GPT-5.1 Thinking in front-end software engineering, particularly in complex or non-standard interface development and scenarios involving 3D elements. Jeff Wang, CEO of Windsurf, stated that GPT-5.2 represents a significant leap in intelligent coding for GPT models and is the most advanced coding model at its price point. Windsurf and other core workloads use GPT-5.2 Thinking as the default version. Cognition, Warp, Charlie Labs, JetBrains, and Augment Code reported that GPT-5.2 achieved industry-leading agent programming performance, with improvements in interactive programming, code review, and bug troubleshooting.

Matt Shumer, CEO of HyperWriteAI, who has used GPT-5.2 since November 25, noted substantial progress in instruction following and problem-solving. He observed significant improvements in code generation, making it more powerful, autonomous, and logically rigorous, capable of writing larger volumes of code. Visual and long-context processing capabilities were also optimized, particularly in image element position recognition and large codebase processing. However, Shumer identified speed as a main drawback, noting that the thinking mode is very slow for most problems. GPT-5.2 Pro's deep reasoning performance was described as astonishing but slow, occasionally getting stuck without producing results. In the Codex command-line interface, GPT-5.2's programming performance was close to professional levels, but enabling the ultra-high inference mode required a long time.

Research and Reduced Hallucinations

OpenAI positioned GPT-5.2 Pro and GPT-5.2 Thinking as optimal models for scientific research. In the graduate-level Google search-verified Q&A benchmark GPQA Diamond, GPT-5.2 Pro scored 93.2%, with GPT-5.2 Thinking at 92.4%. In the expert-level mathematics evaluation benchmark FrontierMath (levels 1–3), GPT-5.2 Thinking solved 40.3% of problems.

A research project using GPT-5.2 Pro explored an open problem in statistical learning theory, where the model proposed a proof process that was subsequently verified by researchers and external experts. In the ARC-AGI-1 (verified version) benchmark, which measures general reasoning ability, GPT-5.2 Pro surpassed the 90% score threshold, an improvement over last year's o3-preview version's 87% score, while reducing the cost of achieving this performance by approximately 390 times. In the more challenging ARC-AGI-2 (verified version), assessing fluid reasoning ability, GPT-5.2 Thinking achieved 52.9% for chain-of-thought models, with GPT-5.2 Pro scoring 54.2%.

AJ Orbach, CEO of Triple Whale, described GPT-5.2 as a complete architectural revolution, simplifying multi-agent systems into a super-agent with over 20 tools. He noted reduced latency, more powerful tool calling, and the ability to execute cleanly with simple one-line prompts.

GPT-5.2 Thinking exhibits fewer hallucinations compared to GPT-5.1 Thinking, with a 30% reduction in erroneous responses in a set of anonymous ChatGPT queries. This implies a lower probability of errors for professionals using the model for research, writing, analysis, and decision support. OpenAI, however, advises users to verify answers for critical matters.

ChatGPT's subscription pricing remains unchanged after GPT-5.2 integration. On the API side, GPT-5.2's per-token pricing is higher than GPT-5.1, at $1.75 per million input tokens and $14 per million output tokens, with a 90% discount for cached input content. OpenAI expects to release another GPT-5.2 version optimized for Codex in the coming weeks. GPT-5.2 is currently available on the Codex platform.