OpenAI Releases GPT-5.2 Models, Citing Expert-Level Performance
OpenAI has introduced GPT-5.2, a new generation of models designed to handle complex knowledge-based tasks, according to an announcement made on the company's tenth anniversary. The release includes three models: GPT-5.2 Instant, GPT-5.2 Thinking, and GPT-5.2 Pro.
The company stated that GPT-5.2 models demonstrate capabilities comparable to human experts in various professional domains. OpenAI benchmark tests indicate that GPT-5.2 surpasses Gemini 3 Pro across several metrics.
Performance Benchmarks and Capabilities
GPT-5.2 shows advancements in general intelligence, long-text comprehension, Agent tool invocation, and visual processing. Key performance indicators include:
SWE-Bench Pro: Achieved a score of 55.6%.
LMArena Code Arena: Ranked second globally, behind Claude Opus 4.5.
ARC-AGI-2: GPT-5.2 Pro led with a 52.9% score.
GDPval: Covered 44 types of professional knowledge, outperforming human industry experts.
The models feature a 400,000-token context window, a maximum output length of 128,000 tokens, and a knowledge base updated to August 31, 2025. They also support Reasoning Tokens for complex logic and multi-step reasoning.
While performance has increased, the input and output prices for GPT-5.2 are 40% higher compared to GPT-5/5.1. OpenAI executives indicated that the release is not a direct response to competitor models, but rather a result of increased resources dedicated to developing ChatGPT.
Professional Applications and Efficiency
OpenAI highlighted GPT-5.2's focus on professional knowledge-based AI. A Chinese researcher at OpenAI, Yu Bai, described the iteration as a "huge leap in capability." Human evaluations suggest GPT-5.2 has a 70.9% win rate in tasks that typically take human experts 4-8 hours to complete.
GPT-5.2 is designed to assist with tasks such as spreadsheet creation, presentation development, coding, image perception, and complex multi-step projects. Previous reports from OpenAI indicated that ChatGPT saves enterprise users an average of 40-60 minutes daily, with heavy users reporting over 10 hours saved weekly.
On GDPval, GPT-5.2 Thinking achieved a new state-of-the-art (SOTA) and is the first model to perform above human expert level. In GDPval knowledge work tasks, it matched or outperformed top industry professionals in 70.9% of cases, completing tasks 11 times faster and at less than 1% of the cost.
For junior investment banking analysts' spreadsheet modeling, GPT-5.2 Thinking's average score per task was 9.3% higher than GPT-5.1, increasing from 59.1% to 68.4%. The spreadsheets and presentations generated by GPT-5.2 Thinking showed improved complexity and formatting.
Advanced Programming and Reduced Hallucinations
In programming, GPT-5.2 Thinking set a new record of 55.6% on the SWE-Bench Pro benchmark. This benchmark tests four programming languages and is designed to be more challenging and industrially relevant than SWE-bench Verified. On SWE-bench Verified, GPT-5.2 Thinking scored 80%, indicating improved reliability in debugging, implementing features, and refactoring code. Early testers noted its strength in front-end development and complex UI work.
GPT-5.2 Thinking also demonstrated a 30% reduction in incorrect answers compared to GPT-5.1 Thinking in a set of de-identified ChatGPT queries, leading to fewer errors in research, writing, analysis, and decision support for professionals.
Long-Context Reasoning and Visual Capabilities
GPT-5.2 Thinking established a new industry standard in long-context reasoning. On OpenAI MRCRv2, it achieved leading performance and was the first OpenAI model to reach nearly 100% accuracy across all four MRCR variants (up to 256k tokens). This allows professionals to process long documents such as reports, contracts, and research papers while maintaining coherence and accuracy.
The model is compatible with OpenAI's new Responses "/compact" endpoint, which extends its effective context window for tool-intensive, long-running workflows.
GPT-5.2 Thinking is described as OpenAI's strongest visual model to date, with approximately half the error rate in chart reasoning and software interface understanding. It shows a stronger grasp of element positioning in images, aiding tasks where relative layout is critical.
Workflow Integration and Scientific Research
GPT-5.2 Thinking achieved a record of 98.7% on Tau2-bench Telecom for reliably using tools in long multi-turn tasks. For latency-sensitive applications, it performs better with reasoning.effort='none'. This facilitates stronger end-to-end workflows, such as resolving customer support cases and generating outputs from multiple systems.
OpenAI is also exploring AI's role in accelerating scientific research. GPT-5.2 Pro and GPT-5.2 Thinking are presented as models for assisting scientists. On the GPQA Diamond benchmark, GPT-5.2 Pro scored 93.2%, with GPT-5.2 Thinking at 92.4%. On the FrontierMath (Tier 1–3) assessment, GPT-5.2 Thinking set a new record, solving 40.3% of problems.
In a recent collaboration, researchers used GPT-5.2 Pro to explore an open problem in statistical learning theory, resulting in a new paper where the AI completed the proof, with human researchers verifying and writing the findings.
Fluid Intelligence and Model Variants
On ARC-AGI-1 (Verified), a benchmark for general reasoning, GPT-5.2 Pro was the first model to exceed 90%. On the more challenging ARC-AGI-2 (Verified), GPT-5.2 Thinking scored 52.9%, and GPT-5.2 Pro reached 54.2%, demonstrating improved multi-step reasoning and quantitative accuracy.
The GPT-5.2 family includes three models tailored for different uses:
GPT-5.2 Instant: Designed for daily office and learning tasks, offering speed and practicality with clearer explanations, improved operational guidance, and stronger technical writing.
GPT-5.2 Thinking: Intended for deeper work, specializing in programming, long document summarization, and complex mathematical or logical problems. It offers industry-leading long-context reasoning.
GPT-5.2 Pro: Positioned as the most capable model for complex, high-difficulty problems, particularly in fields like programming and scientific research. It is also more cost-effective despite its higher per-token cost due to increased token efficiency.
Paid ChatGPT users with Plus, Pro, Go, Business, or Enterprise plans will have priority access to GPT-5.2 models. OpenAI plans a gradual deployment, with GPT-5.1 remaining available to paid users for three months before deprecation. In the API platform, the new GPT-5.2 series models are available in the Responses API and Chat Completions API. Developers can set reasoning parameters in GPT-5.2 Pro, and both GPT-5.2 Pro and GPT-5.2 Thinking support a new fifth reasoning intensity, 'xhigh'.
GPT-5.2 is priced at $1.75 per million input tokens and $14 per million output tokens, with a 90% discount for cached input.
OpenAI's Decade of Development
OpenAI marked its tenth anniversary, reflecting on its journey since its founding on December 11, 2015. Key milestones include:
2016: Open-sourcing the reinforcement learning platform OpenAI Gym.
2017: Publishing research on the Transformer concept.
2018: Launch of the pre-trained language model GPT.
2019: Release of the 1.5B parameter GPT-2.
2020: Introduction of the 175B parameter GPT-3.
2021: Release of Codex and DALL·E.
2022: Launch of ChatGPT (GPT-3.5).
OpenAI's CEO, Sam Altman, commented on the past decade as "amazing" and hinted at a "little gift" coming next week.
