GPT-5.2 Release Sparks Debate Over Performance, Chinese Alumni Among Contributors

Victor Zhang
Victor Zhang
A stylized, glowing brain icon with binary code and network connections, representing advanced AI and technology.

OpenAI's GPT-5.2 has generated mixed reactions following its release, with some users reporting a decline in performance shortly after launch, while others, including early testers, lauded its capabilities. The model, which officially outperformed Gemini 3 Pro in benchmark tests, is designed to excel at economically valuable tasks such as spreadsheet creation, code review, and document analysis. OpenAI stated that GPT-5.2 can match or surpass professional performance in benchmarks like GDPval 70.9% of the time.

Initial User Experience and Performance Concerns

Following its release, a post on X highlighted a perceived failure of GPT-5.2 in a basic test: when asked to count the number of "R"s in "garlic," the model responded with "0." This issue is attributed to a fundamental limitation of large language models (LLMs) regarding letter counting due to tokenization. However, activating GPT-5.2's "Thinking" version reportedly allowed it to answer correctly.

On Reddit, several users reported that GPT-5.2's initial performance appeared robust but deteriorated within hours of its launch. Some described experiencing a strong version of the model in the morning, only for its capabilities to diminish later in the day. This pattern has led to speculation among users about potential post-release adjustments by OpenAI.

Expert Endorsements and Advanced Capabilities

Despite some user complaints, experts and early testers have largely praised GPT-5.2's performance. Ethan Mollick, a professor at Wharton Business School, shared his positive experience with an advanced version of GPT-5.2. He cited an example where the model successfully generated code for a visually complex shader depicting an "infinite neo-Gothic tower city" partially submerged in an ocean. Mollick noted that GPT-5.2 not only followed instructions but also demonstrated a strong understanding of aesthetic and structural elements in the code. He also highlighted the model's ability to create a chart of human exam scores over time, a task requiring extensive information retrieval and cross-referencing.

The CEO of Magicpathai, who has been testing GPT-5.2 for some time, described it as a "major leap in complex reasoning, math, programming, and simulation." He provided an example where GPT-5.2 built a complete 3D graphics engine in a single file, supporting interactive control and 4K resolution. The CEO clarified that all code and graphics were written from scratch by the model, indicating a significant advancement in its coding assistant functionality. He further characterized GPT-5.2 as OpenAI's best agent model, capable of continuously running numerous tools without issues and performing faster than previous iterations. Tests comparing GPT-5.2 with earlier versions showed that the new model called tools directly and maintained coherence even during extended sessions.

ARC Prize reported that GPT-5.2 Pro (X-High) achieved a new SOTA score of 90.5%, suggesting a 390-fold increase in AI efficiency over the past year.

Chinese Contributors Behind GPT-5.2

Several Chinese researchers contributed to the development of GPT-5.2. Yu Bai, an OpenAI researcher and Peking University alumnus, was among the first to preview the model. He holds an undergraduate degree in mathematics from Peking University and a Ph.D. in statistics from Stanford. Yun Dai, responsible for post-training, is a Tsinghua University alumna with a Master's in Computer Science from the University of California, Irvine. Zuxin Liu, another OpenAI researcher focusing on post-training for reasoning models, graduated from Beihang University and completed his Master's and Ph.D. at Carnegie Mellon University. Aston Zhang, a researcher at OpenAI who pursued his Ph.D. at the University of Illinois Urbana-Champaign, specifically acknowledged the GPT-5.2 Thinking team for its ability to handle multi-step tasks.