Alibaba Cloud Design Leverages AI for Enhanced Experience Measurement System

Alibaba Cloud Design has detailed its use of artificial intelligence to refine experience measurement, addressing traditional challenges such as manual effort, subjectivity, and inefficiency. The company demonstrated advancements ranging from intelligent metric iteration and process automation to data insights and AI virtual character research. This initiative aims to upgrade Alibaba Cloud's Experience Measurement (UES) system, creating a more scientific and efficient framework.
The company's exploration over five years has evolved from establishing a classic UES measurement system to leveraging AI for self-evolution and investigating AI virtual characters as alternatives to human testers.
Evolution of Alibaba Cloud's Experience Measurement
Experience measurement is defined as a systematic process that converts subjective user feelings into measurable data to inform product optimization and business decisions.
The Alibaba Cloud Design Department, which supports hundreds of cloud products, previously encountered three primary issues in experience management: a lack of quantitative benchmarks for product experience, difficulty identifying improvement areas, and inconsistent management mechanisms across product lines.
To address these, the Alibaba Cloud Experience Measurement System (UES) was launched in 2019. This system integrates measurement models, tools, and management mechanisms. UES measures user experience across three dimensions: subjective attitude (usability, consistency, satisfaction), objective behavior (task efficiency), and system performance (product performance), all derived from user surveys, expert reviews, and data tracking.
Over five years, the UES system has been applied to 113 core cloud products, involving over 600 internal staff and contributing to the resolution of more than 2,000 experience issues.
AI Integration and System Self-Evolution
Alibaba Cloud recognized that a static measurement system might not remain effective given evolving products and market dynamics. The company initiated a second evolution of UES, integrating AI to ensure the system remains adaptable, accurately reflects user feedback, and correlates with business metrics.
For a growing tool-type cloud product, initial metric candidates like usability and task efficiency were identified. AI-assisted analysis was then used to process 2,000 pieces of user feedback on product console experience. The AI model was trained on metric concepts and examples, then classified and statistically analyzed the feedback, identifying usability, task, consistency, and performance issues as predominant.
To determine the weight of each metric, the correlation between historical UES scores and core operational metrics, such as order renewal and annual retention rates, was analyzed. Metrics with stronger correlations were assigned higher theoretical weights. This led to a new UES system for the product, prioritizing usability, followed by task efficiency, consistency, performance, and satisfaction as a reference.
Alibaba Cloud is working towards automating this process in the AI era by establishing a metric literature RAG (Retrieval Augmented Generation) based on product information, applying automated natural language processing (NLP) clustering for user feedback analysis, and continuously calculating correlations between measurement and business metrics. This aims to enable each product's experience measurement system to evolve autonomously.
Efficiency Improvements with AI Agents and Virtual Characters
Alibaba Cloud has also deployed AI for efficiency gains in measurement execution and data acquisition.
An AI Experience Measurement Agent has been developed to streamline traditional, time-consuming measurement processes. This agent executes measurement steps via natural language commands and automates product calls at specific stages, reducing manual effort.
Furthermore, the company is exploring AI virtual characters to replace human testers for subjective metrics like usability and consistency. Initial evaluations of large language models—Gemini, ChatGPT, and Doubao—were conducted by providing them with product page screenshots and usability rules. Gemini demonstrated strong performance in theoretical understanding and evaluation stability.
To enhance AI's understanding of experience, the selected model was fine-tuned using three methods: RAG training with classic experience measurement theories, training with a library of common internal and external experience problems, and training with data from real-person comparison experiments. In these experiments, 20 experts tested pages concurrently with AI. Results showed near-identical performance between AI and humans for rule-based scoring, a 79.2% overlap in identifying high-frequency problems, and a 76.8% effectiveness rate for AI-identified issues.
This research led to the creation of "Janus," a virtual developer character designed with a specific skill background and personality traits, such as impatience and a preference for keyboard shortcuts. When presented with pages and problems, Janus provided insightful feedback that integrated its assigned personality.
Alibaba Cloud is developing automated AI virtual character training methods to create different characters tailored to specific measurement scenarios. This research is ongoing.
According to information reviewed by toolmesh.ai, Alibaba Cloud believes AI is driving a paradigm shift in experience measurement, leading to a more scientific, efficient, and in-depth approach.