AI Researcher Project Introduces Autonomous AI for Experimentation and Report Generation
The AI Researcher project has unveiled an autonomous AI system designed to conduct research, execute experiments, and produce comprehensive reports. Users provide a research objective, and the system then disaggregates the task into multiple experiments. It subsequently deploys several sub-agents, each equipped with independent GPU resources, to perform parallel training, inference, and evaluation. The system concludes by consolidating the findings into a paper-like report.
Operational Workflow
The AI Researcher functions by first decomposing a broad research goal into several executable sub-experiments. A dedicated researcher agent is assigned to each of these sub-experiments. Each agent is capable of requesting a GPU sandbox on Modal, generating its own code, running experiments, and collecting evidence. Upon the completion of all experiments, an orchestrator agent synthesizes the results and generates a complete report, which includes charts, tables, and analysis. This entire process operates without human intervention.
Setup and Configuration
To initiate the AI Researcher, users can execute python run_app.py. This command automatically installs any missing dependencies, starts the backend API and frontend Notebook, and opens a browser page. If API keys are not configured, a prompt will appear for local storage of these keys.
The system requires at least one large language model key, such as GOOGLE_API_KEY for Google AI Studio (Gemini 3 Pro) or ANTHROPIC_API_KEY for Anthropic (Claude Opus 4.5). For GPU operations, a Modal.com account is necessary, providing MODAL_TOKEN_ID and MODAL_TOKEN_SECRET. These keys can be placed in the .env file within the project's root directory or entered manually via the Web UI during its initial launch.
Model selection is available through a dropdown menu in the Web interface, with Gemini 3 Pro as the default and Claude Opus 4.5 as an alternative.
Command Line Interface Usage
For quick, single-agent experiments, the command line interface (CLI) can be used: python main.py "Does label smoothing help ViT-Base on CIFAR-10?" --mode single --gpu any --model gemini-3-pro-preview
For comprehensive research involving multiple agents, the recommended command is: python main.py "Systematically characterize the scaling laws of sparse attention Transformers" --mode orchestrator --num-agents 4 --max-rounds 3 --max-parallel 2 --gpu any Users can specify GPU types such as a100, t4, or h100.
A dry run test, which incurs no cost, can be performed using: python main.py "First run a complete pipeline test" --mode orchestrator --test-mode
The project's code repository is located at https://github.com/mattshumer/ai-researcher.
