AI Researcher: Autonomous AI for Experimentation & Reports

The AI Researcher project has unveiled an autonomous AI system designed to conduct research, execute experiments, and produce comprehensive reports. Users provide a research objective, and the system then disaggregates the task into multiple experiments. It subsequently deploys several sub-agents, each equipped with independent GPU resources, to perform parallel training, inference, and evaluation. The system concludes by consolidating the findings into a paper-like report.

Operational Workflow

The AI Researcher functions by first decomposing a broad research goal into several executable sub-experiments. A dedicated researcher agent is assigned to each of these sub-experiments. Each agent is capable of requesting a GPU sandbox on Modal, generating its own code, running experiments, and collecting evidence. Upon the completion of all experiments, an orchestrator agent synthesizes the results and generates a complete report, which includes charts, tables, and analysis. This entire process operates without human intervention.

Setup and Configuration

To initiate the AI Researcher, users can execute python run_app.py. This command automatically installs any missing dependencies, starts the backend API and frontend Notebook, and opens a browser page. If API keys are not configured, a prompt will appear for local storage of these keys.

The system requires at least one large language model key, such as GOOGLE_API_KEY for Google AI Studio (Gemini 3 Pro) or ANTHROPIC_API_KEY for Anthropic (Claude Opus 4.5). For GPU operations, a Modal.com account is necessary, providing MODAL_TOKEN_ID and MODAL_TOKEN_SECRET. These keys can be placed in the .env file within the project's root directory or entered manually via the Web UI during its initial launch.

Model selection is available through a dropdown menu in the Web interface, with Gemini 3 Pro as the default and Claude Opus 4.5 as an alternative.

Command Line Interface Usage

For quick, single-agent experiments, the command line interface (CLI) can be used: python main.py "Does label smoothing help ViT-Base on CIFAR-10?" --mode single --gpu any --model gemini-3-pro-preview

For comprehensive research involving multiple agents, the recommended command is: python main.py "Systematically characterize the scaling laws of sparse attention Transformers" --mode orchestrator --num-agents 4 --max-rounds 3 --max-parallel 2 --gpu any Users can specify GPU types such as a100, t4, or h100.

A dry run test, which incurs no cost, can be performed using: python main.py "First run a complete pipeline test" --mode orchestrator --test-mode

The project's code repository is located at https://github.com/mattshumer/ai-researcher.