Artificial General Intelligence Evaluation System

Developed by Beijing Institute for General Artificial Intelligence

Building the world's leading AGI evaluation standard, leading the evolution and application of agent capabilities. Using human child development stages as a benchmark to evaluate the fundamental and essential abilities of agents to enter human society.

Multidimensional Index System

Scientific, rigorous, and comprehensive evaluation dimensions, establishing the benchmark for the AGI era.

General Testing

General Testing

This assessment delineates six core dimensions—vision, language, cognition, motion, learning, and value—grounded in the developmental psychology of human children to quantify an agent's mental development level.

Main ranking2 general evaluations

General Testing Ranking

Top 5 model performance data based on Basic Family Comprehensive Tasks

View Full Ranking
Switch evaluation paper

Scores measure how often each model completes daily composite tasks in a simulated home environment. The ability view breaks results down by task type; the dimension view groups the same results into object understanding, spatial intelligence, and social activity.

Model
Avg
Counting Objects
Preparing Baggage
Building Blocks
Jigsaw Puzzle
Understanding Buttons
Setting Tables
Tidying Up Rooms
Selecting Gifts
Google Gemini 2.5 Pro
24.5348.012.410.05.03.326.722.868.1
Google Gemini 2.5 Flash
23.0542.011.15.55.33.325.823.268.2
OpenAI o3
22.8854.010.310.06.43.314.318.865.9
4OpenAI GPT-5
21.5436.09.53.86.03.328.716.069.1
5Anthropic Claude Sonnet 3.7
20.5246.03.48.96.30.023.816.159.7

Specialized Testing

Specialized Testing

This assessment provides in-depth evaluation of advanced intelligence domains, including abstract reasoning, geometric proof, theory of mind, and intuitive physics.

Updates

Follow key TongTest releases, research progress, and standards development.

View more
FeaturedPublishingMar 28, 2026

Chinese Edition of AGI Standards, Rating, Testing, and Architecture Published

The Chinese edition of AGI Standards, Rating, Testing, and Architecture has been published and received the 2025 Impactful New Book Award from the Async Community. The book systematically presents methods for AGI standards, rating, testing, and architecture, providing theoretical and methodological support for TongTest.

Chinese edition has been published
Received the Async Community 2025 Impactful New Book Award
Covers AGI standards, rating, testing, and architecture
《通用人工智能标准、评级、测试与架构》书籍封面

Joint Evaluation and Cooperative Institutions

Peking University
Peking University
Tsinghua University
Tsinghua University
Zhejiang University
Zhejiang University
Shanghai Jiao Tong University
Shanghai Jiao Tong University
USTC
USTC
Beihang University
Beihang University
Wuhan University
Wuhan University
HUST
HUST
Shandong University
Shandong University
ShanghaiTech University
ShanghaiTech University
UESTC
UESTC
Xidian University
Xidian University
BUPT
BUPT
BIT
BIT
Peking University
Peking University
Tsinghua University
Tsinghua University
Zhejiang University
Zhejiang University
Shanghai Jiao Tong University
Shanghai Jiao Tong University
USTC
USTC
Beihang University
Beihang University
Wuhan University
Wuhan University
HUST
HUST
Shandong University
Shandong University
ShanghaiTech University
ShanghaiTech University
UESTC
UESTC
Xidian University
Xidian University
BUPT
BUPT
BIT
BIT