mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00
![]() * Update CascadeEvaluator * Update CascadeEvaluator * Update CascadeEvaluator * Update Config * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update |
||
---|---|---|
.. | ||
aime2024_0shot_nocot_gen_2b9dc2.py | ||
aime2024_0shot_nocot_genericllmeval_academic_gen.py | ||
aime2024_0shot_nocot_genericllmeval_gen_2b9dc2.py | ||
aime2024_cascade_eval_gen_5e9f4f.py | ||
aime2024_gen_6e39a4.py | ||
aime2024_gen_17d799.py | ||
aime2024_gen.py | ||
aime2024_llmjudge_gen_5e9f4f.py | ||
aime2024_llmjudge_gen.py | ||
aime2024_llmverify_repeat8_gen_e8fcee.py | ||
aime2024_llmverify_repeat16_gen_bf7475.py | ||
README.md |
Description
Math dataset composed of problems from AIME2024 (American Invitational Mathematics Examination 2024).
Performance
Qwen2.5-Math-72B-Instruct | Qwen2.5-Math-7B-Instruct | Qwen2-Math-7B-Instruct | Qwen2-Math-1.5B-Instruct | internlm2-math-7b |
---|---|---|---|---|
20.00 | 16.67 | 16.67 | 13.33 | 3.33 |
Qwen2.5-72B-Instruct | Qwen2.5-7B-Instruct | internlm2_5-7b-chat |
---|---|---|
31.25 | 26.44 | 9.13 |