mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00
![]() * feat datasetrefine drop * fix datasets in fullbench_int3 * fix * fix * back * fix * fix and doc * feat * fix hook * fix * fix * fix * fix * fix * fix * fix * fix * fix * doc * fix * fix * Update dataset-index.yml |
||
---|---|---|
.. | ||
aime2024_0shot_nocot_gen_2b9dc2.py | ||
aime2024_0shot_nocot_genericllmeval_academic_gen.py | ||
aime2024_0shot_nocot_genericllmeval_gen_2b9dc2.py | ||
aime2024_0shot_nocot_genericllmeval_xml_gen_2b9dc2.py | ||
aime2024_0shot_nocot_llmjudge_gen_2b9dc2.py | ||
aime2024_gen_6e39a4.py | ||
aime2024_gen_17d799.py | ||
aime2024_gen.py | ||
aime2024_llm_judge_gen.py | ||
aime2024_llmjudge_gen_5e9f4f.py | ||
aime2024_llmverify_repeat8_gen_e8fcee.py | ||
aime2024_llmverify_repeat16_gen_bf7475.py | ||
README.md |
Description
Math dataset composed of problems from AIME2024 (American Invitational Mathematics Examination 2024).
Performance
Qwen2.5-Math-72B-Instruct | Qwen2.5-Math-7B-Instruct | Qwen2-Math-7B-Instruct | Qwen2-Math-1.5B-Instruct | internlm2-math-7b |
---|---|---|---|---|
20.00 | 16.67 | 16.67 | 13.33 | 3.33 |
Qwen2.5-72B-Instruct | Qwen2.5-7B-Instruct | internlm2_5-7b-chat |
---|---|---|
31.25 | 26.44 | 9.13 |