mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00
![]() * BigCodeBench update * update LCBench * update LCBench 2 * update code * academicBench update * academic bench ifeval&math update * generic_llmjudge_aime_academic_postprocess delete * aime delete * postprocessors update * ifeval delete * update work_dir * linting * linting double-quote-string-fixer * r1-distill out_len update * fix lint --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com> |
||
---|---|---|
.. | ||
aime2024_0shot_nocot_gen_2b9dc2.py | ||
aime2024_0shot_nocot_genericllmeval_academic_gen.py | ||
aime2024_0shot_nocot_genericllmeval_gen_2b9dc2.py | ||
aime2024_0shot_nocot_genericllmeval_xml_gen_2b9dc2.py | ||
aime2024_0shot_nocot_llmjudge_gen_2b9dc2.py | ||
aime2024_gen_6e39a4.py | ||
aime2024_gen.py | ||
README.md |
Description
Math dataset composed of problems from AIME2024 (American Invitational Mathematics Examination 2024).
Performance
Qwen2.5-Math-72B-Instruct | Qwen2.5-Math-7B-Instruct | Qwen2-Math-7B-Instruct | Qwen2-Math-1.5B-Instruct | internlm2-math-7b |
---|---|---|---|---|
20.00 | 16.67 | 16.67 | 13.33 | 3.33 |
Qwen2.5-72B-Instruct | Qwen2.5-7B-Instruct | internlm2_5-7b-chat |
---|---|---|
31.25 | 26.44 | 9.13 |