mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

History

Junnan Liu 22a33d8759 [Update] Update LiveMathBench Hard Configs (#1826 ) * support G-Pass@k and livemathbench * fix bugs * fix comments of GPassKEvaluator * update saved details of GPassKEvaluator * update saved details of GPassKEvaluator * fix eval api configs & update openai_api for ease of debugging * update huggingface path * fix method name of G-Pass@k * fix default value of eval_model_name * refactor G-Pass@k evaluator * log generation params for each backend * fix evaluation resume * add notimplementerror * update livemathbench-hard configs * remove max_out_len from livemathbench_hard_greedy_gen_9befbf.py * remove max_out_len from livemathbench_hard_gen_9befbf.py * rename livemathbench_hard_gen_9befbf.py to livemathbench_hard_gen_353ae7.py * rename livemathbench_hard_greedy_gen_9befbf.py to livemathbench_hard_greedy_gen_353ae7.py * update livemathbench_gen_9befbf.py * remove whitespace * upload livemathbench hard configs		2025-02-25 17:24:36 +08:00
..
livemathbench_gen_6eb711.py	[Feature] Update o1 evaluation with JudgeLLM (#1795 )	2024-12-30 17:31:00 +08:00
livemathbench_gen_9befbf.py	[Update] Update LiveMathBench Hard Configs (#1826 )	2025-02-25 17:24:36 +08:00
livemathbench_gen_caed8f.py	[Feature] Support LiveMathBench (#1727 )	2024-11-30 00:07:19 +08:00
livemathbench_gen.py	[Feature] Support G-Pass@k and LiveMathBench (#1772 )	2024-12-30 16:59:39 +08:00
livemathbench_greedy_gen_9befbf.py	[Update] Update Greedy Config & README of LiveMathBench (#1862 )	2025-02-20 19:47:04 +08:00
livemathbench_greedy_gen.py	[Update] Update LiveMathBench Hard Configs (#1826 )	2025-02-25 17:24:36 +08:00
livemathbench_hard_gen_353ae7.py	[Update] Update LiveMathBench Hard Configs (#1826 )	2025-02-25 17:24:36 +08:00
livemathbench_hard_greedy_gen_353ae7.py	[Update] Update LiveMathBench Hard Configs (#1826 )	2025-02-25 17:24:36 +08:00
README.md	[Update] Update Greedy Config & README of LiveMathBench (#1862 )	2025-02-20 19:47:04 +08:00

README.md

LiveMathBench

v202412

Details of Datsets

dataset	language	#fill-in-the-blank	#problem-solving
AMC	cn	0	46
AMC	en	0	46
CCEE	cn	13	31
CCEE	en	13	31
CNMO	cn	0	18
CNMO	en	0	18
WLPMC	cn	0	11
WLPMC	en	0	11

How to use

G-Pass@k

from mmengine.config import read_base

with read_base():
    from opencompass.datasets.livemathbench_gen import livemathbench_datasets

livemathbench_datasets[0]['eval_cfg']['evaluator'].update(
    {
        'model_name': 'Qwen/Qwen2.5-72B-Instruct', 
        'url': [
            'http://0.0.0.0:23333/v1', 
            '...'
        ]  # set url of evaluation models
    }
)
livemathbench_dataset['infer_cfg']['inferencer'].update(dict(
    max_out_len=32768 # for o1-like models you need to update max_out_len
))

Greedy

from mmengine.config import read_base

with read_base():
    from opencompass.datasets.livemathbench_greedy_gen import livemathbench_datasets

livemathbench_datasets[0]['eval_cfg']['evaluator'].update(
    {
        'model_name': 'Qwen/Qwen2.5-72B-Instruct', 
        'url': [
            'http://0.0.0.0:23333/v1', 
            '...'
        ]  # set url of evaluation models
    }
)
livemathbench_dataset['infer_cfg']['inferencer'].update(dict(
    max_out_len=32768 # for o1-like models you need to update max_out_len
))

Output Samples

dataset	version	metric	mode	Qwen2.5-72B-Instruct
LiveMathBench	9befbf	G-Pass@16_0.0	gen	xx.xx
LiveMathBench	caed8f	G-Pass@16_0.25	gen	xx.xx
LiveMathBench	caed8f	G-Pass@16_0.5	gen	xx.xx
LiveMathBench	caed8f	G-Pass@16_0.75	gen	xx.xx
LiveMathBench	caed8f	G-Pass@16_1.0	gen	xx.xx