OpenCompass/opencompass/configs/datasets/livemathbench
Linchen Xiao b2da1c08a8
[Dataset] Add SmolInstruct, Update Chembench (#2025)
* [Dataset] Add SmolInstruct, Update Chembench

* Add dataset metadata

* update

* update

* update
2025-04-18 17:21:29 +08:00
..
livemathbench_gen_6eb711.py [Feature] Update o1 evaluation with JudgeLLM (#1795) 2024-12-30 17:31:00 +08:00
livemathbench_gen_9befbf.py [Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator (#1886) 2025-02-26 19:43:12 +08:00
livemathbench_gen_caed8f.py [Feature] Support LiveMathBench (#1727) 2024-11-30 00:07:19 +08:00
livemathbench_gen.py [Feature] Support G-Pass@k and LiveMathBench (#1772) 2024-12-30 16:59:39 +08:00
livemathbench_greedy_gen_9befbf.py [Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator (#1886) 2025-02-26 19:43:12 +08:00
livemathbench_greedy_gen.py [Update] Update LiveMathBench Hard Configs (#1826) 2025-02-25 17:24:36 +08:00
livemathbench_hard_custom_llmverify_gen_85d0ef.py [Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899) 2025-03-03 18:56:11 +08:00
livemathbench_hard_gen_353ae7.py [Update] Fix Hard Configs With General GPassK (#1906) 2025-03-03 18:17:15 +08:00
livemathbench_hard_greedy_gen_353ae7.py [Update] Fix Hard Configs With General GPassK (#1906) 2025-03-03 18:17:15 +08:00
livemathbench_hard_llmjudge_gen_71eaf5.py [Dataset] Add SmolInstruct, Update Chembench (#2025) 2025-04-18 17:21:29 +08:00
README.md [Update] Update Greedy Config & README of LiveMathBench (#1862) 2025-02-20 19:47:04 +08:00

LiveMathBench

v202412

Details of Datsets

dataset language #single-choice #multiple-choice #fill-in-the-blank #problem-solving
AMC cn 0 0 0 46
AMC en 0 0 0 46
CCEE cn 0 0 13 31
CCEE en 0 0 13 31
CNMO cn 0 0 0 18
CNMO en 0 0 0 18
WLPMC cn 0 0 0 11
WLPMC en 0 0 0 11

How to use

G-Pass@k

from mmengine.config import read_base

with read_base():
    from opencompass.datasets.livemathbench_gen import livemathbench_datasets

livemathbench_datasets[0]['eval_cfg']['evaluator'].update(
    {
        'model_name': 'Qwen/Qwen2.5-72B-Instruct', 
        'url': [
            'http://0.0.0.0:23333/v1', 
            '...'
        ]  # set url of evaluation models
    }
)
livemathbench_dataset['infer_cfg']['inferencer'].update(dict(
    max_out_len=32768 # for o1-like models you need to update max_out_len
))

Greedy

from mmengine.config import read_base

with read_base():
    from opencompass.datasets.livemathbench_greedy_gen import livemathbench_datasets

livemathbench_datasets[0]['eval_cfg']['evaluator'].update(
    {
        'model_name': 'Qwen/Qwen2.5-72B-Instruct', 
        'url': [
            'http://0.0.0.0:23333/v1', 
            '...'
        ]  # set url of evaluation models
    }
)
livemathbench_dataset['infer_cfg']['inferencer'].update(dict(
    max_out_len=32768 # for o1-like models you need to update max_out_len
))

Output Samples

dataset version metric mode Qwen2.5-72B-Instruct
LiveMathBench 9befbf G-Pass@16_0.0 gen xx.xx
LiveMathBench caed8f G-Pass@16_0.25 gen xx.xx
LiveMathBench caed8f G-Pass@16_0.5 gen xx.xx
LiveMathBench caed8f G-Pass@16_0.75 gen xx.xx
LiveMathBench caed8f G-Pass@16_1.0 gen xx.xx