LiveMathBench
v202412
Details of Datsets
dataset |
language |
#single-choice |
#multiple-choice |
#fill-in-the-blank |
#problem-solving |
AMC |
cn |
0 |
0 |
0 |
46 |
AMC |
en |
0 |
0 |
0 |
46 |
CCEE |
cn |
0 |
0 |
13 |
31 |
CCEE |
en |
0 |
0 |
13 |
31 |
CNMO |
cn |
0 |
0 |
0 |
18 |
CNMO |
en |
0 |
0 |
0 |
18 |
WLPMC |
cn |
0 |
0 |
0 |
11 |
WLPMC |
en |
0 |
0 |
0 |
11 |
How to use
G-Pass@k
from mmengine.config import read_base
with read_base():
from opencompass.datasets.livemathbench_gen import livemathbench_datasets
livemathbench_datasets[0]['eval_cfg']['evaluator'].update(
{
'model_name': 'Qwen/Qwen2.5-72B-Instruct',
'url': [
'http://0.0.0.0:23333/v1',
'...'
] # set url of evaluation models
}
)
livemathbench_dataset['infer_cfg']['inferencer'].update(dict(
max_out_len=32768 # for o1-like models you need to update max_out_len
))
Greedy
from mmengine.config import read_base
with read_base():
from opencompass.datasets.livemathbench_greedy_gen import livemathbench_datasets
livemathbench_datasets[0]['eval_cfg']['evaluator'].update(
{
'model_name': 'Qwen/Qwen2.5-72B-Instruct',
'url': [
'http://0.0.0.0:23333/v1',
'...'
] # set url of evaluation models
}
)
livemathbench_dataset['infer_cfg']['inferencer'].update(dict(
max_out_len=32768 # for o1-like models you need to update max_out_len
))
Output Samples
dataset |
version |
metric |
mode |
Qwen2.5-72B-Instruct |
LiveMathBench |
9befbf |
G-Pass@16_0.0 |
gen |
xx.xx |
LiveMathBench |
caed8f |
G-Pass@16_0.25 |
gen |
xx.xx |
LiveMathBench |
caed8f |
G-Pass@16_0.5 |
gen |
xx.xx |
LiveMathBench |
caed8f |
G-Pass@16_0.75 |
gen |
xx.xx |
LiveMathBench |
caed8f |
G-Pass@16_1.0 |
gen |
xx.xx |