mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00

* upload dataset definitions & configs * add single dataset split specific metrics * add k-pass@threshold & MATH500
2.0 KiB
2.0 KiB
LiveMathBench
Details of Datsets
dataset | language | #single-choice | #multiple-choice | #fill-in-the-blank | #problem-solving |
---|---|---|---|---|---|
AIMC | cn | 0 | 0 | 0 | 46 |
AIMC | en | 0 | 0 | 0 | 46 |
CEE | cn | 0 | 0 | 13 | 40 |
CEE | en | 0 | 0 | 13 | 40 |
CMO | cn | 0 | 0 | 0 | 18 |
CMO | en | 0 | 0 | 0 | 18 |
MATH500 | en | 0 | 0 | 0 | 500 |
How to use
from mmengine.config import read_base
with read_base():
from opencompass.datasets.livemathbench import livemathbench_datasets
livemathbench_datasets[0].update(
{
'abbr': 'livemathbench_${k}x${n}'
'path': '/path/to/data/dir',
'k': 'k@pass', # the max value of k in k@pass
'n': 'number of runs', # number of runs
}
)
livemathbench_datasets[0]['eval_cfg']['evaluator'].update(
{
'model_name': 'Qwen/Qwen2.5-72B-Instruct',
'url': [
'http://0.0.0.0:23333/v1',
'...'
] # set url of evaluation models
}
)
❗️ At present,
extract_from_boxed
is used to extract answers from model responses, and one can also leverage LLM for extracting through the following parameters, but this part of the code has not been tested.
livemathbench_datasets[0]['eval_cfg']['evaluator'].update(
{
'model_name': 'Qwen/Qwen2.5-72B-Instruct',
'url': [
'http://0.0.0.0:23333/v1',
'...'
], # set url of evaluation models
# for LLM-based extraction
'use_extract_model': True,
'post_model_name': 'oc-extractor',
'post_url': [
'http://0.0.0.0:21006/v1,
'...'
]
}
)
Output Samples
dataset | version | metric | mode | Qwen2.5-72B-Instruct |
---|---|---|---|---|
LiveMathBench | caed8f | 1@pass | gen | 26.07 |
LiveMathBench | caed8f | 1@pass/std | gen | xx.xx |
LiveMathBench | caed8f | 2@pass | gen | xx.xx |
LiveMathBench | caed8f | 2@pass/std | gen | xx.xx |
LiveMathBench | caed8f | pass-rate | gen | xx.xx |