2024-11-30 00:07:19 +08:00
# LiveMathBench
## Details of Datsets
| dataset | language | #single -choice | #multiple -choice | #fill -in-the-blank | #problem -solving |
| -- | -- | -- | -- | -- | -- |
2024-12-05 16:54:16 +08:00
| AIMC | cn | 0 | 0 | 0 | 46 |
| AIMC | en | 0 | 0 | 0 | 46 |
| CEE | cn | 0 | 0 | 13 | 40 |
| CEE | en | 0 | 0 | 13 | 40 |
2024-11-30 00:07:19 +08:00
| CMO | cn | 0 | 0 | 0 | 18 |
| CMO | en | 0 | 0 | 0 | 18 |
2024-12-05 16:54:16 +08:00
| MATH500 | en | 0 | 0 | 0 | 500 |
2024-12-06 14:36:49 +08:00
| AIME2024 | en | 0 | 0 | 0 | 44 |
2024-11-30 00:07:19 +08:00
## How to use
```python
from mmengine.config import read_base
with read_base():
from opencompass.datasets.livemathbench import livemathbench_datasets
livemathbench_datasets[0].update(
{
2024-12-05 16:54:16 +08:00
'abbr': 'livemathbench_${k}x${n}'
2024-11-30 00:07:19 +08:00
'path': '/path/to/data/dir',
'k': 'k@pass', # the max value of k in k@pass
'n': 'number of runs', # number of runs
}
)
livemathbench_datasets[0]['eval_cfg']['evaluator'].update(
{
'model_name': 'Qwen/Qwen2.5-72B-Instruct',
'url': [
'http://0.0.0.0:23333/v1',
'...'
] # set url of evaluation models
}
)
```
> ❗️ At present, `extract_from_boxed` is used to extract answers from model responses, and one can also leverage LLM for extracting through the following parameters, but this part of the code has not been tested.
```python
livemathbench_datasets[0]['eval_cfg']['evaluator'].update(
{
'model_name': 'Qwen/Qwen2.5-72B-Instruct',
'url': [
'http://0.0.0.0:23333/v1',
'...'
], # set url of evaluation models
# for LLM-based extraction
'use_extract_model': True,
'post_model_name': 'oc-extractor',
'post_url': [
'http://0.0.0.0:21006/v1,
'...'
]
}
)
```
## Output Samples
| dataset | version | metric | mode | Qwen2.5-72B-Instruct |
|----- | ----- | ----- | ----- | -----|
| LiveMathBench | caed8f | 1@pass | gen | 26.07 |
| LiveMathBench | caed8f | 1@pass/std | gen | xx.xx |
| LiveMathBench | caed8f | 2@pass | gen | xx.xx |
| LiveMathBench | caed8f | 2@pass/std | gen | xx.xx |
| LiveMathBench | caed8f | pass-rate | gen | xx.xx |