mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00
75 lines
2.0 KiB
Markdown
75 lines
2.0 KiB
Markdown
![]() |
# LiveMathBench
|
||
|
|
||
|
## Details of Datsets
|
||
|
|
||
|
| dataset | language | #single-choice | #multiple-choice | #fill-in-the-blank | #problem-solving |
|
||
|
| -- | -- | -- | -- | -- | -- |
|
||
|
| AIMC | cn | 46 | 0 | 0 | 0 |
|
||
|
| AIMC | en | 46 | 0 | 0 | 0 |
|
||
|
| CEE | cn | 28 | 9 | 13 | 3 |
|
||
|
| CEE | en | 28 | 9 | 13 | 3 |
|
||
|
| CMO | cn | 0 | 0 | 0 | 18 |
|
||
|
| CMO | en | 0 | 0 | 0 | 18 |
|
||
|
|
||
|
|
||
|
## How to use
|
||
|
|
||
|
|
||
|
```python
|
||
|
from mmengine.config import read_base
|
||
|
|
||
|
with read_base():
|
||
|
from opencompass.datasets.livemathbench import livemathbench_datasets
|
||
|
|
||
|
livemathbench_datasets[0].update(
|
||
|
{
|
||
|
'path': '/path/to/data/dir',
|
||
|
'k': 'k@pass', # the max value of k in k@pass
|
||
|
'n': 'number of runs', # number of runs
|
||
|
}
|
||
|
)
|
||
|
livemathbench_datasets[0]['eval_cfg']['evaluator'].update(
|
||
|
{
|
||
|
'model_name': 'Qwen/Qwen2.5-72B-Instruct',
|
||
|
'url': [
|
||
|
'http://0.0.0.0:23333/v1',
|
||
|
'...'
|
||
|
] # set url of evaluation models
|
||
|
}
|
||
|
)
|
||
|
|
||
|
```
|
||
|
|
||
|
> ❗️ At present, `extract_from_boxed` is used to extract answers from model responses, and one can also leverage LLM for extracting through the following parameters, but this part of the code has not been tested.
|
||
|
|
||
|
```python
|
||
|
livemathbench_datasets[0]['eval_cfg']['evaluator'].update(
|
||
|
{
|
||
|
'model_name': 'Qwen/Qwen2.5-72B-Instruct',
|
||
|
'url': [
|
||
|
'http://0.0.0.0:23333/v1',
|
||
|
'...'
|
||
|
], # set url of evaluation models
|
||
|
|
||
|
# for LLM-based extraction
|
||
|
'use_extract_model': True,
|
||
|
'post_model_name': 'oc-extractor',
|
||
|
'post_url': [
|
||
|
'http://0.0.0.0:21006/v1,
|
||
|
'...'
|
||
|
]
|
||
|
}
|
||
|
)
|
||
|
```
|
||
|
|
||
|
## Output Samples
|
||
|
|
||
|
| dataset | version | metric | mode | Qwen2.5-72B-Instruct |
|
||
|
|----- | ----- | ----- | ----- | -----|
|
||
|
| LiveMathBench | caed8f | 1@pass | gen | 26.07 |
|
||
|
| LiveMathBench | caed8f | 1@pass/std | gen | xx.xx |
|
||
|
| LiveMathBench | caed8f | 2@pass | gen | xx.xx |
|
||
|
| LiveMathBench | caed8f | 2@pass/std | gen | xx.xx |
|
||
|
| LiveMathBench | caed8f | pass-rate | gen | xx.xx |
|
||
|
|