mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00
43 lines
1.4 KiB
Markdown
43 lines
1.4 KiB
Markdown
![]() |
# Omni-Math
|
||
|
|
||
|
[Omni-Math](https://huggingface.co/datasets/KbsdJames/Omni-MATH) contains 4428 competition-level problems. These problems are meticulously categorized into 33 (and potentially more) sub-domains and span across 10 distinct difficulty levels, enabling a nuanced analysis of model performance across various mathematical disciplines and levels of complexity.
|
||
|
|
||
|
* Project Page: https://omni-math.github.io/
|
||
|
* Github Repo: https://github.com/KbsdJames/Omni-MATH
|
||
|
* Omni-Judge (opensource evaluator of this dataset): https://huggingface.co/KbsdJames/Omni-Judge
|
||
|
|
||
|
## Omni-Judge
|
||
|
|
||
|
> Omni-Judge is an open-source mathematical evaluation model designed to assess whether a solution generated by a model is correct given a problem and a standard answer.
|
||
|
|
||
|
You should deploy the omni-judge server like:
|
||
|
```bash
|
||
|
set -x
|
||
|
|
||
|
lmdeploy serve api_server KbsdJames/Omni-Judge --server-port 8000 \
|
||
|
--tp 1 \
|
||
|
--cache-max-entry-count 0.9 \
|
||
|
--log-level INFO
|
||
|
```
|
||
|
|
||
|
and set the server url in opencompass config file:
|
||
|
|
||
|
```python
|
||
|
from mmengine.config import read_base
|
||
|
|
||
|
with read_base():
|
||
|
from opencompass.configs.datasets.omni_math.omni_math_gen import omni_math_datasets
|
||
|
|
||
|
|
||
|
omni_math_dataset = omni_math_datasets[0]
|
||
|
omni_math_dataset['eval_cfg']['evaluator'].update(
|
||
|
url=['http://172.30.8.45:8000',
|
||
|
'http://172.30.16.113:8000'],
|
||
|
)
|
||
|
```
|
||
|
|
||
|
## Performance
|
||
|
|
||
|
| llama-3_1-8b-instruct | qwen-2_5-7b-instruct | InternLM3-8b-Instruct |
|
||
|
| -- | -- | -- |
|
||
|
| 15.18 | 29.97 | 32.75 |
|