OpenCompass/opencompass/configs/datasets/omni_math
Songyang Zhang c84bc18ac1
[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899)
* [Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard with LLM Verify

* Update

* Update

* Update DeepSeek-R1 example

* Update DeepSeek-R1 example

* Update DeepSeek-R1 example
2025-03-03 18:56:11 +08:00
..
omni_math_gen_18cc08.py [Feature] Support Omni-Math (#1837) 2025-01-23 18:36:54 +08:00
omni_math_gen.py [Feature] Support Omni-Math (#1837) 2025-01-23 18:36:54 +08:00
omni_math_llmverify_gen_ccf9c0.py [Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899) 2025-03-03 18:56:11 +08:00
README.md [Feature] Support Omni-Math (#1837) 2025-01-23 18:36:54 +08:00

Omni-Math

Omni-Math contains 4428 competition-level problems. These problems are meticulously categorized into 33 (and potentially more) sub-domains and span across 10 distinct difficulty levels, enabling a nuanced analysis of model performance across various mathematical disciplines and levels of complexity.

Omni-Judge

Omni-Judge is an open-source mathematical evaluation model designed to assess whether a solution generated by a model is correct given a problem and a standard answer.

You should deploy the omni-judge server like:

set -x

lmdeploy serve api_server KbsdJames/Omni-Judge --server-port 8000 \
    --tp 1 \
    --cache-max-entry-count 0.9 \
    --log-level INFO

and set the server url in opencompass config file:

from mmengine.config import read_base

with read_base():
    from opencompass.configs.datasets.omni_math.omni_math_gen import omni_math_datasets


omni_math_dataset = omni_math_datasets[0]
omni_math_dataset['eval_cfg']['evaluator'].update(
    url=['http://172.30.8.45:8000',
         'http://172.30.16.113:8000'],
)

Performance

llama-3_1-8b-instruct qwen-2_5-7b-instruct InternLM3-8b-Instruct
15.18 29.97 32.75