Jun
d4a69ba65f
add 2
2025-05-27 03:26:40 +00:00
Songyang Zhang
c84bc18ac1
[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard ( #1899 )
...
* [Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard with LLM Verify
* Update
* Update
* Update DeepSeek-R1 example
* Update DeepSeek-R1 example
* Update DeepSeek-R1 example
2025-03-03 18:56:11 +08:00
Junnan Liu
73c80953c6
[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator ( #1886 )
...
* support dataset repeat and g-pass compute for each evaluator
* fix pre-commit errors
* delete print
* delete gpassk_evaluator and fix potential errors
* change `repeat` to `n`
* fix `repeat` to `n` in openicl_eval
* update doc for multi-run and g-pass
* update latex equation in doc
* update eng doc for multi-run and g-pass
* update datasets.md
* update datasets.md
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation in zh_cn user_guides
* mmodify pre-commit-zh-cn
* recover pre-commit and edit math expr in doc
* del [TIP]
* del cite tag in doc
* del extract_model param in livemathbench config
2025-02-26 19:43:12 +08:00
Junnan Liu
22a33d8759
[Update] Update LiveMathBench Hard Configs ( #1826 )
...
* support G-Pass@k and livemathbench
* fix bugs
* fix comments of GPassKEvaluator
* update saved details of GPassKEvaluator
* update saved details of GPassKEvaluator
* fix eval api configs & update openai_api for ease of debugging
* update huggingface path
* fix method name of G-Pass@k
* fix default value of eval_model_name
* refactor G-Pass@k evaluator
* log generation params for each backend
* fix evaluation resume
* add notimplementerror
* update livemathbench-hard configs
* remove max_out_len from livemathbench_hard_greedy_gen_9befbf.py
* remove max_out_len from livemathbench_hard_gen_9befbf.py
* rename livemathbench_hard_gen_9befbf.py to livemathbench_hard_gen_353ae7.py
* rename livemathbench_hard_greedy_gen_9befbf.py to livemathbench_hard_greedy_gen_353ae7.py
* update livemathbench_gen_9befbf.py
* remove whitespace
* upload livemathbench hard configs
2025-02-25 17:24:36 +08:00
Songyang Zhang
8fdb72f567
[Update] Update o1 eval prompt ( #1806 )
...
* Update XML prediction post-process
* Update LiveMathBench
* Update LiveMathBench
* Update New O1 Evaluation
2025-01-07 00:14:32 +08:00
Junnan Liu
8e8d4f1c64
[Feature] Support G-Pass@k and LiveMathBench ( #1772 )
...
* support G-Pass@k and livemathbench
* fix bugs
* fix comments of GPassKEvaluator
* update saved details of GPassKEvaluator
* update saved details of GPassKEvaluator
* fix eval api configs & update openai_api for ease of debugging
* update huggingface path
* fix method name of G-Pass@k
* fix default value of eval_model_name
* refactor G-Pass@k evaluator
* log generation params for each backend
* fix evaluation resume
* add notimplementerror
2024-12-30 16:59:39 +08:00
Songyang Zhang
0d8df541bc
[Update] Update O1-style Benchmark and Prompts ( #1742 )
...
* Update JuderBench
* Support O1-style Prompts
* Update Code
* Update OpenAI
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update
* Update
* Update
* Update
2024-12-09 13:48:56 +08:00
Junnan Liu
f333be177c
[Update] Add MATH500 & AIME2024 to LiveMathBench ( #1741 )
...
* upload dataset definitions & configs
* add single dataset split specific metrics
* add k-pass@threshold & MATH500
* update std computation & k-pass computation
* add AIME224
* update README
2024-12-06 14:36:49 +08:00
Songyang Zhang
fb43dd1906
[Update] Update Skywork/Qwen-QwQ ( #1728 )
...
* Update JuderBench
* Support O1-style Prompts
* Update Code
* Update OpenAI
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update
2024-12-05 19:30:43 +08:00
Junnan Liu
6181ac1122
[Update] Update LiveMathBench Evaluation to Support Single Dataset Split Metric Computation ( #1730 )
...
* upload dataset definitions & configs
* add single dataset split specific metrics
* add k-pass@threshold & MATH500
2024-12-05 16:54:16 +08:00
Junnan Liu
fe6d76fb13
[Feature] Support LiveMathBench ( #1727 )
2024-11-30 00:07:19 +08:00