jnanliu
|
b0330ef1c6
|
change repeat to n
|
2025-02-24 08:11:27 +00:00 |
|
jnanliu
|
2349fcff2c
|
delete gpassk_evaluator and fix potential errors
|
2025-02-24 06:25:17 +00:00 |
|
jnanliu
|
8def69369a
|
support dataset repeat and g-pass compute for each evaluator
|
2025-02-23 03:05:42 +00:00 |
|
Junnan Liu
|
046b6f75c6
|
[Update] Update Greedy Config & README of LiveMathBench (#1862)
* support omni-math
* update config
* upload README
* Delete opencompass/configs/datasets/omni_math/__init__.py
* update greedy config & README of LiveMathBench
* update intro for max_out_len
* rename livemathbench greedy confi
* delete greedy config
---------
Co-authored-by: liushz <qq1791167085@163.com>
|
2025-02-20 19:47:04 +08:00 |
|
Songyang Zhang
|
f1e50d4bf0
|
[Update] Update LiveMathBench (#1809)
* Update LiveMathBench
* Update New O1 Evaluation
* Update O1 evaluation
|
2025-01-07 19:16:12 +08:00 |
|
Songyang Zhang
|
98435dd98e
|
[Feature] Update o1 evaluation with JudgeLLM (#1795)
* Update Generic LLM Evaluator
* Update o1 style evaluator
|
2024-12-30 17:31:00 +08:00 |
|
Junnan Liu
|
8e8d4f1c64
|
[Feature] Support G-Pass@k and LiveMathBench (#1772)
* support G-Pass@k and livemathbench
* fix bugs
* fix comments of GPassKEvaluator
* update saved details of GPassKEvaluator
* update saved details of GPassKEvaluator
* fix eval api configs & update openai_api for ease of debugging
* update huggingface path
* fix method name of G-Pass@k
* fix default value of eval_model_name
* refactor G-Pass@k evaluator
* log generation params for each backend
* fix evaluation resume
* add notimplementerror
|
2024-12-30 16:59:39 +08:00 |
|
Songyang Zhang
|
0d8df541bc
|
[Update] Update O1-style Benchmark and Prompts (#1742)
* Update JuderBench
* Support O1-style Prompts
* Update Code
* Update OpenAI
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update
* Update
* Update
* Update
|
2024-12-09 13:48:56 +08:00 |
|
Junnan Liu
|
f333be177c
|
[Update] Add MATH500 & AIME2024 to LiveMathBench (#1741)
* upload dataset definitions & configs
* add single dataset split specific metrics
* add k-pass@threshold & MATH500
* update std computation & k-pass computation
* add AIME224
* update README
|
2024-12-06 14:36:49 +08:00 |
|
Junnan Liu
|
6181ac1122
|
[Update] Update LiveMathBench Evaluation to Support Single Dataset Split Metric Computation (#1730)
* upload dataset definitions & configs
* add single dataset split specific metrics
* add k-pass@threshold & MATH500
|
2024-12-05 16:54:16 +08:00 |
|
Junnan Liu
|
fe6d76fb13
|
[Feature] Support LiveMathBench (#1727)
|
2024-11-30 00:07:19 +08:00 |
|