Junnan Liu
|
f333be177c
|
[Update] Add MATH500 & AIME2024 to LiveMathBench (#1741)
* upload dataset definitions & configs
* add single dataset split specific metrics
* add k-pass@threshold & MATH500
* update std computation & k-pass computation
* add AIME224
* update README
|
2024-12-06 14:36:49 +08:00 |
|
Songyang Zhang
|
fb43dd1906
|
[Update] Update Skywork/Qwen-QwQ (#1728)
* Update JuderBench
* Support O1-style Prompts
* Update Code
* Update OpenAI
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update BigCodeBench
* Update
|
2024-12-05 19:30:43 +08:00 |
|
Junnan Liu
|
6181ac1122
|
[Update] Update LiveMathBench Evaluation to Support Single Dataset Split Metric Computation (#1730)
* upload dataset definitions & configs
* add single dataset split specific metrics
* add k-pass@threshold & MATH500
|
2024-12-05 16:54:16 +08:00 |
|
Junnan Liu
|
fe6d76fb13
|
[Feature] Support LiveMathBench (#1727)
|
2024-11-30 00:07:19 +08:00 |
|