Commit Graph

8 Commits

Author SHA1 Message Date
Songyang Zhang
c84bc18ac1
[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899)
* [Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard with LLM Verify

* Update

* Update

* Update DeepSeek-R1 example

* Update DeepSeek-R1 example

* Update DeepSeek-R1 example
2025-03-03 18:56:11 +08:00
Linchen Xiao
bdb2d46f59
[Feature] Add general math, llm judge evaluator (#1892)
* update_doc

* update llm_judge

* update README

* update md file name
2025-02-26 15:08:50 +08:00
Songyang Zhang
fd6fbf01a2
[Update] Support AIME-24 Evaluation for DeepSeek-R1 series (#1888)
* Update

* Update

* Update

* Update
2025-02-25 20:34:41 +08:00
Dongsheng Zhu
465e93e10e
[Update] Academic bench llm judge update (#1876)
* BigCodeBench update

* update LCBench

* update LCBench 2

* update code

* academicBench update

* academic bench ifeval&math update

* generic_llmjudge_aime_academic_postprocess delete

* aime delete

* postprocessors update

* ifeval delete

* update work_dir

* linting

* linting double-quote-string-fixer

* r1-distill out_len update

* fix lint

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-02-24 15:45:24 +08:00
Linchen Xiao
27c916661d
[Feature] Math Verify with model post_processor (#1881)
* update

* [Feature] Update model post_processor

* update

* update

* update
2025-02-20 19:32:12 +08:00
Linchen Xiao
862bf78464
[Demo] Internlm3 math500 thinking demo (#1846)
* [Demo] Add demo for Internlm3 math500 thinking

* [Demo] Add demo for Internlm3 math500 thinking

* update max_out_len

* update start instruction
2025-01-24 14:56:41 +08:00
Shudong Liu
412199f802
[Feature] Support OlympiadBench Benchmark (#1841)
* Support OlympiadBench Benchmark

* Support OlympiadBench Benchmark

* Support OlympiadBench Benchmark

* update dataset path

* Update olmpiadBench

* Update olmpiadBench

* Update olmpiadBench

---------

Co-authored-by: liushz <qq1791167085@163.com>
2025-01-24 10:00:01 +08:00
Linchen Xiao
a6193b4c02
[Refactor] Code refactoarization (#1831)
* Update

* fix lint

* update

* fix lint
2025-01-20 19:17:38 +08:00