Commit Graph

968 Commits

Author SHA1 Message Date
Mo Li
7a44a80bb9
Merge 03f16c8a83 into d572761cef 2025-05-29 14:22:59 +08:00
Yu Sun
d572761cef
[Dataset] Add Smolinstruct configs (#2127)
Some checks failed
lint / lint (push) Has been cancelled
* 0-shot Smolinstruct

Add 0-shot evaluation and postprocess functions for Smolinstruct

* fix acc postprocessor

* update 0-shot acc postprocessor

* rename 0-shot
2025-05-29 14:09:08 +08:00
Linchen Xiao
408f5caff4
[Dataset] Add SuperGPQA subfield configs (#2124)
* update

* fix lint

* fix lint

* update precommit

* update precommit

* fix lint
2025-05-28 14:12:58 +08:00
Myhs_phz
6f3c670b99
add qwen3 lmdeply (#2126) 2025-05-27 19:41:13 +08:00
zhulinJulia24
c3779ebfc1
[ci] update dlc setting (#2112) 2025-05-22 16:47:57 +08:00
Songyang Zhang
aa2b89b6f8
[Update] Add CascadeEvaluator with Data Replica (#2022)
* Update CascadeEvaluator

* Update CascadeEvaluator

* Update CascadeEvaluator

* Update Config

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update
2025-05-20 16:46:55 +08:00
Dongsheng Zhu
7a7a4517ab
[Update] History code bench pass@k update (#2102)
* bigcodebench

* humaneval

* humanevalx

* humanevalx

* livecodebench

* mbpp

* humaneval_plus

* fix bug

* template

* max_out fix

* template update
2025-05-19 17:03:33 +08:00
kkscilife
8c0ccf9a6b
[CI] Fix Lint error (#2103) 2025-05-16 15:36:45 +08:00
kkscilife
6f3b6a5d12
[CI] Add gitleaks check (#2101) 2025-05-16 14:34:57 +08:00
tcheng
3d1760aba2
[Dataset] Add Scieval (#2089)
* style: pass all formatting hooks (yapf & quote fixer)

* revise name:Add Lifescience Sub-set Support for MMLU & SciEval (datasets + configs + loader)

* revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)

* Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)

* all categories of SciEval (datasets + configs + loader+dataset-index.yml)

* revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)

* revise :SciEval 5shot

---------

Co-authored-by: root <tangcheng231@mails.ucas.edu.cn>
2025-05-14 10:25:03 +08:00
Wei Li
b84518c656
[Dataset] Support MedMCQA and MedBullets benchmark (#2054)
* support medmcqa and medbullets benchmark

* Add Medbullets data folder for benchmark support

* revise gen name

* revise config file & remove csv file & add dataset info to dataset-index.yml

* remove csv file

* remove print in medbullets.py

* revise class name

* update_oss_info

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-05-13 17:10:50 +08:00
Mor-Li
03f16c8a83 [Fix] Fix precommit 2025-05-13 14:59:32 +08:00
Mor-Li
a0c3a24aa1 [Docs] Update Default Settings for NeedleBench and ATC Configs 2025-05-13 14:56:46 +08:00
Mor-Li
98a6f6119b [Docs] update NeedleBenchV2 Docs 2025-05-13 14:32:26 +08:00
Mor-Li
f7242fdea8 Merge branch 'update_needlebench_docs' into needlebench_v2_pr 2025-05-13 14:19:48 +08:00
Mor-Li
35518f612f [Docs] Update NeedleBench Docs 2025-05-13 14:17:11 +08:00
zhulinJulia24
d60f59dcab
[CI] update baseline and fix lmdeploy version (#2098)
* update

* update

* update

* update

* update

* update
2025-05-13 14:01:47 +08:00
bittersweet1999
9eaa1f6fec
Update icl_judge_evaluator.py (#2095) 2025-05-13 10:44:24 +08:00
Linchen Xiao
d590f557bb
[Update] OpenaiSDK handle empty content (#2096) 2025-05-12 19:38:30 +08:00
yuehua-s
c492e49e79
[Update] Add o4 in OpenaiSDK (#2083)
* feature:1.add o4-mini;2.o3 or o4-mini only support temperature==1

* feature:change 4o-mini to 4o

---------

Co-authored-by: yuehuazhang <yuehuazhang@tencent.com>
2025-05-12 18:39:44 +08:00
Dongsheng Zhu
2c79dc5227
[Dataset] Add human_eval/mbpp pro (#2092)
* add bench

* update

* bug fix

* time update

* add index

* fix repeat bug
2025-05-12 18:38:13 +08:00
huihui1999
345674f700
[Dataset] Add SciknowEval Dataset (#2070)
* first

* first

* first

* first

* SciKnowEval

* fix hash

* fix dataset-index & use official llm_judge_postprocess

* fix dataset-index.yml

* use official llmjudge_postprocess

* fix lint

* fix lint

* fix lint

* fix lint

* fix lint

* merge with main

---------

Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2025-05-12 17:23:44 +08:00
Kun Yuan
8aa18df368
[Dataset] HLE Biomedical version support (#2080)
* HLE Biomedical version support

* set up default category value for hle
2025-05-12 10:14:11 +08:00
Mor-Li
d75494841d remove choice version 2025-05-09 20:21:24 +08:00
Mor-Li
40c6c68162 [Fix] Fix pre-commit 2025-05-09 20:19:04 +08:00
Mor-Li
d1da4a577c Add NeedleBench_V2 2025-05-09 19:37:39 +08:00
huihui1999
44a7024ed5
[Dataset] MedCalc_Bench (#2072)
* MedCalc_Bench

* MedCal_Bench

* add hash

* fix hash

* fix comments &dataset-index yml

* fix lint

* fix lint

* fix lint

* fix lint

* fix lint

---------

Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2025-05-09 16:58:55 +08:00
Linchen Xiao
508e2b0cb2
[Update] Set load_from_cache_file to False (#2085) 2025-05-09 15:21:47 +08:00
Kun Yuan
7bdd3c1904
[Dataset] MMLU_Pro Biomedical Version Support (#2081) 2025-05-09 15:07:26 +08:00
Jin Ye
6097186a95
[Datasets] MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 (#2064)
* Add Datasets: MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1

* Fix bugs for MedQA. Add info in dataset-index

* Add version code for MedQA and ProteinLMBench

* Add version code for MedQA and ProteinLMBench
2025-05-09 14:47:44 +08:00
Linchen Xiao
d72df59363
[Revert] Add Lifescience Sub-set Support for SciEval (#2059) (#2087)
This reverts commit c5048bfec7.
2025-05-09 14:46:27 +08:00
tcheng
c5048bfec7
[Dataset] Add Lifescience Sub-set Support for SciEval (#2059)
* style: pass all formatting hooks (yapf & quote fixer)

* revise name:Add Lifescience Sub-set Support for MMLU & SciEval (datasets + configs + loader)

* revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)

* Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)

---------

Co-authored-by: root <tangcheng231@mails.ucas.edu.cn>
2025-05-09 14:31:12 +08:00
huihui1999
a7f3ac20b2
[Dataset] Add CARDBiomedBench (#2071)
* CARDBiomedBench

* fix hash

* fix dataset-index

* use official llmjudge postprocess

* use official llmjudge_postprocess

* fix lint

* fix init
2025-05-08 19:44:46 +08:00
Mo Li
ff3275edf0
[Update] Add Long-Context configs for Gemma, OREAL, and Qwen2.5 models (#2048)
* [Update] Update Gemma, Oreal, Qwen Config

* fix lint
2025-05-08 19:06:56 +08:00
Wei Li
a685ed7daf
[Dataset] Add nejm ai benchmark (#2063)
* support nejm ai benchmark

* add dataset files

* revise gen name

* revise gen name

* revise class name & remove csv file & add dataset-index.yml info

* update

* update

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-05-08 16:44:05 +08:00
Jiahao Xu
9ec23c145b
[Datasets] Add ClinicBench, PubMedQA and ScienceQA (#2061)
* Add ClinicBench

* Add PubMedQA & ScienceQA & ClinicBench

* Add PubMedQA & ScienceQA & ClinicBench

* Update datasets_info & hf_path

* Update hf_path
2025-05-08 16:25:43 +08:00
Dongsheng Zhu
ba0e32292c
[Feature] Support InternSandbox (#2049)
* internsandbox init

* internsandbox

* dataset_index

* dataset_index_add
2025-05-07 16:42:09 +08:00
谢昕辰
43b2c4ed76
[Fix] Update lawbench data path (#2037) 2025-05-07 16:18:43 +08:00
Dongsheng Zhu
d62b69aaef
[Fix] Fix InternVL model config (#2068)
* intervl-8b&38b

* intervl adjustment

* internvl fix
2025-05-07 15:51:18 +08:00
Linchen Xiao
af8432e1d6
[Update] OpenAI SDK model reasoning content (#2078)
* update

* update

* update
2025-05-07 14:06:40 +08:00
bittersweet1999
ddc9cc0afb
[Add] add a config to Judge dataset all (#2077)
* fix pip version

* fix pip version

* add judgedatasetall

* add judgedatasetall

* add judgedatasetall
2025-05-07 10:57:23 +08:00
bittersweet1999
37cbaf8d92
[Add] Add Judgerbenchv2 (#2067)
* fix pip version

* fix pip version

* add judgerbenchv2

* Update __init__.py
2025-04-30 17:12:34 +08:00
Taolin Zhang
b6148aa198
add Judgebench (#2066)
* add rewardbench

* add rewardbench

* add rmb datasets

* add rmb datasets

* add judgebench

* add judgebench
2025-04-30 15:01:10 +08:00
bittersweet1999
527a80947b
[Add] Add writingbench (#2028)
* fix pip version

* fix pip version

* add writingbench

* add writingbench

* add writingbench

* add writingbench
2025-04-29 16:29:32 +08:00
Mor-Li
f8e41dfeb4 [Docs] fix needlebench examples 2025-04-27 16:36:59 +08:00
Taolin Zhang
8c74e6a39e
add RMB Bench (#2056)
* add rewardbench

* add rewardbench

* add rmb datasets

* add rmb datasets
2025-04-27 16:26:01 +08:00
Mor-Li
890f051609 update docs typo 2025-04-26 13:38:32 +08:00
Mor-Li
831713ba5d update docs 2025-04-26 13:35:45 +08:00
Mor-Li
ca1865cdac update docs typo 2025-04-26 13:34:12 +08:00
Mor-Li
7297a00181 update bilingual needlebench docs 2025-04-26 13:24:56 +08:00