Yu Sun
d572761cef
[Dataset] Add Smolinstruct configs ( #2127 )
...
lint / lint (push) Has been cancelled
* 0-shot Smolinstruct
Add 0-shot evaluation and postprocess functions for Smolinstruct
* fix acc postprocessor
* update 0-shot acc postprocessor
* rename 0-shot
2025-05-29 14:09:08 +08:00
Linchen Xiao
408f5caff4
[Dataset] Add SuperGPQA subfield configs ( #2124 )
...
* update
* fix lint
* fix lint
* update precommit
* update precommit
* fix lint
2025-05-28 14:12:58 +08:00
Myhs_phz
6f3c670b99
add qwen3 lmdeply ( #2126 )
2025-05-27 19:41:13 +08:00
Songyang Zhang
aa2b89b6f8
[Update] Add CascadeEvaluator with Data Replica ( #2022 )
...
* Update CascadeEvaluator
* Update CascadeEvaluator
* Update CascadeEvaluator
* Update Config
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
* Update
2025-05-20 16:46:55 +08:00
Dongsheng Zhu
7a7a4517ab
[Update] History code bench pass@k update ( #2102 )
...
* bigcodebench
* humaneval
* humanevalx
* humanevalx
* livecodebench
* mbpp
* humaneval_plus
* fix bug
* template
* max_out fix
* template update
2025-05-19 17:03:33 +08:00
tcheng
3d1760aba2
[Dataset] Add Scieval ( #2089 )
...
* style: pass all formatting hooks (yapf & quote fixer)
* revise name:Add Lifescience Sub-set Support for MMLU & SciEval (datasets + configs + loader)
* revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)
* Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)
* all categories of SciEval (datasets + configs + loader+dataset-index.yml)
* revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)
* revise :SciEval 5shot
---------
Co-authored-by: root <tangcheng231@mails.ucas.edu.cn>
2025-05-14 10:25:03 +08:00
Wei Li
b84518c656
[Dataset] Support MedMCQA and MedBullets benchmark ( #2054 )
...
* support medmcqa and medbullets benchmark
* Add Medbullets data folder for benchmark support
* revise gen name
* revise config file & remove csv file & add dataset info to dataset-index.yml
* remove csv file
* remove print in medbullets.py
* revise class name
* update_oss_info
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-05-13 17:10:50 +08:00
Dongsheng Zhu
2c79dc5227
[Dataset] Add human_eval/mbpp pro ( #2092 )
...
* add bench
* update
* bug fix
* time update
* add index
* fix repeat bug
2025-05-12 18:38:13 +08:00
huihui1999
345674f700
[Dataset] Add SciknowEval Dataset ( #2070 )
...
* first
* first
* first
* first
* SciKnowEval
* fix hash
* fix dataset-index & use official llm_judge_postprocess
* fix dataset-index.yml
* use official llmjudge_postprocess
* fix lint
* fix lint
* fix lint
* fix lint
* fix lint
* merge with main
---------
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2025-05-12 17:23:44 +08:00
Kun Yuan
8aa18df368
[Dataset] HLE Biomedical version support ( #2080 )
...
* HLE Biomedical version support
* set up default category value for hle
2025-05-12 10:14:11 +08:00
huihui1999
44a7024ed5
[Dataset] MedCalc_Bench ( #2072 )
...
* MedCalc_Bench
* MedCal_Bench
* add hash
* fix hash
* fix comments &dataset-index yml
* fix lint
* fix lint
* fix lint
* fix lint
* fix lint
---------
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2025-05-09 16:58:55 +08:00
Kun Yuan
7bdd3c1904
[Dataset] MMLU_Pro Biomedical Version Support ( #2081 )
2025-05-09 15:07:26 +08:00
Jin Ye
6097186a95
[Datasets] MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 ( #2064 )
...
* Add Datasets: MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1
* Fix bugs for MedQA. Add info in dataset-index
* Add version code for MedQA and ProteinLMBench
* Add version code for MedQA and ProteinLMBench
2025-05-09 14:47:44 +08:00
Linchen Xiao
d72df59363
[Revert] Add Lifescience Sub-set Support for SciEval ( #2059 ) ( #2087 )
...
This reverts commit c5048bfec7
.
2025-05-09 14:46:27 +08:00
tcheng
c5048bfec7
[Dataset] Add Lifescience Sub-set Support for SciEval ( #2059 )
...
* style: pass all formatting hooks (yapf & quote fixer)
* revise name:Add Lifescience Sub-set Support for MMLU & SciEval (datasets + configs + loader)
* revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)
* Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)
---------
Co-authored-by: root <tangcheng231@mails.ucas.edu.cn>
2025-05-09 14:31:12 +08:00
huihui1999
a7f3ac20b2
[Dataset] Add CARDBiomedBench ( #2071 )
...
* CARDBiomedBench
* fix hash
* fix dataset-index
* use official llmjudge postprocess
* use official llmjudge_postprocess
* fix lint
* fix init
2025-05-08 19:44:46 +08:00
Mo Li
ff3275edf0
[Update] Add Long-Context configs for Gemma, OREAL, and Qwen2.5 models ( #2048 )
...
* [Update] Update Gemma, Oreal, Qwen Config
* fix lint
2025-05-08 19:06:56 +08:00
Wei Li
a685ed7daf
[Dataset] Add nejm ai benchmark ( #2063 )
...
* support nejm ai benchmark
* add dataset files
* revise gen name
* revise gen name
* revise class name & remove csv file & add dataset-index.yml info
* update
* update
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-05-08 16:44:05 +08:00
Jiahao Xu
9ec23c145b
[Datasets] Add ClinicBench, PubMedQA and ScienceQA ( #2061 )
...
* Add ClinicBench
* Add PubMedQA & ScienceQA & ClinicBench
* Add PubMedQA & ScienceQA & ClinicBench
* Update datasets_info & hf_path
* Update hf_path
2025-05-08 16:25:43 +08:00
Dongsheng Zhu
ba0e32292c
[Feature] Support InternSandbox ( #2049 )
...
* internsandbox init
* internsandbox
* dataset_index
* dataset_index_add
2025-05-07 16:42:09 +08:00
Dongsheng Zhu
d62b69aaef
[Fix] Fix InternVL model config ( #2068 )
...
* intervl-8b&38b
* intervl adjustment
* internvl fix
2025-05-07 15:51:18 +08:00
bittersweet1999
ddc9cc0afb
[Add] add a config to Judge dataset all ( #2077 )
...
* fix pip version
* fix pip version
* add judgedatasetall
* add judgedatasetall
* add judgedatasetall
2025-05-07 10:57:23 +08:00
bittersweet1999
37cbaf8d92
[Add] Add Judgerbenchv2 ( #2067 )
...
* fix pip version
* fix pip version
* add judgerbenchv2
* Update __init__.py
2025-04-30 17:12:34 +08:00
Taolin Zhang
b6148aa198
add Judgebench ( #2066 )
...
* add rewardbench
* add rewardbench
* add rmb datasets
* add rmb datasets
* add judgebench
* add judgebench
2025-04-30 15:01:10 +08:00
bittersweet1999
527a80947b
[Add] Add writingbench ( #2028 )
...
* fix pip version
* fix pip version
* add writingbench
* add writingbench
* add writingbench
* add writingbench
2025-04-29 16:29:32 +08:00
Taolin Zhang
8c74e6a39e
add RMB Bench ( #2056 )
...
* add rewardbench
* add rewardbench
* add rmb datasets
* add rmb datasets
2025-04-27 16:26:01 +08:00
Linchen Xiao
455bb05d1b
[Update] Update dataset configs ( #2030 )
...
* [Update] Update dataset configs
* Fix lint
2025-04-21 18:55:06 +08:00
Taolin Zhang
c69110361b
[Add] add rewardbench ( #2029 )
...
* add rewardbench
* add rewardbench
2025-04-21 17:18:51 +08:00
JuchengHu
a2093a81ef
[Dataset] Matbench ( #2021 )
...
* add support for matbench
* fix dataset path
* fix data load
* fix
* fix lint
---------
Co-authored-by: Jucheng Hu <jucheng.hu.20@ucl.ac.uk>
Co-authored-by: Myhs-phz <demarcia2014@126.com>
2025-04-21 15:50:47 +08:00
Linchen Xiao
b2da1c08a8
[Dataset] Add SmolInstruct, Update Chembench ( #2025 )
...
* [Dataset] Add SmolInstruct, Update Chembench
* Add dataset metadata
* update
* update
* update
2025-04-18 17:21:29 +08:00
Myhs_phz
75e7834b59
[Feature] Add Datasets: ClimateQA,Physics ( #2017 )
...
* feat ClimateQA
* feat PHYSICS
* fix
* fix
* fix
* fix
2025-04-14 20:18:47 +08:00
Myhs_phz
fd82bea747
[Fix] OpenICL Math Evaluator Config ( #2007 )
...
* fix
* fix recommended
* fix
* fix
* fix
* fix
2025-04-08 14:38:35 +08:00
Jin Ye
b564e608b1
[Dataset] Add MedXpertQA ( #2002 )
...
* Add MedXpertQA
* Add MedXpertQA
* Add MedXpertQA
* Fix lint
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-04-08 10:44:48 +08:00
zhulinJulia24
f982d6278e
[CI] fix baseline score ( #2000 )
...
* update
* update
* update
* update
* update
* update
* update
* updaste
* update
* update
* updaste
* updaste
* update
* update
* update
* update
* update
* update
* update
* update
2025-04-03 19:32:36 +08:00
Myhs_phz
9b489e9ea0
[Update] Revert math500 dataset configs ( #1998 )
2025-04-03 15:11:02 +08:00
liushz
32d6859679
[Feature] Add olymmath dataset ( #1982 )
...
* Add olymmath dataset
* Add olymmath dataset
* Add olymmath dataset
* Update olymmath dataset
2025-04-02 17:34:07 +08:00
Dongsheng Zhu
330a6e5ca7
[Update] Add Intervl-8b&38b model configs ( #1978 )
2025-04-01 11:51:37 +08:00
Linchen Xiao
0f46c35211
[Bug] Aime2024 config fix ( #1974 )
...
lint / lint (push) Has been cancelled
* [Bug] Aime2024 config fix
* fix
2025-03-25 17:57:11 +08:00
Myhs_phz
6118596362
[Feature] Add recommendation configs for datasets ( #1937 )
...
* feat datasetrefine drop
* fix datasets in fullbench_int3
* fix
* fix
* back
* fix
* fix and doc
* feat
* fix hook
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* doc
* fix
* fix
* Update dataset-index.yml
2025-03-25 14:54:13 +08:00
Linchen Xiao
07930b854a
[Update] Add Korbench config with no max_out_len ( #1968 )
...
lint / lint (push) Waiting to run
* Add Korbench no max_out_len
* Add Korbench no max_out_len
2025-03-24 18:38:06 +08:00
Myhs_phz
37307fa996
[Update] Add QWQ32b model config ( #1959 )
...
lint / lint (push) Waiting to run
* feat qwq-32b
* fix
* feat phi_4
---------
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2025-03-24 14:51:39 +08:00
Linchen Xiao
db96161a4e
[Update] Add SuperGPQA subset metrics ( #1966 )
2025-03-24 14:25:12 +08:00
Linchen Xiao
aa05993922
[Update] Add dataset configurations of no max_out_len ( #1967 )
...
* [Update] Add dataset configurations of no max_out_len
* update test torch version
* update test torch version
* update test torch version
* update test torch version
2025-03-24 14:24:12 +08:00
Dongsheng Zhu
8a5029b121
[Feature] Add MultiPL-E & Code Evaluator ( #1963 )
...
* multiple_code develop
* multiple_code update
* comments upadate
* index upadate
2025-03-21 20:09:25 +08:00
Songyang Zhang
c98599271b
[Update] Update OlympiadBench and Update LLM Judge ( #1954 )
2025-03-18 20:15:20 +08:00
Linchen Xiao
0b7f76e193
[Bug] Fix Summarizer logic ( #1953 )
2025-03-17 18:25:08 +08:00
Yufeng Zhao
15c825a51a
[Update] Bbeh harmony summarizer added ( #1951 )
...
* bbeh
* bbeh
* fix_smallbugs_bbeh
* removeprint
* harmonic
* update_summerizer
* harmonic-tested
* harmonic-tested
* clean
* clean
* cleaned_rebased
---------
Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>
2025-03-17 17:19:56 +08:00
Linchen Xiao
1c60e3a0f6
[Update] Add configurations for llmjudge dataset ( #1940 )
...
* Add configurations for llmjudge dataset
* update
2025-03-13 17:30:04 +08:00
Yufeng Zhao
bc2969dba8
[Feature] Add support for BBEH dataset ( #1925 )
...
* bbeh
* bbeh
* fix_smallbugs_bbeh
* removeprint
* results
---------
Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>
2025-03-12 10:53:31 +08:00
Kangreen
59e49aedf1
[Feature] Support SuperGPQA ( #1924 )
...
* support supergpqa
* remove unnecessary code
* remove unnecessary code
* Add Readme
* Add Readme
* fix lint
* fix lint
* update
* update
---------
Co-authored-by: mkj3085003 <mkj3085003@gmail.com>
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-03-11 19:32:08 +08:00