Commit Graph

303 Commits

Author SHA1 Message Date
Jun
d4a69ba65f add 2 2025-05-27 03:26:40 +00:00
Jun
e59f0b59cb srbench_add 2025-05-20 03:23:27 +00:00
Jun
6156974794 srbench_add 2025-05-20 03:18:59 +00:00
Jun
19e7fec7fb srbench 2025-05-20 02:59:00 +00:00
Jun
36d8b19399 srbench 2025-05-20 02:57:38 +00:00
tcheng
3d1760aba2
[Dataset] Add Scieval (#2089)
* style: pass all formatting hooks (yapf & quote fixer)

* revise name:Add Lifescience Sub-set Support for MMLU & SciEval (datasets + configs + loader)

* revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)

* Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)

* all categories of SciEval (datasets + configs + loader+dataset-index.yml)

* revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)

* revise :SciEval 5shot

---------

Co-authored-by: root <tangcheng231@mails.ucas.edu.cn>
2025-05-14 10:25:03 +08:00
Wei Li
b84518c656
[Dataset] Support MedMCQA and MedBullets benchmark (#2054)
* support medmcqa and medbullets benchmark

* Add Medbullets data folder for benchmark support

* revise gen name

* revise config file & remove csv file & add dataset info to dataset-index.yml

* remove csv file

* remove print in medbullets.py

* revise class name

* update_oss_info

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-05-13 17:10:50 +08:00
Dongsheng Zhu
2c79dc5227
[Dataset] Add human_eval/mbpp pro (#2092)
* add bench

* update

* bug fix

* time update

* add index

* fix repeat bug
2025-05-12 18:38:13 +08:00
huihui1999
345674f700
[Dataset] Add SciknowEval Dataset (#2070)
* first

* first

* first

* first

* SciKnowEval

* fix hash

* fix dataset-index & use official llm_judge_postprocess

* fix dataset-index.yml

* use official llmjudge_postprocess

* fix lint

* fix lint

* fix lint

* fix lint

* fix lint

* merge with main

---------

Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2025-05-12 17:23:44 +08:00
Kun Yuan
8aa18df368
[Dataset] HLE Biomedical version support (#2080)
* HLE Biomedical version support

* set up default category value for hle
2025-05-12 10:14:11 +08:00
huihui1999
44a7024ed5
[Dataset] MedCalc_Bench (#2072)
* MedCalc_Bench

* MedCal_Bench

* add hash

* fix hash

* fix comments &dataset-index yml

* fix lint

* fix lint

* fix lint

* fix lint

* fix lint

---------

Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2025-05-09 16:58:55 +08:00
Linchen Xiao
508e2b0cb2
[Update] Set load_from_cache_file to False (#2085) 2025-05-09 15:21:47 +08:00
Jin Ye
6097186a95
[Datasets] MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 (#2064)
* Add Datasets: MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1

* Fix bugs for MedQA. Add info in dataset-index

* Add version code for MedQA and ProteinLMBench

* Add version code for MedQA and ProteinLMBench
2025-05-09 14:47:44 +08:00
Linchen Xiao
d72df59363
[Revert] Add Lifescience Sub-set Support for SciEval (#2059) (#2087)
This reverts commit c5048bfec7.
2025-05-09 14:46:27 +08:00
tcheng
c5048bfec7
[Dataset] Add Lifescience Sub-set Support for SciEval (#2059)
* style: pass all formatting hooks (yapf & quote fixer)

* revise name:Add Lifescience Sub-set Support for MMLU & SciEval (datasets + configs + loader)

* revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)

* Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)

---------

Co-authored-by: root <tangcheng231@mails.ucas.edu.cn>
2025-05-09 14:31:12 +08:00
huihui1999
a7f3ac20b2
[Dataset] Add CARDBiomedBench (#2071)
* CARDBiomedBench

* fix hash

* fix dataset-index

* use official llmjudge postprocess

* use official llmjudge_postprocess

* fix lint

* fix init
2025-05-08 19:44:46 +08:00
Wei Li
a685ed7daf
[Dataset] Add nejm ai benchmark (#2063)
* support nejm ai benchmark

* add dataset files

* revise gen name

* revise gen name

* revise class name & remove csv file & add dataset-index.yml info

* update

* update

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-05-08 16:44:05 +08:00
Jiahao Xu
9ec23c145b
[Datasets] Add ClinicBench, PubMedQA and ScienceQA (#2061)
* Add ClinicBench

* Add PubMedQA & ScienceQA & ClinicBench

* Add PubMedQA & ScienceQA & ClinicBench

* Update datasets_info & hf_path

* Update hf_path
2025-05-08 16:25:43 +08:00
Dongsheng Zhu
ba0e32292c
[Feature] Support InternSandbox (#2049)
* internsandbox init

* internsandbox

* dataset_index

* dataset_index_add
2025-05-07 16:42:09 +08:00
谢昕辰
43b2c4ed76
[Fix] Update lawbench data path (#2037) 2025-05-07 16:18:43 +08:00
bittersweet1999
37cbaf8d92
[Add] Add Judgerbenchv2 (#2067)
* fix pip version

* fix pip version

* add judgerbenchv2

* Update __init__.py
2025-04-30 17:12:34 +08:00
Taolin Zhang
b6148aa198
add Judgebench (#2066)
* add rewardbench

* add rewardbench

* add rmb datasets

* add rmb datasets

* add judgebench

* add judgebench
2025-04-30 15:01:10 +08:00
bittersweet1999
527a80947b
[Add] Add writingbench (#2028)
* fix pip version

* fix pip version

* add writingbench

* add writingbench

* add writingbench

* add writingbench
2025-04-29 16:29:32 +08:00
Taolin Zhang
8c74e6a39e
add RMB Bench (#2056)
* add rewardbench

* add rewardbench

* add rmb datasets

* add rmb datasets
2025-04-27 16:26:01 +08:00
Junnan Liu
97010dc4ce
[Update] Update dataset repeat concatenation (#2039) 2025-04-23 16:16:28 +08:00
Linchen Xiao
dcbf899369
[Bug] Fix SmolInsturct logger import (#2036) 2025-04-23 11:10:30 +08:00
Linchen Xiao
bf74f26603
[Update] Safe SmolInstruct meteor calculation (#2033) 2025-04-22 18:27:48 +08:00
Linchen Xiao
455bb05d1b
[Update] Update dataset configs (#2030)
* [Update] Update dataset configs

* Fix lint
2025-04-21 18:55:06 +08:00
Taolin Zhang
c69110361b
[Add] add rewardbench (#2029)
* add rewardbench

* add rewardbench
2025-04-21 17:18:51 +08:00
JuchengHu
a2093a81ef
[Dataset] Matbench (#2021)
* add support for matbench

* fix dataset path

* fix data load

* fix

* fix lint

---------

Co-authored-by: Jucheng Hu <jucheng.hu.20@ucl.ac.uk>
Co-authored-by: Myhs-phz <demarcia2014@126.com>
2025-04-21 15:50:47 +08:00
Linchen Xiao
b2da1c08a8
[Dataset] Add SmolInstruct, Update Chembench (#2025)
* [Dataset] Add SmolInstruct, Update Chembench

* Add dataset metadata

* update

* update

* update
2025-04-18 17:21:29 +08:00
Linchen Xiao
65ff602cf5
[Update] Fix LLM Judge metrics cacluation & Add reasoning content concat to OpenAI SDK 2025-04-15 11:33:16 +08:00
Myhs_phz
75e7834b59
[Feature] Add Datasets: ClimateQA,Physics (#2017)
* feat ClimateQA

* feat PHYSICS

* fix

* fix

* fix

* fix
2025-04-14 20:18:47 +08:00
Linchen Xiao
6a6a1a5c0b
[Feature] LLM Judge sanity check (#2012)
* update

* update
2025-04-11 19:01:39 +08:00
bittersweet1999
3f50b1dc49
[Fix] fix order bug Update arena_hard.py (#2015) 2025-04-11 16:59:40 +08:00
zhulinJulia24
6ac9b06bc2
[ci] update baseline for kernal change of vllm and lmdeploy (#2011)
* update

* update

* update

* update

* update

* update

* update
2025-04-09 14:09:35 +08:00
Jin Ye
b564e608b1
[Dataset] Add MedXpertQA (#2002)
* Add MedXpertQA

* Add MedXpertQA

* Add MedXpertQA

* Fix lint

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-04-08 10:44:48 +08:00
shijinpjlab
828fb745c9
[Dataset] Update dingo 1.5.0 (#2008)
Co-authored-by: shiin <shijin@pjlab.org.cn>
2025-04-07 17:21:15 +08:00
liushz
32d6859679
[Feature] Add olymmath dataset (#1982)
* Add olymmath dataset

* Add olymmath dataset

* Add olymmath dataset

* Update olymmath dataset
2025-04-02 17:34:07 +08:00
Linchen Xiao
f66b0b347a
[Update] Requirements update (#1993) 2025-04-02 12:03:45 +08:00
Linchen Xiao
db96161a4e
[Update] Add SuperGPQA subset metrics (#1966) 2025-03-24 14:25:12 +08:00
Dongsheng Zhu
8a5029b121
[Feature] Add MultiPL-E & Code Evaluator (#1963)
* multiple_code develop

* multiple_code update

* comments upadate

* index upadate
2025-03-21 20:09:25 +08:00
Linchen Xiao
1c60e3a0f6
[Update] Add configurations for llmjudge dataset (#1940)
* Add configurations for llmjudge dataset

* update
2025-03-13 17:30:04 +08:00
Yufeng Zhao
bc2969dba8
[Feature] Add support for BBEH dataset (#1925)
* bbeh

* bbeh

* fix_smallbugs_bbeh

* removeprint

* results

---------

Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>
2025-03-12 10:53:31 +08:00
Kangreen
59e49aedf1
[Feature] Support SuperGPQA (#1924)
* support supergpqa

* remove unnecessary code

* remove unnecessary code

* Add Readme

* Add Readme

* fix lint

* fix lint

* update

* update

---------

Co-authored-by: mkj3085003 <mkj3085003@gmail.com>
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-03-11 19:32:08 +08:00
Dongsheng Zhu
fff2d51440
[Update] Code evaluation alignment (#1909)
* code alignment

* update oss md5

* bigcodebench update

* lint

* lint_

* lint yapf
2025-03-04 18:49:38 +08:00
liushz
198c08632e
[Feature] Add HLE (Humanity's Last Exam) dataset (#1902)
* Support OlympiadBench Benchmark

* Support OlympiadBench Benchmark

* Support OlympiadBench Benchmark

* update dataset path

* Update olmpiadBench

* Update olmpiadBench

* Update olmpiadBench

* Add HLE dataset

* Add HLE dataset

* Add HLE dataset

---------

Co-authored-by: sudanl <sudanl@foxmail.com>
2025-03-04 16:42:37 +08:00
Songyang Zhang
c84bc18ac1
[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899)
* [Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard with LLM Verify

* Update

* Update

* Update DeepSeek-R1 example

* Update DeepSeek-R1 example

* Update DeepSeek-R1 example
2025-03-03 18:56:11 +08:00
Linchen Xiao
6a573f671b
[Fix] Fix compatible issue 2025-03-03 15:35:57 +08:00
Junnan Liu
73c80953c6
[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator (#1886)
* support dataset repeat and g-pass compute for each evaluator

* fix pre-commit errors

* delete print

* delete gpassk_evaluator and fix potential errors

* change `repeat` to `n`

* fix `repeat` to `n` in openicl_eval

* update doc for multi-run and g-pass

* update latex equation in doc

* update eng doc for multi-run and g-pass

* update datasets.md

* update datasets.md

* fix multi-line equation

* fix multi-line equation

* fix multi-line equation

* fix multi-line equation

* fix multi-line equation

* fix multi-line equation

* fix multi-line equation in zh_cn user_guides

* mmodify pre-commit-zh-cn

* recover pre-commit and edit math expr in doc

* del [TIP]

* del cite tag in doc

* del extract_model param in livemathbench config
2025-02-26 19:43:12 +08:00