Commit Graph

943 Commits

Author SHA1 Message Date
huihui
efae720249 fix lint 2025-05-09 04:46:47 +00:00
Linchen Xiao
70192c284b
Merge branch 'main' into SciKnowEval 2025-05-09 12:19:18 +08:00
huihui1999
a7f3ac20b2
[Dataset] Add CARDBiomedBench (#2071)
* CARDBiomedBench

* fix hash

* fix dataset-index

* use official llmjudge postprocess

* use official llmjudge_postprocess

* fix lint

* fix init
2025-05-08 19:44:46 +08:00
Mo Li
ff3275edf0
[Update] Add Long-Context configs for Gemma, OREAL, and Qwen2.5 models (#2048)
* [Update] Update Gemma, Oreal, Qwen Config

* fix lint
2025-05-08 19:06:56 +08:00
huihui
1b05a473d2 fix lint 2025-05-08 10:55:51 +00:00
huihui
862cf61f64 fix lint 2025-05-08 10:53:26 +00:00
Wei Li
a685ed7daf
[Dataset] Add nejm ai benchmark (#2063)
* support nejm ai benchmark

* add dataset files

* revise gen name

* revise gen name

* revise class name & remove csv file & add dataset-index.yml info

* update

* update

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-05-08 16:44:05 +08:00
Jiahao Xu
9ec23c145b
[Datasets] Add ClinicBench, PubMedQA and ScienceQA (#2061)
* Add ClinicBench

* Add PubMedQA & ScienceQA & ClinicBench

* Add PubMedQA & ScienceQA & ClinicBench

* Update datasets_info & hf_path

* Update hf_path
2025-05-08 16:25:43 +08:00
huihui
26adccc20c use official llmjudge_postprocess 2025-05-08 06:07:08 +00:00
huihui
021c0d896a fix dataset-index.yml 2025-05-08 05:02:36 +00:00
huihui
5e8bfee3f4 fix dataset-index & use official llm_judge_postprocess 2025-05-08 04:31:11 +00:00
Dongsheng Zhu
ba0e32292c
[Feature] Support InternSandbox (#2049)
* internsandbox init

* internsandbox

* dataset_index

* dataset_index_add
2025-05-07 16:42:09 +08:00
谢昕辰
43b2c4ed76
[Fix] Update lawbench data path (#2037) 2025-05-07 16:18:43 +08:00
Dongsheng Zhu
d62b69aaef
[Fix] Fix InternVL model config (#2068)
* intervl-8b&38b

* intervl adjustment

* internvl fix
2025-05-07 15:51:18 +08:00
Linchen Xiao
af8432e1d6
[Update] OpenAI SDK model reasoning content (#2078)
* update

* update

* update
2025-05-07 14:06:40 +08:00
huihui
bc9ba0126f fix hash 2025-05-07 05:27:37 +00:00
bittersweet1999
ddc9cc0afb
[Add] add a config to Judge dataset all (#2077)
* fix pip version

* fix pip version

* add judgedatasetall

* add judgedatasetall

* add judgedatasetall
2025-05-07 10:57:23 +08:00
huihui
272efd7d25 SciKnowEval 2025-05-02 12:08:58 +00:00
huihui
de6e4909bd first 2025-05-02 11:57:24 +00:00
bittersweet1999
37cbaf8d92
[Add] Add Judgerbenchv2 (#2067)
* fix pip version

* fix pip version

* add judgerbenchv2

* Update __init__.py
2025-04-30 17:12:34 +08:00
Taolin Zhang
b6148aa198
add Judgebench (#2066)
* add rewardbench

* add rewardbench

* add rmb datasets

* add rmb datasets

* add judgebench

* add judgebench
2025-04-30 15:01:10 +08:00
huihui
f931d2ca94 first 2025-04-30 05:29:40 +00:00
huihui
44aadf627b first 2025-04-30 05:29:04 +00:00
huihui
dfa26b24bd first 2025-04-30 05:20:38 +00:00
bittersweet1999
527a80947b
[Add] Add writingbench (#2028)
* fix pip version

* fix pip version

* add writingbench

* add writingbench

* add writingbench

* add writingbench
2025-04-29 16:29:32 +08:00
Taolin Zhang
8c74e6a39e
add RMB Bench (#2056)
* add rewardbench

* add rewardbench

* add rmb datasets

* add rmb datasets
2025-04-27 16:26:01 +08:00
Linchen Xiao
e8bc8c1e8c
[Bug] Concat OpenaiSDK reasoning content (#2041)
* [Bug] Concat OpenaiSDK reasoning content

* [Bug] Concat OpenaiSDK reasoning content

* update

* update
2025-04-25 14:10:33 +08:00
Junnan Liu
97010dc4ce
[Update] Update dataset repeat concatenation (#2039) 2025-04-23 16:16:28 +08:00
Linchen Xiao
dcbf899369
[Bug] Fix SmolInsturct logger import (#2036) 2025-04-23 11:10:30 +08:00
Linchen Xiao
bf74f26603
[Update] Safe SmolInstruct meteor calculation (#2033) 2025-04-22 18:27:48 +08:00
Linchen Xiao
455bb05d1b
[Update] Update dataset configs (#2030)
* [Update] Update dataset configs

* Fix lint
2025-04-21 18:55:06 +08:00
Taolin Zhang
c69110361b
[Add] add rewardbench (#2029)
* add rewardbench

* add rewardbench
2025-04-21 17:18:51 +08:00
JuchengHu
a2093a81ef
[Dataset] Matbench (#2021)
* add support for matbench

* fix dataset path

* fix data load

* fix

* fix lint

---------

Co-authored-by: Jucheng Hu <jucheng.hu.20@ucl.ac.uk>
Co-authored-by: Myhs-phz <demarcia2014@126.com>
2025-04-21 15:50:47 +08:00
Linchen Xiao
b2da1c08a8
[Dataset] Add SmolInstruct, Update Chembench (#2025)
* [Dataset] Add SmolInstruct, Update Chembench

* Add dataset metadata

* update

* update

* update
2025-04-18 17:21:29 +08:00
Linchen Xiao
65ff602cf5
[Update] Fix LLM Judge metrics cacluation & Add reasoning content concat to OpenAI SDK 2025-04-15 11:33:16 +08:00
Myhs_phz
75e7834b59
[Feature] Add Datasets: ClimateQA,Physics (#2017)
* feat ClimateQA

* feat PHYSICS

* fix

* fix

* fix

* fix
2025-04-14 20:18:47 +08:00
Linchen Xiao
6a6a1a5c0b
[Feature] LLM Judge sanity check (#2012)
* update

* update
2025-04-11 19:01:39 +08:00
bittersweet1999
3f50b1dc49
[Fix] fix order bug Update arena_hard.py (#2015) 2025-04-11 16:59:40 +08:00
Junnan Liu
20660ab507
[Fix] Fix compare error when k is list in base_evaluator (#2010)
* fix gpass compare error of list k

* fix compare error in 177
2025-04-10 19:47:21 +08:00
Linchen Xiao
12213207b6
[Refactor] Refactorize openicl eval task (#1990)
* [Refactor] Refactorize openicl eval task

* update
2025-04-09 15:52:23 +08:00
zhulinJulia24
6ac9b06bc2
[ci] update baseline for kernal change of vllm and lmdeploy (#2011)
* update

* update

* update

* update

* update

* update

* update
2025-04-09 14:09:35 +08:00
Linchen Xiao
a05f9da134
[Feature] Make dump-eval-details default behavior (#1999)
* Update

* update

* update
2025-04-08 14:42:26 +08:00
Myhs_phz
fd82bea747
[Fix] OpenICL Math Evaluator Config (#2007)
* fix

* fix recommended

* fix

* fix

* fix

* fix
2025-04-08 14:38:35 +08:00
Linchen Xiao
bb58cfc85d
[Feature] Add CascadeEvaluator (#1992)
* [Feature] Add CascadeEvaluator

* update

* updat
2025-04-08 11:58:14 +08:00
Jin Ye
b564e608b1
[Dataset] Add MedXpertQA (#2002)
* Add MedXpertQA

* Add MedXpertQA

* Add MedXpertQA

* Fix lint

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-04-08 10:44:48 +08:00
shijinpjlab
828fb745c9
[Dataset] Update dingo 1.5.0 (#2008)
Co-authored-by: shiin <shijin@pjlab.org.cn>
2025-04-07 17:21:15 +08:00
zhulinJulia24
f982d6278e
[CI] fix baseline score (#2000)
* update

* update

* update

* update

* update

* update

* update

* updaste

* update

* update

* updaste

* updaste

* update

* update

* update

* update

* update

* update

* update

* update
2025-04-03 19:32:36 +08:00
Myhs_phz
3a9a384173
[Doc] Fix links between zh & en (#2001)
* test

* test

* test
2025-04-03 17:37:53 +08:00
Myhs_phz
9b489e9ea0
[Update] Revert math500 dataset configs (#1998) 2025-04-03 15:11:02 +08:00
Linchen Xiao
dc8deb6af0
[BUMP] Bump version to 0.4.2 (#1997) 2025-04-02 17:47:15 +08:00