MaiziXiao
|
1acb3c30c0
|
update
|
2025-05-08 07:26:18 +00:00 |
|
marcry
|
23fb3c7fa9
|
resove dataset-index conflicts
|
2025-05-08 04:54:39 +00:00 |
|
marcry
|
adc33cd4f8
|
revise class name & remove csv file & add dataset-index.yml info
|
2025-05-07 14:35:48 +00:00 |
|
Dongsheng Zhu
|
ba0e32292c
|
[Feature] Support InternSandbox (#2049)
* internsandbox init
* internsandbox
* dataset_index
* dataset_index_add
|
2025-05-07 16:42:09 +08:00 |
|
谢昕辰
|
43b2c4ed76
|
[Fix] Update lawbench data path (#2037)
|
2025-05-07 16:18:43 +08:00 |
|
Dongsheng Zhu
|
d62b69aaef
|
[Fix] Fix InternVL model config (#2068)
* intervl-8b&38b
* intervl adjustment
* internvl fix
|
2025-05-07 15:51:18 +08:00 |
|
Linchen Xiao
|
af8432e1d6
|
[Update] OpenAI SDK model reasoning content (#2078)
* update
* update
* update
|
2025-05-07 14:06:40 +08:00 |
|
bittersweet1999
|
ddc9cc0afb
|
[Add] add a config to Judge dataset all (#2077)
* fix pip version
* fix pip version
* add judgedatasetall
* add judgedatasetall
* add judgedatasetall
|
2025-05-07 10:57:23 +08:00 |
|
marcry
|
e7b04afa3c
|
revise gen name
|
2025-05-06 13:11:44 +00:00 |
|
marcry
|
5ee365593e
|
revise gen name
|
2025-05-06 13:09:31 +00:00 |
|
bittersweet1999
|
37cbaf8d92
|
[Add] Add Judgerbenchv2 (#2067)
* fix pip version
* fix pip version
* add judgerbenchv2
* Update __init__.py
|
2025-04-30 17:12:34 +08:00 |
|
Taolin Zhang
|
b6148aa198
|
add Judgebench (#2066)
* add rewardbench
* add rewardbench
* add rmb datasets
* add rmb datasets
* add judgebench
* add judgebench
|
2025-04-30 15:01:10 +08:00 |
|
marcry
|
f953ad3178
|
add dataset files
|
2025-04-29 09:23:42 +00:00 |
|
marcry
|
48ac21f371
|
support nejm ai benchmark
|
2025-04-29 09:16:25 +00:00 |
|
bittersweet1999
|
527a80947b
|
[Add] Add writingbench (#2028)
* fix pip version
* fix pip version
* add writingbench
* add writingbench
* add writingbench
* add writingbench
|
2025-04-29 16:29:32 +08:00 |
|
Taolin Zhang
|
8c74e6a39e
|
add RMB Bench (#2056)
* add rewardbench
* add rewardbench
* add rmb datasets
* add rmb datasets
|
2025-04-27 16:26:01 +08:00 |
|
Linchen Xiao
|
e8bc8c1e8c
|
[Bug] Concat OpenaiSDK reasoning content (#2041)
* [Bug] Concat OpenaiSDK reasoning content
* [Bug] Concat OpenaiSDK reasoning content
* update
* update
|
2025-04-25 14:10:33 +08:00 |
|
Junnan Liu
|
97010dc4ce
|
[Update] Update dataset repeat concatenation (#2039)
|
2025-04-23 16:16:28 +08:00 |
|
Linchen Xiao
|
dcbf899369
|
[Bug] Fix SmolInsturct logger import (#2036)
|
2025-04-23 11:10:30 +08:00 |
|
Linchen Xiao
|
bf74f26603
|
[Update] Safe SmolInstruct meteor calculation (#2033)
|
2025-04-22 18:27:48 +08:00 |
|
Linchen Xiao
|
455bb05d1b
|
[Update] Update dataset configs (#2030)
* [Update] Update dataset configs
* Fix lint
|
2025-04-21 18:55:06 +08:00 |
|
Taolin Zhang
|
c69110361b
|
[Add] add rewardbench (#2029)
* add rewardbench
* add rewardbench
|
2025-04-21 17:18:51 +08:00 |
|
JuchengHu
|
a2093a81ef
|
[Dataset] Matbench (#2021)
* add support for matbench
* fix dataset path
* fix data load
* fix
* fix lint
---------
Co-authored-by: Jucheng Hu <jucheng.hu.20@ucl.ac.uk>
Co-authored-by: Myhs-phz <demarcia2014@126.com>
|
2025-04-21 15:50:47 +08:00 |
|
Linchen Xiao
|
b2da1c08a8
|
[Dataset] Add SmolInstruct, Update Chembench (#2025)
* [Dataset] Add SmolInstruct, Update Chembench
* Add dataset metadata
* update
* update
* update
|
2025-04-18 17:21:29 +08:00 |
|
Linchen Xiao
|
65ff602cf5
|
[Update] Fix LLM Judge metrics cacluation & Add reasoning content concat to OpenAI SDK
|
2025-04-15 11:33:16 +08:00 |
|
Myhs_phz
|
75e7834b59
|
[Feature] Add Datasets: ClimateQA,Physics (#2017)
* feat ClimateQA
* feat PHYSICS
* fix
* fix
* fix
* fix
|
2025-04-14 20:18:47 +08:00 |
|
Linchen Xiao
|
6a6a1a5c0b
|
[Feature] LLM Judge sanity check (#2012)
* update
* update
|
2025-04-11 19:01:39 +08:00 |
|
bittersweet1999
|
3f50b1dc49
|
[Fix] fix order bug Update arena_hard.py (#2015)
|
2025-04-11 16:59:40 +08:00 |
|
Junnan Liu
|
20660ab507
|
[Fix] Fix compare error when k is list in base_evaluator (#2010)
* fix gpass compare error of list k
* fix compare error in 177
|
2025-04-10 19:47:21 +08:00 |
|
Linchen Xiao
|
12213207b6
|
[Refactor] Refactorize openicl eval task (#1990)
* [Refactor] Refactorize openicl eval task
* update
|
2025-04-09 15:52:23 +08:00 |
|
zhulinJulia24
|
6ac9b06bc2
|
[ci] update baseline for kernal change of vllm and lmdeploy (#2011)
* update
* update
* update
* update
* update
* update
* update
|
2025-04-09 14:09:35 +08:00 |
|
Linchen Xiao
|
a05f9da134
|
[Feature] Make dump-eval-details default behavior (#1999)
* Update
* update
* update
|
2025-04-08 14:42:26 +08:00 |
|
Myhs_phz
|
fd82bea747
|
[Fix] OpenICL Math Evaluator Config (#2007)
* fix
* fix recommended
* fix
* fix
* fix
* fix
|
2025-04-08 14:38:35 +08:00 |
|
Linchen Xiao
|
bb58cfc85d
|
[Feature] Add CascadeEvaluator (#1992)
* [Feature] Add CascadeEvaluator
* update
* updat
|
2025-04-08 11:58:14 +08:00 |
|
Jin Ye
|
b564e608b1
|
[Dataset] Add MedXpertQA (#2002)
* Add MedXpertQA
* Add MedXpertQA
* Add MedXpertQA
* Fix lint
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
|
2025-04-08 10:44:48 +08:00 |
|
shijinpjlab
|
828fb745c9
|
[Dataset] Update dingo 1.5.0 (#2008)
Co-authored-by: shiin <shijin@pjlab.org.cn>
|
2025-04-07 17:21:15 +08:00 |
|
zhulinJulia24
|
f982d6278e
|
[CI] fix baseline score (#2000)
* update
* update
* update
* update
* update
* update
* update
* updaste
* update
* update
* updaste
* updaste
* update
* update
* update
* update
* update
* update
* update
* update
|
2025-04-03 19:32:36 +08:00 |
|
Myhs_phz
|
3a9a384173
|
[Doc] Fix links between zh & en (#2001)
* test
* test
* test
|
2025-04-03 17:37:53 +08:00 |
|
Myhs_phz
|
9b489e9ea0
|
[Update] Revert math500 dataset configs (#1998)
|
2025-04-03 15:11:02 +08:00 |
|
Linchen Xiao
|
dc8deb6af0
|
[BUMP] Bump version to 0.4.2 (#1997)
|
2025-04-02 17:47:15 +08:00 |
|
liushz
|
32d6859679
|
[Feature] Add olymmath dataset (#1982)
* Add olymmath dataset
* Add olymmath dataset
* Add olymmath dataset
* Update olymmath dataset
|
2025-04-02 17:34:07 +08:00 |
|
zhulinJulia24
|
97236c8e97
|
[CI] Fix baseline score (#1996)
* update
* update
* update
* update
|
2025-04-02 14:25:16 +08:00 |
|
Linchen Xiao
|
f66b0b347a
|
[Update] Requirements update (#1993)
|
2025-04-02 12:03:45 +08:00 |
|
Dongsheng Zhu
|
330a6e5ca7
|
[Update] Add Intervl-8b&38b model configs (#1978)
|
2025-04-01 11:51:37 +08:00 |
|
Myhs_phz
|
f71eb78c72
|
[Doc] Add TBD Token in Datasets Statistics (#1986)
* feat
* doc
* doc
* doc
* doc
|
2025-03-31 19:08:55 +08:00 |
|
Linchen Xiao
|
0f46c35211
|
[Bug] Aime2024 config fix (#1974)
lint / lint (push) Has been cancelled
* [Bug] Aime2024 config fix
* fix
|
2025-03-25 17:57:11 +08:00 |
|
Myhs_phz
|
6118596362
|
[Feature] Add recommendation configs for datasets (#1937)
* feat datasetrefine drop
* fix datasets in fullbench_int3
* fix
* fix
* back
* fix
* fix and doc
* feat
* fix hook
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* doc
* fix
* fix
* Update dataset-index.yml
|
2025-03-25 14:54:13 +08:00 |
|
Linchen Xiao
|
07930b854a
|
[Update] Add Korbench config with no max_out_len (#1968)
lint / lint (push) Waiting to run
* Add Korbench no max_out_len
* Add Korbench no max_out_len
|
2025-03-24 18:38:06 +08:00 |
|
Myhs_phz
|
37307fa996
|
[Update] Add QWQ32b model config (#1959)
lint / lint (push) Waiting to run
* feat qwq-32b
* fix
* feat phi_4
---------
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
|
2025-03-24 14:51:39 +08:00 |
|
Linchen Xiao
|
db96161a4e
|
[Update] Add SuperGPQA subset metrics (#1966)
|
2025-03-24 14:25:12 +08:00 |
|