leoyizhang
d679be0cf6
format rbench.py by isort
2025-05-28 14:59:49 +08:00
leoyizhang
01af69a685
fixed lint
2025-05-14 18:33:07 +08:00
leoyizhang
5d8c96b001
[Dataset] Add R-Bench (ICML 2025)
2025-05-11 13:26:25 +08:00
huihui1999
44a7024ed5
[Dataset] MedCalc_Bench ( #2072 )
...
* MedCalc_Bench
* MedCal_Bench
* add hash
* fix hash
* fix comments &dataset-index yml
* fix lint
* fix lint
* fix lint
* fix lint
* fix lint
---------
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2025-05-09 16:58:55 +08:00
Linchen Xiao
508e2b0cb2
[Update] Set load_from_cache_file to False ( #2085 )
2025-05-09 15:21:47 +08:00
Jin Ye
6097186a95
[Datasets] MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 ( #2064 )
...
* Add Datasets: MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1
* Fix bugs for MedQA. Add info in dataset-index
* Add version code for MedQA and ProteinLMBench
* Add version code for MedQA and ProteinLMBench
2025-05-09 14:47:44 +08:00
Linchen Xiao
d72df59363
[Revert] Add Lifescience Sub-set Support for SciEval ( #2059 ) ( #2087 )
...
This reverts commit c5048bfec7
.
2025-05-09 14:46:27 +08:00
tcheng
c5048bfec7
[Dataset] Add Lifescience Sub-set Support for SciEval ( #2059 )
...
* style: pass all formatting hooks (yapf & quote fixer)
* revise name:Add Lifescience Sub-set Support for MMLU & SciEval (datasets + configs + loader)
* revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)
* Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml)
---------
Co-authored-by: root <tangcheng231@mails.ucas.edu.cn>
2025-05-09 14:31:12 +08:00
huihui1999
a7f3ac20b2
[Dataset] Add CARDBiomedBench ( #2071 )
...
* CARDBiomedBench
* fix hash
* fix dataset-index
* use official llmjudge postprocess
* use official llmjudge_postprocess
* fix lint
* fix init
2025-05-08 19:44:46 +08:00
Wei Li
a685ed7daf
[Dataset] Add nejm ai benchmark ( #2063 )
...
* support nejm ai benchmark
* add dataset files
* revise gen name
* revise gen name
* revise class name & remove csv file & add dataset-index.yml info
* update
* update
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-05-08 16:44:05 +08:00
Jiahao Xu
9ec23c145b
[Datasets] Add ClinicBench, PubMedQA and ScienceQA ( #2061 )
...
* Add ClinicBench
* Add PubMedQA & ScienceQA & ClinicBench
* Add PubMedQA & ScienceQA & ClinicBench
* Update datasets_info & hf_path
* Update hf_path
2025-05-08 16:25:43 +08:00
Dongsheng Zhu
ba0e32292c
[Feature] Support InternSandbox ( #2049 )
...
* internsandbox init
* internsandbox
* dataset_index
* dataset_index_add
2025-05-07 16:42:09 +08:00
谢昕辰
43b2c4ed76
[Fix] Update lawbench data path ( #2037 )
2025-05-07 16:18:43 +08:00
bittersweet1999
37cbaf8d92
[Add] Add Judgerbenchv2 ( #2067 )
...
* fix pip version
* fix pip version
* add judgerbenchv2
* Update __init__.py
2025-04-30 17:12:34 +08:00
Taolin Zhang
b6148aa198
add Judgebench ( #2066 )
...
* add rewardbench
* add rewardbench
* add rmb datasets
* add rmb datasets
* add judgebench
* add judgebench
2025-04-30 15:01:10 +08:00
bittersweet1999
527a80947b
[Add] Add writingbench ( #2028 )
...
* fix pip version
* fix pip version
* add writingbench
* add writingbench
* add writingbench
* add writingbench
2025-04-29 16:29:32 +08:00
Taolin Zhang
8c74e6a39e
add RMB Bench ( #2056 )
...
* add rewardbench
* add rewardbench
* add rmb datasets
* add rmb datasets
2025-04-27 16:26:01 +08:00
Junnan Liu
97010dc4ce
[Update] Update dataset repeat concatenation ( #2039 )
2025-04-23 16:16:28 +08:00
Linchen Xiao
dcbf899369
[Bug] Fix SmolInsturct logger import ( #2036 )
2025-04-23 11:10:30 +08:00
Linchen Xiao
bf74f26603
[Update] Safe SmolInstruct meteor calculation ( #2033 )
2025-04-22 18:27:48 +08:00
Linchen Xiao
455bb05d1b
[Update] Update dataset configs ( #2030 )
...
* [Update] Update dataset configs
* Fix lint
2025-04-21 18:55:06 +08:00
Taolin Zhang
c69110361b
[Add] add rewardbench ( #2029 )
...
* add rewardbench
* add rewardbench
2025-04-21 17:18:51 +08:00
JuchengHu
a2093a81ef
[Dataset] Matbench ( #2021 )
...
* add support for matbench
* fix dataset path
* fix data load
* fix
* fix lint
---------
Co-authored-by: Jucheng Hu <jucheng.hu.20@ucl.ac.uk>
Co-authored-by: Myhs-phz <demarcia2014@126.com>
2025-04-21 15:50:47 +08:00
Linchen Xiao
b2da1c08a8
[Dataset] Add SmolInstruct, Update Chembench ( #2025 )
...
* [Dataset] Add SmolInstruct, Update Chembench
* Add dataset metadata
* update
* update
* update
2025-04-18 17:21:29 +08:00
Linchen Xiao
65ff602cf5
[Update] Fix LLM Judge metrics cacluation & Add reasoning content concat to OpenAI SDK
2025-04-15 11:33:16 +08:00
Myhs_phz
75e7834b59
[Feature] Add Datasets: ClimateQA,Physics ( #2017 )
...
* feat ClimateQA
* feat PHYSICS
* fix
* fix
* fix
* fix
2025-04-14 20:18:47 +08:00
Linchen Xiao
6a6a1a5c0b
[Feature] LLM Judge sanity check ( #2012 )
...
* update
* update
2025-04-11 19:01:39 +08:00
bittersweet1999
3f50b1dc49
[Fix] fix order bug Update arena_hard.py ( #2015 )
2025-04-11 16:59:40 +08:00
zhulinJulia24
6ac9b06bc2
[ci] update baseline for kernal change of vllm and lmdeploy ( #2011 )
...
* update
* update
* update
* update
* update
* update
* update
2025-04-09 14:09:35 +08:00
Jin Ye
b564e608b1
[Dataset] Add MedXpertQA ( #2002 )
...
* Add MedXpertQA
* Add MedXpertQA
* Add MedXpertQA
* Fix lint
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-04-08 10:44:48 +08:00
shijinpjlab
828fb745c9
[Dataset] Update dingo 1.5.0 ( #2008 )
...
Co-authored-by: shiin <shijin@pjlab.org.cn>
2025-04-07 17:21:15 +08:00
liushz
32d6859679
[Feature] Add olymmath dataset ( #1982 )
...
* Add olymmath dataset
* Add olymmath dataset
* Add olymmath dataset
* Update olymmath dataset
2025-04-02 17:34:07 +08:00
Linchen Xiao
f66b0b347a
[Update] Requirements update ( #1993 )
2025-04-02 12:03:45 +08:00
Linchen Xiao
db96161a4e
[Update] Add SuperGPQA subset metrics ( #1966 )
2025-03-24 14:25:12 +08:00
Dongsheng Zhu
8a5029b121
[Feature] Add MultiPL-E & Code Evaluator ( #1963 )
...
* multiple_code develop
* multiple_code update
* comments upadate
* index upadate
2025-03-21 20:09:25 +08:00
Linchen Xiao
1c60e3a0f6
[Update] Add configurations for llmjudge dataset ( #1940 )
...
* Add configurations for llmjudge dataset
* update
2025-03-13 17:30:04 +08:00
Yufeng Zhao
bc2969dba8
[Feature] Add support for BBEH dataset ( #1925 )
...
* bbeh
* bbeh
* fix_smallbugs_bbeh
* removeprint
* results
---------
Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>
2025-03-12 10:53:31 +08:00
Kangreen
59e49aedf1
[Feature] Support SuperGPQA ( #1924 )
...
* support supergpqa
* remove unnecessary code
* remove unnecessary code
* Add Readme
* Add Readme
* fix lint
* fix lint
* update
* update
---------
Co-authored-by: mkj3085003 <mkj3085003@gmail.com>
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-03-11 19:32:08 +08:00
Dongsheng Zhu
fff2d51440
[Update] Code evaluation alignment ( #1909 )
...
* code alignment
* update oss md5
* bigcodebench update
* lint
* lint_
* lint yapf
2025-03-04 18:49:38 +08:00
liushz
198c08632e
[Feature] Add HLE (Humanity's Last Exam) dataset ( #1902 )
...
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* update dataset path
* Update olmpiadBench
* Update olmpiadBench
* Update olmpiadBench
* Add HLE dataset
* Add HLE dataset
* Add HLE dataset
---------
Co-authored-by: sudanl <sudanl@foxmail.com>
2025-03-04 16:42:37 +08:00
Songyang Zhang
c84bc18ac1
[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard ( #1899 )
...
* [Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard with LLM Verify
* Update
* Update
* Update DeepSeek-R1 example
* Update DeepSeek-R1 example
* Update DeepSeek-R1 example
2025-03-03 18:56:11 +08:00
Linchen Xiao
6a573f671b
[Fix] Fix compatible issue
2025-03-03 15:35:57 +08:00
Junnan Liu
73c80953c6
[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator ( #1886 )
...
* support dataset repeat and g-pass compute for each evaluator
* fix pre-commit errors
* delete print
* delete gpassk_evaluator and fix potential errors
* change `repeat` to `n`
* fix `repeat` to `n` in openicl_eval
* update doc for multi-run and g-pass
* update latex equation in doc
* update eng doc for multi-run and g-pass
* update datasets.md
* update datasets.md
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation
* fix multi-line equation in zh_cn user_guides
* mmodify pre-commit-zh-cn
* recover pre-commit and edit math expr in doc
* del [TIP]
* del cite tag in doc
* del extract_model param in livemathbench config
2025-02-26 19:43:12 +08:00
Songyang Zhang
fd6fbf01a2
[Update] Support AIME-24 Evaluation for DeepSeek-R1 series ( #1888 )
...
* Update
* Update
* Update
* Update
2025-02-25 20:34:41 +08:00
Junnan Liu
22a33d8759
[Update] Update LiveMathBench Hard Configs ( #1826 )
...
* support G-Pass@k and livemathbench
* fix bugs
* fix comments of GPassKEvaluator
* update saved details of GPassKEvaluator
* update saved details of GPassKEvaluator
* fix eval api configs & update openai_api for ease of debugging
* update huggingface path
* fix method name of G-Pass@k
* fix default value of eval_model_name
* refactor G-Pass@k evaluator
* log generation params for each backend
* fix evaluation resume
* add notimplementerror
* update livemathbench-hard configs
* remove max_out_len from livemathbench_hard_greedy_gen_9befbf.py
* remove max_out_len from livemathbench_hard_gen_9befbf.py
* rename livemathbench_hard_gen_9befbf.py to livemathbench_hard_gen_353ae7.py
* rename livemathbench_hard_greedy_gen_9befbf.py to livemathbench_hard_greedy_gen_353ae7.py
* update livemathbench_gen_9befbf.py
* remove whitespace
* upload livemathbench hard configs
2025-02-25 17:24:36 +08:00
Dongsheng Zhu
465e93e10e
[Update] Academic bench llm judge update ( #1876 )
...
* BigCodeBench update
* update LCBench
* update LCBench 2
* update code
* academicBench update
* academic bench ifeval&math update
* generic_llmjudge_aime_academic_postprocess delete
* aime delete
* postprocessors update
* ifeval delete
* update work_dir
* linting
* linting double-quote-string-fixer
* r1-distill out_len update
* fix lint
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2025-02-24 15:45:24 +08:00
Linchen Xiao
d7daee6e25
[Update] OpenAI model update, bigcodebench update ( #1879 )
...
* [Update] Openai model update, bigcodebench update
* update
2025-02-20 19:33:25 +08:00
Linchen Xiao
27c916661d
[Feature] Math Verify with model post_processor ( #1881 )
...
* update
* [Feature] Update model post_processor
* update
* update
* update
2025-02-20 19:32:12 +08:00
Dongsheng Zhu
3fd8b4e0cd
[Update] Update BigCodeBench & LCBench load path ( #1857 )
...
* BigCodeBench update
* update LCBench
* update LCBench 2
* update code
2025-02-08 15:15:47 +08:00
Shudong Liu
412199f802
[Feature] Support OlympiadBench Benchmark ( #1841 )
...
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* Support OlympiadBench Benchmark
* update dataset path
* Update olmpiadBench
* Update olmpiadBench
* Update olmpiadBench
---------
Co-authored-by: liushz <qq1791167085@163.com>
2025-01-24 10:00:01 +08:00