OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
leoyizhang	d679be0cf6	format rbench.py by isort	2025-05-28 14:59:49 +08:00
leoyizhang	01af69a685	fixed lint	2025-05-14 18:33:07 +08:00
leoyizhang	5d8c96b001	[Dataset] Add R-Bench (ICML 2025)	2025-05-11 13:26:25 +08:00
huihui1999	44a7024ed5	[Dataset] MedCalc_Bench (#2072 ) * MedCalc_Bench * MedCal_Bench * add hash * fix hash * fix comments &dataset-index yml * fix lint * fix lint * fix lint * fix lint * fix lint --------- Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2025-05-09 16:58:55 +08:00
Linchen Xiao	508e2b0cb2	[Update] Set load_from_cache_file to False (#2085 )	2025-05-09 15:21:47 +08:00
Jin Ye	6097186a95	[Datasets] MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 (#2064 ) * Add Datasets: MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 * Fix bugs for MedQA. Add info in dataset-index * Add version code for MedQA and ProteinLMBench * Add version code for MedQA and ProteinLMBench	2025-05-09 14:47:44 +08:00
Linchen Xiao	d72df59363	[Revert] Add Lifescience Sub-set Support for SciEval (#2059 ) (#2087 ) This reverts commit `c5048bfec7`.	2025-05-09 14:46:27 +08:00
tcheng	c5048bfec7	[Dataset] Add Lifescience Sub-set Support for SciEval (#2059 ) * style: pass all formatting hooks (yapf & quote fixer) * revise name:Add Lifescience Sub-set Support for MMLU & SciEval (datasets + configs + loader) * revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) * Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) --------- Co-authored-by: root <tangcheng231@mails.ucas.edu.cn>	2025-05-09 14:31:12 +08:00
huihui1999	a7f3ac20b2	[Dataset] Add CARDBiomedBench (#2071 ) * CARDBiomedBench * fix hash * fix dataset-index * use official llmjudge postprocess * use official llmjudge_postprocess * fix lint * fix init	2025-05-08 19:44:46 +08:00
Wei Li	a685ed7daf	[Dataset] Add nejm ai benchmark (#2063 ) * support nejm ai benchmark * add dataset files * revise gen name * revise gen name * revise class name & remove csv file & add dataset-index.yml info * update * update --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2025-05-08 16:44:05 +08:00
Jiahao Xu	9ec23c145b	[Datasets] Add ClinicBench, PubMedQA and ScienceQA (#2061 ) * Add ClinicBench * Add PubMedQA & ScienceQA & ClinicBench * Add PubMedQA & ScienceQA & ClinicBench * Update datasets_info & hf_path * Update hf_path	2025-05-08 16:25:43 +08:00
Dongsheng Zhu	ba0e32292c	[Feature] Support InternSandbox (#2049 ) * internsandbox init * internsandbox * dataset_index * dataset_index_add	2025-05-07 16:42:09 +08:00
谢昕辰	43b2c4ed76	[Fix] Update lawbench data path (#2037 )	2025-05-07 16:18:43 +08:00
bittersweet1999	37cbaf8d92	[Add] Add Judgerbenchv2 (#2067 ) * fix pip version * fix pip version * add judgerbenchv2 * Update __init__.py	2025-04-30 17:12:34 +08:00
Taolin Zhang	b6148aa198	add Judgebench (#2066 ) * add rewardbench * add rewardbench * add rmb datasets * add rmb datasets * add judgebench * add judgebench	2025-04-30 15:01:10 +08:00
bittersweet1999	527a80947b	[Add] Add writingbench (#2028 ) * fix pip version * fix pip version * add writingbench * add writingbench * add writingbench * add writingbench	2025-04-29 16:29:32 +08:00
Taolin Zhang	8c74e6a39e	add RMB Bench (#2056 ) * add rewardbench * add rewardbench * add rmb datasets * add rmb datasets	2025-04-27 16:26:01 +08:00
Junnan Liu	97010dc4ce	[Update] Update dataset repeat concatenation (#2039 )	2025-04-23 16:16:28 +08:00
Linchen Xiao	dcbf899369	[Bug] Fix SmolInsturct logger import (#2036 )	2025-04-23 11:10:30 +08:00
Linchen Xiao	bf74f26603	[Update] Safe SmolInstruct meteor calculation (#2033 )	2025-04-22 18:27:48 +08:00
Linchen Xiao	455bb05d1b	[Update] Update dataset configs (#2030 ) * [Update] Update dataset configs * Fix lint	2025-04-21 18:55:06 +08:00
Taolin Zhang	c69110361b	[Add] add rewardbench (#2029 ) * add rewardbench * add rewardbench	2025-04-21 17:18:51 +08:00
JuchengHu	a2093a81ef	[Dataset] Matbench (#2021 ) * add support for matbench * fix dataset path * fix data load * fix * fix lint --------- Co-authored-by: Jucheng Hu <jucheng.hu.20@ucl.ac.uk> Co-authored-by: Myhs-phz <demarcia2014@126.com>	2025-04-21 15:50:47 +08:00
Linchen Xiao	b2da1c08a8	[Dataset] Add SmolInstruct, Update Chembench (#2025 ) * [Dataset] Add SmolInstruct, Update Chembench * Add dataset metadata * update * update * update	2025-04-18 17:21:29 +08:00
Linchen Xiao	65ff602cf5	[Update] Fix LLM Judge metrics cacluation & Add reasoning content concat to OpenAI SDK	2025-04-15 11:33:16 +08:00
Myhs_phz	75e7834b59	[Feature] Add Datasets: ClimateQA,Physics (#2017 ) * feat ClimateQA * feat PHYSICS * fix * fix * fix * fix	2025-04-14 20:18:47 +08:00
Linchen Xiao	6a6a1a5c0b	[Feature] LLM Judge sanity check (#2012 ) * update * update	2025-04-11 19:01:39 +08:00
bittersweet1999	3f50b1dc49	[Fix] fix order bug Update arena_hard.py (#2015 )	2025-04-11 16:59:40 +08:00
zhulinJulia24	6ac9b06bc2	[ci] update baseline for kernal change of vllm and lmdeploy (#2011 ) * update * update * update * update * update * update * update	2025-04-09 14:09:35 +08:00
Jin Ye	b564e608b1	[Dataset] Add MedXpertQA (#2002 ) * Add MedXpertQA * Add MedXpertQA * Add MedXpertQA * Fix lint --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2025-04-08 10:44:48 +08:00
shijinpjlab	828fb745c9	[Dataset] Update dingo 1.5.0 (#2008 ) Co-authored-by: shiin <shijin@pjlab.org.cn>	2025-04-07 17:21:15 +08:00
liushz	32d6859679	[Feature] Add olymmath dataset (#1982 ) * Add olymmath dataset * Add olymmath dataset * Add olymmath dataset * Update olymmath dataset	2025-04-02 17:34:07 +08:00
Linchen Xiao	f66b0b347a	[Update] Requirements update (#1993 )	2025-04-02 12:03:45 +08:00
Linchen Xiao	db96161a4e	[Update] Add SuperGPQA subset metrics (#1966 )	2025-03-24 14:25:12 +08:00
Dongsheng Zhu	8a5029b121	[Feature] Add MultiPL-E & Code Evaluator (#1963 ) * multiple_code develop * multiple_code update * comments upadate * index upadate	2025-03-21 20:09:25 +08:00
Linchen Xiao	1c60e3a0f6	[Update] Add configurations for llmjudge dataset (#1940 ) * Add configurations for llmjudge dataset * update	2025-03-13 17:30:04 +08:00
Yufeng Zhao	bc2969dba8	[Feature] Add support for BBEH dataset (#1925 ) * bbeh * bbeh * fix_smallbugs_bbeh * removeprint * results --------- Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>	2025-03-12 10:53:31 +08:00
Kangreen	59e49aedf1	[Feature] Support SuperGPQA (#1924 ) * support supergpqa * remove unnecessary code * remove unnecessary code * Add Readme * Add Readme * fix lint * fix lint * update * update --------- Co-authored-by: mkj3085003 <mkj3085003@gmail.com> Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2025-03-11 19:32:08 +08:00
Dongsheng Zhu	fff2d51440	[Update] Code evaluation alignment (#1909 ) * code alignment * update oss md5 * bigcodebench update * lint * lint_ * lint yapf	2025-03-04 18:49:38 +08:00
liushz	198c08632e	[Feature] Add HLE (Humanity's Last Exam) dataset (#1902 ) * Support OlympiadBench Benchmark * Support OlympiadBench Benchmark * Support OlympiadBench Benchmark * update dataset path * Update olmpiadBench * Update olmpiadBench * Update olmpiadBench * Add HLE dataset * Add HLE dataset * Add HLE dataset --------- Co-authored-by: sudanl <sudanl@foxmail.com>	2025-03-04 16:42:37 +08:00
Songyang Zhang	c84bc18ac1	[Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard (#1899 ) * [Update] Support OlympiadBench-Math/OmniMath/LiveMathBench-Hard with LLM Verify * Update * Update * Update DeepSeek-R1 example * Update DeepSeek-R1 example * Update DeepSeek-R1 example	2025-03-03 18:56:11 +08:00
Linchen Xiao	6a573f671b	[Fix] Fix compatible issue	2025-03-03 15:35:57 +08:00
Junnan Liu	73c80953c6	[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator (#1886 ) * support dataset repeat and g-pass compute for each evaluator * fix pre-commit errors * delete print * delete gpassk_evaluator and fix potential errors * change `repeat` to `n` * fix `repeat` to `n` in openicl_eval * update doc for multi-run and g-pass * update latex equation in doc * update eng doc for multi-run and g-pass * update datasets.md * update datasets.md * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation * fix multi-line equation in zh_cn user_guides * mmodify pre-commit-zh-cn * recover pre-commit and edit math expr in doc * del [TIP] * del cite tag in doc * del extract_model param in livemathbench config	2025-02-26 19:43:12 +08:00
Songyang Zhang	fd6fbf01a2	[Update] Support AIME-24 Evaluation for DeepSeek-R1 series (#1888 ) * Update * Update * Update * Update	2025-02-25 20:34:41 +08:00
Junnan Liu	22a33d8759	[Update] Update LiveMathBench Hard Configs (#1826 ) * support G-Pass@k and livemathbench * fix bugs * fix comments of GPassKEvaluator * update saved details of GPassKEvaluator * update saved details of GPassKEvaluator * fix eval api configs & update openai_api for ease of debugging * update huggingface path * fix method name of G-Pass@k * fix default value of eval_model_name * refactor G-Pass@k evaluator * log generation params for each backend * fix evaluation resume * add notimplementerror * update livemathbench-hard configs * remove max_out_len from livemathbench_hard_greedy_gen_9befbf.py * remove max_out_len from livemathbench_hard_gen_9befbf.py * rename livemathbench_hard_gen_9befbf.py to livemathbench_hard_gen_353ae7.py * rename livemathbench_hard_greedy_gen_9befbf.py to livemathbench_hard_greedy_gen_353ae7.py * update livemathbench_gen_9befbf.py * remove whitespace * upload livemathbench hard configs	2025-02-25 17:24:36 +08:00
Dongsheng Zhu	465e93e10e	[Update] Academic bench llm judge update (#1876 ) * BigCodeBench update * update LCBench * update LCBench 2 * update code * academicBench update * academic bench ifeval&math update * generic_llmjudge_aime_academic_postprocess delete * aime delete * postprocessors update * ifeval delete * update work_dir * linting * linting double-quote-string-fixer * r1-distill out_len update * fix lint --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2025-02-24 15:45:24 +08:00
Linchen Xiao	d7daee6e25	[Update] OpenAI model update, bigcodebench update (#1879 ) * [Update] Openai model update, bigcodebench update * update	2025-02-20 19:33:25 +08:00
Linchen Xiao	27c916661d	[Feature] Math Verify with model post_processor (#1881 ) * update * [Feature] Update model post_processor * update * update * update	2025-02-20 19:32:12 +08:00
Dongsheng Zhu	3fd8b4e0cd	[Update] Update BigCodeBench & LCBench load path (#1857 ) * BigCodeBench update * update LCBench * update LCBench 2 * update code	2025-02-08 15:15:47 +08:00
Shudong Liu	412199f802	[Feature] Support OlympiadBench Benchmark (#1841 ) * Support OlympiadBench Benchmark * Support OlympiadBench Benchmark * Support OlympiadBench Benchmark * update dataset path * Update olmpiadBench * Update olmpiadBench * Update olmpiadBench --------- Co-authored-by: liushz <qq1791167085@163.com>	2025-01-24 10:00:01 +08:00

1 2 3 4 5 ...

296 Commits