OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Mo Li	7a44a80bb9	Merge `03f16c8a83` into `d572761cef`	2025-05-29 14:22:59 +08:00
Yu Sun	d572761cef	[Dataset] Add Smolinstruct configs (#2127 ) Some checks failed lint / lint (push) Has been cancelled Details * 0-shot Smolinstruct Add 0-shot evaluation and postprocess functions for Smolinstruct * fix acc postprocessor * update 0-shot acc postprocessor * rename 0-shot	2025-05-29 14:09:08 +08:00
Linchen Xiao	408f5caff4	[Dataset] Add SuperGPQA subfield configs (#2124 ) * update * fix lint * fix lint * update precommit * update precommit * fix lint	2025-05-28 14:12:58 +08:00
Myhs_phz	6f3c670b99	add qwen3 lmdeply (#2126 )	2025-05-27 19:41:13 +08:00
zhulinJulia24	c3779ebfc1	[ci] update dlc setting (#2112 )	2025-05-22 16:47:57 +08:00
Songyang Zhang	aa2b89b6f8	[Update] Add CascadeEvaluator with Data Replica (#2022 ) * Update CascadeEvaluator * Update CascadeEvaluator * Update CascadeEvaluator * Update Config * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update * Update	2025-05-20 16:46:55 +08:00
Dongsheng Zhu	7a7a4517ab	[Update] History code bench pass@k update (#2102 ) * bigcodebench * humaneval * humanevalx * humanevalx * livecodebench * mbpp * humaneval_plus * fix bug * template * max_out fix * template update	2025-05-19 17:03:33 +08:00
kkscilife	8c0ccf9a6b	[CI] Fix Lint error (#2103 )	2025-05-16 15:36:45 +08:00
kkscilife	6f3b6a5d12	[CI] Add gitleaks check (#2101 )	2025-05-16 14:34:57 +08:00
tcheng	3d1760aba2	[Dataset] Add Scieval (#2089 ) * style: pass all formatting hooks (yapf & quote fixer) * revise name:Add Lifescience Sub-set Support for MMLU & SciEval (datasets + configs + loader) * revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) * Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) * all categories of SciEval (datasets + configs + loader+dataset-index.yml) * revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) * revise :SciEval 5shot --------- Co-authored-by: root <tangcheng231@mails.ucas.edu.cn>	2025-05-14 10:25:03 +08:00
Wei Li	b84518c656	[Dataset] Support MedMCQA and MedBullets benchmark (#2054 ) * support medmcqa and medbullets benchmark * Add Medbullets data folder for benchmark support * revise gen name * revise config file & remove csv file & add dataset info to dataset-index.yml * remove csv file * remove print in medbullets.py * revise class name * update_oss_info --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2025-05-13 17:10:50 +08:00
Mor-Li	03f16c8a83	[Fix] Fix precommit	2025-05-13 14:59:32 +08:00
Mor-Li	a0c3a24aa1	[Docs] Update Default Settings for NeedleBench and ATC Configs	2025-05-13 14:56:46 +08:00
Mor-Li	98a6f6119b	[Docs] update NeedleBenchV2 Docs	2025-05-13 14:32:26 +08:00
Mor-Li	f7242fdea8	Merge branch 'update_needlebench_docs' into needlebench_v2_pr	2025-05-13 14:19:48 +08:00
Mor-Li	35518f612f	[Docs] Update NeedleBench Docs	2025-05-13 14:17:11 +08:00
zhulinJulia24	d60f59dcab	[CI] update baseline and fix lmdeploy version (#2098 ) * update * update * update * update * update * update	2025-05-13 14:01:47 +08:00
bittersweet1999	9eaa1f6fec	Update icl_judge_evaluator.py (#2095 )	2025-05-13 10:44:24 +08:00
Linchen Xiao	d590f557bb	[Update] OpenaiSDK handle empty content (#2096 )	2025-05-12 19:38:30 +08:00
yuehua-s	c492e49e79	[Update] Add o4 in OpenaiSDK (#2083 ) * feature:1.add o4-mini;2.o3 or o4-mini only support temperature==1 * feature:change 4o-mini to 4o --------- Co-authored-by: yuehuazhang <yuehuazhang@tencent.com>	2025-05-12 18:39:44 +08:00
Dongsheng Zhu	2c79dc5227	[Dataset] Add human_eval/mbpp pro (#2092 ) * add bench * update * bug fix * time update * add index * fix repeat bug	2025-05-12 18:38:13 +08:00
huihui1999	345674f700	[Dataset] Add SciknowEval Dataset (#2070 ) * first * first * first * first * SciKnowEval * fix hash * fix dataset-index & use official llm_judge_postprocess * fix dataset-index.yml * use official llmjudge_postprocess * fix lint * fix lint * fix lint * fix lint * fix lint * merge with main --------- Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2025-05-12 17:23:44 +08:00
Kun Yuan	8aa18df368	[Dataset] HLE Biomedical version support (#2080 ) * HLE Biomedical version support * set up default category value for hle	2025-05-12 10:14:11 +08:00
Mor-Li	d75494841d	remove choice version	2025-05-09 20:21:24 +08:00
Mor-Li	40c6c68162	[Fix] Fix pre-commit	2025-05-09 20:19:04 +08:00
Mor-Li	d1da4a577c	Add NeedleBench_V2	2025-05-09 19:37:39 +08:00
huihui1999	44a7024ed5	[Dataset] MedCalc_Bench (#2072 ) * MedCalc_Bench * MedCal_Bench * add hash * fix hash * fix comments &dataset-index yml * fix lint * fix lint * fix lint * fix lint * fix lint --------- Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2025-05-09 16:58:55 +08:00
Linchen Xiao	508e2b0cb2	[Update] Set load_from_cache_file to False (#2085 )	2025-05-09 15:21:47 +08:00
Kun Yuan	7bdd3c1904	[Dataset] MMLU_Pro Biomedical Version Support (#2081 )	2025-05-09 15:07:26 +08:00
Jin Ye	6097186a95	[Datasets] MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 (#2064 ) * Add Datasets: MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 * Fix bugs for MedQA. Add info in dataset-index * Add version code for MedQA and ProteinLMBench * Add version code for MedQA and ProteinLMBench	2025-05-09 14:47:44 +08:00
Linchen Xiao	d72df59363	[Revert] Add Lifescience Sub-set Support for SciEval (#2059 ) (#2087 ) This reverts commit `c5048bfec7`.	2025-05-09 14:46:27 +08:00
tcheng	c5048bfec7	[Dataset] Add Lifescience Sub-set Support for SciEval (#2059 ) * style: pass all formatting hooks (yapf & quote fixer) * revise name:Add Lifescience Sub-set Support for MMLU & SciEval (datasets + configs + loader) * revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) * Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) --------- Co-authored-by: root <tangcheng231@mails.ucas.edu.cn>	2025-05-09 14:31:12 +08:00
huihui1999	a7f3ac20b2	[Dataset] Add CARDBiomedBench (#2071 ) * CARDBiomedBench * fix hash * fix dataset-index * use official llmjudge postprocess * use official llmjudge_postprocess * fix lint * fix init	2025-05-08 19:44:46 +08:00
Mo Li	ff3275edf0	[Update] Add Long-Context configs for Gemma, OREAL, and Qwen2.5 models (#2048 ) * [Update] Update Gemma, Oreal, Qwen Config * fix lint	2025-05-08 19:06:56 +08:00
Wei Li	a685ed7daf	[Dataset] Add nejm ai benchmark (#2063 ) * support nejm ai benchmark * add dataset files * revise gen name * revise gen name * revise class name & remove csv file & add dataset-index.yml info * update * update --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2025-05-08 16:44:05 +08:00
Jiahao Xu	9ec23c145b	[Datasets] Add ClinicBench, PubMedQA and ScienceQA (#2061 ) * Add ClinicBench * Add PubMedQA & ScienceQA & ClinicBench * Add PubMedQA & ScienceQA & ClinicBench * Update datasets_info & hf_path * Update hf_path	2025-05-08 16:25:43 +08:00
Dongsheng Zhu	ba0e32292c	[Feature] Support InternSandbox (#2049 ) * internsandbox init * internsandbox * dataset_index * dataset_index_add	2025-05-07 16:42:09 +08:00
谢昕辰	43b2c4ed76	[Fix] Update lawbench data path (#2037 )	2025-05-07 16:18:43 +08:00
Dongsheng Zhu	d62b69aaef	[Fix] Fix InternVL model config (#2068 ) * intervl-8b&38b * intervl adjustment * internvl fix	2025-05-07 15:51:18 +08:00
Linchen Xiao	af8432e1d6	[Update] OpenAI SDK model reasoning content (#2078 ) * update * update * update	2025-05-07 14:06:40 +08:00
bittersweet1999	ddc9cc0afb	[Add] add a config to Judge dataset all (#2077 ) * fix pip version * fix pip version * add judgedatasetall * add judgedatasetall * add judgedatasetall	2025-05-07 10:57:23 +08:00
bittersweet1999	37cbaf8d92	[Add] Add Judgerbenchv2 (#2067 ) * fix pip version * fix pip version * add judgerbenchv2 * Update __init__.py	2025-04-30 17:12:34 +08:00
Taolin Zhang	b6148aa198	add Judgebench (#2066 ) * add rewardbench * add rewardbench * add rmb datasets * add rmb datasets * add judgebench * add judgebench	2025-04-30 15:01:10 +08:00
bittersweet1999	527a80947b	[Add] Add writingbench (#2028 ) * fix pip version * fix pip version * add writingbench * add writingbench * add writingbench * add writingbench	2025-04-29 16:29:32 +08:00
Mor-Li	f8e41dfeb4	[Docs] fix needlebench examples	2025-04-27 16:36:59 +08:00
Taolin Zhang	8c74e6a39e	add RMB Bench (#2056 ) * add rewardbench * add rewardbench * add rmb datasets * add rmb datasets	2025-04-27 16:26:01 +08:00
Mor-Li	890f051609	update docs typo	2025-04-26 13:38:32 +08:00
Mor-Li	831713ba5d	update docs	2025-04-26 13:35:45 +08:00
Mor-Li	ca1865cdac	update docs typo	2025-04-26 13:34:12 +08:00
Mor-Li	7297a00181	update bilingual needlebench docs	2025-04-26 13:24:56 +08:00

1 2 3 4 5 ...

968 Commits