OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
tcheng	c5048bfec7	[Dataset] Add Lifescience Sub-set Support for SciEval (#2059 ) * style: pass all formatting hooks (yapf & quote fixer) * revise name:Add Lifescience Sub-set Support for MMLU & SciEval (datasets + configs + loader) * revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) * Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) --------- Co-authored-by: root <tangcheng231@mails.ucas.edu.cn>	2025-05-09 14:31:12 +08:00
huihui1999	a7f3ac20b2	[Dataset] Add CARDBiomedBench (#2071 ) * CARDBiomedBench * fix hash * fix dataset-index * use official llmjudge postprocess * use official llmjudge_postprocess * fix lint * fix init	2025-05-08 19:44:46 +08:00
Mo Li	ff3275edf0	[Update] Add Long-Context configs for Gemma, OREAL, and Qwen2.5 models (#2048 ) * [Update] Update Gemma, Oreal, Qwen Config * fix lint	2025-05-08 19:06:56 +08:00
Wei Li	a685ed7daf	[Dataset] Add nejm ai benchmark (#2063 ) * support nejm ai benchmark * add dataset files * revise gen name * revise gen name * revise class name & remove csv file & add dataset-index.yml info * update * update --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2025-05-08 16:44:05 +08:00
Jiahao Xu	9ec23c145b	[Datasets] Add ClinicBench, PubMedQA and ScienceQA (#2061 ) * Add ClinicBench * Add PubMedQA & ScienceQA & ClinicBench * Add PubMedQA & ScienceQA & ClinicBench * Update datasets_info & hf_path * Update hf_path	2025-05-08 16:25:43 +08:00
Dongsheng Zhu	ba0e32292c	[Feature] Support InternSandbox (#2049 ) * internsandbox init * internsandbox * dataset_index * dataset_index_add	2025-05-07 16:42:09 +08:00
谢昕辰	43b2c4ed76	[Fix] Update lawbench data path (#2037 )	2025-05-07 16:18:43 +08:00
Dongsheng Zhu	d62b69aaef	[Fix] Fix InternVL model config (#2068 ) * intervl-8b&38b * intervl adjustment * internvl fix	2025-05-07 15:51:18 +08:00
Linchen Xiao	af8432e1d6	[Update] OpenAI SDK model reasoning content (#2078 ) * update * update * update	2025-05-07 14:06:40 +08:00
bittersweet1999	ddc9cc0afb	[Add] add a config to Judge dataset all (#2077 ) * fix pip version * fix pip version * add judgedatasetall * add judgedatasetall * add judgedatasetall	2025-05-07 10:57:23 +08:00
bittersweet1999	37cbaf8d92	[Add] Add Judgerbenchv2 (#2067 ) * fix pip version * fix pip version * add judgerbenchv2 * Update __init__.py	2025-04-30 17:12:34 +08:00
Taolin Zhang	b6148aa198	add Judgebench (#2066 ) * add rewardbench * add rewardbench * add rmb datasets * add rmb datasets * add judgebench * add judgebench	2025-04-30 15:01:10 +08:00
bittersweet1999	527a80947b	[Add] Add writingbench (#2028 ) * fix pip version * fix pip version * add writingbench * add writingbench * add writingbench * add writingbench	2025-04-29 16:29:32 +08:00
Taolin Zhang	8c74e6a39e	add RMB Bench (#2056 ) * add rewardbench * add rewardbench * add rmb datasets * add rmb datasets	2025-04-27 16:26:01 +08:00
Linchen Xiao	e8bc8c1e8c	[Bug] Concat OpenaiSDK reasoning content (#2041 ) * [Bug] Concat OpenaiSDK reasoning content * [Bug] Concat OpenaiSDK reasoning content * update * update	2025-04-25 14:10:33 +08:00
Junnan Liu	97010dc4ce	[Update] Update dataset repeat concatenation (#2039 )	2025-04-23 16:16:28 +08:00
Linchen Xiao	dcbf899369	[Bug] Fix SmolInsturct logger import (#2036 )	2025-04-23 11:10:30 +08:00
Linchen Xiao	bf74f26603	[Update] Safe SmolInstruct meteor calculation (#2033 )	2025-04-22 18:27:48 +08:00
Linchen Xiao	455bb05d1b	[Update] Update dataset configs (#2030 ) * [Update] Update dataset configs * Fix lint	2025-04-21 18:55:06 +08:00
Taolin Zhang	c69110361b	[Add] add rewardbench (#2029 ) * add rewardbench * add rewardbench	2025-04-21 17:18:51 +08:00
JuchengHu	a2093a81ef	[Dataset] Matbench (#2021 ) * add support for matbench * fix dataset path * fix data load * fix * fix lint --------- Co-authored-by: Jucheng Hu <jucheng.hu.20@ucl.ac.uk> Co-authored-by: Myhs-phz <demarcia2014@126.com>	2025-04-21 15:50:47 +08:00
Linchen Xiao	b2da1c08a8	[Dataset] Add SmolInstruct, Update Chembench (#2025 ) * [Dataset] Add SmolInstruct, Update Chembench * Add dataset metadata * update * update * update	2025-04-18 17:21:29 +08:00
Linchen Xiao	65ff602cf5	[Update] Fix LLM Judge metrics cacluation & Add reasoning content concat to OpenAI SDK	2025-04-15 11:33:16 +08:00
Myhs_phz	75e7834b59	[Feature] Add Datasets: ClimateQA,Physics (#2017 ) * feat ClimateQA * feat PHYSICS * fix * fix * fix * fix	2025-04-14 20:18:47 +08:00
Linchen Xiao	6a6a1a5c0b	[Feature] LLM Judge sanity check (#2012 ) * update * update	2025-04-11 19:01:39 +08:00
bittersweet1999	3f50b1dc49	[Fix] fix order bug Update arena_hard.py (#2015 )	2025-04-11 16:59:40 +08:00
Junnan Liu	20660ab507	[Fix] Fix compare error when k is list in base_evaluator (#2010 ) * fix gpass compare error of list k * fix compare error in 177	2025-04-10 19:47:21 +08:00
Linchen Xiao	12213207b6	[Refactor] Refactorize openicl eval task (#1990 ) * [Refactor] Refactorize openicl eval task * update	2025-04-09 15:52:23 +08:00
zhulinJulia24	6ac9b06bc2	[ci] update baseline for kernal change of vllm and lmdeploy (#2011 ) * update * update * update * update * update * update * update	2025-04-09 14:09:35 +08:00
Linchen Xiao	a05f9da134	[Feature] Make dump-eval-details default behavior (#1999 ) * Update * update * update	2025-04-08 14:42:26 +08:00
Myhs_phz	fd82bea747	[Fix] OpenICL Math Evaluator Config (#2007 ) * fix * fix recommended * fix * fix * fix * fix	2025-04-08 14:38:35 +08:00
Linchen Xiao	bb58cfc85d	[Feature] Add CascadeEvaluator (#1992 ) * [Feature] Add CascadeEvaluator * update * updat	2025-04-08 11:58:14 +08:00
Jin Ye	b564e608b1	[Dataset] Add MedXpertQA (#2002 ) * Add MedXpertQA * Add MedXpertQA * Add MedXpertQA * Fix lint --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2025-04-08 10:44:48 +08:00
shijinpjlab	828fb745c9	[Dataset] Update dingo 1.5.0 (#2008 ) Co-authored-by: shiin <shijin@pjlab.org.cn>	2025-04-07 17:21:15 +08:00
zhulinJulia24	f982d6278e	[CI] fix baseline score (#2000 ) * update * update * update * update * update * update * update * updaste * update * update * updaste * updaste * update * update * update * update * update * update * update * update	2025-04-03 19:32:36 +08:00
Myhs_phz	3a9a384173	[Doc] Fix links between zh & en (#2001 ) * test * test * test	2025-04-03 17:37:53 +08:00
Myhs_phz	9b489e9ea0	[Update] Revert math500 dataset configs (#1998 )	2025-04-03 15:11:02 +08:00
Linchen Xiao	dc8deb6af0	[BUMP] Bump version to 0.4.2 (#1997 )	2025-04-02 17:47:15 +08:00
liushz	32d6859679	[Feature] Add olymmath dataset (#1982 ) * Add olymmath dataset * Add olymmath dataset * Add olymmath dataset * Update olymmath dataset	2025-04-02 17:34:07 +08:00
zhulinJulia24	97236c8e97	[CI] Fix baseline score (#1996 ) * update * update * update * update	2025-04-02 14:25:16 +08:00
Linchen Xiao	f66b0b347a	[Update] Requirements update (#1993 )	2025-04-02 12:03:45 +08:00
Dongsheng Zhu	330a6e5ca7	[Update] Add Intervl-8b&38b model configs (#1978 )	2025-04-01 11:51:37 +08:00
Myhs_phz	f71eb78c72	[Doc] Add TBD Token in Datasets Statistics (#1986 ) * feat * doc * doc * doc * doc	2025-03-31 19:08:55 +08:00
Linchen Xiao	0f46c35211	[Bug] Aime2024 config fix (#1974 ) Some checks failed lint / lint (push) Has been cancelled Details * [Bug] Aime2024 config fix * fix	2025-03-25 17:57:11 +08:00
Myhs_phz	6118596362	[Feature] Add recommendation configs for datasets (#1937 ) * feat datasetrefine drop * fix datasets in fullbench_int3 * fix * fix * back * fix * fix and doc * feat * fix hook * fix * fix * fix * fix * fix * fix * fix * fix * fix * doc * fix * fix * Update dataset-index.yml	2025-03-25 14:54:13 +08:00
Linchen Xiao	07930b854a	[Update] Add Korbench config with no max_out_len (#1968 ) Some checks are pending lint / lint (push) Waiting to run Details * Add Korbench no max_out_len * Add Korbench no max_out_len	2025-03-24 18:38:06 +08:00
Myhs_phz	37307fa996	[Update] Add QWQ32b model config (#1959 ) Some checks are pending lint / lint (push) Waiting to run Details * feat qwq-32b * fix * feat phi_4 --------- Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2025-03-24 14:51:39 +08:00
Linchen Xiao	db96161a4e	[Update] Add SuperGPQA subset metrics (#1966 )	2025-03-24 14:25:12 +08:00
Linchen Xiao	aa05993922	[Update] Add dataset configurations of no max_out_len (#1967 ) * [Update] Add dataset configurations of no max_out_len * update test torch version * update test torch version * update test torch version * update test torch version	2025-03-24 14:24:12 +08:00
Linchen Xiao	64128916d0	[Update] Increase memory size for CPU job of VOLC Runner (#1962 ) * [Update] Increase memory size for CPU job of VOLC Runner * [Update] Increase memory size for CPU job of VOLC Runner	2025-03-24 11:21:14 +08:00

1 2 3 4 5 ...

931 Commits