OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Hanxu Hu	b504576fc1	Merge `56fc5748d8` into `d572761cef`	2025-05-29 14:38:31 +08:00
tcheng	3d1760aba2	[Dataset] Add Scieval (#2089 ) * style: pass all formatting hooks (yapf & quote fixer) * revise name:Add Lifescience Sub-set Support for MMLU & SciEval (datasets + configs + loader) * revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) * Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) * all categories of SciEval (datasets + configs + loader+dataset-index.yml) * revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) * revise :SciEval 5shot --------- Co-authored-by: root <tangcheng231@mails.ucas.edu.cn>	2025-05-14 10:25:03 +08:00
Wei Li	b84518c656	[Dataset] Support MedMCQA and MedBullets benchmark (#2054 ) * support medmcqa and medbullets benchmark * Add Medbullets data folder for benchmark support * revise gen name * revise config file & remove csv file & add dataset info to dataset-index.yml * remove csv file * remove print in medbullets.py * revise class name * update_oss_info --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2025-05-13 17:10:50 +08:00
Dongsheng Zhu	2c79dc5227	[Dataset] Add human_eval/mbpp pro (#2092 ) * add bench * update * bug fix * time update * add index * fix repeat bug	2025-05-12 18:38:13 +08:00
huihui1999	345674f700	[Dataset] Add SciknowEval Dataset (#2070 ) * first * first * first * first * SciKnowEval * fix hash * fix dataset-index & use official llm_judge_postprocess * fix dataset-index.yml * use official llmjudge_postprocess * fix lint * fix lint * fix lint * fix lint * fix lint * merge with main --------- Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2025-05-12 17:23:44 +08:00
huihui1999	44a7024ed5	[Dataset] MedCalc_Bench (#2072 ) * MedCalc_Bench * MedCal_Bench * add hash * fix hash * fix comments &dataset-index yml * fix lint * fix lint * fix lint * fix lint * fix lint --------- Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2025-05-09 16:58:55 +08:00
Jin Ye	6097186a95	[Datasets] MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 (#2064 ) * Add Datasets: MedQA, ProteinLMBench; Add Models: huatuogpt, baichuanM1 * Fix bugs for MedQA. Add info in dataset-index * Add version code for MedQA and ProteinLMBench * Add version code for MedQA and ProteinLMBench	2025-05-09 14:47:44 +08:00
Linchen Xiao	d72df59363	[Revert] Add Lifescience Sub-set Support for SciEval (#2059 ) (#2087 ) This reverts commit `c5048bfec7`.	2025-05-09 14:46:27 +08:00
tcheng	c5048bfec7	[Dataset] Add Lifescience Sub-set Support for SciEval (#2059 ) * style: pass all formatting hooks (yapf & quote fixer) * revise name:Add Lifescience Sub-set Support for MMLU & SciEval (datasets + configs + loader) * revise name:Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) * Add Lifescience SciEval (datasets + configs + loader+dataset-index.yml) --------- Co-authored-by: root <tangcheng231@mails.ucas.edu.cn>	2025-05-09 14:31:12 +08:00
huihui1999	a7f3ac20b2	[Dataset] Add CARDBiomedBench (#2071 ) * CARDBiomedBench * fix hash * fix dataset-index * use official llmjudge postprocess * use official llmjudge_postprocess * fix lint * fix init	2025-05-08 19:44:46 +08:00
Wei Li	a685ed7daf	[Dataset] Add nejm ai benchmark (#2063 ) * support nejm ai benchmark * add dataset files * revise gen name * revise gen name * revise class name & remove csv file & add dataset-index.yml info * update * update --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2025-05-08 16:44:05 +08:00
Dongsheng Zhu	ba0e32292c	[Feature] Support InternSandbox (#2049 ) * internsandbox init * internsandbox * dataset_index * dataset_index_add	2025-05-07 16:42:09 +08:00
Taolin Zhang	c69110361b	[Add] add rewardbench (#2029 ) * add rewardbench * add rewardbench	2025-04-21 17:18:51 +08:00
JuchengHu	a2093a81ef	[Dataset] Matbench (#2021 ) * add support for matbench * fix dataset path * fix data load * fix * fix lint --------- Co-authored-by: Jucheng Hu <jucheng.hu.20@ucl.ac.uk> Co-authored-by: Myhs-phz <demarcia2014@126.com>	2025-04-21 15:50:47 +08:00
Linchen Xiao	b2da1c08a8	[Dataset] Add SmolInstruct, Update Chembench (#2025 ) * [Dataset] Add SmolInstruct, Update Chembench * Add dataset metadata * update * update * update	2025-04-18 17:21:29 +08:00
Myhs_phz	75e7834b59	[Feature] Add Datasets: ClimateQA,Physics (#2017 ) * feat ClimateQA * feat PHYSICS * fix * fix * fix * fix	2025-04-14 20:18:47 +08:00
Jin Ye	b564e608b1	[Dataset] Add MedXpertQA (#2002 ) * Add MedXpertQA * Add MedXpertQA * Add MedXpertQA * Fix lint --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2025-04-08 10:44:48 +08:00
liushz	32d6859679	[Feature] Add olymmath dataset (#1982 ) * Add olymmath dataset * Add olymmath dataset * Add olymmath dataset * Update olymmath dataset	2025-04-02 17:34:07 +08:00
Dongsheng Zhu	8a5029b121	[Feature] Add MultiPL-E & Code Evaluator (#1963 ) * multiple_code develop * multiple_code update * comments upadate * index upadate	2025-03-21 20:09:25 +08:00
Yufeng Zhao	bc2969dba8	[Feature] Add support for BBEH dataset (#1925 ) * bbeh * bbeh * fix_smallbugs_bbeh * removeprint * results --------- Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>	2025-03-12 10:53:31 +08:00
Kangreen	59e49aedf1	[Feature] Support SuperGPQA (#1924 ) * support supergpqa * remove unnecessary code * remove unnecessary code * Add Readme * Add Readme * fix lint * fix lint * update * update --------- Co-authored-by: mkj3085003 <mkj3085003@gmail.com> Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2025-03-11 19:32:08 +08:00
liushz	198c08632e	[Feature] Add HLE (Humanity's Last Exam) dataset (#1902 ) * Support OlympiadBench Benchmark * Support OlympiadBench Benchmark * Support OlympiadBench Benchmark * update dataset path * Update olmpiadBench * Update olmpiadBench * Update olmpiadBench * Add HLE dataset * Add HLE dataset * Add HLE dataset --------- Co-authored-by: sudanl <sudanl@foxmail.com>	2025-03-04 16:42:37 +08:00
Hanxu Hu	8ea13bde6a	add humaneval	2025-02-19 04:43:17 +01:00
Hanxu Hu	707ef2fef9	add benchmax part 1	2025-02-10 03:27:55 +01:00
Shudong Liu	412199f802	[Feature] Support OlympiadBench Benchmark (#1841 ) * Support OlympiadBench Benchmark * Support OlympiadBench Benchmark * Support OlympiadBench Benchmark * update dataset path * Update olmpiadBench * Update olmpiadBench * Update olmpiadBench --------- Co-authored-by: liushz <qq1791167085@163.com>	2025-01-24 10:00:01 +08:00
Zhao Qihao	e039f3efa0	[Feature] Support MMLU-CF Benchmark (#1775 ) * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * Update mmlu-cf * Update mmlu-cf * Update mmlu-cf * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * [Feature] Support MMLU-CF Benchmark * Remove outside configs --------- Co-authored-by: liushz <qq1791167085@163.com>	2025-01-09 14:11:20 +08:00
Linchen Xiao	117dc500ad	[Feature] Add Longbenchv2 support (#1801 ) * Create eval_longbenchv2.py * Create longbenchv2_gen.py * Update __init__.py * Create longbenchv2.py * Update datasets_info.py * update * update * update * update * update * update --------- Co-authored-by: abrohamLee <146956824+abrohamLee@users.noreply.github.com>	2025-01-03 12:04:29 +08:00
liushz	9c980cbc62	[Feature] Add LiveStemBench Dataset (#1794 ) * [Fix] Fix vllm max_seq_len parameter transfer * [Fix] Fix vllm max_seq_len parameter transfer * Add livestembench dataset * Add livestembench dataset * Add livestembench dataset * Update livestembench_gen_3e3c50.py * Update eval_livestembench.py * Update eval_livestembench.py	2024-12-31 15:17:39 +08:00
OpenStellarTeam	1a5b3fc11e	Add Chinese SimpleQA config (#1697 ) * add chinese simpleqa config * add chinese simpleqa config * add chinese simpleqa config * add chinese simpleqa config * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * Update CsimpleQA * pdate Csimpleqa --------- Co-authored-by: 明念 <heyancheng.hyc@taobao.com> Co-authored-by: liushz <qq1791167085@163.com>	2024-12-11 18:03:39 +08:00
Songyang Zhang	fb43dd1906	[Update] Update Skywork/Qwen-QwQ (#1728 ) * Update JuderBench * Support O1-style Prompts * Update Code * Update OpenAI * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update BigCodeBench * Update	2024-12-05 19:30:43 +08:00
Linchen Xiao	9de27b4d85	[Update] Update max_out_len for datasets (#1726 ) * [Update] Update max_out_len for datasets * Update eval_regression_chat_objective_fullbench.py * Update eval_regression_chat.py * Update eval_regression_chat.py * Update oc_score_baseline_fullbench.yaml --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>	2024-12-02 11:42:07 +08:00
liushz	c437135fad	[Feature] Add Openai Simpleqa dataset (#1720 ) * Add Openai SimpleQA dataset * Add Openai SimpleQA dataset * Add Openai SimpleQA dataset * Update eval_simpleqa.py --------- Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2024-11-28 19:16:07 +08:00
Yufeng Zhao	300adc31e8	[Feature] Add Korbench dataset (#1713 ) * first version for korbench * first stage for korbench * korbench_1 * korbench_1 * korbench_1 * korbench_1 * korbench_1_revised * korbench_combined_1 * korbench_combined_1 * kor_combined * kor_combined * update --------- Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>	2024-11-25 20:11:27 +08:00
abrohamLee	e9e4b69ddb	[Feature] MuSR Datset Evaluation (#1689 ) * MuSR Datset Evaluation * MuSR Datset Evaluation Add an assertion and a Readme.md	2024-11-14 20:42:12 +08:00
Linchen Xiao	e92a5d4230	[Feature] BABILong Dataset added (#1684 ) * update * update * update * update	2024-11-14 15:32:43 +08:00
Junnan Liu	645c5f3b2c	[Datasets] Add datasets CMO&AIME (#1610 ) * add datasets cmo&aime * delete unused modules * modify prompt * update __init__ * update data load and add README * update data load * update performance * update md5 * remove indents * add indent * fix log for debug mode	2024-10-28 18:08:02 +08:00
Songyang Zhang	a4d5a6c81b	[Feature] Support LiveCodeBench (#1617 ) * Update * Update LCB * Update * Update * Update * Update * Update	2024-10-21 20:50:39 +08:00
Bob Tsang	dd0b655bd0	[Feature] Support MMMLU & MMMLU-lite Benchmark (#1565 ) * rm folder * modify format according to reviewer * modify format according to reviewer * modify format according to reviewer * add some files requirement * fix some bug * fix bug * change load type * Update MMMLU Dataset * Update MMMLU Dataset * Add MMMLU-Lite Dataset * update MMMMLU datast * update MMMMLU datast * update MMMMLU datast --------- Co-authored-by: BobTsang <BobTsang1995@gmail.com> Co-authored-by: liushz <qq1791167085@163.com>	2024-10-17 19:09:34 +08:00
liushz	5faee929db	[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config (#1589 ) * Add GaoKaoMath Dataset * Add MATH LLM Eval * Update GAOKAO Math Eval Dataset * Update GAOKAO Math Eval Dataset	2024-10-12 19:13:06 +08:00
bittersweet1999	3f7a3730d7	[Fix] fix Flames (#1599 ) * fix pip version * fix pip version * fix flames * fix flames	2024-10-12 14:34:59 +08:00
shijinpjlab	7528b8ab8a	[Feature] Add dingo test (#1529 ) * add qa dingo * update * change name qa to dingo * eval model: llm_base * update path * change name and move path * add eval_dingo * update import * add for pip * add dingo package * change import place * update import place * fix lint fail * isort * double quoted --------- Co-authored-by: sj <shijin@pjlab.org.cn>	2024-09-29 19:24:58 +08:00
Hari Seldon	14b4b735cb	[Feature] Add support for SciCode (#1417 ) * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode w/ bg * add scicode * Update README.md * Update README.md * Delete configs/eval_SciCode.py * rename * 1 * rename * Update README.md * Update scicode.py * Update scicode.py * fix some bugs * Update * Update --------- Co-authored-by: root <HariSeldon0> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-08-22 13:42:25 +08:00
Linchen Xiao	a4b54048ae	[Feature] Add Ruler datasets (#1310 ) * [Feature] Add Ruler datasets * pre-commit fixed * Add model specific tokenizer to dataset * pre-commit modified * remove unused import * fix linting * add trust_remote to tokenizer load * lint fix * comments resolved * fix lint * Add readme * Fix lint * ruler refactorize * fix lint * lint fix * updated * lint fix * fix wonderwords import issue * prompt modified * update * readme updated * update * ruler dataset added * Update --------- Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-08-20 11:40:11 +08:00
Peng Bo	07c96ac659	Calm dataset (#1385 ) * Add CALM Dataset	2024-08-01 10:03:21 +08:00
Songyang Zhang	704853e5e7	[Feature] Update pip install (#1324 ) * [Feature] Update pip install * Update Configuration * Update * Update * Update * Update Internal Config * Update collect env	2024-07-29 18:32:50 +08:00
Xingjun.Wang	edab1c07ba	[Feature] Support ModelScope datasets (#1289 ) * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * udpate dataset for modelscope support * update readme * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * update readme * remove tydiqa japanese subset * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * update readme * udpate dataset for modelscope support * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * remove tydiqa japanese subset * update util * remove .DS_Store * fix md format * move util into package * update docs/get_started.md * restore eval_api_zhipu_v2.py, add environment setting * Update dataset * Update * Update * Update * Update --------- Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local> Co-authored-by: Yunnglin <mao.looper@qq.com> Co-authored-by: Yun lin <yunlin@laptop.local> Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn> Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-07-29 13:48:32 +08:00
bittersweet1999	d3782c1d47	Revert "Calm dataset (#1287 )" (#1366 ) This reverts commit `edd0ffdf70`.	2024-07-26 18:27:29 +08:00
Peng Bo	edd0ffdf70	Calm dataset (#1287 ) * add calm dataset * modify config max_out_len * update README * Modify README * update README * update README * update README * update README * update README * add summarizer and modify readme * delete summarizer config comment * update summarizer * modify same response to all questions * update README	2024-07-26 11:48:16 +08:00
Que Haoran	a244453d9e	[Feature] Support inference ppl datasets (#1315 ) * commit inference ppl datasets * revised format * revise * revise * revise * revise * revise * revise	2024-07-22 17:59:30 +08:00
Fengzhe Zhou	a32f21a356	[Sync] Sync with internal codes 2024.06.28 (#1279 )	2024-06-28 14:16:34 +08:00

1 2 3

110 Commits