OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
bittersweet1999	1fe152b3e8	[Feature] Support AlignmentBench infer and judge (#697 ) * alignmentbench infer and judge * alignmentbench * alignmentbench done * alignment all done * alignment all done	2023-12-13 19:59:30 +08:00
bittersweet1999	465308e430	[Feature] Add Subjective Evaluation (#680 ) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint	2023-12-11 22:22:11 +08:00
Hubert	e78857ac36	[Sync] minor test (#683 )	2023-12-11 17:42:53 +08:00
Jingming	dd4318f6ab	[Feature] enhance the ability of humaneval_postprocess (#676 ) * [Feature] enhance the ability of humaneval_postprocess * refactor * [Feature] Keep the old version of the function and realize the new function in humaneval_postprocess_v2. * Update opencompass/datasets/humaneval.py --------- Co-authored-by: Leymore <zfz-960727@163.com> Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>	2023-12-11 14:39:56 +08:00
Xiaoming Shi	1bf85949ef	[Feature] Add medbench (#678 ) * update medbench * medbench update * format medbench * format --------- Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org> Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-09 16:05:46 +08:00
liyucheng09	05bbce8b08	[Feature] Add Data Contamination Analysis (#639 ) * add contamination analysis to ceval * fix bugs * add contamination docs * to pass CI check * update --------- Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-08 10:00:11 +08:00
bittersweet1999	1c95790fdd	New subjective judgement (#660 ) * TabMWP * TabMWP * fixed * fixed * fixed * done * done * done * add new subjective judgement * add new subjective judgement * add new subjective judgement * add new subjective judgement * add new subjective judgement * modified to a more general way * modified to a more general way * final * final * add summarizer * add new summarize * fixed * fixed * fixed --------- Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>	2023-12-06 13:28:33 +08:00
rolellm	e10f1c9139	added rolebench dataset. (#633 ) * added rolebench * 修改了不合理的变量名 * 修改了评论中的变量名	2023-12-01 22:54:42 +08:00
Hubert	9eb5cadcac	[Feat] update gsm8k and math agent config (#652 ) * [Feat] update gsm8k and math agent config * minor fix	2023-12-01 15:08:38 +08:00
liushz	a331c9abfd	[Feature] Add wikibench dataset (#655 ) * Add WikiBench * Add WikiBench * format --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-01 14:56:54 +08:00
liushz	e019c831fe	[Feature] Add Chinese version: commonsenseqa, crowspairs and nq (#144 ) * add Chinese version: csqa crowspairs nq * Update cn_data * Update cn_data * update format --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2023-11-30 15:33:02 +08:00
Ma Zerun	6aaf3b91ec	[Feature] Support chat style inferencer. (#643 ) * [Feature] Support chat style inferencer. * [Fix] use new prompt * [Fix] use new prompt --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-11-30 14:00:06 +08:00
liushz	6d0d78986c	[Feature] Add GSM_Hard dataset (#619 ) * Add SVAMP dataset * Add SVAMP dataset * Add SVAMP dataset * Add gsm_hard dataset * Add gsm_hard dataset * format --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-11-27 17:40:34 +08:00
Fengzhe Zhou	9083dea683	[Sync] some renaming (#641 )	2023-11-27 16:06:49 +08:00
Fengzhe Zhou	d949e3c003	[Feature] Add circular eval (#610 ) * refactor default, add circular summarizer * add circular * update impl * update doc * minor update * no more to be added	2023-11-23 16:45:47 +08:00
Fengzhe Zhou	d4d1330a5a	[Sync] Fix cmnli, fix vicuna meta template, fix longbench postprocess and other minor fixes (#625 )	2023-11-23 14:05:59 +08:00
liushz	048775192b	[Feature] Add SVAMP dataset (#604 ) * Add SVAMP dataset * Add SVAMP dataset * Add SVAMP dataset	2023-11-22 14:54:39 +08:00
liushz	dbacd36379	Add aritch to mathbench (#607 )	2023-11-20 19:40:41 +08:00
liushz	c9c5c5d92e	Mathbench update postprocess (#600 ) * Update mathbench * Update mathbench	2023-11-20 16:48:55 +08:00
Hubert	91fba2c2e9	[Feat] support humaneval and mbpp pass@k (#598 ) * [Feat] support pass@ k * [Feat] support pass@k * [Feat] support pass@k * [Feat] support pass@k * [Feat] support pass@k * [Feat] support pass@k docs * update naming --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-11-16 21:22:06 +08:00
Raymond Zhang	c0acd06b05	[Feature] Add FinanceIQ dataset (#596 )	2023-11-16 17:47:57 +08:00
Fengzhe Zhou	19ad7f9613	fix cmb dataset (#587 )	2023-11-14 16:13:39 +08:00
Wei Jueqi	14e6fe6f13	Fix bugs in subjective evaluation (#589 ) * rename * fix sub bugs and update docs * update * update	2023-11-14 16:11:55 +08:00
Fengzhe Zhou	d3de5c41fb	[Sync] update model configs (#574 )	2023-11-13 15:15:34 +08:00
Fengzhe Zhou	689ffe5b63	[Feature] Use dataset in local path (#570 ) * update commonsenseqa * update drop * update flores_first100 * update gsm8k * update humaneval * update lambda * update obqa * update piqa * update race * update siqa * update story_cloze * update strategyqa * update tydiqa * update winogrande * update doc * update hellaswag * fix obqa * update collections * update .zip name	2023-11-13 13:00:37 +08:00
Fengzhe Zhou	d6aaac22e7	[Feature] Update cmb (#571 )	2023-11-13 00:09:05 +08:00
jingmingzhuo	b3cbef3226	[Feature] Add py150 and maxmin (#562 ) * [feat] add clozeTesst_maxmin dataset * [feat] add py150 datasets * [feat] change __init__.py in opencompass/datasets * [fix] pre-commit check * [fix] rename py150 and masxmin datasets in configs * [feat] add gen.py of py150 and maxmin in configs/datasets	2023-11-09 22:05:25 +08:00
Hubert	cf5a6d1ab7	[Fix] fix unnecessary import and update requirements (#555 )	2023-11-08 17:58:49 +08:00
Hubert	bb2ecf416e	[Feat] Support cibench (#538 ) * [Feat] support cidataset * [Feat] support cidataset * [Feat] support cidataset * [Feat] support cidataset * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * rename cibench * rename cibench * rename cibench * rename cibench * minor fix * minor fix * minor fix	2023-11-07 19:11:44 +08:00
bittersweet1999	f25a980043	[fFeat] Add an opensource dataset Tabmwp (#505 ) * TabMWP * TabMWP * fixed * fixed * fixed * done * done * done --------- Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>	2023-11-03 11:15:46 +08:00
Fengzhe Zhou	dbb20b8270	[Sync] update (#517 )	2023-10-27 20:31:22 +08:00
Wei Jueqi	b62842335d	[Doc] Update Subjective docs (#510 ) * rename * add en subdoc * fix name * fix writing * update --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-10-27 16:27:24 +08:00
Leymore	4dd9a3fc10	[Sync] sync with internal codes 20231019 (#488 )	2023-10-18 23:37:35 -05:00
liushz	2737249f31	[Feature] Add mathbench dataset and circular evaluator (#408 ) * add_mathbench * update mathbench * support non circular eval dataset --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-10-18 04:08:31 -05:00
Leymore	861942ab1b	[Feature] Add lawbench (#460 ) * add lawbench * update requirements * update	2023-10-13 06:51:36 -05:00
Leymore	fbf5089c40	[Sync] update github token (#475 )	2023-10-13 06:50:54 -05:00
Leymore	9db5652638	[Feature] re-implement ceval load dataset (#446 )	2023-09-27 21:18:48 +08:00
philipwangOvO	3bb3d330eb	[Sync] Update LongEval (#443 )	2023-09-27 16:32:40 +08:00
liushz	c5224c2a91	[Feature] Add kaoshi dataset (#392 ) * Add ToT method * Update ToT * Update ToT * Update ToT * Update ToT * Update ToT * Add Koashi * Update Kaoshi * Update Kaoshi * Update kaoshi * Update kaoshi * Update Kaoshi * Update Kaoshi * Update Kaoshi * Update Kaoshi * update Kaoshi * update * update * fix --------- Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>	2023-09-22 18:46:33 +08:00
TTTTTiam	2a62bea1a4	add evaluation of scibench (#393 ) * add evaluation of scibench * add evaluation of scibench * update scibench * remove scibench evaluator --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-09-22 17:42:08 +08:00
Tong Gao	a1ea3c094a	[Sync] Initial support of subjective evaluation (#421 ) Co-authored-by: Leymore <zfz-960727@163.com>	2023-09-22 15:42:31 +08:00
Hubert	8803f7f7a6	[Feat] support antropics evals dataset (#422 ) * [Feat] support anthropics ai risk dataset * [Feat] support anthropics evals dataset * [Feat] support anthropics evals dataset	2023-09-20 18:36:44 +08:00
Hubert	2c15a0c01d	[Feat] refine docs and codes for more user guides (#409 )	2023-09-18 16:12:13 +08:00
Hubert	a11cb45c83	[Feat] implementation for support promptbench (#239 ) * [Feat] support adv_glue dataset for adversarial robustness * reorg files * minor fix * minor fix * support prompt bench demo * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix	2023-09-15 15:06:53 +08:00
Hubert	de8a154795	[Feat] support ds1000 dataset (#395 ) * [Feat] support ds1000 datase	2023-09-15 12:50:27 +08:00
Xidong Wang	47a752cd56	[Dataset] Add CMB (#376 ) * Add CMB * modify CMB --------- Co-authored-by: wangxidong <xidongw@163.com>	2023-09-12 19:16:41 +08:00
Hubert	ddb8197212	[Feat] support wizardcoder series (#344 ) * [Feat] support wizardcoder series * minor fix	2023-09-06 17:52:35 +08:00
Leymore	a1782f9a08	[Fix] triviaqa & nq postprocess (#350 )	2023-09-04 15:24:52 +08:00
Leymore	7ca6ba625e	[Feature] Add qwen & qwen-chat support (#286 ) * add and apply update suffix tool * add tool doc * add qwen configs * add cmmlu * rename bbh * update datasets * delete * update hf_qwen_7b.py	2023-08-31 11:29:05 +08:00
Hubert	fd389e2d78	[Feat] support codellama and preds collection tools (#335 )	2023-08-31 11:14:42 +08:00

1 2

78 Commits