OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Fengzhe Zhou	32f40a8f83	[Sync] Sync with internal codes 2023.01.08 (#777 )	2024-01-08 14:07:24 +00:00
liyucheng09	0b2863039e	[Feature] Contamination analysis for MMLU, Hellaswag, and ARC_c (#699 ) * Contamination analysis for ARC_c, mmlu, and Hellaswag * update `eval_contamination.py` * update `contamination.py` summarizer * fix `eval_contamination.py` * add mmlu groups for contamination analysis	2024-01-08 15:51:48 +08:00
Yuchen Yan	11f3b91e78	[Fix] fix typos in drop prompt (#773 ) Co-authored-by: yanyuchen04 <yanyuchen04@meituan.com>	2024-01-08 14:22:35 +08:00
Connor-Shen	30a90d8dd8	Support Mbpp_plus dataset (#770 ) * support mbpp+ * support mbpp+ * minor fix * [Feat] minor fix --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2024-01-05 22:01:57 +08:00
bittersweet1999	2163f9398f	[Feature] add subject ir dataset (#755 ) * add subject ir * Add ir dataset * Add ir dataset	2024-01-05 12:00:57 +00:00
bittersweet1999	be369c3e06	[Feature] Add multi_round dataset evaluation (#766 ) * multi_round dataset * add multi_round evaluation	2024-01-04 10:37:52 +00:00
Mo Li	33f8df1ca3	[Update] Change NeedleInAHaystackDataset to dynamic dataset loading (#754 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues * add English version support * change NeedleInAHaystackDataset to dynamic loading * change NeedleInAHaystackDataset to dynamic loading * fix needleinahaystack test eval bug * fix needleinahaystack config bug --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-02 17:22:56 +08:00
Francis-llgg	b69fe2343b	[Feature] Add GPQA Dataset (#729 ) * check * message * add * change prompt * change a para nameq * modify name of the file * delete an useless file	2024-01-01 15:54:40 +08:00
Francis-llgg	ef3ae63539	[Feature] Add new dataset mastermath2024v1 (#744 ) * add new dataset mastermath2024v1 * change it to simplified chinese prompt * change file name	2024-01-01 15:53:24 +08:00
Mo Li	17b8e929dd	[Feature] Update plot function in tools_needleinahaystack.py (#747 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-12-29 18:51:09 +08:00
Hubert	327951087f	[Feat] update code config (#749 ) * [Feat] update code dataset * [Feat] update code dataset * [Feat] update code dataset	2023-12-29 18:46:34 +08:00
bittersweet1999	fe0b717033	add creationbench (#753 )	2023-12-29 10:03:44 +00:00
bittersweet1999	8728287a55	fix erro in configs (#750 )	2023-12-28 11:53:07 +00:00
Connor-Shen	81098722d2	add chinese version of humaneval, mbpp (#743 ) * add chinese_version of humaneval,mbpp * add humaneval&mbpp gen.py * minor fix * minor add --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-12-28 14:47:56 +08:00
Hubert	0a525985e8	[Feature] Support sanitized MBPP dataset (#745 )	2023-12-27 22:17:23 +08:00
bittersweet1999	dfd9ac0fd9	[Feature] Add other judgelm prompts for Alignbench (#731 ) * add judgellm prompts * add judgelm prompts * update import info * fix situation that no abbr in config * fix situation that no abbr in config * add summarizer for other judgellm * change config name * add maxlen * add maxlen * dict assert * dict assert * fix strings * fix strings	2023-12-27 17:54:53 +08:00
Yang Yong	54345c56b7	Update LightllmApi and Fix mmlu bug (#738 ) * Update LightllmApi and Fix mmlu bug * checkout mmlu_gen_a484b3.py --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-27 13:49:08 +08:00
philipwangOvO	34561ececb	[Feature] Add InfiniteBench (#739 ) * add InfiniteBench * add InfiniteBench --------- Co-authored-by: wangchonghua <wangchonghua@pjlab.org.cn>	2023-12-26 15:36:27 +08:00
Fengzhe Zhou	3a68083ecc	[Sync] update configs (#734 )	2023-12-25 21:59:16 +08:00
Mo Li	0e24f4213e	[Feature] Add NeedleInAHaystack Test Support (#714 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-12-23 12:00:51 +08:00
Skyfall-xzz	b35d991786	[Feature] Add ReasonBench(Internal) dataset (#577 ) * [Feature] Add reasonbench dataset * add configs for supporting generative inference & merge datasets in the same category * modify config filename to prompt version * fix codes to meet pre-commit requirements * lint the code to meet pre-commit requirements * Align Load_data Sourcecode Briefly * fix bugs * reduce code redundancy	2023-12-20 17:57:42 +08:00
Jingming	76a95e9e81	[Feature] Support the use of humaneval_plus. (#720 ) * [Feature] Support the use of humaneval_plus. * [Feature] Add humaneval_plus_gen.py * minor check * [Fix] Fix bug --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-12-20 17:25:17 +08:00
bittersweet1999	47e745d748	quick fix for maxoutlen (#719 )	2023-12-20 00:00:28 +08:00
Hubert	5e8b838f51	[Feat] Update math/agent (#716 ) * minor add * minor add * minor fix	2023-12-19 21:20:42 +08:00
Songyang Zhang	bfe4aa2af5	[Fix] Update alignmentbench (#704 ) * update alignmentbench * update alignmentbench * update alignmentbench	2023-12-14 18:24:21 +08:00
bittersweet1999	1fe152b3e8	[Feature] Support AlignmentBench infer and judge (#697 ) * alignmentbench infer and judge * alignmentbench * alignmentbench done * alignment all done * alignment all done	2023-12-13 19:59:30 +08:00
bittersweet1999	6130394165	[Feature] Add double order of subjective evaluation and removing duplicated response among two models (#692 ) * add features * add doc string * add doc string	2023-12-12 20:58:17 +08:00
bittersweet1999	465308e430	[Feature] Add Subjective Evaluation (#680 ) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint	2023-12-11 22:22:11 +08:00
Hubert	e78857ac36	[Sync] minor test (#683 )	2023-12-11 17:42:53 +08:00
Xiaoming Shi	1bf85949ef	[Feature] Add medbench (#678 ) * update medbench * medbench update * format medbench * format --------- Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org> Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-09 16:05:46 +08:00
liyucheng09	05bbce8b08	[Feature] Add Data Contamination Analysis (#639 ) * add contamination analysis to ceval * fix bugs * add contamination docs * to pass CI check * update --------- Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-08 10:00:11 +08:00
bittersweet1999	1c95790fdd	New subjective judgement (#660 ) * TabMWP * TabMWP * fixed * fixed * fixed * done * done * done * add new subjective judgement * add new subjective judgement * add new subjective judgement * add new subjective judgement * add new subjective judgement * modified to a more general way * modified to a more general way * final * final * add summarizer * add new summarize * fixed * fixed * fixed --------- Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>	2023-12-06 13:28:33 +08:00
rolellm	e10f1c9139	added rolebench dataset. (#633 ) * added rolebench * 修改了不合理的变量名 * 修改了评论中的变量名	2023-12-01 22:54:42 +08:00
liushz	f4bbff6537	[Feature] Update MathBench CodeInterpreter & fix MathBench Bug (#657 ) * Update MathBench CodeInterpreter & fix MathBench Bug * Fix errors * update --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>	2023-12-01 22:27:24 +08:00
Hubert	9eb5cadcac	[Feat] update gsm8k and math agent config (#652 ) * [Feat] update gsm8k and math agent config * minor fix	2023-12-01 15:08:38 +08:00
liushz	a331c9abfd	[Feature] Add wikibench dataset (#655 ) * Add WikiBench * Add WikiBench * format --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-01 14:56:54 +08:00
liushz	e019c831fe	[Feature] Add Chinese version: commonsenseqa, crowspairs and nq (#144 ) * add Chinese version: csqa crowspairs nq * Update cn_data * Update cn_data * update format --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2023-11-30 15:33:02 +08:00
Ma Zerun	6aaf3b91ec	[Feature] Support chat style inferencer. (#643 ) * [Feature] Support chat style inferencer. * [Fix] use new prompt * [Fix] use new prompt --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-11-30 14:00:06 +08:00
Fengzhe Zhou	5933c04fda	fix hellaswag_ppl_47bff9 (#648 )	2023-11-29 16:51:44 +08:00
liushz	6d0d78986c	[Feature] Add GSM_Hard dataset (#619 ) * Add SVAMP dataset * Add SVAMP dataset * Add SVAMP dataset * Add gsm_hard dataset * Add gsm_hard dataset * format --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-11-27 17:40:34 +08:00
Fengzhe Zhou	9083dea683	[Sync] some renaming (#641 )	2023-11-27 16:06:49 +08:00
Fengzhe Zhou	d4d1330a5a	[Sync] Fix cmnli, fix vicuna meta template, fix longbench postprocess and other minor fixes (#625 )	2023-11-23 14:05:59 +08:00
liushz	048775192b	[Feature] Add SVAMP dataset (#604 ) * Add SVAMP dataset * Add SVAMP dataset * Add SVAMP dataset	2023-11-22 14:54:39 +08:00
Songyang Zhang	d925748266	[Feature] Support 360API and FixKRetriever for CSQA dataset (#601 ) * [Feature] Support 360API and FixKRetriever for CSQA dataset * Update API * Update API * [Feature] Support 360API and FixKRetriever for CSQA dataset * Update API * Update API * rm mathbench * fix_lint * Update opencompass/models/bytedance_api.py Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com> * update * update * update --------- Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>	2023-11-21 20:25:47 +08:00
liushz	dbacd36379	Add aritch to mathbench (#607 )	2023-11-20 19:40:41 +08:00
liushz	c9c5c5d92e	Mathbench update postprocess (#600 ) * Update mathbench * Update mathbench	2023-11-20 16:48:55 +08:00
Jingming	5e75e29711	[Feature] Add multi-prompt generation demo (#568 ) * [Feature] Add multi-prompt generation demo * [Fix] change form in winogrande_gen_XXX.py * [Fix] make multi prompt demo more directly * [Fix] fix bug * [Fix] minor fix --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-11-20 16:16:37 +08:00
Raymond Zhang	c0acd06b05	[Feature] Add FinanceIQ dataset (#596 )	2023-11-16 17:47:57 +08:00
Yu	8160cb84e3	update word spell (#594 )	2023-11-15 15:23:58 +08:00
Songyang Zhang	c8cb38e822	[Feature] Update mathbench (#580 ) * update xunfei api * fix lint * update mathbench to avoid incomplete prediction	2023-11-14 16:04:02 +08:00

1 2 3

117 Commits