OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
liyucheng09	0b2863039e	[Feature] Contamination analysis for MMLU, Hellaswag, and ARC_c (#699 ) * Contamination analysis for ARC_c, mmlu, and Hellaswag * update `eval_contamination.py` * update `contamination.py` summarizer * fix `eval_contamination.py` * add mmlu groups for contamination analysis	2024-01-08 15:51:48 +08:00
Connor-Shen	30a90d8dd8	Support Mbpp_plus dataset (#770 ) * support mbpp+ * support mbpp+ * minor fix * [Feat] minor fix --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2024-01-05 22:01:57 +08:00
bittersweet1999	3c606cb712	quick fix for postprocess pred extraction (#771 )	2024-01-05 21:10:18 +08:00
bittersweet1999	2163f9398f	[Feature] add subject ir dataset (#755 ) * add subject ir * Add ir dataset * Add ir dataset	2024-01-05 12:00:57 +00:00
bittersweet1999	be369c3e06	[Feature] Add multi_round dataset evaluation (#766 ) * multi_round dataset * add multi_round evaluation	2024-01-04 10:37:52 +00:00
bittersweet1999	7cd65d49d8	[Fix] Fix small bug in alignbench (#764 ) * fix small bugs * fix small bugs	2024-01-03 07:44:53 +00:00
Chris Liu	3eb225a5e6	[Feature] Support LLaMA2-Accessory (#732 ) * Support LLaMA2-Accessory * remove strip * clear imports * reformat * fix lint * fix lint * update readme * update readme * update readme * update readme	2024-01-02 20:48:51 +08:00
HUANG Fei	ba027eeeac	[Feature] Add support of qwen api (#735 )	2024-01-02 20:47:12 +08:00
Mo Li	33f8df1ca3	[Update] Change NeedleInAHaystackDataset to dynamic dataset loading (#754 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues * add English version support * change NeedleInAHaystackDataset to dynamic loading * change NeedleInAHaystackDataset to dynamic loading * fix needleinahaystack test eval bug * fix needleinahaystack config bug --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-02 17:22:56 +08:00
Francis-llgg	b69fe2343b	[Feature] Add GPQA Dataset (#729 ) * check * message * add * change prompt * change a para nameq * modify name of the file * delete an useless file	2024-01-01 15:54:40 +08:00
Francis-llgg	ef3ae63539	[Feature] Add new dataset mastermath2024v1 (#744 ) * add new dataset mastermath2024v1 * change it to simplified chinese prompt * change file name	2024-01-01 15:53:24 +08:00
Mo Li	17b8e929dd	[Feature] Update plot function in tools_needleinahaystack.py (#747 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-12-29 18:51:09 +08:00
Hubert	327951087f	[Feat] update code config (#749 ) * [Feat] update code dataset * [Feat] update code dataset * [Feat] update code dataset	2023-12-29 18:46:34 +08:00
bittersweet1999	fe0b717033	add creationbench (#753 )	2023-12-29 10:03:44 +00:00
Connor-Shen	81098722d2	add chinese version of humaneval, mbpp (#743 ) * add chinese_version of humaneval,mbpp * add humaneval&mbpp gen.py * minor fix * minor add --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-12-28 14:47:56 +08:00
bittersweet1999	db919f0191	[Fix] SubSizePartition fix (#746 ) * fix subjective_eval * subject_eval partition situation fixed * subject_eval partition situation fixed	2023-12-28 11:46:46 +08:00
Hubert	0a525985e8	[Feature] Support sanitized MBPP dataset (#745 )	2023-12-27 22:17:23 +08:00
bittersweet1999	dfd9ac0fd9	[Feature] Add other judgelm prompts for Alignbench (#731 ) * add judgellm prompts * add judgelm prompts * update import info * fix situation that no abbr in config * fix situation that no abbr in config * add summarizer for other judgellm * change config name * add maxlen * add maxlen * dict assert * dict assert * fix strings * fix strings	2023-12-27 17:54:53 +08:00
Yang Yong	54345c56b7	Update LightllmApi and Fix mmlu bug (#738 ) * Update LightllmApi and Fix mmlu bug * checkout mmlu_gen_a484b3.py --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-27 13:49:08 +08:00
philipwangOvO	34561ececb	[Feature] Add InfiniteBench (#739 ) * add InfiniteBench * add InfiniteBench --------- Co-authored-by: wangchonghua <wangchonghua@pjlab.org.cn>	2023-12-26 15:36:27 +08:00
Fengzhe Zhou	3a68083ecc	[Sync] update configs (#734 )	2023-12-25 21:59:16 +08:00
AllentDan	336d8d76ff	add turbomind restful api support (#693 ) * add turbomind restful api support * config * top_p 0.8 * top_k = 1	2023-12-24 01:40:00 +08:00
bittersweet1999	e985100cd1	[Fix] Fix subjective alignbench (#730 )	2023-12-23 20:06:53 +08:00
Mo Li	0e24f4213e	[Feature] Add NeedleInAHaystack Test Support (#714 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-12-23 12:00:51 +08:00
RunningLeon	e34c552282	[Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend (#721 ) * add llama2 test * fix * test qwen chat-7b * test w4 * add baichuan2 * update * update * update configs and docs * update	2023-12-21 18:22:17 +08:00
bittersweet1999	fbb912ddf3	[Feature] Add abbr for judgemodel in subjective evaluation (#724 ) * add_judgemodel_abbr * add judgemodel abbr	2023-12-21 15:58:20 +08:00
Skyfall-xzz	b35d991786	[Feature] Add ReasonBench(Internal) dataset (#577 ) * [Feature] Add reasonbench dataset * add configs for supporting generative inference & merge datasets in the same category * modify config filename to prompt version * fix codes to meet pre-commit requirements * lint the code to meet pre-commit requirements * Align Load_data Sourcecode Briefly * fix bugs * reduce code redundancy	2023-12-20 17:57:42 +08:00
Jingming	76a95e9e81	[Feature] Support the use of humaneval_plus. (#720 ) * [Feature] Support the use of humaneval_plus. * [Feature] Add humaneval_plus_gen.py * minor check * [Fix] Fix bug --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-12-20 17:25:17 +08:00
bittersweet1999	97c2068bd9	[Feature] Add JudgeLLMs (#710 ) * add judgellms * add judgellms * add sub_size_partition * add docs * add ref	2023-12-19 18:40:25 +08:00
Hubert	eda72e756e	[Fix] minor fix openai (#711 )	2023-12-18 15:45:31 +08:00
Songyang Zhang	637628a70f	[Doc] Update Doc for Alignbench (#707 ) * update alignmentbench * update alignmentbench * update doc * update * update	2023-12-15 15:07:25 +08:00
DseidLi	db2920326a	[Fix] remove redundant in gsm8k.py (#700 ) Removed redundant code in GSM8KDataset.load method.	2023-12-14 19:55:58 +08:00
Songyang Zhang	bfe4aa2af5	[Fix] Update alignmentbench (#704 ) * update alignmentbench * update alignmentbench * update alignmentbench	2023-12-14 18:24:21 +08:00
bittersweet1999	1fe152b3e8	[Feature] Support AlignmentBench infer and judge (#697 ) * alignmentbench infer and judge * alignmentbench * alignmentbench done * alignment all done * alignment all done	2023-12-13 19:59:30 +08:00
Hubert	a94598d921	[Feat] update python action and slurm (#694 )	2023-12-13 10:41:10 +08:00
bittersweet1999	6130394165	[Feature] Add double order of subjective evaluation and removing duplicated response among two models (#692 ) * add features * add doc string * add doc string	2023-12-12 20:58:17 +08:00
Hubert	4780b39eda	[Sync] format (#690 ) Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-12 14:03:45 +08:00
bittersweet1999	3e77175720	[Fix] Hotfix for Subjective Evaluation (#686 )	2023-12-12 09:22:08 +08:00
bittersweet1999	465308e430	[Feature] Add Subjective Evaluation (#680 ) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint	2023-12-11 22:22:11 +08:00
Hubert	4f0b373a0a	[Fix] fix docstring (#684 )	2023-12-11 19:12:01 +08:00
Hubert	e78857ac36	[Sync] minor test (#683 )	2023-12-11 17:42:53 +08:00
Jingming	dd4318f6ab	[Feature] enhance the ability of humaneval_postprocess (#676 ) * [Feature] enhance the ability of humaneval_postprocess * refactor * [Feature] Keep the old version of the function and realize the new function in humaneval_postprocess_v2. * Update opencompass/datasets/humaneval.py --------- Co-authored-by: Leymore <zfz-960727@163.com> Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>	2023-12-11 14:39:56 +08:00
Songyang Zhang	e25c5f9525	[Enhancement] Update API Interface and Mixtral (#681 ) * [Enhancement] Update API interface * [Enhancement] Update API interface * Update mixtral * Update readme	2023-12-10 13:29:26 +08:00
Xiaoming Shi	1bf85949ef	[Feature] Add medbench (#678 ) * update medbench * medbench update * format medbench * format --------- Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org> Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-09 16:05:46 +08:00
Jingming	7cb53a95fa	[Fix] fix bug on standart_deviation summarizer (#675 )	2023-12-08 13:38:07 +08:00
liyucheng09	05bbce8b08	[Feature] Add Data Contamination Analysis (#639 ) * add contamination analysis to ceval * fix bugs * add contamination docs * to pass CI check * update --------- Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-08 10:00:11 +08:00
bittersweet1999	1c95790fdd	New subjective judgement (#660 ) * TabMWP * TabMWP * fixed * fixed * fixed * done * done * done * add new subjective judgement * add new subjective judgement * add new subjective judgement * add new subjective judgement * add new subjective judgement * modified to a more general way * modified to a more general way * final * final * add summarizer * add new summarize * fixed * fixed * fixed --------- Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>	2023-12-06 13:28:33 +08:00
rolellm	e10f1c9139	added rolebench dataset. (#633 ) * added rolebench * 修改了不合理的变量名 * 修改了评论中的变量名	2023-12-01 22:54:42 +08:00
Hubert	9eb5cadcac	[Feat] update gsm8k and math agent config (#652 ) * [Feat] update gsm8k and math agent config * minor fix	2023-12-01 15:08:38 +08:00
liushz	a331c9abfd	[Feature] Add wikibench dataset (#655 ) * Add WikiBench * Add WikiBench * format --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-01 14:56:54 +08:00

1 2 3 4 5

241 Commits