OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
RunningLeon	61fe873c89	[Fix] Fix turbomind and update docs (#808 ) * update * update docs * add engine_config and gen_config in eval_config * update * fix * fix * fix * fix docstr * fix url	2024-01-18 14:41:35 +08:00
Fengzhe Zhou	9e5746d3d8	[Doc] Update News (#810 )	2024-01-17 18:22:12 +08:00
Fengzhe Zhou	b4afe3e7c1	[Sync] Add InternLM2 Keyset Evaluation Demo (#807 ) Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>	2024-01-17 13:48:12 +08:00
Mo Li	acae560911	Added support for multi-needle testing in needle-in-a-haystack test (#802 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues * add English version support * change NeedleInAHaystackDataset to dynamic loading * change NeedleInAHaystackDataset to dynamic loading * fix needleinahaystack test eval bug * fix needleinahaystack config bug * Added support for multi-needle testing in needle-in-a-haystack test * Optimize the code for plotting in the needle-in-a-haystack test. * Correct the typo in the dataset parameters. * update needleinahaystack test docs --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-17 13:47:34 +08:00
RunningLeon	0836aec67b	[Feature] Update evaluate turbomind (#804 ) * update * fix * fix * fix	2024-01-17 11:09:50 +08:00
bittersweet1999	814b3f73bd	reorganize subject files (#801 )	2024-01-16 18:03:11 +08:00
zhulinJulia24	2cd091647c	Add test runner, one case, daily and pr trigger (#751 ) * init test yaml * add simple pr * update * update * change name * Update pr-run-test.yml * Update pr-run-test.yml --------- Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-01-16 11:59:22 +08:00
bittersweet1999	83d6c48378	[Feature] Add configs for creationbench (#791 ) * add creationv2_zh * add creationv2_zh * add eng config for creationbench * add eng config for creationbench * add eng config for creationbench	2024-01-12 14:20:21 +08:00
Hubert	d0dc3534e5	[Fix] hot fix for requirements (#789 )	2024-01-11 15:48:32 +08:00
Songyang Zhang	467ad0ac21	Update gsm8k agent prompt (#788 )	2024-01-11 14:07:36 +08:00
notoschord	d3a0ddc3ef	[Feature] Add support for Nanbeige API (#786 ) Co-authored-by: notoschord <wangzekai@kanzhun.com>	2024-01-11 13:54:27 +08:00
bittersweet1999	5679edb490	add temperature in alles (#787 )	2024-01-11 03:57:24 +00:00
Xiaoming Shi	ad872a5dc2	[Feature] Update MedBench (#779 ) * update medbench * medbench update * format medbench * format * Update * update * update * update suffix --------- Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org> Co-authored-by: Leymore <zfz-960727@163.com>	2024-01-09 11:42:44 +08:00
Fengzhe Zhou	a74e4c1a8d	[Sync] Bump version to 0.2.1 (#778 )	2024-01-08 14:56:28 +00:00
Fengzhe Zhou	32f40a8f83	[Sync] Sync with internal codes 2023.01.08 (#777 )	2024-01-08 14:07:24 +00:00
jiangjin1999	8194199d79	[Feature] _batch_generate function, add the MultiTokenEOSCriteria (#772 ) * jiangjin1999: in the _batch_generate function, add the MultiTokenEOSCriteria feature to speed up inference. * jiangjin1999: in the _batch_generate function, add the MultiTokenEOSCriteria feature to speed up inference. --------- Co-authored-by: jiangjin08 <jiangjin08@MBP-2F32S5MD6P-0029.local> Co-authored-by: jiangjin08 <jiangjin08@a.sh.vip.dianping.com>	2024-01-08 16:40:02 +08:00
Fengzhe Zhou	f78fcf6eeb	[Docs] Update contamination docs (#775 )	2024-01-08 16:37:28 +08:00
liyucheng09	0b2863039e	[Feature] Contamination analysis for MMLU, Hellaswag, and ARC_c (#699 ) * Contamination analysis for ARC_c, mmlu, and Hellaswag * update `eval_contamination.py` * update `contamination.py` summarizer * fix `eval_contamination.py` * add mmlu groups for contamination analysis	2024-01-08 15:51:48 +08:00
tpoisonooo	ba1b684fec	typo(installation.md): fix unzip commands (#774 ) * Update installation.md * Update installation.md	2024-01-08 14:23:35 +08:00
Yuchen Yan	11f3b91e78	[Fix] fix typos in drop prompt (#773 ) Co-authored-by: yanyuchen04 <yanyuchen04@meituan.com>	2024-01-08 14:22:35 +08:00
Connor-Shen	30a90d8dd8	Support Mbpp_plus dataset (#770 ) * support mbpp+ * support mbpp+ * minor fix * [Feat] minor fix --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2024-01-05 22:01:57 +08:00
bittersweet1999	3c606cb712	quick fix for postprocess pred extraction (#771 )	2024-01-05 21:10:18 +08:00
Songyang Zhang	0c75f0f95a	[Update] Update introduction of CompassBench-2024-Q1 (#769 ) * [Doc] Update Example of CompassBench * [Doc] Update Example of CompassBench * [Doc] Update Example of CompassBench * update * Update docs/zh_cn/advanced_guides/compassbench_intro.md Co-authored-by: Fengzhe Zhou <zfz-960727@163.com> --------- Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>	2024-01-05 20:39:36 +08:00
bittersweet1999	2163f9398f	[Feature] add subject ir dataset (#755 ) * add subject ir * Add ir dataset * Add ir dataset	2024-01-05 12:00:57 +00:00
bittersweet1999	be369c3e06	[Feature] Add multi_round dataset evaluation (#766 ) * multi_round dataset * add multi_round evaluation	2024-01-04 10:37:52 +00:00
bittersweet1999	7cd65d49d8	[Fix] Fix small bug in alignbench (#764 ) * fix small bugs * fix small bugs	2024-01-03 07:44:53 +00:00
Chris Liu	3eb225a5e6	[Feature] Support LLaMA2-Accessory (#732 ) * Support LLaMA2-Accessory * remove strip * clear imports * reformat * fix lint * fix lint * update readme * update readme * update readme * update readme	2024-01-02 20:48:51 +08:00
HUANG Fei	ba027eeeac	[Feature] Add support of qwen api (#735 )	2024-01-02 20:47:12 +08:00
Mo Li	33f8df1ca3	[Update] Change NeedleInAHaystackDataset to dynamic dataset loading (#754 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues * add English version support * change NeedleInAHaystackDataset to dynamic loading * change NeedleInAHaystackDataset to dynamic loading * fix needleinahaystack test eval bug * fix needleinahaystack config bug --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-02 17:22:56 +08:00
Francis-llgg	b69fe2343b	[Feature] Add GPQA Dataset (#729 ) * check * message * add * change prompt * change a para nameq * modify name of the file * delete an useless file	2024-01-01 15:54:40 +08:00
Francis-llgg	ef3ae63539	[Feature] Add new dataset mastermath2024v1 (#744 ) * add new dataset mastermath2024v1 * change it to simplified chinese prompt * change file name	2024-01-01 15:53:24 +08:00
Mo Li	17b8e929dd	[Feature] Update plot function in tools_needleinahaystack.py (#747 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-12-29 18:51:09 +08:00
Hubert	327951087f	[Feat] update code config (#749 ) * [Feat] update code dataset * [Feat] update code dataset * [Feat] update code dataset	2023-12-29 18:46:34 +08:00
bittersweet1999	fe0b717033	add creationbench (#753 )	2023-12-29 10:03:44 +00:00
bittersweet1999	8728287a55	fix erro in configs (#750 )	2023-12-28 11:53:07 +00:00
Connor-Shen	81098722d2	add chinese version of humaneval, mbpp (#743 ) * add chinese_version of humaneval,mbpp * add humaneval&mbpp gen.py * minor fix * minor add --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-12-28 14:47:56 +08:00
bittersweet1999	db919f0191	[Fix] SubSizePartition fix (#746 ) * fix subjective_eval * subject_eval partition situation fixed * subject_eval partition situation fixed	2023-12-28 11:46:46 +08:00
Hubert	0a525985e8	[Feature] Support sanitized MBPP dataset (#745 )	2023-12-27 22:17:23 +08:00
bittersweet1999	dfd9ac0fd9	[Feature] Add other judgelm prompts for Alignbench (#731 ) * add judgellm prompts * add judgelm prompts * update import info * fix situation that no abbr in config * fix situation that no abbr in config * add summarizer for other judgellm * change config name * add maxlen * add maxlen * dict assert * dict assert * fix strings * fix strings	2023-12-27 17:54:53 +08:00
Yang Yong	54345c56b7	Update LightllmApi and Fix mmlu bug (#738 ) * Update LightllmApi and Fix mmlu bug * checkout mmlu_gen_a484b3.py --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-27 13:49:08 +08:00
philipwangOvO	34561ececb	[Feature] Add InfiniteBench (#739 ) * add InfiniteBench * add InfiniteBench --------- Co-authored-by: wangchonghua <wangchonghua@pjlab.org.cn>	2023-12-26 15:36:27 +08:00
Fengzhe Zhou	3a68083ecc	[Sync] update configs (#734 )	2023-12-25 21:59:16 +08:00
Songyang Zhang	ad96f2156f	Update merge script (#733 )	2023-12-25 16:45:22 +08:00
AllentDan	336d8d76ff	add turbomind restful api support (#693 ) * add turbomind restful api support * config * top_p 0.8 * top_k = 1	2023-12-24 01:40:00 +08:00
bittersweet1999	e985100cd1	[Fix] Fix subjective alignbench (#730 )	2023-12-23 20:06:53 +08:00
Mo Li	0e24f4213e	[Feature] Add NeedleInAHaystack Test Support (#714 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-12-23 12:00:51 +08:00
loveSnowBest	4a2d1926a2	[News] add news for T-Eval (#727 ) * add news for teval * update * update doc for cz&en	2023-12-22 19:58:24 +08:00
RunningLeon	e34c552282	[Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend (#721 ) * add llama2 test * fix * test qwen chat-7b * test w4 * add baichuan2 * update * update * update configs and docs * update	2023-12-21 18:22:17 +08:00
bittersweet1999	fbb912ddf3	[Feature] Add abbr for judgemodel in subjective evaluation (#724 ) * add_judgemodel_abbr * add judgemodel abbr	2023-12-21 15:58:20 +08:00
Skyfall-xzz	b35d991786	[Feature] Add ReasonBench(Internal) dataset (#577 ) * [Feature] Add reasonbench dataset * add configs for supporting generative inference & merge datasets in the same category * modify config filename to prompt version * fix codes to meet pre-commit requirements * lint the code to meet pre-commit requirements * Align Load_data Sourcecode Briefly * fix bugs * reduce code redundancy	2023-12-20 17:57:42 +08:00

1 2 3 4 5 ...

398 Commits