OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Fengzhe Zhou	3a68083ecc	[Sync] update configs (#734 )	2023-12-25 21:59:16 +08:00
Songyang Zhang	ad96f2156f	Update merge script (#733 )	2023-12-25 16:45:22 +08:00
AllentDan	336d8d76ff	add turbomind restful api support (#693 ) * add turbomind restful api support * config * top_p 0.8 * top_k = 1	2023-12-24 01:40:00 +08:00
bittersweet1999	e985100cd1	[Fix] Fix subjective alignbench (#730 )	2023-12-23 20:06:53 +08:00
Mo Li	0e24f4213e	[Feature] Add NeedleInAHaystack Test Support (#714 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-12-23 12:00:51 +08:00
loveSnowBest	4a2d1926a2	[News] add news for T-Eval (#727 ) * add news for teval * update * update doc for cz&en	2023-12-22 19:58:24 +08:00
RunningLeon	e34c552282	[Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend (#721 ) * add llama2 test * fix * test qwen chat-7b * test w4 * add baichuan2 * update * update * update configs and docs * update	2023-12-21 18:22:17 +08:00
bittersweet1999	fbb912ddf3	[Feature] Add abbr for judgemodel in subjective evaluation (#724 ) * add_judgemodel_abbr * add judgemodel abbr	2023-12-21 15:58:20 +08:00
Skyfall-xzz	b35d991786	[Feature] Add ReasonBench(Internal) dataset (#577 ) * [Feature] Add reasonbench dataset * add configs for supporting generative inference & merge datasets in the same category * modify config filename to prompt version * fix codes to meet pre-commit requirements * lint the code to meet pre-commit requirements * Align Load_data Sourcecode Briefly * fix bugs * reduce code redundancy	2023-12-20 17:57:42 +08:00
Jingming	76a95e9e81	[Feature] Support the use of humaneval_plus. (#720 ) * [Feature] Support the use of humaneval_plus. * [Feature] Add humaneval_plus_gen.py * minor check * [Fix] Fix bug --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-12-20 17:25:17 +08:00
bittersweet1999	47e745d748	quick fix for maxoutlen (#719 )	2023-12-20 00:00:28 +08:00
Hubert	fdf18a3238	[Docs] Update Docker docs (#718 ) * [Docs] update docker docs * [Docs] update docker docs	2023-12-19 23:29:43 +08:00
Hubert	5e8b838f51	[Feat] Update math/agent (#716 ) * minor add * minor add * minor fix	2023-12-19 21:20:42 +08:00
bittersweet1999	97c2068bd9	[Feature] Add JudgeLLMs (#710 ) * add judgellms * add judgellms * add sub_size_partition * add docs * add ref	2023-12-19 18:40:25 +08:00
Hubert	eda72e756e	[Fix] minor fix openai (#711 )	2023-12-18 15:45:31 +08:00
Songyang Zhang	637628a70f	[Doc] Update Doc for Alignbench (#707 ) * update alignmentbench * update alignmentbench * update doc * update * update	2023-12-15 15:07:25 +08:00
Jingming	d7e7a637a5	[Fix] fix a bug on configs/eval_mixtral_8x7b.py (#706 )	2023-12-15 14:15:32 +08:00
DseidLi	db2920326a	[Fix] remove redundant in gsm8k.py (#700 ) Removed redundant code in GSM8KDataset.load method.	2023-12-14 19:55:58 +08:00
Songyang Zhang	bfe4aa2af5	[Fix] Update alignmentbench (#704 ) * update alignmentbench * update alignmentbench * update alignmentbench	2023-12-14 18:24:21 +08:00
bittersweet1999	1fe152b3e8	[Feature] Support AlignmentBench infer and judge (#697 ) * alignmentbench infer and judge * alignmentbench * alignmentbench done * alignment all done * alignment all done	2023-12-13 19:59:30 +08:00
Fengzhe Zhou	cadab9474f	[Doc] Update contamination docs (#698 ) * update contamination docs * add citation * Update contamination_eval.md * Update contamination_eval.md --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-12-13 18:03:39 +08:00
Hubert	a94598d921	[Feat] update python action and slurm (#694 )	2023-12-13 10:41:10 +08:00
bittersweet1999	6130394165	[Feature] Add double order of subjective evaluation and removing duplicated response among two models (#692 ) * add features * add doc string * add doc string	2023-12-12 20:58:17 +08:00
Xiaoyu Zhang	82a533a690	add rwkv-5-3b model (#666 ) * support rwkv5-3b learnboard * update rwkv-5-3b config * update config * refine * fix bug * update config * refine * reduce batch size * refine * reduce batch size to avoid oom in special datasets * Update huggingface.py * Update huggingface.py	2023-12-12 18:15:19 +08:00
Hubert	4780b39eda	[Sync] format (#690 ) Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-12 14:03:45 +08:00
bittersweet1999	3e77175720	[Fix] Hotfix for Subjective Evaluation (#686 )	2023-12-12 09:22:08 +08:00
bittersweet1999	465308e430	[Feature] Add Subjective Evaluation (#680 ) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint	2023-12-11 22:22:11 +08:00
Hubert	4f0b373a0a	[Fix] fix docstring (#684 )	2023-12-11 19:12:01 +08:00
Hubert	e78857ac36	[Sync] minor test (#683 )	2023-12-11 17:42:53 +08:00
Jingming	dd4318f6ab	[Feature] enhance the ability of humaneval_postprocess (#676 ) * [Feature] enhance the ability of humaneval_postprocess * refactor * [Feature] Keep the old version of the function and realize the new function in humaneval_postprocess_v2. * Update opencompass/datasets/humaneval.py --------- Co-authored-by: Leymore <zfz-960727@163.com> Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>	2023-12-11 14:39:56 +08:00
Hubert	1029119e39	[Feat] support pr merge test ci (#669 ) * [Feat] support ci * [Feat] support ci * [Feat] support ci * [Feat] support ci * init docs * init docs * init docs	2023-12-11 14:12:04 +08:00
Haodong Duan	6a928b996a	[Doc] Update README (#682 )	2023-12-10 21:27:46 +08:00
Songyang Zhang	e25c5f9525	[Enhancement] Update API Interface and Mixtral (#681 ) * [Enhancement] Update API interface * [Enhancement] Update API interface * Update mixtral * Update readme	2023-12-10 13:29:26 +08:00
Xiaoming Shi	1bf85949ef	[Feature] Add medbench (#678 ) * update medbench * medbench update * format medbench * format --------- Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org> Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-09 16:05:46 +08:00
Jingming	7cb53a95fa	[Fix] fix bug on standart_deviation summarizer (#675 )	2023-12-08 13:38:07 +08:00
liyucheng09	05bbce8b08	[Feature] Add Data Contamination Analysis (#639 ) * add contamination analysis to ceval * fix bugs * add contamination docs * to pass CI check * update --------- Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-08 10:00:11 +08:00
Fengzhe Zhou	3a354bd1da	add qwen and deepseek configs (#672 )	2023-12-07 20:29:00 +08:00
bittersweet1999	1c95790fdd	New subjective judgement (#660 ) * TabMWP * TabMWP * fixed * fixed * fixed * done * done * done * add new subjective judgement * add new subjective judgement * add new subjective judgement * add new subjective judgement * add new subjective judgement * modified to a more general way * modified to a more general way * final * final * add summarizer * add new summarize * fixed * fixed * fixed --------- Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>	2023-12-06 13:28:33 +08:00
rolellm	e10f1c9139	added rolebench dataset. (#633 ) * added rolebench * 修改了不合理的变量名 * 修改了评论中的变量名	2023-12-01 22:54:42 +08:00
liushz	f4bbff6537	[Feature] Update MathBench CodeInterpreter & fix MathBench Bug (#657 ) * Update MathBench CodeInterpreter & fix MathBench Bug * Fix errors * update --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>	2023-12-01 22:27:24 +08:00
Hubert	9eb5cadcac	[Feat] update gsm8k and math agent config (#652 ) * [Feat] update gsm8k and math agent config * minor fix	2023-12-01 15:08:38 +08:00
liushz	a331c9abfd	[Feature] Add wikibench dataset (#655 ) * Add WikiBench * Add WikiBench * format --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-01 14:56:54 +08:00
liushz	e019c831fe	[Feature] Add Chinese version: commonsenseqa, crowspairs and nq (#144 ) * add Chinese version: csqa crowspairs nq * Update cn_data * Update cn_data * update format --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2023-11-30 15:33:02 +08:00
Ma Zerun	6aaf3b91ec	[Feature] Support chat style inferencer. (#643 ) * [Feature] Support chat style inferencer. * [Fix] use new prompt * [Fix] use new prompt --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-11-30 14:00:06 +08:00
Fengzhe Zhou	5933c04fda	fix hellaswag_ppl_47bff9 (#648 )	2023-11-29 16:51:44 +08:00
Hubert	e9e75fb4eb	[Fix] remove colossalai dependency (#645 )	2023-11-28 14:09:44 +08:00
Fengzhe Zhou	e20d654c18	[Sync] Bump version to 0.1.9 (#644 )	2023-11-28 11:42:43 +08:00
Hubert	d4af31bab4	[Feat] support zhipu post process (#642 ) * [Feat] support zhipu post * [Feat] support zhipu post * [Feat] support zhipu post	2023-11-27 19:57:36 +08:00
liushz	6d0d78986c	[Feature] Add GSM_Hard dataset (#619 ) * Add SVAMP dataset * Add SVAMP dataset * Add SVAMP dataset * Add gsm_hard dataset * Add gsm_hard dataset * format --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-11-27 17:40:34 +08:00
Fengzhe Zhou	9083dea683	[Sync] some renaming (#641 )	2023-11-27 16:06:49 +08:00

1 2 3 4 5 ...

357 Commits