OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Jingming Zhuo	41196c48ae	Add humaneval prompt from simple_evals, openai (#1076 ) * [Feature] Add IFEval * add humaneval prompt from simple_evals, openai	2024-04-24 17:40:50 +08:00
liushz	17735f0c13	Fix Llama-3 meta template (#1079 ) Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-04-24 16:46:25 +08:00
Ke Bao	81d0e4d793	[Feature] Add lmdeploy tis python backend model (#1014 ) * add lmdeploy tis python backend model * fix pr check * update	2024-04-23 14:27:11 +08:00
Fengzhe Zhou	004ed79593	[Feature] Add TheoremQA with 5-shot (#1048 ) * add TheoremQA with 5-shot * cherry pick from add-huggingface-above-v4.33, good TheoremQA results	2024-04-22 15:22:04 +08:00
Fengzhe Zhou	a256753221	[Feature] Add LLaMA-3 Series Configs (#1065 ) * add LLaMA-3 Series configs * update readme	2024-04-22 14:39:31 +08:00
bittersweet1999	6f98c8d9ab	[Fix] Fix MultiRound Subjective Evaluation(#1043 ) * fix multiround * fix	2024-04-22 12:06:03 +08:00
Fengzhe Zhou	8c85edd1cd	[Sync] deprecate old mbpps (#1064 )	2024-04-19 20:49:46 +08:00
liuwei130	a00e57296f	[Feature] Add ChemBench (#1032 ) * add ChemBench * update results * molbench -> ChemBench --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-12 08:46:26 +08:00
Fengzhe Zhou	b39f501563	[Sync] update taco (#1030 )	2024-04-09 17:50:23 +08:00
Mo Li	16f29b25f1	[Fix] Simplify needlebench summarizer (#1024 ) * Conflicts: configs/summarizers/needlebench.py * fix lint problems	2024-04-07 17:51:13 +08:00
Mo Li	f2af49337d	[Feature] Add ATC Choice Version (#1019 ) * Squashed commit of the following: commit c48ad194c3976dc63d1b60d8c8ab2d5ff9e1cbfe Author: DseidLi <2568818204@qq.com> Date: Tue Apr 2 16:57:43 2024 +0800 add atc_choice commit 3ac6efea29619573e6fac8fa3cce464853dcead0 Merge: `2d4e559` 8e3a9c3 Author: DseidLi <2568818204@qq.com> Date: Tue Apr 2 16:41:38 2024 +0800 Merge branch 'atc_choice' into atc_add_choice commit 8e3a9c396a3e5546d3faf584183f6fd60b974d5e Merge: 150a036 `0a6a03f` Author: DseidLi <2568818204@qq.com> Date: Tue Mar 26 04:47:07 2024 +0800 Merge branch 'main' into atc_choice Conflicts: configs/summarizers/needlebench.py opencompass/datasets/needlebench/multi.py opencompass/datasets/needlebench/origin.py opencompass/datasets/needlebench/parallel.py commit 150a036d6d990f26a57c974d1af83d88c31a0f9d Merge: 8d6ac9a 940dd18 Author: DseidLi <2568818204@qq.com> Date: Wed Mar 20 03:49:08 2024 +0800 Merge branch 'needlebench_fix' into atc_choice commit 8d6ac9a1a43b1c9d0f0ea27e7d58968a203ea898 Author: DseidLi <2568818204@qq.com> Date: Wed Mar 20 03:41:49 2024 +0800 optimize needlebench code commit 940dd18a4270f24bc69edd2a780182c68918e1a9 Author: DseidLi <2568818204@qq.com> Date: Wed Mar 20 03:39:46 2024 +0800 fix vllm commit d8be6877bc41051f3edcc0421c462c834c0f1c9a Merge: ecad78a `2527fda` Author: DseidLi <2568818204@qq.com> Date: Tue Mar 19 21:07:08 2024 +0800 Merge remote-tracking branch 'origin/add_1M_dataset' into atc_choice commit `2527fda8a5` Author: DseidLi <2568818204@qq.com> Date: Tue Mar 19 16:03:40 2024 +0800 add model configs commit `75425acdf8` Author: DseidLi <2568818204@qq.com> Date: Tue Mar 19 16:02:15 2024 +0800 add prompt postion args commit `367ba1ba61` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 21:40:00 2024 +0800 add Needlebench-1000K configs commit ecad78af14c4bb00fe325779114b384c57ab30bf Author: DseidLi <2568818204@qq.com> Date: Thu Mar 14 22:08:32 2024 +0800 fix atc commit 08772c0787b18872abadc9ffec3223941a5ee0c2 Merge: 9f3f8cf `caf1cf8` Author: DseidLi <2568818204@qq.com> Date: Thu Mar 14 22:07:28 2024 +0800 Merge branch 'main' into atc_choice Conflicts: configs/datasets/needlebench/readme.md configs/datasets/needlebench/readme_zh-CN.md configs/summarizers/needlebench.py opencompass/datasets/needlebench/atc.py opencompass/summarizers/needlebench.py commit 9f3f8cfb4452722734d334114ac1d14110e57406 Author: DseidLi <2568818204@qq.com> Date: Thu Mar 14 21:35:53 2024 +0800 add atc-choice test commit 52be7c1202376b4e09821188b826f1a805328129 Author: DseidLi <2568818204@qq.com> Date: Wed Mar 6 02:54:15 2024 +0800 update needlebench randomseed and add vllm qwen14b commit fc1effce596ae2e5ece4933e8cd34aef8e64a6f9 Merge: 4e747ed `caf1cf8` Author: DseidLi <2568818204@qq.com> Date: Wed Mar 6 02:51:14 2024 +0800 Merge branch 'main' into add_model_configs commit 31834f9b23af3354ac3581ec86d693d0f05cdd1c Merge: 7dabc82 `120bf8b` Author: DseidLi <2568818204@qq.com> Date: Sun Mar 3 23:29:42 2024 +0800 Merge branch 'main' of https://github.com/open-compass/opencompass into atc_choice commit 4e747ed1988ddbcfcc7fff334601259ade72d363 Author: DseidLi <2568818204@qq.com> Date: Sun Mar 3 22:15:25 2024 +0800 add internlm2-lmdeploy model and gemma configs commit 7dabc828123d711c8cf834d6aab4137bb55e85ed Author: DseidLi <2568818204@qq.com> Date: Sat Mar 2 17:26:15 2024 +0800 add atc choice version -ZH commit `996f8ae43d` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 16:58:56 2024 +0800 update readme for needlebench commit `f7266e873c` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 16:44:53 2024 +0800 move readme.md commit `1c7375681d` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 16:38:31 2024 +0800 fix linting error commit `b6524f3ebf` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 16:33:51 2024 +0800 lint summarizer commit `c0d1190e39` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 16:29:03 2024 +0800 add needlebench intro, fix summarizer commit `0965baf785` Author: DseidLi <2568818204@qq.com> Date: Mon Feb 26 13:31:26 2024 +0800 fix bug in needlebench summarizer commit `5d32b31eb8` Author: DseidLi <2568818204@qq.com> Date: Sat Feb 24 03:19:08 2024 +0800 update act prompt commit `af82a7f085` Merge: `32bf9fe` `53fe788` Author: DseidLi <2568818204@qq.com> Date: Fri Feb 23 17:50:32 2024 +0800 Merge remote-tracking branch 'upstream/main' into needlebench commit `32bf9fe802` Author: DseidLi <2568818204@qq.com> Date: Fri Feb 23 17:31:32 2024 +0800 simplify needlebench 32k, 128k, 200k for eval commit `a7cb025e05` Author: DseidLi <2568818204@qq.com> Date: Fri Feb 23 14:48:58 2024 +0800 add needlebench * fix summarizer * remove repeated code * remove chinese comments	2024-04-07 15:46:20 +08:00
Mo Li	b50d163265	[Fix] Refactor Needlebench Configs for CLI Testing Support (#1020 ) * add needlebench datasets suffix * fix import * update run.py args for summarizer key and dataset suffix * update utils/run.py	2024-04-07 15:12:56 +08:00
bittersweet1999	2d4e559763	[Feature] Add multi-model judge and fix some problems (#1016 ) * support multi-model judge and moe judge * test_moe * test_moe * test * add moe judge * support multi-judge-model	2024-04-02 11:52:06 +08:00
bittersweet1999	02e7eec911	[Feature] Support AlpacaEval_V2 (#1006 ) * support alpacaeval_v2 * support alpacaeval * update docs * update docs	2024-03-28 16:49:04 +08:00
Mo Li	0a6a03fe1a	[Feature] update needlebench and configs (#986 ) * add Needlebench-1000K configs * add prompt postion args * add model configs * Update parallel.py * fix lint	2024-03-25 18:05:01 +08:00
bittersweet1999	0665bb91a8	[Fix] Quick fix (#995 )	2024-03-22 19:54:19 +08:00
Ke Bao	e415ddf96a	[Fix] Fix turbomind_tis (#992 )	2024-03-22 15:50:12 +08:00
bittersweet1999	054e9fa7e5	[Feature] add one script for subjective (#993 ) * add one script for subjective * add one script for subjective * add one script for subjective * add one script for subjective --------- Co-authored-by: thebestannie <1290646445@qq.com>	2024-03-20 23:20:41 +08:00
Connor-Shen	0221d30877	[Fix] Update APPS/TACO (#988 ) * [Feature] update apps/taco * [Feature] update apps/taco	2024-03-19 20:21:39 +08:00
Connor-Shen	8a3c6e51ed	[Feature] Update APPS (#985 ) * update post process * update post process	2024-03-19 15:47:05 +08:00
Connor-Shen	d92595b671	[Feat] Support TACO (#966 ) * [Feat] Support TACO * update README * update README	2024-03-19 15:39:16 +08:00
bittersweet1999	c78a4df923	add support for set prediction path (#984 )	2024-03-19 14:32:15 +08:00
Jingming	89a8a8917b	[Feature] Add the implement of QuALITY datasets (#976 ) #976	2024-03-15 21:22:38 +08:00
Jingming	c2d4717be2	[Fix] Fix a bug in internlm2 series configs (#977 )	2024-03-15 15:21:35 +08:00
Connor-Shen	3098d78845	[Bench] Support APPS (#963 ) * [Feat] support apps * [Feat] support apps * [Feat] support apps * update README	2024-03-13 16:09:23 +08:00
Jingming	4c1533e59e	[Fix] fix the config's name of deepseek-coder (#964 )	2024-03-12 19:36:52 +08:00
Fengzhe Zhou	bdd85358cc	[Sync] update 20240308 (#953 )	2024-03-11 22:34:19 +08:00
bittersweet1999	848e7c8a76	[fix] add different temp for different question in mtbench (#954 ) * add temp for mtbench * add document for mtbench * add document for mtbench	2024-03-11 17:24:39 +08:00
Yang Yong	107e022cf4	Support prompt template for LightllmApi. Update LightllmApi token bucket. (#945 )	2024-03-06 15:33:53 +08:00
RunningLeon	c54a5d3b0f	Support get_ppl for TurbomindModel (#878 ) * update ppl for turbomindmodel * update api_server * rename config and set thread_safe for pytorch engine if possible	2024-03-06 11:44:19 +08:00
Xu Song	2e993989a6	[Fix] FinanceIQ_datasets import error (#939 ) * [Fix] Fix KeyError: 'FinanceIQ_datasets' * [Fix] Fix KeyError: 'FinanceIQ_datasets'	2024-03-05 20:32:24 +08:00
Jingming	66d3aa4c01	[Feature] Add configs of deepseek-coder (#943 )	2024-03-05 11:38:28 +08:00
Jingming	d0550268f3	[Fix] fix a bug of humanevalplus config (#944 )	2024-03-05 11:37:17 +08:00
Fengzhe Zhou	b03d5dc531	[Sync] Sync Internal (#941 )	2024-03-04 14:42:36 +08:00
yuantao2108	bbec7d8733	[Feature] add lveval benchmark (#914 ) * add lveval benchmark * add LVEval readme file * update LVEval readme file * Update configs/eval_bluelm_32k_lveval.py * Update configs/eval_llama2_7b_lveval.py --------- Co-authored-by: yuantao <yuantao@infini-ai.com> Co-authored-by: Mo Li <82895469+DseidLi@users.noreply.github.com>	2024-03-04 11:22:03 +08:00
Mo Li	8142f399a8	[Feature] Upgrade the needle-in-a-haystack experiment to Needlebench (#913 ) * add needlebench * simplify needlebench 32k, 128k, 200k for eval * update act prompt * fix bug in needlebench summarizer * add needlebench intro, fix summarizer * lint summarizer * fix linting error * move readme.md * update readme for needlebench * update docs of needlebench * simplify needlebench summarizers	2024-03-04 11:10:52 +08:00
Mo Li	120bf8b399	add vllm model configs (#938 )	2024-03-01 17:31:51 +08:00
Skyfall-xzz	4c45a71bbc	[Feature] Support OpenFinData (#896 ) * [Feature] Support OpenFinData * add README for OpenFinData * update README	2024-02-29 12:55:07 +08:00
bittersweet1999	001e77fea2	[Feature] add support for gemini (#931 ) * add gemini * add gemini * add gemini	2024-02-28 19:38:34 +08:00
Jingming	53fe788d27	[Fix] fix ifeval (#909 )	2024-02-23 16:52:03 +08:00
bittersweet1999	45c606bcd0	[Fix] Fix IFEval (#906 ) * fix ifeval * fix ifeval * fix ifeval * fix ifeval	2024-02-22 16:51:34 +08:00
RunningLeon	32ba0b074e	Support lmdeploy pytorch engine (#875 ) * add lmdeploy pytorch model * fix * speed up encoding and decoding * fix * change tokenizer	2024-02-22 03:46:07 -03:00
Xu Song	6d04decab4	[Fix] Fix moss template config (#897 )	2024-02-21 11:19:24 +08:00
Fengzhe Zhou	2b7d376e3d	[Fix] Fix chatglm2 config (#893 )	2024-02-19 14:55:53 +08:00
Fengzhe Zhou	9119e2ac39	[Fix] rename qwen2-beta -> qwen1.5 (#894 )	2024-02-19 14:55:35 +08:00
Yang Yong	b6e21ece38	Support LightllmApi input_format (#888 )	2024-02-19 10:02:59 +08:00
hailsham	e257254b00	[Feature] add global retriever config (#842 ) * add global retriever config * give zero shot overwrite example * give zero shot overwrite example --------- Co-authored-by: Lei Fei <SENSETIME\leifei1@cn3114002087l.domain.sensetime.com> Co-authored-by: Leymore <zfz-960727@163.com>	2024-02-07 00:30:20 +08:00
hailsham	dd444685bb	fix bug of gsm8k_postprocess (#863 ) * fix bug of gsm8k_postprocess * update postprocess --------- Co-authored-by: Lei Fei <SENSETIME\leifei1@cn3114002087l.domain.sensetime.com> Co-authored-by: Leymore <zfz-960727@163.com>	2024-02-06 23:52:47 +08:00
Connor-Shen	444d8d9507	[feat] support multipl-e (#846 ) * [feat] support humaneval_multipl-e * format --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-02-06 23:30:28 +08:00
bittersweet1999	1c8e193de8	[Fix] hotfix for mtbench (#877 ) * hotfix for mtbench * hotfix	2024-02-06 21:26:47 +08:00
Fengzhe Zhou	d34ba11106	[Sync] Merge branch 'dev' into zfz/update-keyset-demo (#876 )	2024-02-05 23:29:10 +08:00
bittersweet1999	32b5948f4e	[Fix] add do sample demo for subjective dataset (#873 ) * add do sample demo for subjective dataset * fix strings * format --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-02-05 15:55:58 +08:00
Skyfall-xzz	7ad1168062	Support NPHardEval (#835 ) * support NPHardEval * add .md file and fix minor bugs * refactor and minor fix --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-02-05 15:52:28 +08:00
bittersweet1999	7806cd0f64	[Feature] support alpacaeval (#809 ) * support alpacaeval_v1 * Update opencompass/summarizers/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/summarizers/subjective/alpacaeval_v1.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * fix conflict * support alpacaeval v2 * support alpacav2 --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-02-04 14:18:36 +08:00
RunningLeon	4c87e777d8	[Feature] Add end_str for turbomind (#859 ) * fix * update * fix internlm1 * fix docs * remove sys	2024-02-01 22:31:14 +08:00
bittersweet1999	5c6dc908cd	fix compass arena (#854 )	2024-01-30 16:34:38 +08:00
Songyang Zhang	cdca59ff49	[Fix] Update Zhipu API and Fix issue min_out_len issue of API models (#847 ) * Update zhipu api and fix min_out_len issue of API class * Update example * Update example	2024-01-28 14:52:43 +08:00
Jingming	2801883351	[Fix] Fix acc of IFEval (#849 ) * [Feature] Add IFEval * [Fix] Changing the Score Rule.	2024-01-27 22:27:07 +08:00
Xiaoming Shi	35aace776a	[Fix] Update MedBench (#845 )	2024-01-26 17:56:13 +08:00
Songyang Zhang	8ed022b4c4	Update Sensetime API (#844 )	2024-01-26 16:40:49 +08:00
Hubert	4aa74565e2	[Feat] minor update agent related (#839 ) * [Feat] update cibench * [Feat] Support CIBench * [Feat] Support CIBench * [Feat] Support CIBench * [Feat] Support CIBench	2024-01-26 14:15:51 +08:00
bittersweet1999	77be07dbb5	[Fix] fix corev2 (#838 ) * fix corev2 * fix corev2	2024-01-24 18:15:29 +08:00
Fengzhe Zhou	0991dd33a0	[Sync] Updata dataset cfg for internMath (#837 ) Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-01-24 16:30:32 +08:00
bittersweet1999	2ee8e8a1a1	[Feature] add mtbench (#829 ) * add mtbench * add mtbench * Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/mtbench.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * fix mtbench --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-24 12:11:47 +08:00
Jingming	e059a5c2bf	[Feature] Add IFEval (#813 ) * [Feature] Add IFEval * [Doc] add introduction of IFEval	2024-01-23 20:07:49 +08:00
bittersweet1999	2d4da8dd02	[Feature] Add CompassArena (#828 ) * add compass arena * add compass_arena * add compass arena * Update opencompass/summarizers/subjective/compass_arena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/summarizers/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/compass_arena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/eval_subjective_compassarena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/compassarena/compassarena_compare.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/eval_subjective_compassarena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/compassarena/compassarena_compare.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * fix check position bias --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-23 15:12:46 +08:00
RangiLyu	40a2441deb	Update hf_internlm2_chat template (#823 ) * Update hf_internlm2_chat template * Update 20B	2024-01-19 18:21:47 +08:00
Guo Qipeng	e975a96fa1	Update cdme config and evaluator (#812 ) * update cdme config and evaluator * fix cdme prompt * move CDME trim post-processor as a separate evaluator --------- Co-authored-by: 郭琦鹏 <guoqipeng@pjlab.org.cn>	2024-01-19 11:29:27 +08:00
Mo Li	dcc32ed856	[Fix] Update yi 200k config (#815 )	2024-01-18 20:54:24 +08:00
RunningLeon	61fe873c89	[Fix] Fix turbomind and update docs (#808 ) * update * update docs * add engine_config and gen_config in eval_config * update * fix * fix * fix * fix docstr * fix url	2024-01-18 14:41:35 +08:00
Fengzhe Zhou	b4afe3e7c1	[Sync] Add InternLM2 Keyset Evaluation Demo (#807 ) Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>	2024-01-17 13:48:12 +08:00
Mo Li	acae560911	Added support for multi-needle testing in needle-in-a-haystack test (#802 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues * add English version support * change NeedleInAHaystackDataset to dynamic loading * change NeedleInAHaystackDataset to dynamic loading * fix needleinahaystack test eval bug * fix needleinahaystack config bug * Added support for multi-needle testing in needle-in-a-haystack test * Optimize the code for plotting in the needle-in-a-haystack test. * Correct the typo in the dataset parameters. * update needleinahaystack test docs --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-17 13:47:34 +08:00
RunningLeon	0836aec67b	[Feature] Update evaluate turbomind (#804 ) * update * fix * fix * fix	2024-01-17 11:09:50 +08:00
bittersweet1999	814b3f73bd	reorganize subject files (#801 )	2024-01-16 18:03:11 +08:00
bittersweet1999	83d6c48378	[Feature] Add configs for creationbench (#791 ) * add creationv2_zh * add creationv2_zh * add eng config for creationbench * add eng config for creationbench * add eng config for creationbench	2024-01-12 14:20:21 +08:00
Songyang Zhang	467ad0ac21	Update gsm8k agent prompt (#788 )	2024-01-11 14:07:36 +08:00
notoschord	d3a0ddc3ef	[Feature] Add support for Nanbeige API (#786 ) Co-authored-by: notoschord <wangzekai@kanzhun.com>	2024-01-11 13:54:27 +08:00
Xiaoming Shi	ad872a5dc2	[Feature] Update MedBench (#779 ) * update medbench * medbench update * format medbench * format * Update * update * update * update suffix --------- Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org> Co-authored-by: Leymore <zfz-960727@163.com>	2024-01-09 11:42:44 +08:00
Fengzhe Zhou	32f40a8f83	[Sync] Sync with internal codes 2023.01.08 (#777 )	2024-01-08 14:07:24 +00:00
Fengzhe Zhou	f78fcf6eeb	[Docs] Update contamination docs (#775 )	2024-01-08 16:37:28 +08:00
liyucheng09	0b2863039e	[Feature] Contamination analysis for MMLU, Hellaswag, and ARC_c (#699 ) * Contamination analysis for ARC_c, mmlu, and Hellaswag * update `eval_contamination.py` * update `contamination.py` summarizer * fix `eval_contamination.py` * add mmlu groups for contamination analysis	2024-01-08 15:51:48 +08:00
Yuchen Yan	11f3b91e78	[Fix] fix typos in drop prompt (#773 ) Co-authored-by: yanyuchen04 <yanyuchen04@meituan.com>	2024-01-08 14:22:35 +08:00
Connor-Shen	30a90d8dd8	Support Mbpp_plus dataset (#770 ) * support mbpp+ * support mbpp+ * minor fix * [Feat] minor fix --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2024-01-05 22:01:57 +08:00
bittersweet1999	2163f9398f	[Feature] add subject ir dataset (#755 ) * add subject ir * Add ir dataset * Add ir dataset	2024-01-05 12:00:57 +00:00
bittersweet1999	be369c3e06	[Feature] Add multi_round dataset evaluation (#766 ) * multi_round dataset * add multi_round evaluation	2024-01-04 10:37:52 +00:00
Chris Liu	3eb225a5e6	[Feature] Support LLaMA2-Accessory (#732 ) * Support LLaMA2-Accessory * remove strip * clear imports * reformat * fix lint * fix lint * update readme * update readme * update readme * update readme	2024-01-02 20:48:51 +08:00
HUANG Fei	ba027eeeac	[Feature] Add support of qwen api (#735 )	2024-01-02 20:47:12 +08:00
Mo Li	33f8df1ca3	[Update] Change NeedleInAHaystackDataset to dynamic dataset loading (#754 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues * add English version support * change NeedleInAHaystackDataset to dynamic loading * change NeedleInAHaystackDataset to dynamic loading * fix needleinahaystack test eval bug * fix needleinahaystack config bug --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-02 17:22:56 +08:00
Francis-llgg	b69fe2343b	[Feature] Add GPQA Dataset (#729 ) * check * message * add * change prompt * change a para nameq * modify name of the file * delete an useless file	2024-01-01 15:54:40 +08:00
Francis-llgg	ef3ae63539	[Feature] Add new dataset mastermath2024v1 (#744 ) * add new dataset mastermath2024v1 * change it to simplified chinese prompt * change file name	2024-01-01 15:53:24 +08:00
Mo Li	17b8e929dd	[Feature] Update plot function in tools_needleinahaystack.py (#747 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-12-29 18:51:09 +08:00
Hubert	327951087f	[Feat] update code config (#749 ) * [Feat] update code dataset * [Feat] update code dataset * [Feat] update code dataset	2023-12-29 18:46:34 +08:00
bittersweet1999	fe0b717033	add creationbench (#753 )	2023-12-29 10:03:44 +00:00
bittersweet1999	8728287a55	fix erro in configs (#750 )	2023-12-28 11:53:07 +00:00
Connor-Shen	81098722d2	add chinese version of humaneval, mbpp (#743 ) * add chinese_version of humaneval,mbpp * add humaneval&mbpp gen.py * minor fix * minor add --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-12-28 14:47:56 +08:00
Hubert	0a525985e8	[Feature] Support sanitized MBPP dataset (#745 )	2023-12-27 22:17:23 +08:00
bittersweet1999	dfd9ac0fd9	[Feature] Add other judgelm prompts for Alignbench (#731 ) * add judgellm prompts * add judgelm prompts * update import info * fix situation that no abbr in config * fix situation that no abbr in config * add summarizer for other judgellm * change config name * add maxlen * add maxlen * dict assert * dict assert * fix strings * fix strings	2023-12-27 17:54:53 +08:00
Yang Yong	54345c56b7	Update LightllmApi and Fix mmlu bug (#738 ) * Update LightllmApi and Fix mmlu bug * checkout mmlu_gen_a484b3.py --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-27 13:49:08 +08:00
philipwangOvO	34561ececb	[Feature] Add InfiniteBench (#739 ) * add InfiniteBench * add InfiniteBench --------- Co-authored-by: wangchonghua <wangchonghua@pjlab.org.cn>	2023-12-26 15:36:27 +08:00
Fengzhe Zhou	3a68083ecc	[Sync] update configs (#734 )	2023-12-25 21:59:16 +08:00
AllentDan	336d8d76ff	add turbomind restful api support (#693 ) * add turbomind restful api support * config * top_p 0.8 * top_k = 1	2023-12-24 01:40:00 +08:00
Mo Li	0e24f4213e	[Feature] Add NeedleInAHaystack Test Support (#714 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-12-23 12:00:51 +08:00
RunningLeon	e34c552282	[Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend (#721 ) * add llama2 test * fix * test qwen chat-7b * test w4 * add baichuan2 * update * update * update configs and docs * update	2023-12-21 18:22:17 +08:00
Skyfall-xzz	b35d991786	[Feature] Add ReasonBench(Internal) dataset (#577 ) * [Feature] Add reasonbench dataset * add configs for supporting generative inference & merge datasets in the same category * modify config filename to prompt version * fix codes to meet pre-commit requirements * lint the code to meet pre-commit requirements * Align Load_data Sourcecode Briefly * fix bugs * reduce code redundancy	2023-12-20 17:57:42 +08:00
Jingming	76a95e9e81	[Feature] Support the use of humaneval_plus. (#720 ) * [Feature] Support the use of humaneval_plus. * [Feature] Add humaneval_plus_gen.py * minor check * [Fix] Fix bug --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-12-20 17:25:17 +08:00
bittersweet1999	47e745d748	quick fix for maxoutlen (#719 )	2023-12-20 00:00:28 +08:00
Hubert	5e8b838f51	[Feat] Update math/agent (#716 ) * minor add * minor add * minor fix	2023-12-19 21:20:42 +08:00
bittersweet1999	97c2068bd9	[Feature] Add JudgeLLMs (#710 ) * add judgellms * add judgellms * add sub_size_partition * add docs * add ref	2023-12-19 18:40:25 +08:00
Songyang Zhang	637628a70f	[Doc] Update Doc for Alignbench (#707 ) * update alignmentbench * update alignmentbench * update doc * update * update	2023-12-15 15:07:25 +08:00
Jingming	d7e7a637a5	[Fix] fix a bug on configs/eval_mixtral_8x7b.py (#706 )	2023-12-15 14:15:32 +08:00
Songyang Zhang	bfe4aa2af5	[Fix] Update alignmentbench (#704 ) * update alignmentbench * update alignmentbench * update alignmentbench	2023-12-14 18:24:21 +08:00
bittersweet1999	1fe152b3e8	[Feature] Support AlignmentBench infer and judge (#697 ) * alignmentbench infer and judge * alignmentbench * alignmentbench done * alignment all done * alignment all done	2023-12-13 19:59:30 +08:00
bittersweet1999	6130394165	[Feature] Add double order of subjective evaluation and removing duplicated response among two models (#692 ) * add features * add doc string * add doc string	2023-12-12 20:58:17 +08:00
Xiaoyu Zhang	82a533a690	add rwkv-5-3b model (#666 ) * support rwkv5-3b learnboard * update rwkv-5-3b config * update config * refine * fix bug * update config * refine * reduce batch size * refine * reduce batch size to avoid oom in special datasets * Update huggingface.py * Update huggingface.py	2023-12-12 18:15:19 +08:00
bittersweet1999	3e77175720	[Fix] Hotfix for Subjective Evaluation (#686 )	2023-12-12 09:22:08 +08:00
bittersweet1999	465308e430	[Feature] Add Subjective Evaluation (#680 ) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint	2023-12-11 22:22:11 +08:00
Hubert	e78857ac36	[Sync] minor test (#683 )	2023-12-11 17:42:53 +08:00
Songyang Zhang	e25c5f9525	[Enhancement] Update API Interface and Mixtral (#681 ) * [Enhancement] Update API interface * [Enhancement] Update API interface * Update mixtral * Update readme	2023-12-10 13:29:26 +08:00
Xiaoming Shi	1bf85949ef	[Feature] Add medbench (#678 ) * update medbench * medbench update * format medbench * format --------- Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org> Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-09 16:05:46 +08:00
liyucheng09	05bbce8b08	[Feature] Add Data Contamination Analysis (#639 ) * add contamination analysis to ceval * fix bugs * add contamination docs * to pass CI check * update --------- Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-08 10:00:11 +08:00
Fengzhe Zhou	3a354bd1da	add qwen and deepseek configs (#672 )	2023-12-07 20:29:00 +08:00
bittersweet1999	1c95790fdd	New subjective judgement (#660 ) * TabMWP * TabMWP * fixed * fixed * fixed * done * done * done * add new subjective judgement * add new subjective judgement * add new subjective judgement * add new subjective judgement * add new subjective judgement * modified to a more general way * modified to a more general way * final * final * add summarizer * add new summarize * fixed * fixed * fixed --------- Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>	2023-12-06 13:28:33 +08:00
rolellm	e10f1c9139	added rolebench dataset. (#633 ) * added rolebench * 修改了不合理的变量名 * 修改了评论中的变量名	2023-12-01 22:54:42 +08:00
liushz	f4bbff6537	[Feature] Update MathBench CodeInterpreter & fix MathBench Bug (#657 ) * Update MathBench CodeInterpreter & fix MathBench Bug * Fix errors * update --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>	2023-12-01 22:27:24 +08:00
Hubert	9eb5cadcac	[Feat] update gsm8k and math agent config (#652 ) * [Feat] update gsm8k and math agent config * minor fix	2023-12-01 15:08:38 +08:00
liushz	a331c9abfd	[Feature] Add wikibench dataset (#655 ) * Add WikiBench * Add WikiBench * format --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-01 14:56:54 +08:00
liushz	e019c831fe	[Feature] Add Chinese version: commonsenseqa, crowspairs and nq (#144 ) * add Chinese version: csqa crowspairs nq * Update cn_data * Update cn_data * update format --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2023-11-30 15:33:02 +08:00
Ma Zerun	6aaf3b91ec	[Feature] Support chat style inferencer. (#643 ) * [Feature] Support chat style inferencer. * [Fix] use new prompt * [Fix] use new prompt --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-11-30 14:00:06 +08:00
Fengzhe Zhou	5933c04fda	fix hellaswag_ppl_47bff9 (#648 )	2023-11-29 16:51:44 +08:00
Hubert	d4af31bab4	[Feat] support zhipu post process (#642 ) * [Feat] support zhipu post * [Feat] support zhipu post * [Feat] support zhipu post	2023-11-27 19:57:36 +08:00
liushz	6d0d78986c	[Feature] Add GSM_Hard dataset (#619 ) * Add SVAMP dataset * Add SVAMP dataset * Add SVAMP dataset * Add gsm_hard dataset * Add gsm_hard dataset * format --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-11-27 17:40:34 +08:00
Fengzhe Zhou	9083dea683	[Sync] some renaming (#641 )	2023-11-27 16:06:49 +08:00
Fengzhe Zhou	d949e3c003	[Feature] Add circular eval (#610 ) * refactor default, add circular summarizer * add circular * update impl * update doc * minor update * no more to be added	2023-11-23 16:45:47 +08:00
Songyang Zhang	5202456b4c	[API] Update API (#624 ) * update api * update generation_kwargs impl * update api * refactor --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-11-23 15:06:20 +08:00
Fengzhe Zhou	d4d1330a5a	[Sync] Fix cmnli, fix vicuna meta template, fix longbench postprocess and other minor fixes (#625 )	2023-11-23 14:05:59 +08:00
Kevin Wang	c0785e53d8	[Feature] support download from modelscope (#534 ) * [Feature] download from modelscope * [Feature] download from modelscope * minor fix --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-11-22 15:32:21 +08:00
liushz	048775192b	[Feature] Add SVAMP dataset (#604 ) * Add SVAMP dataset * Add SVAMP dataset * Add SVAMP dataset	2023-11-22 14:54:39 +08:00
Lyu Han	eb56fd6d16	Integrate turbomind python api (#484 ) * integrate turbomind python api * update * update user guide * update * fix according to reviewer's comments * fix error * fix linting * update user guide * remove debug log --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-11-21 22:34:46 +08:00
Songyang Zhang	d925748266	[Feature] Support 360API and FixKRetriever for CSQA dataset (#601 ) * [Feature] Support 360API and FixKRetriever for CSQA dataset * Update API * Update API * [Feature] Support 360API and FixKRetriever for CSQA dataset * Update API * Update API * rm mathbench * fix_lint * Update opencompass/models/bytedance_api.py Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com> * update * update * update --------- Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>	2023-11-21 20:25:47 +08:00
Yang Yong	d3b0d5c4ce	[Feature] Support Lightllm API (#613 ) * [Feature] Support Lightllm api * formatting & renaming --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-11-21 19:18:40 +08:00
Yuan Feng	7199acc25d	Add support for DataCanvas Alaya LM (#612 ) * Support for Alaya * Remove useless requirements	2023-11-21 17:51:30 +08:00
liushz	dbacd36379	Add aritch to mathbench (#607 )	2023-11-20 19:40:41 +08:00
liushz	c9c5c5d92e	Mathbench update postprocess (#600 ) * Update mathbench * Update mathbench	2023-11-20 16:48:55 +08:00
Jingming	5e75e29711	[Feature] Add multi-prompt generation demo (#568 ) * [Feature] Add multi-prompt generation demo * [Fix] change form in winogrande_gen_XXX.py * [Fix] make multi prompt demo more directly * [Fix] fix bug * [Fix] minor fix --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-11-20 16:16:37 +08:00
Hubert	91fba2c2e9	[Feat] support humaneval and mbpp pass@k (#598 ) * [Feat] support pass@ k * [Feat] support pass@k * [Feat] support pass@k * [Feat] support pass@k * [Feat] support pass@k * [Feat] support pass@k docs * update naming --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-11-16 21:22:06 +08:00
Raymond Zhang	c0acd06b05	[Feature] Add FinanceIQ dataset (#596 )	2023-11-16 17:47:57 +08:00
Yu	8160cb84e3	update word spell (#594 )	2023-11-15 15:23:58 +08:00
Wei Jueqi	14e6fe6f13	Fix bugs in subjective evaluation (#589 ) * rename * fix sub bugs and update docs * update * update	2023-11-14 16:11:55 +08:00
Songyang Zhang	c8cb38e822	[Feature] Update mathbench (#580 ) * update xunfei api * fix lint * update mathbench to avoid incomplete prediction	2023-11-14 16:04:02 +08:00
Fengzhe Zhou	1ea88d5822	[Sync] Bump version to 0.1.8 (#576 )	2023-11-13 16:00:38 +08:00
Fengzhe Zhou	d3de5c41fb	[Sync] update model configs (#574 )	2023-11-13 15:15:34 +08:00
Fengzhe Zhou	689ffe5b63	[Feature] Use dataset in local path (#570 ) * update commonsenseqa * update drop * update flores_first100 * update gsm8k * update humaneval * update lambda * update obqa * update piqa * update race * update siqa * update story_cloze * update strategyqa * update tydiqa * update winogrande * update doc * update hellaswag * fix obqa * update collections * update .zip name	2023-11-13 13:00:37 +08:00
Fengzhe Zhou	d6aaac22e7	[Feature] Update cmb (#571 )	2023-11-13 00:09:05 +08:00
Kevin Wang	7f77e8dae5	[Docs] fix dataset name error (#533 )	2023-11-10 18:54:20 +08:00
Hubert	95e0da0173	[Docs] add humanevalx dataset link in config (#559 ) * [Docs] add humanevalx dataset link in config * [Docs] add humanevalx dataset link in config * minor fix	2023-11-10 18:18:58 +08:00
jingmingzhuo	b3cbef3226	[Feature] Add py150 and maxmin (#562 ) * [feat] add clozeTesst_maxmin dataset * [feat] add py150 datasets * [feat] change __init__.py in opencompass/datasets * [fix] pre-commit check * [fix] rename py150 and masxmin datasets in configs * [feat] add gen.py of py150 and maxmin in configs/datasets	2023-11-09 22:05:25 +08:00
Hubert	889a6b26ae	[Fix] fix log re-direct (#564 )	2023-11-09 19:34:19 +08:00
Hubert	bb2ecf416e	[Feat] Support cibench (#538 ) * [Feat] support cidataset * [Feat] support cidataset * [Feat] support cidataset * [Feat] support cidataset * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * rename cibench * rename cibench * rename cibench * rename cibench * minor fix * minor fix * minor fix	2023-11-07 19:11:44 +08:00
Hubert	36360bdfc3	[Fix] fix filename typo (#549 )	2023-11-07 14:00:26 +08:00
liushz	214a34f0b8	【Feature】Update Mathbench dataset prompt and fix small errors (#546 ) * Update mathbench * Update mathbench * Update mathbench	2023-11-06 21:58:31 +08:00
Songyang Zhang	239c2a346e	[Feature] Add support for MiniMax API (#548 ) * update requirement * update requirement * update with minimax * update api model * Update readme * fix error --------- Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2023-11-06 21:57:32 +08:00
bittersweet1999	f25a980043	[fFeat] Add an opensource dataset Tabmwp (#505 ) * TabMWP * TabMWP * fixed * fixed * fixed * done * done * done --------- Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>	2023-11-03 11:15:46 +08:00
Surav Shrestha	e5ae86221c	docs: fix typos in markdown files (#530 ) * fix typos in configs/multimodal/llava/README.md * fix typos in configs/multimodal/minigpt_4/README.md	2023-11-01 16:16:16 +08:00
Qing	229a65f305	[Fix] Fix typo in WSC prompt (#520 ) Co-authored-by: wq.chu <wq.chu@tianrang-inc.com>	2023-10-30 12:16:26 +08:00
Fengzhe Zhou	dbb20b8270	[Sync] update (#517 )	2023-10-27 20:31:22 +08:00
Wei Jueqi	b62842335d	[Doc] Update Subjective docs (#510 ) * rename * add en subdoc * fix name * fix writing * update --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-10-27 16:27:24 +08:00
Hubert	b3f5d9e421	[Feat] support math/gms8k agent config (#494 ) * support math agent * support gsm8k agent * support gsm8k agent * minor fix * minor fix * minor fix * Update configs/eval_codeagent.py	2023-10-25 23:05:15 +08:00
liushz	2737249f31	[Feature] Add mathbench dataset and circular evaluator (#408 ) * add_mathbench * update mathbench * support non circular eval dataset --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-10-18 04:08:31 -05:00
Leymore	861942ab1b	[Feature] Add lawbench (#460 ) * add lawbench * update requirements * update	2023-10-13 06:51:36 -05:00
Leymore	fbf5089c40	[Sync] update github token (#475 )	2023-10-13 06:50:54 -05:00
Leymore	d7ff933a73	[Fix] Use jieba rouge in lcsts (#459 ) * use jieba rouge in lcsts * use rouge_chinese	2023-10-09 10:10:33 +08:00
Tong Gao	119bfd1569	[Refactor] Move fix_id_list to Retriever (#442 ) * [Refactor] Move fix_id_list to Retriever * update * move to base * fix	2023-10-07 12:53:41 +08:00
Lyu Han	6738247142	Integrate turbomind inference via its RPC API instead of its python API (#414 ) * support tis * integrate turbomind inference via its RPC API instead of its python API * update guide * update ip address spec * update according to reviewer's comments	2023-10-07 10:27:48 +08:00
Leymore	9db5652638	[Feature] re-implement ceval load dataset (#446 )	2023-09-27 21:18:48 +08:00
philipwangOvO	3bb3d330eb	[Sync] Update LongEval (#443 )	2023-09-27 16:32:40 +08:00
Kevin Wang	dc1b82c346	[SIG] add GLUE_MRPC dataset (#440 )	2023-09-27 11:44:54 +08:00
Kevin Wang	14fdecfecc	[Dataset] add GLUE QQP dataset (#438 )	2023-09-27 11:36:43 +08:00
Kevin Wang	d8354fe5d8	[SIG] add GLUE_CoLA dataset (#406 ) * [Dataset] add GLUE_CoLA dataset * [update] use HFDataset to load glue/cola dataset * update --------- Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>	2023-09-27 11:30:44 +08:00
Kevin Wang	012546666b	[SIG] add WikiText-2&103 (#397 ) * fix conflict * add eval_cfg	2023-09-26 14:31:15 +08:00
liushz	c5224c2a91	[Feature] Add kaoshi dataset (#392 ) * Add ToT method * Update ToT * Update ToT * Update ToT * Update ToT * Update ToT * Add Koashi * Update Kaoshi * Update Kaoshi * Update kaoshi * Update kaoshi * Update Kaoshi * Update Kaoshi * Update Kaoshi * Update Kaoshi * update Kaoshi * update * update * fix --------- Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>	2023-09-22 18:46:33 +08:00
TTTTTiam	2a62bea1a4	add evaluation of scibench (#393 ) * add evaluation of scibench * add evaluation of scibench * update scibench * remove scibench evaluator --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-09-22 17:42:08 +08:00
Ma Zerun	0f2c388280	Support GSM8k evaluation with tools by Lagent and LangChain (#277 ) * Support GSM8k evaluation with tools by Lagent and LangChain * Avoid to use MMEngine new feature * update document --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-09-22 15:28:22 +08:00
Yike Yuan	97fdc51102	[Fix] Fix performance issue of visualglm. (#424 ) * [Fix] Visualglm performance fixed. * [Fix] Hide ckpt path.	2023-09-21 19:54:23 +08:00
Hubert	8803f7f7a6	[Feat] support antropics evals dataset (#422 ) * [Feat] support anthropics ai risk dataset * [Feat] support anthropics evals dataset * [Feat] support anthropics evals dataset	2023-09-20 18:36:44 +08:00
Yike Yuan	bd50bad8b5	[Feat] Support mm models on public dataset and fix several issues. (#412 ) * [Feat] Add public dataset support for visualglm, qwenvl, and flamingo * [Fix] MMBench related changes. * [Fix] Openflamingo inference. * [Fix] Hide ckpt path. * [Fix] Pre-commit. --------- Co-authored-by: Haodong Duan <dhd.efz@gmail.com>	2023-09-19 19:08:44 +08:00
Yuanhan Zhang	7c2726c23b	[Model] Yhzhang/add mlugowl llamaadapter (#405 ) * refine gitignore * [Feature]: Add minigpt-4 * [Feature]: Add mm local runner * [Feature]: Add instructblip * add otter and llama-adapter * add owl * add llama2-adapter and owl * lint * [Feature]: Add minigpt-4 * [Feature]: Add instructblip * add otter and llama-adapter * add owl * add llama2-adapter and owl * lint * lint * update * lint * lint * add __init__.py * update * update * update * update * [Feature]: Add minigpt-4 * [Feature]: Add mm local runner * [Feature]: Add instructblip * add otter and llama-adapter * add owl * add llama2-adapter and owl * lint * [Feature]: Add minigpt-4 * [Feature]: Add instructblip * add otter and llama-adapter * add owl * add llama2-adapter and owl * lint * lint * update * lint * lint * add __init__.py * update * update * update * update * optimize mmbench dataset args * update * update * run commit hook --------- Co-authored-by: liuyuan <3463423099@qq.com> Co-authored-by: kennymckormick <dhd@pku.edu.cn> Co-authored-by: kennymckormick <dhd.efz@gmail.com>	2023-09-19 14:21:26 +08:00
Hubert	2c15a0c01d	[Feat] refine docs and codes for more user guides (#409 )	2023-09-18 16:12:13 +08:00
Hubert	a11cb45c83	[Feat] implementation for support promptbench (#239 ) * [Feat] support adv_glue dataset for adversarial robustness * reorg files * minor fix * minor fix * support prompt bench demo * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix	2023-09-15 15:06:53 +08:00
Hubert	de8a154795	[Feat] support ds1000 dataset (#395 ) * [Feat] support ds1000 datase	2023-09-15 12:50:27 +08:00
Yuan Liu	545d50a4c0	[Fix]: Add has_image to scienceqa (#391 ) Co-authored-by: bensenliu <bensenliu@tencent.com>	2023-09-13 13:07:14 +08:00
Xidong Wang	47a752cd56	[Dataset] Add CMB (#376 ) * Add CMB * modify CMB --------- Co-authored-by: wangxidong <xidongw@163.com>	2023-09-12 19:16:41 +08:00
Tong Gao	b9b145c335	[Docs] Fix incorrect name in get_started (#380 )	2023-09-11 16:10:09 +08:00
Leymore	2c915218e8	[Feaure] Add new models: baichuan2, tigerbot, vicuna v1.5 (#373 ) * add bag of new models: baichuan2, tigerbot, vicuna v1.5 * update * re-organize models * update readme * update	2023-09-08 15:41:20 +08:00
Leymore	b48d084020	[Fix] update bbh implement & fix bbh suffix (#371 )	2023-09-08 15:14:30 +08:00
Yixiao Fang	fada77a31c	[Feature] Add open source dataset eval config of instruct-blip (#370 ) * add configs * refactor model * add post processor and prompt constructor	2023-09-08 15:07:09 +08:00
Tong Gao	b11838f80a	[Feature] Update claude2 postprocessor (#365 ) * [Feature] Update claude2 config * [Feature] Update claude2 postprocessor	2023-09-07 11:26:26 +08:00
Yike Yuan	b885ec84df	[Feat] Support Qwen-VL-Chat on MMBench. (#312 ) * [Feat] Support Qwen-VL base. * [Feat] Support Qwen-VL-Chat on MMBench. * [Fix] Add postprocessor and fix format. * [Fix] Add type hint and remove redundant codes. * [Fix] fix bugs in postprocessor. * [Fix] Use given commit id.	2023-09-06 18:42:19 +08:00
Hubert	ddb8197212	[Feat] support wizardcoder series (#344 ) * [Feat] support wizardcoder series * minor fix	2023-09-06 17:52:35 +08:00
Leymore	764c2f799a	[Fix] update qwen config (#358 )	2023-09-05 10:15:19 +08:00
Yuanhan Zhang	f2dd98ca7a	[Feat] Support LLaVA and mPLUG-Owl (#331 ) * refine gitignore * [Feature]: Add minigpt-4 * [Feature]: Add mm local runner * [Feature]: Add instructblip * add otter and llama-adapter * add owl * add llama2-adapter and owl * lint * [Feature]: Add minigpt-4 * [Feature]: Add instructblip * add otter and llama-adapter * add owl * add llama2-adapter and owl * lint * lint * update * lint * lint * add __init__.py * update * update * update --------- Co-authored-by: liuyuan <3463423099@qq.com>	2023-09-01 23:32:05 +08:00

... 2 3 4 5 6 ...

419 Commits