OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
klein	153c4fc988	[Feature] update drop dataset from openai simple eval (#1092 ) * [Feature] update drop dataset from openai simple eval * update drop template presentation * update --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-05-06 13:37:08 +08:00
Fengzhe Zhou	d43392a3bb	[Feature] Add mmlu prompt from simple_evals, openai (#1074 ) * add mmlu prompt from simple_evals, openai * return empty str on failure	2024-05-06 13:26:26 +08:00
Yang Yong	53fe390454	fix LightllmApi workers bug (#1113 )	2024-04-30 22:09:22 +08:00
Fengzhe Zhou	baed2ed9b8	update pre-commit (#891 )	2024-04-30 10:59:41 +08:00
Alexander Lam	35c94d0cde	[Feature] Adding support for LLM Compression Evaluation (#1108 ) * fixed formatting based on pre-commit tests * fixed typo in comments; reduced the number of models in the eval config * fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset * removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English	2024-04-30 10:51:01 +08:00
Ikko Eltociear Ashimine	9c79224b39	[Docs] Update README.md (#1110 ) requiresments -> requirements	2024-04-30 00:45:33 +08:00
bittersweet1999	3de48e9b35	[Bug] Fix CMB dataset (#1106 )	2024-04-30 00:33:43 +08:00
Songyang Zhang	063f5f5f49	[Update] Update performance of common benchmarks (#1109 ) * [Update] Update performance of common benchmarks * [Update] Update performance of common benchmarks * [Update] Update performance of common benchmarks	2024-04-30 00:09:08 +08:00
liushz	a6f67e1a65	[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README (#1103 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-04-28 21:58:58 +08:00
bittersweet1999	0b7de67c4a	fix prompt template (#1104 )	2024-04-28 21:54:30 +08:00
Lyu Han	1013dce60c	adapt to lmdeploy v0.4.0 (#1073 ) * adapt to lmdeploy v0.4.0 * compatible	2024-04-28 19:57:40 +08:00
Yggdrasill7D6	58a57a4c45	[Feature] add support for Flames datasets (#1093 ) * add flames datasets * fix lint * rm quota * add judgemodel info and fix os path * support flames dataset * support flames dataset --------- Co-authored-by: bittersweet1999 <1487910649@qq.com>	2024-04-28 18:56:24 +08:00
Mo Li	76dd814c4d	[Doc] Update NeedleInAHaystack Docs (#1102 ) * update NeedleInAHaystack Test Docs * update docs	2024-04-28 18:51:47 +08:00
dmitrysarov	cce5b6fbb6	fix output typing, change mutable list to immutable tuple (#989 ) * fix output typing, change mutable list to immutable tuple * import missed type * format --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 23:07:34 +08:00
binary-husky	701ecbb292	[Fix] python path bug (#1063 ) * fix relative path bug * format --------- Co-authored-by: hmp <505030475@qq.com> Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 21:58:45 +08:00
Wang Xingjin	048d41a1c4	add vllm get_ppl (#1003 ) * add vllm get_ppl * add vllm get_ppl * format --------- Co-authored-by: xingjin.wang <xingjin.wang@mihoyo.com> Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 21:31:56 +08:00
Haodong Duan	3a232db471	[Deperecate] Remove multi-modal related stuff (#1072 ) * Remove MultiModal * update index.rst * update README * remove mmbench codes * update news --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 21:20:14 +08:00
Francis-llgg	f1ee11de14	[Feature] Add gpqa prompt from simple_evals, openai (#1080 ) * add gpqa_openai_simple_eval * 触发CI构建 * reorg --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 20:13:00 +08:00
klein	e4830a6926	Update CIBench (#1089 ) * modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4 * update cibench: dataset and evluation * cibench summarizer bug * update cibench * move extract_code import --------- Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 18:46:02 +08:00
bittersweet1999	e404b72c52	[Feature] support arenahard evaluation (#1096 ) * support arenahard * support arenahard * support arenahard	2024-04-26 15:42:00 +08:00
bittersweet1999	6ba1c4937d	[Feature] Support Math evaluation via judgemodel (#1094 ) * support openai math evaluation * support openai math evaluation * support openai math evaluation * support math llm judge * support math llm judge	2024-04-26 14:56:23 +08:00
Jingming Zhuo	41196c48ae	Add humaneval prompt from simple_evals, openai (#1076 ) * [Feature] Add IFEval * add humaneval prompt from simple_evals, openai	2024-04-24 17:40:50 +08:00
liushz	17735f0c13	Fix Llama-3 meta template (#1079 ) Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-04-24 16:46:25 +08:00
Ke Bao	81d0e4d793	[Feature] Add lmdeploy tis python backend model (#1014 ) * add lmdeploy tis python backend model * fix pr check * update	2024-04-23 14:27:11 +08:00
Fengzhe Zhou	8fe7b271cc	[Fix] Fix sequential runner (#1070 )	2024-04-23 11:31:10 +08:00
Fengzhe Zhou	004ed79593	[Feature] Add TheoremQA with 5-shot (#1048 ) * add TheoremQA with 5-shot * cherry pick from add-huggingface-above-v4.33, good TheoremQA results	2024-04-22 15:22:04 +08:00
Fengzhe Zhou	a256753221	[Feature] Add LLaMA-3 Series Configs (#1065 ) * add LLaMA-3 Series configs * update readme	2024-04-22 14:39:31 +08:00
bittersweet1999	6f98c8d9ab	[Fix] Fix MultiRound Subjective Evaluation(#1043 ) * fix multiround * fix	2024-04-22 12:06:03 +08:00
Fengzhe Zhou	8c85edd1cd	[Sync] deprecate old mbpps (#1064 )	2024-04-19 20:49:46 +08:00
Robin Chen	c172401323	[Fix] Fixed repeated loading of VLLM (#1051 ) * [fix]Fixed the issue caused by the repeated loading of VLLM model during task segmentation. * [fix] avoid TypeError: VLLM.__init__() got an unexpected keyword argument 'tokenizer_only' * restore .pre-commit-config.yaml * restore opencompass/tasks/openicl_infer.py --------- Co-authored-by: IcyFeather <mengzhuo.happy@gmail.com> Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-17 20:36:08 +08:00
Songyang Zhang	629836146a	[Doc] Update README (#1053 ) * [Update] Update readme * [Update] Update readme * [Update] Update readme	2024-04-16 19:54:12 +08:00
Fengzhe Zhou	881bdbf6bd	[Sync] Bump version to 0.2.4 (#1052 ) (cherry picked from commit 16ac6306c72fa202173289b55eaefe85e0fcb73c) Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-04-16 18:09:46 +08:00
Fengzhe Zhou	7a41951dda	[Fix] logger.error -> logger.debug in OpenAI wrapper (#1050 ) * logger.error -> logger.info in OpenAI * logger.info -> logger.debug in OpenAI	2024-04-15 21:08:13 +08:00
liuwei130	a00e57296f	[Feature] Add ChemBench (#1032 ) * add ChemBench * update results * molbench -> ChemBench --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-12 08:46:26 +08:00
Fengzhe Zhou	bd7c11bb89	[Fix] Update setup.py install_requires (#1036 )	2024-04-11 11:11:34 +08:00
Fengzhe Zhou	b39f501563	[Sync] update taco (#1030 )	2024-04-09 17:50:23 +08:00
Mo Li	16f29b25f1	[Fix] Simplify needlebench summarizer (#1024 ) * Conflicts: configs/summarizers/needlebench.py * fix lint problems	2024-04-07 17:51:13 +08:00
Mo Li	f2af49337d	[Feature] Add ATC Choice Version (#1019 ) * Squashed commit of the following: commit c48ad194c3976dc63d1b60d8c8ab2d5ff9e1cbfe Author: DseidLi <2568818204@qq.com> Date: Tue Apr 2 16:57:43 2024 +0800 add atc_choice commit 3ac6efea29619573e6fac8fa3cce464853dcead0 Merge: `2d4e559` 8e3a9c3 Author: DseidLi <2568818204@qq.com> Date: Tue Apr 2 16:41:38 2024 +0800 Merge branch 'atc_choice' into atc_add_choice commit 8e3a9c396a3e5546d3faf584183f6fd60b974d5e Merge: 150a036 `0a6a03f` Author: DseidLi <2568818204@qq.com> Date: Tue Mar 26 04:47:07 2024 +0800 Merge branch 'main' into atc_choice Conflicts: configs/summarizers/needlebench.py opencompass/datasets/needlebench/multi.py opencompass/datasets/needlebench/origin.py opencompass/datasets/needlebench/parallel.py commit 150a036d6d990f26a57c974d1af83d88c31a0f9d Merge: 8d6ac9a 940dd18 Author: DseidLi <2568818204@qq.com> Date: Wed Mar 20 03:49:08 2024 +0800 Merge branch 'needlebench_fix' into atc_choice commit 8d6ac9a1a43b1c9d0f0ea27e7d58968a203ea898 Author: DseidLi <2568818204@qq.com> Date: Wed Mar 20 03:41:49 2024 +0800 optimize needlebench code commit 940dd18a4270f24bc69edd2a780182c68918e1a9 Author: DseidLi <2568818204@qq.com> Date: Wed Mar 20 03:39:46 2024 +0800 fix vllm commit d8be6877bc41051f3edcc0421c462c834c0f1c9a Merge: ecad78a `2527fda` Author: DseidLi <2568818204@qq.com> Date: Tue Mar 19 21:07:08 2024 +0800 Merge remote-tracking branch 'origin/add_1M_dataset' into atc_choice commit `2527fda8a5` Author: DseidLi <2568818204@qq.com> Date: Tue Mar 19 16:03:40 2024 +0800 add model configs commit `75425acdf8` Author: DseidLi <2568818204@qq.com> Date: Tue Mar 19 16:02:15 2024 +0800 add prompt postion args commit `367ba1ba61` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 21:40:00 2024 +0800 add Needlebench-1000K configs commit ecad78af14c4bb00fe325779114b384c57ab30bf Author: DseidLi <2568818204@qq.com> Date: Thu Mar 14 22:08:32 2024 +0800 fix atc commit 08772c0787b18872abadc9ffec3223941a5ee0c2 Merge: 9f3f8cf `caf1cf8` Author: DseidLi <2568818204@qq.com> Date: Thu Mar 14 22:07:28 2024 +0800 Merge branch 'main' into atc_choice Conflicts: configs/datasets/needlebench/readme.md configs/datasets/needlebench/readme_zh-CN.md configs/summarizers/needlebench.py opencompass/datasets/needlebench/atc.py opencompass/summarizers/needlebench.py commit 9f3f8cfb4452722734d334114ac1d14110e57406 Author: DseidLi <2568818204@qq.com> Date: Thu Mar 14 21:35:53 2024 +0800 add atc-choice test commit 52be7c1202376b4e09821188b826f1a805328129 Author: DseidLi <2568818204@qq.com> Date: Wed Mar 6 02:54:15 2024 +0800 update needlebench randomseed and add vllm qwen14b commit fc1effce596ae2e5ece4933e8cd34aef8e64a6f9 Merge: 4e747ed `caf1cf8` Author: DseidLi <2568818204@qq.com> Date: Wed Mar 6 02:51:14 2024 +0800 Merge branch 'main' into add_model_configs commit 31834f9b23af3354ac3581ec86d693d0f05cdd1c Merge: 7dabc82 `120bf8b` Author: DseidLi <2568818204@qq.com> Date: Sun Mar 3 23:29:42 2024 +0800 Merge branch 'main' of https://github.com/open-compass/opencompass into atc_choice commit 4e747ed1988ddbcfcc7fff334601259ade72d363 Author: DseidLi <2568818204@qq.com> Date: Sun Mar 3 22:15:25 2024 +0800 add internlm2-lmdeploy model and gemma configs commit 7dabc828123d711c8cf834d6aab4137bb55e85ed Author: DseidLi <2568818204@qq.com> Date: Sat Mar 2 17:26:15 2024 +0800 add atc choice version -ZH commit `996f8ae43d` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 16:58:56 2024 +0800 update readme for needlebench commit `f7266e873c` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 16:44:53 2024 +0800 move readme.md commit `1c7375681d` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 16:38:31 2024 +0800 fix linting error commit `b6524f3ebf` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 16:33:51 2024 +0800 lint summarizer commit `c0d1190e39` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 16:29:03 2024 +0800 add needlebench intro, fix summarizer commit `0965baf785` Author: DseidLi <2568818204@qq.com> Date: Mon Feb 26 13:31:26 2024 +0800 fix bug in needlebench summarizer commit `5d32b31eb8` Author: DseidLi <2568818204@qq.com> Date: Sat Feb 24 03:19:08 2024 +0800 update act prompt commit `af82a7f085` Merge: `32bf9fe` `53fe788` Author: DseidLi <2568818204@qq.com> Date: Fri Feb 23 17:50:32 2024 +0800 Merge remote-tracking branch 'upstream/main' into needlebench commit `32bf9fe802` Author: DseidLi <2568818204@qq.com> Date: Fri Feb 23 17:31:32 2024 +0800 simplify needlebench 32k, 128k, 200k for eval commit `a7cb025e05` Author: DseidLi <2568818204@qq.com> Date: Fri Feb 23 14:48:58 2024 +0800 add needlebench * fix summarizer * remove repeated code * remove chinese comments	2024-04-07 15:46:20 +08:00
Mo Li	b50d163265	[Fix] Refactor Needlebench Configs for CLI Testing Support (#1020 ) * add needlebench datasets suffix * fix import * update run.py args for summarizer key and dataset suffix * update utils/run.py	2024-04-07 15:12:56 +08:00
bittersweet1999	2d4e559763	[Feature] Add multi-model judge and fix some problems (#1016 ) * support multi-model judge and moe judge * test_moe * test_moe * test * add moe judge * support multi-judge-model	2024-04-02 11:52:06 +08:00
Y0oMu	c220550fb9	updates docs (#1015 ) Co-authored-by: youmuspc <yejiayi2004@outlook.com>	2024-04-02 10:30:04 +08:00
bittersweet1999	02e7eec911	[Feature] Support AlpacaEval_V2 (#1006 ) * support alpacaeval_v2 * support alpacaeval * update docs * update docs	2024-03-28 16:49:04 +08:00
Mo Li	0a6a03fe1a	[Feature] update needlebench and configs (#986 ) * add Needlebench-1000K configs * add prompt postion args * add model configs * Update parallel.py * fix lint	2024-03-25 18:05:01 +08:00
bittersweet1999	0665bb91a8	[Fix] Quick fix (#995 )	2024-03-22 19:54:19 +08:00
Chaseldot	1d3198554b	[Fix] base.py change status into list (#994 )	2024-03-22 17:06:34 +08:00
Ke Bao	e415ddf96a	[Fix] Fix turbomind_tis (#992 )	2024-03-22 15:50:12 +08:00
bittersweet1999	054e9fa7e5	[Feature] add one script for subjective (#993 ) * add one script for subjective * add one script for subjective * add one script for subjective * add one script for subjective --------- Co-authored-by: thebestannie <1290646445@qq.com>	2024-03-20 23:20:41 +08:00
Connor-Shen	0221d30877	[Fix] Update APPS/TACO (#988 ) * [Feature] update apps/taco * [Feature] update apps/taco	2024-03-19 20:21:39 +08:00
Connor-Shen	8a3c6e51ed	[Feature] Update APPS (#985 ) * update post process * update post process	2024-03-19 15:47:05 +08:00
Connor-Shen	d92595b671	[Feat] Support TACO (#966 ) * [Feat] Support TACO * update README * update README	2024-03-19 15:39:16 +08:00

... 3 4 5 6 7 ...

724 Commits