OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
bittersweet1999	e0d7808b4e	[Fix] fix pip version (#1228 ) * fix pip version * fix pip version	2024-06-06 11:48:07 +08:00
bittersweet1999	982e024540	[Feature] add dataset Fofo (#1224 ) * add fofo dataset * add dataset fofo	2024-06-06 11:40:48 +08:00
Xingyuan Bu	02a0a4e857	MT-Bench-101 (#1215 ) * add mt-bench-101 * add readme and requirements * add mt-bench-101 data * Update readme_mtbench101.md * update readme * update leaderboard * fix typo * Update readme_mtbench101.md * fit newest opencompass * update readme.md * mtbench101 to opencompass * mtbench101 to opencompass * for code review * for code review * for code review * hook * hook --------- Co-authored-by: liujie <ljie@buaa.edu.cn>	2024-06-03 14:52:12 +08:00
bittersweet1999	7c381e5be8	[Fix] fix summarizer (#1217 ) * fix summarizer * fix summarizer	2024-05-31 11:40:47 +08:00
Fengzhe Zhou	a77b8a5cec	[Sync] format (#1214 )	2024-05-30 00:21:58 +08:00
Fengzhe Zhou	d59189b87f	[Doc] Update running command in README (#1206 )	2024-05-30 00:06:39 +08:00
Fengzhe Zhou	0b50112dc1	[Fix] Rollback opt model configs (#1213 )	2024-05-30 00:03:22 +08:00
Xu Song	808582d952	Fix VLLM argument error (#1207 )	2024-05-29 10:14:08 +08:00
Fengzhe Zhou	2954913d9b	[Sync] bump version (#1204 )	2024-05-28 23:09:59 +08:00
Fengzhe Zhou	9fa80b0f93	[Feat] Update charm summary (#1194 )	2024-05-27 16:17:01 +08:00
jxd	608ff5810d	support CHARM (https://github.com/opendatalab/CHARM ) reasoning tasks (#1190 ) * support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks * fix lint error * add dataset card for CHARM * minor refactor * add txt --------- Co-authored-by: wujiang <wujiang@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2024-05-27 13:48:22 +08:00
bittersweet1999	07a6dacf33	fix length (#1180 )	2024-05-24 23:30:01 +08:00
klein	5eb8f14d97	[Fix] Fix drop_gen.py (#1191 ) Fix the bug in drop_gen: wrong import	2024-05-24 23:17:50 +08:00
bittersweet1999	31afe87026	fix yi-chat template (#1178 )	2024-05-21 18:14:12 +08:00
liushz	1448be00e2	Update MathBench (#1176 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Update acclerator * Update MathBench --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-05-21 14:45:43 +08:00
Fengzhe Zhou	2b3d4150f3	[Sync] update evaluator (#1175 )	2024-05-21 14:22:46 +08:00
Fengzhe Zhou	5de85406ce	[Sync] add OC16 entry (#1171 )	2024-05-17 16:50:58 +08:00
Fengzhe Zhou	8ea2c404d7	[Feat] enable HuggingFacewithChatTemplate with --accelerator via cli (#1163 ) * enable HuggingFacewithChatTemplate with --accelerator via cli * rm vllm_internlm2_chat_7b	2024-05-15 21:51:07 +08:00
Fengzhe Zhou	62dbf04708	[Sync] update github workflow (#1156 )	2024-05-14 22:42:23 +08:00
Fengzhe Zhou	aa2dd2b58c	[Format] Add config lints (#892 )	2024-05-14 15:35:58 +08:00
Xu Song	3dbba11945	[Feat] Support dataset_suffix check for mixed configs (#973 ) * [Feat] Support dataset_suffix check for mixed configs * update mixed suffix * update suffix --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-05-14 15:03:28 +08:00
Fengzhe Zhou	7505b3cadf	[Feature] Add huggingface apply_chat_template (#1098 ) * add TheoremQA with 5-shot * add huggingface_above_v4_33 classes * use num_worker partitioner in cli * update theoremqa * update TheoremQA * add TheoremQA * rename theoremqa -> TheoremQA * update TheoremQA output path * rewrite many model configs * update huggingface * further update * refine configs * update configs * update configs * add configs/eval_llama3_instruct.py * add summarizer multi faceted * update bbh datasets * update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py * rename class * update readme * update hf above v4.33	2024-05-14 14:50:16 +08:00
Mo Li	6c711cb262	[Fix] Fix Needlebench Summarizer (#1143 ) * update few-shot example * add 128k	2024-05-13 15:59:34 +08:00
bittersweet1999	5432dfc1ff	fix multiround (#1146 )	2024-05-13 15:58:39 +08:00
bittersweet1999	833a35140b	[Fix] fix alpacaeval while add caching path (#1139 ) * fix alpacaeval * fix alpacaeval	2024-05-11 14:02:26 +08:00
Alexander Lam	a71122ee18	[Feature] Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs (#1123 ) * added qwen moe and mixtral 8x22 model configs * updated README files news section	2024-05-09 11:04:26 +08:00
Mo Li	cb080fa7de	[Fix] Fix NeedleBench Summarizer Typo (#1125 ) * update needleinahaystack eval docs * update needlebench summarizer * fix english docs typo	2024-05-08 20:00:15 +08:00
JuhaoLiang	d2c40e5648	[Feature] Add AceGPT-MMLUArabic benchmark (#1099 ) * add AceGPT-MMLUArabic benchmark * update readme and fix lint issue * remove unused package * add MMLUArabic zero-shot settings * rename filename and update readme	2024-05-08 15:00:26 +08:00
Fangyu Lei	862044fb7d	[Feature] Add S3Eval Dataset (#916 ) * s3eval_branch * update s3eval	2024-05-06 19:41:52 +08:00
Xu Song	d501710155	[Fix] Fix AGIEval chinese sets (#972 ) * [Fix] Fix AGIEval chinese sets * Create agieval_gen_617738.py * [Fix] Fix AGIEval chinese sets * Restore agieval_gen_64afd3.py * Update agieval_gen.py * Create agieval_mixed_0fa998.py * Update agieval_mixed.py	2024-05-06 15:31:42 +08:00
Yggdrasill7D6	af10ecc272	add mgsm datasets (#1081 ) * add mgsm datasets * fix lint * fix lint * update mgsm * update mgsm * ease code spell * update * update * update --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-05-06 15:29:34 +08:00
klein	153c4fc988	[Feature] update drop dataset from openai simple eval (#1092 ) * [Feature] update drop dataset from openai simple eval * update drop template presentation * update --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-05-06 13:37:08 +08:00
Fengzhe Zhou	d43392a3bb	[Feature] Add mmlu prompt from simple_evals, openai (#1074 ) * add mmlu prompt from simple_evals, openai * return empty str on failure	2024-05-06 13:26:26 +08:00
Yang Yong	53fe390454	fix LightllmApi workers bug (#1113 )	2024-04-30 22:09:22 +08:00
Alexander Lam	35c94d0cde	[Feature] Adding support for LLM Compression Evaluation (#1108 ) * fixed formatting based on pre-commit tests * fixed typo in comments; reduced the number of models in the eval config * fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset * removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English	2024-04-30 10:51:01 +08:00
liushz	a6f67e1a65	[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README (#1103 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-04-28 21:58:58 +08:00
bittersweet1999	0b7de67c4a	fix prompt template (#1104 )	2024-04-28 21:54:30 +08:00
Yggdrasill7D6	58a57a4c45	[Feature] add support for Flames datasets (#1093 ) * add flames datasets * fix lint * rm quota * add judgemodel info and fix os path * support flames dataset * support flames dataset --------- Co-authored-by: bittersweet1999 <1487910649@qq.com>	2024-04-28 18:56:24 +08:00
Mo Li	76dd814c4d	[Doc] Update NeedleInAHaystack Docs (#1102 ) * update NeedleInAHaystack Test Docs * update docs	2024-04-28 18:51:47 +08:00
Haodong Duan	3a232db471	[Deperecate] Remove multi-modal related stuff (#1072 ) * Remove MultiModal * update index.rst * update README * remove mmbench codes * update news --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 21:20:14 +08:00
Francis-llgg	f1ee11de14	[Feature] Add gpqa prompt from simple_evals, openai (#1080 ) * add gpqa_openai_simple_eval * 触发CI构建 * reorg --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 20:13:00 +08:00
klein	e4830a6926	Update CIBench (#1089 ) * modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4 * update cibench: dataset and evluation * cibench summarizer bug * update cibench * move extract_code import --------- Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 18:46:02 +08:00
bittersweet1999	e404b72c52	[Feature] support arenahard evaluation (#1096 ) * support arenahard * support arenahard * support arenahard	2024-04-26 15:42:00 +08:00
bittersweet1999	6ba1c4937d	[Feature] Support Math evaluation via judgemodel (#1094 ) * support openai math evaluation * support openai math evaluation * support openai math evaluation * support math llm judge * support math llm judge	2024-04-26 14:56:23 +08:00
Jingming Zhuo	41196c48ae	Add humaneval prompt from simple_evals, openai (#1076 ) * [Feature] Add IFEval * add humaneval prompt from simple_evals, openai	2024-04-24 17:40:50 +08:00
liushz	17735f0c13	Fix Llama-3 meta template (#1079 ) Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-04-24 16:46:25 +08:00
Ke Bao	81d0e4d793	[Feature] Add lmdeploy tis python backend model (#1014 ) * add lmdeploy tis python backend model * fix pr check * update	2024-04-23 14:27:11 +08:00
Fengzhe Zhou	004ed79593	[Feature] Add TheoremQA with 5-shot (#1048 ) * add TheoremQA with 5-shot * cherry pick from add-huggingface-above-v4.33, good TheoremQA results	2024-04-22 15:22:04 +08:00
Fengzhe Zhou	a256753221	[Feature] Add LLaMA-3 Series Configs (#1065 ) * add LLaMA-3 Series configs * update readme	2024-04-22 14:39:31 +08:00
bittersweet1999	6f98c8d9ab	[Fix] Fix MultiRound Subjective Evaluation(#1043 ) * fix multiround * fix	2024-04-22 12:06:03 +08:00

1 2 3 4 5 ...

313 Commits