OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
hailsham	a81bbb85bf	[FIX] Added handling for the "begin section" in meta_template to APITemplateParser (#1405 ) Co-authored-by: leifei <nuuooo@icloud.com>	2024-09-19 18:12:04 +08:00
Songyang Zhang	5a27c2bd6f	[Model] Support Qwen2.5 Instruct (#1543 )	2024-09-19 16:16:07 +08:00
Songyang Zhang	be460fbb21	[Feature] Support OpenAI O1 models (#1539 ) * [Feature] Support OpenAI O1 models * Update README.md --------- Co-authored-by: liushz <qq1791167085@163.com>	2024-09-18 22:41:17 +08:00
liushz	2e9db77d57	[Feature] Add custom model postprocess function (#1519 ) Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-18 14:40:51 +08:00
liushz	c9a7026f59	[Feature] Update MathBench & WikiBench for FullBench (#1521 ) * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update GPQA & MMLU_Pro * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-18 14:35:30 +08:00
Linchen Xiao	90279b6461	[Feature] Dataset prompts update for ARC, BoolQ, Race (#1527 )	2024-09-13 10:30:43 +08:00
Songyang Zhang	6997990c93	[Feature] Update Models (#1518 ) * Update Models * Update * Update humanevalx * Update * Update	2024-09-12 23:35:30 +08:00
zhulinJulia24	3754dc1b67	update (#1522 ) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-09-12 15:00:52 +08:00
bittersweet1999	7c7fa36235	[Feature] add support for internal Followbench (#1511 ) * fix pip version * fix pip version * add internal followbench * add internal followbench * fix lint * fix lint	2024-09-11 13:32:34 +08:00
Linchen Xiao	317763381c	update (#1517 )	2024-09-11 13:31:20 +08:00
bittersweet1999	c2bcd8725e	[Fix] Fix wildbench (#1508 ) * fix pip version * fix pip version * fix_wildbench	2024-09-10 17:35:07 +08:00
Alexander Lam	a31a77c5c1	[Feature] Add SciCode summarizer config (#1514 ) * [Feature] added SciCode summarizer config and dataset config for with background evaluation * fix lint issues * removed unnecessary type in summarizer group	2024-09-10 16:06:02 +08:00
Linchen Xiao	b5f8afb57b	[Bump] Bump version to 0.3.2.post1	2024-09-06 19:09:30 +08:00
Linchen Xiao	f04f3546bc	[Fix] Import fix (#1500 )	2024-09-06 18:29:24 +08:00
Linchen Xiao	ff18545f0e	[Bump] Bump version to 0.3.2 (#1497 )	2024-09-06 16:10:45 +08:00
Linchen Xiao	87ffa71d68	[Feature] Longbench dataset update	2024-09-06 15:50:12 +08:00
Albert Yan	928d0cfc3a	[Feature] Add support for Rendu API (#1468 ) * Add support for Rendu API * fix lint issue * fix lint issue * fix lint issue * Update --------- Co-authored-by: 13190 <zeyu.yan@transn.com> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-09-06 01:00:43 +08:00
Hari Seldon	faf5260155	[Feature] Optimize Evaluation Speed of SciCode (#1489 ) * update scicode * update comments * remove redundant variable * Update --------- Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-09-06 00:59:41 +08:00
liushz	00fc8da5be	[Feature] Add model postprocess function (#1484 ) * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-05 21:10:29 +08:00
Maxime SHE	45efdc994d	[Feature] Add an attribute api_key into TurboMindAPIModel default None (#1475 ) Co-authored-by: Maxime <maximeshe@163.com> Add an attribute api_key into TurboMindAPIModel default None then we can set the api_key while using lmdeploy to deploy the llm model	2024-09-05 17:51:16 +08:00
Linchen Xiao	6c9cd9a260	[Feature] Needlebench auto-download update (#1480 ) * update * update * update	2024-09-05 17:22:42 +08:00
zhulinJulia24	716d46e1f5	[ci] fix badcase and add env info (#1491 ) * update * update --------- Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-09-05 16:43:45 +08:00
zhulinJulia24	fb6a0df652	[ci] fix test env for vllm and add vllm baselines (#1481 ) * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update --------- Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-09-04 19:24:09 +08:00
Linchen Xiao	da74cbfa39	[Fix] Model configs update	2024-09-04 18:57:10 +08:00
Linchen Xiao	9693be46b7	[Feature] Mmlu-pro auto-download (#1464 ) * update * update * update * update * update	2024-08-30 10:03:40 +08:00
Alexander Lam	8b39225259	[Feature] Added `extra_body` support for OpenAISDK; Added support for proxy URL when connecting to OpenAI's API. (#1467 ) * fix lint issues * fix lint issues	2024-08-29 00:43:43 +08:00
Guoli Yin	a488b9b4f5	[Feature] Make OPENAI_API_BASE compatible with openai default env (#1461 ) * Make OPENAI_API_BASE compatible with openai default env * Make OPENAI_API_BASE compatible with openai default env --------- Co-authored-by: Guoli Yin <gyin@icloud.com>	2024-08-28 23:14:41 +08:00
Songyang Zhang	e5a8eb2283	[Feature] Update Lint and Leaderboard (#1458 ) * [Feature] Update Lint and Leaderboard * Update * Update	2024-08-28 22:36:42 +08:00
Linchen Xiao	245664f4c0	[Feature] Fullbench v0.1 language update (#1463 ) * update * update * update * update	2024-08-28 14:01:05 +08:00
CHEN PENGAN	463231c651	[Feature] Add icl_sliding_k_retriever.py and update __init__.py (#1305 ) * Add icl_sliding_k_retriever.py and update __init__.py * Fix flake8, isort, and yapf issues for Sliding Window Retriever	2024-08-23 17:18:31 +08:00
Linchen Xiao	94b6bd65fc	[Fix] Fix cli evaluation for multiple models (#1454 ) * update * update	2024-08-23 17:15:36 +08:00
Songyang Zhang	5485207fbe	[Bump] Bump version to 0.3.1 (#1450 ) * [Bump] Bump version 0.3.1 * Update	2024-08-23 10:47:57 +08:00
Songyang Zhang	7c2d25b557	[Fix] Update SciCode and Gemma model (#1449 ) * [Fix] Update SciCode and Gemma model * Update * Update	2024-08-23 10:42:27 +08:00
Xu Song	ad3931aa32	Update openicl_infer.py (#1308 )	2024-08-23 10:39:22 +08:00
liushz	9fdbc744dc	[Fix] Update option postprocess & mathbench language summarizer (#1413 ) * Update option postprocess & mathbench language summarizer * Update option postprocess & mathbench language summarizer --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-08-22 14:49:07 +08:00
Linchen Xiao	0fe9756c5d	[Doc] Update Readme (#1439 ) * update * update * update * update * update * update * update * update * update * update * update * update	2024-08-22 14:48:45 +08:00
Hari Seldon	14b4b735cb	[Feature] Add support for SciCode (#1417 ) * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode w/ bg * add scicode * Update README.md * Update README.md * Delete configs/eval_SciCode.py * rename * 1 * rename * Update README.md * Update scicode.py * Update scicode.py * fix some bugs * Update * Update --------- Co-authored-by: root <HariSeldon0> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-08-22 13:42:25 +08:00
liushz	d3963bceae	[Bug] Add model support for 'huggingface_above_v4_33' when using '-a' (#1430 ) Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-08-22 13:40:24 +08:00
seetimee	ac093fce53	[Update] Update openai_api.py (#1438 ) Most models' token limits are above 32k. It will fix long context dataset test bug of skiping some data.	2024-08-21 18:57:49 +08:00
liushz	e076dc5acf	[Fix] Fix openai api tiktoken bug for api server (#1433 ) * Fix openai api tiktoken * Fix openai api tiktoken --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-08-20 22:02:14 +08:00
Linchen Xiao	a4b54048ae	[Feature] Add Ruler datasets (#1310 ) * [Feature] Add Ruler datasets * pre-commit fixed * Add model specific tokenizer to dataset * pre-commit modified * remove unused import * fix linting * add trust_remote to tokenizer load * lint fix * comments resolved * fix lint * Add readme * Fix lint * ruler refactorize * fix lint * lint fix * updated * lint fix * fix wonderwords import issue * prompt modified * update * readme updated * update * ruler dataset added * Update --------- Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-08-20 11:40:11 +08:00
Xu Song	99b5122ed5	[Feature] Add abbr for rolebench dataset (#1431 ) * Add abbr for rolebench dataset * add	2024-08-20 11:22:48 +08:00
Linchen Xiao	ecf9bb3e4c	[Bug] Commonsenseqa dataset fix (#1425 ) * longbench dataset load fix * update * Update * Update * Update * update * update --------- Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-08-16 15:54:07 +08:00
Songyang Zhang	9b3613f10b	[Update] Support auto-download of FOFO/MT-Bench-101 (#1423 ) * [Update] Support auto-download of FOFO/MT-Bench-101 * Update wildbench	2024-08-16 11:57:41 +08:00
bittersweet1999	ce7f4853ce	[Fix] Sub summarizer order fix (#1426 ) * fix pip version * fix pip version * fix sub summarizer order * fix order	2024-08-15 21:08:18 +08:00
Linchen Xiao	2596f226f4	[Fix] longbench dataset load fix (#1422 )	2024-08-15 11:30:30 +08:00
Linchen Xiao	8e55c9c6ee	[Update] Compassbench v1.3 (#1396 ) * stash files * compassbench subjective evaluation added * evaluation update * fix lint * update docs * Update lint * changes saved * changes saved * CompassBench subjective summarizer added (#1349) * subjective summarizer added * fix lint [Fix] Fix MathBench (#1351) Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> [Update] Update model support list (#1353) * fix pip version * fix pip version * update model support subjective summarizer updated knowledge, math objective done (data need update) remove secrets objective changes saved knowledge data added * secrets removed * changed added * summarizer modified * summarizer modified * compassbench coding added * fix lint * objective summarizer updated * compass_bench_v1.3 updated * update files in config folder * remove unused model * lcbench modified * removed model evaluation configs * remove duplicated sdk implementation --------- Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-08-12 19:09:19 +08:00
changyeyu	59586a8b4a	[Feature] Enable Truncation of Mid-Section for Long Prompts in `huggingface_above_v4_33.py` (#1373 ) * Retain the first and last halves of the tokens from the prompt, discarding the middle, to avoid exceeding the model's maximum length. * Add default parameter: mode * Modified a comment. * Modified variable names. * fix yapf lint	2024-08-09 11:36:30 +08:00
Songyang Zhang	88eb91219b	[Doc] Update README (#1404 ) * [Doc] Update README * Update	2024-08-08 16:18:33 +08:00
yaoyingyy	decb621ff6	[Fix] the issue where scores are negative in the Lawbench dataset evaluation(#1402 ) (#1403 )	2024-08-08 16:08:26 +08:00
Yunlin Mao	818d72a650	[Fix] modelscope dataset load problem (#1406 ) * fix modelscope dataset load * fix lint	2024-08-08 14:01:06 +08:00
Songyang Zhang	264fd23129	[Bump] Bump version for v0.3.0 (#1398 )	2024-08-07 01:25:24 +08:00
Songyang Zhang	fed1a4998b	[Fix] Fix CaLM import (#1395 )	2024-08-06 12:17:45 +08:00
Songyang Zhang	c81329b548	[Fix] Fix Slurm ENV (#1392 ) 1. Support Slurm Cluster 2. Support automatic data download 3. Update InternLM2.5-1.8B/20B-Chat	2024-08-06 01:35:20 +08:00
Songyang Zhang	c09fc79ba8	[Feature] Support OpenAI ChatCompletion (#1389 ) * [Feature] Support import configs/models/summarizers from whl * Update * Update openai sdk * Update * Update gemma	2024-08-01 19:10:13 +08:00
Peng Bo	07c96ac659	Calm dataset (#1385 ) * Add CALM Dataset	2024-08-01 10:03:21 +08:00
Songyang Zhang	46cc7894e1	[Feature] Support import configs/models/summarizers from whl (#1376 ) * [Feature] Support import configs/models/summarizers from whl * Update LCBench configs * Update * Update * Update * Update * update * Update * Update * Update * Update * Update	2024-08-01 00:42:48 +08:00
Songyang Zhang	33ceaa0eb8	[Bug] Fix bug in turbomind (#1377 )	2024-07-30 09:37:50 +08:00
Songyang Zhang	eee5a5be23	[Fix] Update get_data_path for LCBench and HumanEval (#1375 )	2024-07-29 19:28:09 +08:00
Songyang Zhang	704853e5e7	[Feature] Update pip install (#1324 ) * [Feature] Update pip install * Update Configuration * Update * Update * Update * Update Internal Config * Update collect env	2024-07-29 18:32:50 +08:00
Xingjun.Wang	edab1c07ba	[Feature] Support ModelScope datasets (#1289 ) * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * udpate dataset for modelscope support * update readme * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * update readme * remove tydiqa japanese subset * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * update readme * udpate dataset for modelscope support * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * remove tydiqa japanese subset * update util * remove .DS_Store * fix md format * move util into package * update docs/get_started.md * restore eval_api_zhipu_v2.py, add environment setting * Update dataset * Update * Update * Update * Update --------- Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local> Co-authored-by: Yunnglin <mao.looper@qq.com> Co-authored-by: Yun lin <yunlin@laptop.local> Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn> Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-07-29 13:48:32 +08:00
jxd	12b84aeb3b	[Feature] Update CHARM Memeorziation (#1230 ) * update gemini api and add gemini models * add openai models * update CHARM evaluation * add CHARM memorization tasks * add CharmMemSummarizer (output eval details for memorization-independent reasoning analysis * update CHARM readme --------- Co-authored-by: wujiang <wujiang@pjlab.org.cn>	2024-07-26 18:42:30 +08:00
bittersweet1999	d3782c1d47	Revert "Calm dataset (#1287 )" (#1366 ) This reverts commit `edd0ffdf70`.	2024-07-26 18:27:29 +08:00
Peng Bo	edd0ffdf70	Calm dataset (#1287 ) * add calm dataset * modify config max_out_len * update README * Modify README * update README * update README * update README * update README * update README * add summarizer and modify readme * delete summarizer config comment * update summarizer * modify same response to all questions * update README	2024-07-26 11:48:16 +08:00
mqy004	a08931f214	[Fix] origin_prompt should be None in llm-compression task (#1225 ) Co-authored-by: Qinyang Mou <qinyang_mou@intsig.net>	2024-07-26 11:46:02 +08:00
LeavittLang	8ee7fecb68	Adding support for Doubao API (#1218 ) * Adding support for Doubao API * Update doubao_api.py Fixed the bug that the connection would be retried even if it was normal. * Update doubao_api.py --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-07-26 11:44:51 +08:00
klein	65fad8e2ac	[Fix] minor update wildbench (#1335 ) * update crb * update crbbench * update crbbench * update crbbench * minor update wildbench * [Fix] Update doc of wildbench, and merge wildbench into subjective * [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench * Update crb.md * Update crb_pair_judge.py * Update crb_single_judge.py * Update subjective_evaluation.md * Update openai_api.py * [Update] update wildbench readme * [Update] update wildbench readme * [Update] update wildbench readme, remove crb * Delete configs/eval_subjective_wildbench_pair.py * Delete configs/eval_subjective_wildbench_single.py * Update __init__.py --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-07-26 11:19:04 +08:00
baymax591	51a94aee01	[Bug] fix bug: delete & (#1365 ) Co-authored-by: 白超 <baichao19@huawei.com>	2024-07-26 11:03:55 +08:00
Mo Li	69aa2f2d57	[Feature] Make NeedleBench available on HF (#1364 ) * update_lint * update_huggingface format * fix bug * update docs	2024-07-25 19:01:56 +08:00
Fengzhe Zhou	c3c02c2960	update docs (#1318 ) * update docs * 高效评测 -> 数据分片 * update * update * Update faq.md --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-07-25 18:44:25 +08:00
heya5	73aa55af6d	[Fix] Support HF models deployed with an OpenAI-compatible API. (#1352 ) * Support HF models deployed with an OpenAI-compatible API. * resolve lint issue * add extra_body arguments There are many other arguments when using openi-compatiable API like this: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters-for-chat-api * fix linting issue * fix yapf linting issue	2024-07-25 18:38:23 +08:00
WANG WENJIN	0aad8199c7	Fix the summary error in subjective.py (#1363 )	2024-07-25 18:36:13 +08:00
Linchen Xiao	8127fc3518	CompassBench subjective summarizer added (#1349 ) * subjective summarizer added * fix lint	2024-07-23 12:29:57 +08:00
Que Haoran	a244453d9e	[Feature] Support inference ppl datasets (#1315 ) * commit inference ppl datasets * revised format * revise * revise * revise * revise * revise * revise	2024-07-22 17:59:30 +08:00
liushz	98c58f8a6c	[Feature] Add compassbench knowledge&math part (#1342 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Update acclerator * Update MathBench * Update accelerator * Add Doc for accelerator * Add Doc for accelerator * Add Doc for accelerator * Add Doc for accelerator * Update compassbench august wiki&math * Update compassbench august wiki&math * Update compassbench august wiki&math * Update compassbench_aug_gen_068af0.py * Update compassbench_aug_gen_068af0.py * Update --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-07-19 22:54:46 +08:00
bittersweet1999	1f9f728f22	[Feature] support compassbench Checklist evaluation (#1339 ) * fix pip version * fix pip version * support checklist eval * init * add lan * fix typo	2024-07-19 16:40:44 +08:00
Mo Li	f40add2596	[Fix] Fix lint (#1334 ) * update needlebench docs * update model_name_mapping dict * update README * fix_lint	2024-07-18 17:15:06 +08:00
Xu Song	1bfb4217ff	Fix typing and typo (#1331 )	2024-07-18 13:41:24 +08:00
Mo Li	104bddf647	[Doc] Update NeedleBench Docs (#1330 ) * update needlebench docs * update model_name_mapping dict * update README * Update README_zh-CN.md --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-07-18 13:16:19 +08:00
bittersweet1999	8e7ad2e981	[Fix] add bc for alignbench summarizer (#1306 ) * fix pip version * fix pip version * fix alignbench * fix import error	2024-07-12 11:06:20 +08:00
Fengzhe Zhou	62f55987f1	force register (#1311 )	2024-07-11 19:59:35 +08:00
Fengzhe Zhou	a62c613d3e	[Sync] bump version 0.2.6+local (#1294 )	2024-07-06 00:44:06 +08:00
Fengzhe Zhou	1d3a26c732	[Doc] quick start swap tabs (#1263 ) * [doc] quick start swap tabs * update docs * update * update * update * update * update * update * update	2024-07-05 23:51:42 +08:00
bittersweet1999	68ca48496b	[Refactor] Reorganize subjective eval (#1284 ) * fix pip version * fix pip version * reorganize subjective eval * reorg sub * reorg subeval * reorg subeval * update subjective doc * reorg subeval * reorg subeval	2024-07-05 22:11:37 +08:00
baymax591	28eba6fe34	npu适配 (#1250 ) * npu适配 * Add suport for Ascend NPU * format --------- Co-authored-by: baymax591 <14428251+baymax591@user.noreply.gitee.com> Co-authored-by: Leymore <zfz-960727@163.com>	2024-07-03 18:55:19 +08:00
Fengzhe Zhou	a32f21a356	[Sync] Sync with internal codes 2024.06.28 (#1279 )	2024-06-28 14:16:34 +08:00
Xingyuan Bu	842fb1cd70	Update mtbench101.py (#1276 ) fix wrong-used import from torch.utils.data import DataLoader, Dataset	2024-06-26 00:40:22 +08:00
klein	1fa62c4a42	Support wildbench (#1266 ) Co-authored-by: Leymore <zfz-960727@163.com>	2024-06-24 13:16:27 +08:00
bittersweet1999	982e024540	[Feature] add dataset Fofo (#1224 ) * add fofo dataset * add dataset fofo	2024-06-06 11:40:48 +08:00
Xingyuan Bu	02a0a4e857	MT-Bench-101 (#1215 ) * add mt-bench-101 * add readme and requirements * add mt-bench-101 data * Update readme_mtbench101.md * update readme * update leaderboard * fix typo * Update readme_mtbench101.md * fit newest opencompass * update readme.md * mtbench101 to opencompass * mtbench101 to opencompass * for code review * for code review * for code review * hook * hook --------- Co-authored-by: liujie <ljie@buaa.edu.cn>	2024-06-03 14:52:12 +08:00
mqy004	b272803d8a	解决release版本安装后不能导入opencompass.cli.main的问题 (#1221 ) * Create __init__.py * Create __init__.py * Create __init__.py * Create __init__.py * Create __init__.py * Create __init__.py * format --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-05-31 13:23:33 +08:00
bittersweet1999	7c381e5be8	[Fix] fix summarizer (#1217 ) * fix summarizer * fix summarizer	2024-05-31 11:40:47 +08:00
Fengzhe Zhou	a77b8a5cec	[Sync] format (#1214 )	2024-05-30 00:21:58 +08:00
Fengzhe Zhou	d656e818f8	[Docs] Remove --no-batch-padding and Use --hf-num-gpus (#1205 ) * [Docs] Remove --no-batch-padding and Use -hf-num-gpus * update	2024-05-29 16:30:10 +08:00
Fengzhe Zhou	2954913d9b	[Sync] bump version (#1204 )	2024-05-28 23:09:59 +08:00
liushz	ba620c4afe	Update accelerator (#1195 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Update acclerator * Update MathBench * Update accelerator --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-05-28 17:17:54 +08:00
jxd	608ff5810d	support CHARM (https://github.com/opendatalab/CHARM ) reasoning tasks (#1190 ) * support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks * fix lint error * add dataset card for CHARM * minor refactor * add txt --------- Co-authored-by: wujiang <wujiang@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2024-05-27 13:48:22 +08:00
bittersweet1999	88c14d3d04	add support for lmdeploy api judge (#1193 )	2024-05-24 23:28:56 +08:00
yaoyingyy	749e4cea71	[Fix] temporary files using tempfile (#1186 ) Co-authored-by: yaoying <yaoying@kingsoft.com>	2024-05-24 23:27:37 +08:00
Fengzhe Zhou	2b3d4150f3	[Sync] update evaluator (#1175 )	2024-05-21 14:22:46 +08:00
Fengzhe Zhou	5de85406ce	[Sync] add OC16 entry (#1171 )	2024-05-17 16:50:58 +08:00
Fengzhe Zhou	8ea2c404d7	[Feat] enable HuggingFacewithChatTemplate with --accelerator via cli (#1163 ) * enable HuggingFacewithChatTemplate with --accelerator via cli * rm vllm_internlm2_chat_7b	2024-05-15 21:51:07 +08:00
liushz	e3c0448bbc	Update accelerator (#1152 ) * Update acclerator * update run --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>	2024-05-15 14:31:47 +08:00
Fengzhe Zhou	f10dd48f9c	[Fix] Update stop_words in huggingface_above_v4_33 (#1160 )	2024-05-15 14:10:33 +08:00
Fengzhe Zhou	80f831b425	[Fix] use ProcessPoolExecutor during mbpp eval (#1159 )	2024-05-15 13:48:29 +08:00
bittersweet1999	8a8987be0b	fix arenahard summarizer (#1154 ) Co-authored-by: Leymore <zfz-960727@163.com>	2024-05-15 13:31:29 +08:00
Fengzhe Zhou	62dbf04708	[Sync] update github workflow (#1156 )	2024-05-14 22:42:23 +08:00
Fengzhe Zhou	7505b3cadf	[Feature] Add huggingface apply_chat_template (#1098 ) * add TheoremQA with 5-shot * add huggingface_above_v4_33 classes * use num_worker partitioner in cli * update theoremqa * update TheoremQA * add TheoremQA * rename theoremqa -> TheoremQA * update TheoremQA output path * rewrite many model configs * update huggingface * further update * refine configs * update configs * update configs * add configs/eval_llama3_instruct.py * add summarizer multi faceted * update bbh datasets * update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py * rename class * update readme * update hf above v4.33	2024-05-14 14:50:16 +08:00
Mo Li	6c711cb262	[Fix] Fix Needlebench Summarizer (#1143 ) * update few-shot example * add 128k	2024-05-13 15:59:34 +08:00
bittersweet1999	833a35140b	[Fix] fix alpacaeval while add caching path (#1139 ) * fix alpacaeval * fix alpacaeval	2024-05-11 14:02:26 +08:00
Fengzhe Zhou	19d7e630d6	[Sync] Update accelerator (#1122 ) (cherry picked from commit 4beb6d9ab655d8a626971841b7acfd9fae9d438f) Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-05-09 14:32:31 +08:00
bittersweet1999	826d8307ac	fix links (#1120 )	2024-05-08 15:13:18 +08:00
JuhaoLiang	d2c40e5648	[Feature] Add AceGPT-MMLUArabic benchmark (#1099 ) * add AceGPT-MMLUArabic benchmark * update readme and fix lint issue * remove unused package * add MMLUArabic zero-shot settings * rename filename and update readme	2024-05-08 15:00:26 +08:00
Fangyu Lei	862044fb7d	[Feature] Add S3Eval Dataset (#916 ) * s3eval_branch * update s3eval	2024-05-06 19:41:52 +08:00
Yggdrasill7D6	af10ecc272	add mgsm datasets (#1081 ) * add mgsm datasets * fix lint * fix lint * update mgsm * update mgsm * ease code spell * update * update * update --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-05-06 15:29:34 +08:00
klein	153c4fc988	[Feature] update drop dataset from openai simple eval (#1092 ) * [Feature] update drop dataset from openai simple eval * update drop template presentation * update --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-05-06 13:37:08 +08:00
Fengzhe Zhou	d43392a3bb	[Feature] Add mmlu prompt from simple_evals, openai (#1074 ) * add mmlu prompt from simple_evals, openai * return empty str on failure	2024-05-06 13:26:26 +08:00
Yang Yong	53fe390454	fix LightllmApi workers bug (#1113 )	2024-04-30 22:09:22 +08:00
Alexander Lam	35c94d0cde	[Feature] Adding support for LLM Compression Evaluation (#1108 ) * fixed formatting based on pre-commit tests * fixed typo in comments; reduced the number of models in the eval config * fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset * removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English	2024-04-30 10:51:01 +08:00
bittersweet1999	3de48e9b35	[Bug] Fix CMB dataset (#1106 )	2024-04-30 00:33:43 +08:00
liushz	a6f67e1a65	[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README (#1103 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-04-28 21:58:58 +08:00
Lyu Han	1013dce60c	adapt to lmdeploy v0.4.0 (#1073 ) * adapt to lmdeploy v0.4.0 * compatible	2024-04-28 19:57:40 +08:00
Yggdrasill7D6	58a57a4c45	[Feature] add support for Flames datasets (#1093 ) * add flames datasets * fix lint * rm quota * add judgemodel info and fix os path * support flames dataset * support flames dataset --------- Co-authored-by: bittersweet1999 <1487910649@qq.com>	2024-04-28 18:56:24 +08:00
dmitrysarov	cce5b6fbb6	fix output typing, change mutable list to immutable tuple (#989 ) * fix output typing, change mutable list to immutable tuple * import missed type * format --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 23:07:34 +08:00
binary-husky	701ecbb292	[Fix] python path bug (#1063 ) * fix relative path bug * format --------- Co-authored-by: hmp <505030475@qq.com> Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 21:58:45 +08:00
Wang Xingjin	048d41a1c4	add vllm get_ppl (#1003 ) * add vllm get_ppl * add vllm get_ppl * format --------- Co-authored-by: xingjin.wang <xingjin.wang@mihoyo.com> Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 21:31:56 +08:00
Haodong Duan	3a232db471	[Deperecate] Remove multi-modal related stuff (#1072 ) * Remove MultiModal * update index.rst * update README * remove mmbench codes * update news --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 21:20:14 +08:00
Francis-llgg	f1ee11de14	[Feature] Add gpqa prompt from simple_evals, openai (#1080 ) * add gpqa_openai_simple_eval * 触发CI构建 * reorg --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 20:13:00 +08:00
klein	e4830a6926	Update CIBench (#1089 ) * modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4 * update cibench: dataset and evluation * cibench summarizer bug * update cibench * move extract_code import --------- Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn> Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-26 18:46:02 +08:00
bittersweet1999	e404b72c52	[Feature] support arenahard evaluation (#1096 ) * support arenahard * support arenahard * support arenahard	2024-04-26 15:42:00 +08:00
bittersweet1999	6ba1c4937d	[Feature] Support Math evaluation via judgemodel (#1094 ) * support openai math evaluation * support openai math evaluation * support openai math evaluation * support math llm judge * support math llm judge	2024-04-26 14:56:23 +08:00
Ke Bao	81d0e4d793	[Feature] Add lmdeploy tis python backend model (#1014 ) * add lmdeploy tis python backend model * fix pr check * update	2024-04-23 14:27:11 +08:00
Fengzhe Zhou	8fe7b271cc	[Fix] Fix sequential runner (#1070 )	2024-04-23 11:31:10 +08:00
Fengzhe Zhou	004ed79593	[Feature] Add TheoremQA with 5-shot (#1048 ) * add TheoremQA with 5-shot * cherry pick from add-huggingface-above-v4.33, good TheoremQA results	2024-04-22 15:22:04 +08:00
bittersweet1999	6f98c8d9ab	[Fix] Fix MultiRound Subjective Evaluation(#1043 ) * fix multiround * fix	2024-04-22 12:06:03 +08:00
Fengzhe Zhou	8c85edd1cd	[Sync] deprecate old mbpps (#1064 )	2024-04-19 20:49:46 +08:00
Robin Chen	c172401323	[Fix] Fixed repeated loading of VLLM (#1051 ) * [fix]Fixed the issue caused by the repeated loading of VLLM model during task segmentation. * [fix] avoid TypeError: VLLM.__init__() got an unexpected keyword argument 'tokenizer_only' * restore .pre-commit-config.yaml * restore opencompass/tasks/openicl_infer.py --------- Co-authored-by: IcyFeather <mengzhuo.happy@gmail.com> Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-17 20:36:08 +08:00
Fengzhe Zhou	881bdbf6bd	[Sync] Bump version to 0.2.4 (#1052 ) (cherry picked from commit 16ac6306c72fa202173289b55eaefe85e0fcb73c) Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-04-16 18:09:46 +08:00
Fengzhe Zhou	7a41951dda	[Fix] logger.error -> logger.debug in OpenAI wrapper (#1050 ) * logger.error -> logger.info in OpenAI * logger.info -> logger.debug in OpenAI	2024-04-15 21:08:13 +08:00
liuwei130	a00e57296f	[Feature] Add ChemBench (#1032 ) * add ChemBench * update results * molbench -> ChemBench --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-04-12 08:46:26 +08:00
Fengzhe Zhou	b39f501563	[Sync] update taco (#1030 )	2024-04-09 17:50:23 +08:00
Mo Li	16f29b25f1	[Fix] Simplify needlebench summarizer (#1024 ) * Conflicts: configs/summarizers/needlebench.py * fix lint problems	2024-04-07 17:51:13 +08:00
Mo Li	f2af49337d	[Feature] Add ATC Choice Version (#1019 ) * Squashed commit of the following: commit c48ad194c3976dc63d1b60d8c8ab2d5ff9e1cbfe Author: DseidLi <2568818204@qq.com> Date: Tue Apr 2 16:57:43 2024 +0800 add atc_choice commit 3ac6efea29619573e6fac8fa3cce464853dcead0 Merge: `2d4e559` 8e3a9c3 Author: DseidLi <2568818204@qq.com> Date: Tue Apr 2 16:41:38 2024 +0800 Merge branch 'atc_choice' into atc_add_choice commit 8e3a9c396a3e5546d3faf584183f6fd60b974d5e Merge: 150a036 `0a6a03f` Author: DseidLi <2568818204@qq.com> Date: Tue Mar 26 04:47:07 2024 +0800 Merge branch 'main' into atc_choice Conflicts: configs/summarizers/needlebench.py opencompass/datasets/needlebench/multi.py opencompass/datasets/needlebench/origin.py opencompass/datasets/needlebench/parallel.py commit 150a036d6d990f26a57c974d1af83d88c31a0f9d Merge: 8d6ac9a 940dd18 Author: DseidLi <2568818204@qq.com> Date: Wed Mar 20 03:49:08 2024 +0800 Merge branch 'needlebench_fix' into atc_choice commit 8d6ac9a1a43b1c9d0f0ea27e7d58968a203ea898 Author: DseidLi <2568818204@qq.com> Date: Wed Mar 20 03:41:49 2024 +0800 optimize needlebench code commit 940dd18a4270f24bc69edd2a780182c68918e1a9 Author: DseidLi <2568818204@qq.com> Date: Wed Mar 20 03:39:46 2024 +0800 fix vllm commit d8be6877bc41051f3edcc0421c462c834c0f1c9a Merge: ecad78a `2527fda` Author: DseidLi <2568818204@qq.com> Date: Tue Mar 19 21:07:08 2024 +0800 Merge remote-tracking branch 'origin/add_1M_dataset' into atc_choice commit `2527fda8a5` Author: DseidLi <2568818204@qq.com> Date: Tue Mar 19 16:03:40 2024 +0800 add model configs commit `75425acdf8` Author: DseidLi <2568818204@qq.com> Date: Tue Mar 19 16:02:15 2024 +0800 add prompt postion args commit `367ba1ba61` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 21:40:00 2024 +0800 add Needlebench-1000K configs commit ecad78af14c4bb00fe325779114b384c57ab30bf Author: DseidLi <2568818204@qq.com> Date: Thu Mar 14 22:08:32 2024 +0800 fix atc commit 08772c0787b18872abadc9ffec3223941a5ee0c2 Merge: 9f3f8cf `caf1cf8` Author: DseidLi <2568818204@qq.com> Date: Thu Mar 14 22:07:28 2024 +0800 Merge branch 'main' into atc_choice Conflicts: configs/datasets/needlebench/readme.md configs/datasets/needlebench/readme_zh-CN.md configs/summarizers/needlebench.py opencompass/datasets/needlebench/atc.py opencompass/summarizers/needlebench.py commit 9f3f8cfb4452722734d334114ac1d14110e57406 Author: DseidLi <2568818204@qq.com> Date: Thu Mar 14 21:35:53 2024 +0800 add atc-choice test commit 52be7c1202376b4e09821188b826f1a805328129 Author: DseidLi <2568818204@qq.com> Date: Wed Mar 6 02:54:15 2024 +0800 update needlebench randomseed and add vllm qwen14b commit fc1effce596ae2e5ece4933e8cd34aef8e64a6f9 Merge: 4e747ed `caf1cf8` Author: DseidLi <2568818204@qq.com> Date: Wed Mar 6 02:51:14 2024 +0800 Merge branch 'main' into add_model_configs commit 31834f9b23af3354ac3581ec86d693d0f05cdd1c Merge: 7dabc82 `120bf8b` Author: DseidLi <2568818204@qq.com> Date: Sun Mar 3 23:29:42 2024 +0800 Merge branch 'main' of https://github.com/open-compass/opencompass into atc_choice commit 4e747ed1988ddbcfcc7fff334601259ade72d363 Author: DseidLi <2568818204@qq.com> Date: Sun Mar 3 22:15:25 2024 +0800 add internlm2-lmdeploy model and gemma configs commit 7dabc828123d711c8cf834d6aab4137bb55e85ed Author: DseidLi <2568818204@qq.com> Date: Sat Mar 2 17:26:15 2024 +0800 add atc choice version -ZH commit `996f8ae43d` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 16:58:56 2024 +0800 update readme for needlebench commit `f7266e873c` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 16:44:53 2024 +0800 move readme.md commit `1c7375681d` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 16:38:31 2024 +0800 fix linting error commit `b6524f3ebf` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 16:33:51 2024 +0800 lint summarizer commit `c0d1190e39` Author: DseidLi <2568818204@qq.com> Date: Wed Feb 28 16:29:03 2024 +0800 add needlebench intro, fix summarizer commit `0965baf785` Author: DseidLi <2568818204@qq.com> Date: Mon Feb 26 13:31:26 2024 +0800 fix bug in needlebench summarizer commit `5d32b31eb8` Author: DseidLi <2568818204@qq.com> Date: Sat Feb 24 03:19:08 2024 +0800 update act prompt commit `af82a7f085` Merge: `32bf9fe` `53fe788` Author: DseidLi <2568818204@qq.com> Date: Fri Feb 23 17:50:32 2024 +0800 Merge remote-tracking branch 'upstream/main' into needlebench commit `32bf9fe802` Author: DseidLi <2568818204@qq.com> Date: Fri Feb 23 17:31:32 2024 +0800 simplify needlebench 32k, 128k, 200k for eval commit `a7cb025e05` Author: DseidLi <2568818204@qq.com> Date: Fri Feb 23 14:48:58 2024 +0800 add needlebench * fix summarizer * remove repeated code * remove chinese comments	2024-04-07 15:46:20 +08:00
Mo Li	b50d163265	[Fix] Refactor Needlebench Configs for CLI Testing Support (#1020 ) * add needlebench datasets suffix * fix import * update run.py args for summarizer key and dataset suffix * update utils/run.py	2024-04-07 15:12:56 +08:00
bittersweet1999	2d4e559763	[Feature] Add multi-model judge and fix some problems (#1016 ) * support multi-model judge and moe judge * test_moe * test_moe * test * add moe judge * support multi-judge-model	2024-04-02 11:52:06 +08:00
bittersweet1999	02e7eec911	[Feature] Support AlpacaEval_V2 (#1006 ) * support alpacaeval_v2 * support alpacaeval * update docs * update docs	2024-03-28 16:49:04 +08:00
Mo Li	0a6a03fe1a	[Feature] update needlebench and configs (#986 ) * add Needlebench-1000K configs * add prompt postion args * add model configs * Update parallel.py * fix lint	2024-03-25 18:05:01 +08:00
Chaseldot	1d3198554b	[Fix] base.py change status into list (#994 )	2024-03-22 17:06:34 +08:00
Ke Bao	e415ddf96a	[Fix] Fix turbomind_tis (#992 )	2024-03-22 15:50:12 +08:00
Connor-Shen	0221d30877	[Fix] Update APPS/TACO (#988 ) * [Feature] update apps/taco * [Feature] update apps/taco	2024-03-19 20:21:39 +08:00
Connor-Shen	8a3c6e51ed	[Feature] Update APPS (#985 ) * update post process * update post process	2024-03-19 15:47:05 +08:00
Connor-Shen	d92595b671	[Feat] Support TACO (#966 ) * [Feat] Support TACO * update README * update README	2024-03-19 15:39:16 +08:00
bittersweet1999	c78a4df923	add support for set prediction path (#984 )	2024-03-19 14:32:15 +08:00
Jingming	89a8a8917b	[Feature] Add the implement of QuALITY datasets (#976 ) #976	2024-03-15 21:22:38 +08:00
Connor-Shen	3098d78845	[Bench] Support APPS (#963 ) * [Feat] support apps * [Feat] support apps * [Feat] support apps * update README	2024-03-13 16:09:23 +08:00
Fengzhe Zhou	ab6cdb2be8	[Sync] Bump version 0.2.3 (#957 )	2024-03-12 11:51:56 +08:00
Fengzhe Zhou	64fde73b15	[Fix] Use logger.error on failure (#960 )	2024-03-12 11:51:39 +08:00
Fengzhe Zhou	bdd85358cc	[Sync] update 20240308 (#953 )	2024-03-11 22:34:19 +08:00
bittersweet1999	848e7c8a76	[fix] add different temp for different question in mtbench (#954 ) * add temp for mtbench * add document for mtbench * add document for mtbench	2024-03-11 17:24:39 +08:00
Yang Yong	3829be87b1	Fix LightllmApi ppl test (#951 )	2024-03-08 12:04:44 +08:00
Yang Yong	107e022cf4	Support prompt template for LightllmApi. Update LightllmApi token bucket. (#945 )	2024-03-06 15:33:53 +08:00
RunningLeon	c54a5d3b0f	Support get_ppl for TurbomindModel (#878 ) * update ppl for turbomindmodel * update api_server * rename config and set thread_safe for pytorch engine if possible	2024-03-06 11:44:19 +08:00
Fengzhe Zhou	b03d5dc531	[Sync] Sync Internal (#941 )	2024-03-04 14:42:36 +08:00
yuantao2108	bbec7d8733	[Feature] add lveval benchmark (#914 ) * add lveval benchmark * add LVEval readme file * update LVEval readme file * Update configs/eval_bluelm_32k_lveval.py * Update configs/eval_llama2_7b_lveval.py --------- Co-authored-by: yuantao <yuantao@infini-ai.com> Co-authored-by: Mo Li <82895469+DseidLi@users.noreply.github.com>	2024-03-04 11:22:03 +08:00
Mo Li	8142f399a8	[Feature] Upgrade the needle-in-a-haystack experiment to Needlebench (#913 ) * add needlebench * simplify needlebench 32k, 128k, 200k for eval * update act prompt * fix bug in needlebench summarizer * add needlebench intro, fix summarizer * lint summarizer * fix linting error * move readme.md * update readme for needlebench * update docs of needlebench * simplify needlebench summarizers	2024-03-04 11:10:52 +08:00
Kdump	3e9844ed33	[Fix]Fixed the problem of never entering task.run() mode in local scheduling mode. (#930 ) * Fixed the problem of never entering task.run() mode in local scheduling mode. get_command_template方法中为命令行前缀添加了CUDA_VISIBLE_DEVICES=或set CUDA_VISIBLE_DEVICES=。导致task.run()分支失效。 --------- CUDA_VISIBLE_DEVICES= or set CUDA_VISIBLE_DEVICES= is added to the command line prefix in the get_command_template method. Causes the task.run() branch to fail. * [Fix]Fixed the problem of never entering task.run() mode in local scheduling mode. get_command_template方法中为命令行前缀添加了CUDA_VISIBLE_DEVICES=或set CUDA_VISIBLE_DEVICES=。导致task.run()分支失效。 --- CUDA_VISIBLE_DEVICES= or set CUDA_VISIBLE_DEVICES= is added to the command line prefix in the get_command_template method. Causes the task.run() branch to fail. * [Fix]Fixed the problem of never entering task.run() mode in local scheduling mode. get_command_template方法中为命令行前缀添加了CUDA_VISIBLE_DEVICES=或set CUDA_VISIBLE_DEVICES=。导致task.run()分支失效。 CUDA_VISIBLE_DEVICES= or set CUDA_VISIBLE_DEVICES= is added to the command line prefix in the get_command_template method. Causes the task.run() branch to fail.	2024-02-29 14:35:45 +08:00
Skyfall-xzz	4c45a71bbc	[Feature] Support OpenFinData (#896 ) * [Feature] Support OpenFinData * add README for OpenFinData * update README	2024-02-29 12:55:07 +08:00
bittersweet1999	001e77fea2	[Feature] add support for gemini (#931 ) * add gemini * add gemini * add gemini	2024-02-28 19:38:34 +08:00
Fengzhe Zhou	9afbfa3639	[Sync] Fix TEvalEvaluator (#929 )	2024-02-28 16:05:30 +08:00
Fengzhe Zhou	5ce8e0450e	[Fix] Fix type hint in IFEval (#915 )	2024-02-28 10:53:40 +08:00
Jingming	53fe788d27	[Fix] fix ifeval (#909 )	2024-02-23 16:52:03 +08:00
bittersweet1999	45c606bcd0	[Fix] Fix IFEval (#906 ) * fix ifeval * fix ifeval * fix ifeval * fix ifeval	2024-02-22 16:51:34 +08:00
RunningLeon	32ba0b074e	Support lmdeploy pytorch engine (#875 ) * add lmdeploy pytorch model * fix * speed up encoding and decoding * fix * change tokenizer	2024-02-22 03:46:07 -03:00
Yang Yong	b6e21ece38	Support LightllmApi input_format (#888 )	2024-02-19 10:02:59 +08:00
Fengzhe Zhou	08133e060a	[Sync] Bump version to 0.2.2 (#880 )	2024-02-07 10:45:48 +08:00
hailsham	dd444685bb	fix bug of gsm8k_postprocess (#863 ) * fix bug of gsm8k_postprocess * update postprocess --------- Co-authored-by: Lei Fei <SENSETIME\leifei1@cn3114002087l.domain.sensetime.com> Co-authored-by: Leymore <zfz-960727@163.com>	2024-02-06 23:52:47 +08:00
Connor-Shen	444d8d9507	[feat] support multipl-e (#846 ) * [feat] support humaneval_multipl-e * format --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-02-06 23:30:28 +08:00
Yggdrasill7D6	a6c49f15ce	fix lawbench 2-1 f0.5 score calculation bug (#795 ) * fix lawbench 2-1 f0.5 score calculation bug * use path in overall datasets folder --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-02-06 22:20:11 +08:00
bittersweet1999	1c8e193de8	[Fix] hotfix for mtbench (#877 ) * hotfix for mtbench * hotfix	2024-02-06 21:26:47 +08:00
Fengzhe Zhou	d34ba11106	[Sync] Merge branch 'dev' into zfz/update-keyset-demo (#876 )	2024-02-05 23:29:10 +08:00
Skyfall-xzz	7ad1168062	Support NPHardEval (#835 ) * support NPHardEval * add .md file and fix minor bugs * refactor and minor fix --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-02-05 15:52:28 +08:00
Yuchen Yan	fed7d800c6	[Fix] Fix error in gsm8k evaluator (#782 ) Co-authored-by: jiangjin1999 <1261842974@qq.com>	2024-02-04 22:55:11 +08:00
bittersweet1999	7806cd0f64	[Feature] support alpacaeval (#809 ) * support alpacaeval_v1 * Update opencompass/summarizers/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/summarizers/subjective/alpacaeval_v1.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * fix conflict * support alpacaeval v2 * support alpacav2 --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-02-04 14:18:36 +08:00
RunningLeon	4c87e777d8	[Feature] Add end_str for turbomind (#859 ) * fix * update * fix internlm1 * fix docs * remove sys	2024-02-01 22:31:14 +08:00
bittersweet1999	5c6dc908cd	fix compass arena (#854 )	2024-01-30 16:34:38 +08:00
Songyang Zhang	cdca59ff49	[Fix] Update Zhipu API and Fix issue min_out_len issue of API models (#847 ) * Update zhipu api and fix min_out_len issue of API class * Update example * Update example	2024-01-28 14:52:43 +08:00
Jingming	2801883351	[Fix] Fix acc of IFEval (#849 ) * [Feature] Add IFEval * [Fix] Changing the Score Rule.	2024-01-27 22:27:07 +08:00
Xiaoming Shi	35aace776a	[Fix] Update MedBench (#845 )	2024-01-26 17:56:13 +08:00
Songyang Zhang	8ed022b4c4	Update Sensetime API (#844 )	2024-01-26 16:40:49 +08:00
Hubert	4aa74565e2	[Feat] minor update agent related (#839 ) * [Feat] update cibench * [Feat] Support CIBench * [Feat] Support CIBench * [Feat] Support CIBench * [Feat] Support CIBench	2024-01-26 14:15:51 +08:00
Fengzhe Zhou	0991dd33a0	[Sync] Updata dataset cfg for internMath (#837 ) Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-01-24 16:30:32 +08:00
Songyang Zhang	793e32c9cc	[Feature] Update API implementation (#834 )	2024-01-24 13:35:21 +08:00
bittersweet1999	2ee8e8a1a1	[Feature] add mtbench (#829 ) * add mtbench * add mtbench * Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/mtbench.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * fix mtbench --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-24 12:11:47 +08:00
Jingming	e059a5c2bf	[Feature] Add IFEval (#813 ) * [Feature] Add IFEval * [Doc] add introduction of IFEval	2024-01-23 20:07:49 +08:00
bittersweet1999	3d9bb4aed7	[Fix] fix strings (#833 ) * add compass arena * add compass_arena * add compass arena * Update opencompass/summarizers/subjective/compass_arena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/summarizers/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/compass_arena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/eval_subjective_compassarena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/compassarena/compassarena_compare.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/eval_subjective_compassarena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/compassarena/compassarena_compare.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * fix check position bias * fix string --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-23 10:57:26 +00:00
bittersweet1999	2d4da8dd02	[Feature] Add CompassArena (#828 ) * add compass arena * add compass_arena * add compass arena * Update opencompass/summarizers/subjective/compass_arena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/summarizers/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/compass_arena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update opencompass/datasets/subjective/__init__.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/eval_subjective_compassarena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/compassarena/compassarena_compare.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/eval_subjective_compassarena.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * Update configs/datasets/subjective/compassarena/compassarena_compare.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * fix check position bias --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-23 15:12:46 +08:00
Guo Qipeng	e975a96fa1	Update cdme config and evaluator (#812 ) * update cdme config and evaluator * fix cdme prompt * move CDME trim post-processor as a separate evaluator --------- Co-authored-by: 郭琦鹏 <guoqipeng@pjlab.org.cn>	2024-01-19 11:29:27 +08:00
Yang Yong	f09a2ff418	Add LightllmApi KeyError log & Update doc (#816 ) * Add LightllmApi KeyError log * Update LightllmApi doc	2024-01-18 22:23:38 +08:00
RunningLeon	61fe873c89	[Fix] Fix turbomind and update docs (#808 ) * update * update docs * add engine_config and gen_config in eval_config * update * fix * fix * fix * fix docstr * fix url	2024-01-18 14:41:35 +08:00
Fengzhe Zhou	b4afe3e7c1	[Sync] Add InternLM2 Keyset Evaluation Demo (#807 ) Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>	2024-01-17 13:48:12 +08:00
Mo Li	acae560911	Added support for multi-needle testing in needle-in-a-haystack test (#802 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues * add English version support * change NeedleInAHaystackDataset to dynamic loading * change NeedleInAHaystackDataset to dynamic loading * fix needleinahaystack test eval bug * fix needleinahaystack config bug * Added support for multi-needle testing in needle-in-a-haystack test * Optimize the code for plotting in the needle-in-a-haystack test. * Correct the typo in the dataset parameters. * update needleinahaystack test docs --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-17 13:47:34 +08:00
RunningLeon	0836aec67b	[Feature] Update evaluate turbomind (#804 ) * update * fix * fix * fix	2024-01-17 11:09:50 +08:00
bittersweet1999	814b3f73bd	reorganize subject files (#801 )	2024-01-16 18:03:11 +08:00
bittersweet1999	83d6c48378	[Feature] Add configs for creationbench (#791 ) * add creationv2_zh * add creationv2_zh * add eng config for creationbench * add eng config for creationbench * add eng config for creationbench	2024-01-12 14:20:21 +08:00
notoschord	d3a0ddc3ef	[Feature] Add support for Nanbeige API (#786 ) Co-authored-by: notoschord <wangzekai@kanzhun.com>	2024-01-11 13:54:27 +08:00
bittersweet1999	5679edb490	add temperature in alles (#787 )	2024-01-11 03:57:24 +00:00
Xiaoming Shi	ad872a5dc2	[Feature] Update MedBench (#779 ) * update medbench * medbench update * format medbench * format * Update * update * update * update suffix --------- Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org> Co-authored-by: Leymore <zfz-960727@163.com>	2024-01-09 11:42:44 +08:00
Fengzhe Zhou	a74e4c1a8d	[Sync] Bump version to 0.2.1 (#778 )	2024-01-08 14:56:28 +00:00
Fengzhe Zhou	32f40a8f83	[Sync] Sync with internal codes 2023.01.08 (#777 )	2024-01-08 14:07:24 +00:00
jiangjin1999	8194199d79	[Feature] _batch_generate function, add the MultiTokenEOSCriteria (#772 ) * jiangjin1999: in the _batch_generate function, add the MultiTokenEOSCriteria feature to speed up inference. * jiangjin1999: in the _batch_generate function, add the MultiTokenEOSCriteria feature to speed up inference. --------- Co-authored-by: jiangjin08 <jiangjin08@MBP-2F32S5MD6P-0029.local> Co-authored-by: jiangjin08 <jiangjin08@a.sh.vip.dianping.com>	2024-01-08 16:40:02 +08:00
liyucheng09	0b2863039e	[Feature] Contamination analysis for MMLU, Hellaswag, and ARC_c (#699 ) * Contamination analysis for ARC_c, mmlu, and Hellaswag * update `eval_contamination.py` * update `contamination.py` summarizer * fix `eval_contamination.py` * add mmlu groups for contamination analysis	2024-01-08 15:51:48 +08:00
Connor-Shen	30a90d8dd8	Support Mbpp_plus dataset (#770 ) * support mbpp+ * support mbpp+ * minor fix * [Feat] minor fix --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2024-01-05 22:01:57 +08:00
bittersweet1999	3c606cb712	quick fix for postprocess pred extraction (#771 )	2024-01-05 21:10:18 +08:00
bittersweet1999	2163f9398f	[Feature] add subject ir dataset (#755 ) * add subject ir * Add ir dataset * Add ir dataset	2024-01-05 12:00:57 +00:00
bittersweet1999	be369c3e06	[Feature] Add multi_round dataset evaluation (#766 ) * multi_round dataset * add multi_round evaluation	2024-01-04 10:37:52 +00:00
bittersweet1999	7cd65d49d8	[Fix] Fix small bug in alignbench (#764 ) * fix small bugs * fix small bugs	2024-01-03 07:44:53 +00:00
Chris Liu	3eb225a5e6	[Feature] Support LLaMA2-Accessory (#732 ) * Support LLaMA2-Accessory * remove strip * clear imports * reformat * fix lint * fix lint * update readme * update readme * update readme * update readme	2024-01-02 20:48:51 +08:00
HUANG Fei	ba027eeeac	[Feature] Add support of qwen api (#735 )	2024-01-02 20:47:12 +08:00
Mo Li	33f8df1ca3	[Update] Change NeedleInAHaystackDataset to dynamic dataset loading (#754 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues * add English version support * change NeedleInAHaystackDataset to dynamic loading * change NeedleInAHaystackDataset to dynamic loading * fix needleinahaystack test eval bug * fix needleinahaystack config bug --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-01-02 17:22:56 +08:00
Francis-llgg	b69fe2343b	[Feature] Add GPQA Dataset (#729 ) * check * message * add * change prompt * change a para nameq * modify name of the file * delete an useless file	2024-01-01 15:54:40 +08:00
Francis-llgg	ef3ae63539	[Feature] Add new dataset mastermath2024v1 (#744 ) * add new dataset mastermath2024v1 * change it to simplified chinese prompt * change file name	2024-01-01 15:53:24 +08:00
Mo Li	17b8e929dd	[Feature] Update plot function in tools_needleinahaystack.py (#747 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test * update plot function in tools_needleinahaystack.py * optimizing needleinahaystack dataset generation strategy * modify minor formatting issues --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-12-29 18:51:09 +08:00
Hubert	327951087f	[Feat] update code config (#749 ) * [Feat] update code dataset * [Feat] update code dataset * [Feat] update code dataset	2023-12-29 18:46:34 +08:00
bittersweet1999	fe0b717033	add creationbench (#753 )	2023-12-29 10:03:44 +00:00
Connor-Shen	81098722d2	add chinese version of humaneval, mbpp (#743 ) * add chinese_version of humaneval,mbpp * add humaneval&mbpp gen.py * minor fix * minor add --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-12-28 14:47:56 +08:00
bittersweet1999	db919f0191	[Fix] SubSizePartition fix (#746 ) * fix subjective_eval * subject_eval partition situation fixed * subject_eval partition situation fixed	2023-12-28 11:46:46 +08:00
Hubert	0a525985e8	[Feature] Support sanitized MBPP dataset (#745 )	2023-12-27 22:17:23 +08:00
bittersweet1999	dfd9ac0fd9	[Feature] Add other judgelm prompts for Alignbench (#731 ) * add judgellm prompts * add judgelm prompts * update import info * fix situation that no abbr in config * fix situation that no abbr in config * add summarizer for other judgellm * change config name * add maxlen * add maxlen * dict assert * dict assert * fix strings * fix strings	2023-12-27 17:54:53 +08:00
Yang Yong	54345c56b7	Update LightllmApi and Fix mmlu bug (#738 ) * Update LightllmApi and Fix mmlu bug * checkout mmlu_gen_a484b3.py --------- Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-27 13:49:08 +08:00
philipwangOvO	34561ececb	[Feature] Add InfiniteBench (#739 ) * add InfiniteBench * add InfiniteBench --------- Co-authored-by: wangchonghua <wangchonghua@pjlab.org.cn>	2023-12-26 15:36:27 +08:00
Fengzhe Zhou	3a68083ecc	[Sync] update configs (#734 )	2023-12-25 21:59:16 +08:00
AllentDan	336d8d76ff	add turbomind restful api support (#693 ) * add turbomind restful api support * config * top_p 0.8 * top_k = 1	2023-12-24 01:40:00 +08:00
bittersweet1999	e985100cd1	[Fix] Fix subjective alignbench (#730 )	2023-12-23 20:06:53 +08:00
Mo Li	0e24f4213e	[Feature] Add NeedleInAHaystack Test Support (#714 ) * Add NeedleInAHaystack Test * Apply pre-commit formatting * Update configs/eval_hf_internlm_chat_20b_cdme.py Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> * add needle in haystack test * update needle in haystack test --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2023-12-23 12:00:51 +08:00
RunningLeon	e34c552282	[Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend (#721 ) * add llama2 test * fix * test qwen chat-7b * test w4 * add baichuan2 * update * update * update configs and docs * update	2023-12-21 18:22:17 +08:00
bittersweet1999	fbb912ddf3	[Feature] Add abbr for judgemodel in subjective evaluation (#724 ) * add_judgemodel_abbr * add judgemodel abbr	2023-12-21 15:58:20 +08:00
Skyfall-xzz	b35d991786	[Feature] Add ReasonBench(Internal) dataset (#577 ) * [Feature] Add reasonbench dataset * add configs for supporting generative inference & merge datasets in the same category * modify config filename to prompt version * fix codes to meet pre-commit requirements * lint the code to meet pre-commit requirements * Align Load_data Sourcecode Briefly * fix bugs * reduce code redundancy	2023-12-20 17:57:42 +08:00
Jingming	76a95e9e81	[Feature] Support the use of humaneval_plus. (#720 ) * [Feature] Support the use of humaneval_plus. * [Feature] Add humaneval_plus_gen.py * minor check * [Fix] Fix bug --------- Co-authored-by: yingfhu <yingfhu@gmail.com>	2023-12-20 17:25:17 +08:00
bittersweet1999	97c2068bd9	[Feature] Add JudgeLLMs (#710 ) * add judgellms * add judgellms * add sub_size_partition * add docs * add ref	2023-12-19 18:40:25 +08:00
Hubert	eda72e756e	[Fix] minor fix openai (#711 )	2023-12-18 15:45:31 +08:00
Songyang Zhang	637628a70f	[Doc] Update Doc for Alignbench (#707 ) * update alignmentbench * update alignmentbench * update doc * update * update	2023-12-15 15:07:25 +08:00
DseidLi	db2920326a	[Fix] remove redundant in gsm8k.py (#700 ) Removed redundant code in GSM8KDataset.load method.	2023-12-14 19:55:58 +08:00
Songyang Zhang	bfe4aa2af5	[Fix] Update alignmentbench (#704 ) * update alignmentbench * update alignmentbench * update alignmentbench	2023-12-14 18:24:21 +08:00
bittersweet1999	1fe152b3e8	[Feature] Support AlignmentBench infer and judge (#697 ) * alignmentbench infer and judge * alignmentbench * alignmentbench done * alignment all done * alignment all done	2023-12-13 19:59:30 +08:00
Hubert	a94598d921	[Feat] update python action and slurm (#694 )	2023-12-13 10:41:10 +08:00
bittersweet1999	6130394165	[Feature] Add double order of subjective evaluation and removing duplicated response among two models (#692 ) * add features * add doc string * add doc string	2023-12-12 20:58:17 +08:00
Hubert	4780b39eda	[Sync] format (#690 ) Co-authored-by: Leymore <zfz-960727@163.com>	2023-12-12 14:03:45 +08:00
bittersweet1999	3e77175720	[Fix] Hotfix for Subjective Evaluation (#686 )	2023-12-12 09:22:08 +08:00
bittersweet1999	465308e430	[Feature] Add Subjective Evaluation (#680 ) * new version of subject * fixed draw * fixed draw * fixed draw * done * done * done * done * fixed lint	2023-12-11 22:22:11 +08:00
Hubert	4f0b373a0a	[Fix] fix docstring (#684 )	2023-12-11 19:12:01 +08:00

... 3 4 5 6 7 ...

651 Commits