OpenCompass

mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

Author	SHA1	Message	Date
Songyang Zhang	c789ce5698	[Fix] the automatically download for several datasets (#1652 ) * [Fix] the automatically download for several datasets * Update * Update * Update CI	2024-11-01 15:57:18 +08:00
Linchen Xiao	695738a89b	[Update] Add lmdeploy DeepSeek configs (#1656 ) * [Update] Add lmdeploy DeepSeek configs * update max out length	2024-11-01 15:34:23 +08:00
bittersweet1999	a0853c939d	[Add] Add CompassArenaSubjectiveBench (#1645 ) * fix pip version * fix pip version * add compassarenasubjectivebench * add compassarenasubjectivebench * add compassarenabench	2024-11-01 13:52:22 +08:00
Linchen Xiao	5212ffe8e2	[Update] Add new model configs (#1653 )	2024-10-30 17:24:53 +08:00
Linchen Xiao	df57c08ccf	[Feature] Update Models, Summarizers (#1600 )	2024-10-29 18:37:15 +08:00
Linchen Xiao	d91d66792a	[Update] Update Needlebench OSS path (#1651 )	2024-10-29 18:05:44 +08:00
Chang Lan	46affab882	[Fix] Fix ruler_16k_gen (#1643 )	2024-10-29 17:58:43 +08:00
Linchen Xiao	8172af49bb	[Update] Update wildbench max_seq_len (#1648 ) * [Update] Wildbench max_seq_len update * [Update] Wildbench max_seq_len update	2024-10-29 13:21:31 +08:00
Junnan Liu	645c5f3b2c	[Datasets] Add datasets CMO&AIME (#1610 ) * add datasets cmo&aime * delete unused modules * modify prompt * update __init__ * update data load and add README * update data load * update performance * update md5 * remove indents * add indent * fix log for debug mode	2024-10-28 18:08:02 +08:00
Linchen Xiao	9c39cb68d4	[Bump] Bump version to 0.3.4 (#1639 )	2024-10-25 20:10:16 +08:00
Linchen Xiao	a61e8a0803	[Update] Internal humaneval add (#1641 ) * [Update] internal_humaneval_add * update	2024-10-25 19:08:42 +08:00
Songyang Zhang	84be90669b	[Update] Fix issue of *_param.py, avoid name conflict;add keep_tmp_file flag to support keep the temp config file. (#1640 )	2024-10-25 16:39:25 +08:00
BigDong	2542bc6907	[Feature] Support results saving as md format table (#1638 )	2024-10-25 15:50:33 +08:00
Linchen Xiao	22fdea4bf2	[Update] Update DLC runner (#1637 )	2024-10-24 21:36:16 +08:00
Lyu Han	fb12c3f98a	[Update] strip stop_words (#1635 )	2024-10-24 20:39:20 +08:00
Linchen Xiao	662dddf41a	[Update] Add internal humaneval postprocess (#1636 )	2024-10-24 17:45:21 +08:00
Linchen Xiao	be3c06a158	[Fix] Update common summarizer regex extraction (#1631 )	2024-10-22 14:35:45 +08:00
Chang Lan	a927bba1cf	[Fix] Fix RULER datasets (#1628 ) We need to ensure that we don't import anything that ends with "_datasets", or they will be picked up by the runner, leading to duplicate / unwanted datasets being evaluated.	2024-10-22 11:59:02 +08:00
Songyang Zhang	a4d5a6c81b	[Feature] Support LiveCodeBench (#1617 ) * Update * Update LCB * Update * Update * Update * Update * Update	2024-10-21 20:50:39 +08:00
Chenguang Li	5868d5afa4	[Bug] Fix-NPU-Support (#1618 ) * bugfix NPU support * formatting --------- Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2024-10-21 17:42:53 +08:00
liushz	500b44ba2d	[Fix] gpqa_few_shot_ppl prompt bug (#1627 )	2024-10-21 16:59:06 +08:00
Linchen Xiao	096c347e7d	[Fix] Qwen 2.5 model config (#1626 ) * [Fix] Fix Qwen 2.5 model config * [Fix] Fix Qwen 2.5 model config * [Fix] Fix Qwen 2.5 model config	2024-10-21 16:58:18 +08:00
bittersweet1999	a11e2b2fd4	[Fix] Compatible with old versions (#1616 ) * fix pip version * fix pip version * Compatible with old versions * compati old version * compati old version * compati old version * update configs	2024-10-21 10:16:29 +08:00
Lyu Han	6e8adf5221	[Bug] Remove prefix bos_token from messages when using lmdeploy as the accelerator (#1623 ) * remove prefix bos_token from messages when using lmdeploy as the accelerator * update	2024-10-19 20:03:47 +08:00
Bob Tsang	dd0b655bd0	[Feature] Support MMMLU & MMMLU-lite Benchmark (#1565 ) * rm folder * modify format according to reviewer * modify format according to reviewer * modify format according to reviewer * add some files requirement * fix some bug * fix bug * change load type * Update MMMLU Dataset * Update MMMLU Dataset * Add MMMLU-Lite Dataset * update MMMMLU datast * update MMMMLU datast * update MMMMLU datast --------- Co-authored-by: BobTsang <BobTsang1995@gmail.com> Co-authored-by: liushz <qq1791167085@163.com>	2024-10-17 19:09:34 +08:00
bittersweet1999	f0d436496e	[Update] update docs and add compassarena (#1614 ) * fix pip version * fix pip version * update docs and add compassarena * update docs	2024-10-17 14:39:06 +08:00
Haoran Que	4fe251729b	Upload HelloBench (#1607 ) * upload hellobench * update hellobench * update readme.md * update eval_hellobench.py * update lastest --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-10-15 17:11:37 +08:00
bittersweet1999	fa54aa62f6	[Feature] Add Judgerbench and reorg subeval (#1593 ) * fix pip version * fix pip version * update (#1522) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> * [Feature] Update Models (#1518) * Update Models * Update * Update humanevalx * Update * Update * [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) add judgerbench and reorg sub add judgerbench and reorg subeval add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval * add judgerbench and reorg subeval --------- Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com> Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn> Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>	2024-10-15 16:36:05 +08:00
x54-729	2b1afa7d1e	[Fix] fix interntrain's tokenizer truncate (#1605 ) Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>	2024-10-15 16:03:57 +08:00
Linchen Xiao	f390697a5e	[Fix] Update dlc runner python env (#1604 )	2024-10-14 15:50:21 +08:00
Lyu Han	4fde41036f	[Feature] Update TurboMindModel by integrating lmdeploy pipeline API (#1556 ) * integrate lmdeploy's pipeline api * fix linting * update user guide * rename * update * update * update * rollback class name * update * remove unused code * update * update * use pipeline * fix ci check * compatibility * compatibility * remove concurrency * update * fix table content * update	2024-10-14 15:33:40 +08:00
liushz	5faee929db	[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config (#1589 ) * Add GaoKaoMath Dataset * Add MATH LLM Eval * Update GAOKAO Math Eval Dataset * Update GAOKAO Math Eval Dataset	2024-10-12 19:13:06 +08:00
bittersweet1999	3f7a3730d7	[Fix] fix Flames (#1599 ) * fix pip version * fix pip version * fix flames * fix flames	2024-10-12 14:34:59 +08:00
Lyu Han	b52ba65c26	[Feature] Integrate lmdeploy pipeline api (#1198 ) * integrate lmdeploy's pipeline api * fix linting * update user guide * rename * update * update * update * rollback class name * update * remove unused code * update * update * fix ci check * compatibility * remove concurrency * Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py * Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md * [Bug] fix lint --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-10-09 22:58:06 +08:00
x54-729	4d6349dfe1	[FIX] fix interntrain get_loglikelihood (#1584 )	2024-10-08 11:34:04 +08:00
Linchen Xiao	22a4e76511	[BUMP] Bump version to 0.3.3 (#1581 )	2024-09-30 16:57:41 +08:00
x54-729	bbdca5eb4c	[BUG] Fix eos token handling and add comments for InternTrain (#1569 ) Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>	2024-09-30 15:46:06 +08:00
Linchen Xiao	763d7755b6	[BUG]GaokaoBench dataset fix (#1583 )	2024-09-30 15:13:26 +08:00
shijinpjlab	7528b8ab8a	[Feature] Add dingo test (#1529 ) * add qa dingo * update * change name qa to dingo * eval model: llm_base * update path * change name and move path * add eval_dingo * update import * add for pip * add dingo package * change import place * update import place * fix lint fail * isort * double quoted --------- Co-authored-by: sj <shijin@pjlab.org.cn>	2024-09-29 19:24:58 +08:00
Yi Ding	85a28874aa	[BUG]: Fix Bailing API configs (#1570 )	2024-09-27 11:56:57 +08:00
Songyang Zhang	e8437db98f	[Feature] Update BailingLM/OpenAI verbose (#1568 ) * [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI * Update * [Feature] Update API * Update	2024-09-27 11:15:25 +08:00
Songyang Zhang	7d50294117	[Feature] Update Bailing (#1567 ) * [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI * Update * Update * Update	2024-09-26 18:56:17 +08:00
Songyang Zhang	a7bacfdf7e	[Feature] Update CoreBench 2.0 (#1566 ) * [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI * Update * Update	2024-09-26 18:44:00 +08:00
Yi Ding	3f833186dc	[Feature] Support the reasoning from BaiLing LLM (#1541 ) * [Feature] Support the reasoning from BaiLing LLM This commit includes the access to BaiLing LLM and gets the reasoning. * Add the api example The example of evalute bailing api * Revise the generation arguments Based on current experiment, we update some generation arguments for better reasoning * [fix] set the batch size * Retry under flowcontrol of serverside * add dependent package into requirement.txt add dependent package retrying to clean up the pre-comment check. * correct the file names and make the file copy correct the file names. copy the files under configs to opencompass * fix the lint issue --------- Co-authored-by: christopher.dy <christopher.dy@antgroup.com>	2024-09-26 16:49:52 +08:00
Linchen Xiao	80cda1980e	[BUG] fix followbench dataset config (#1564 ) * [BUG] fix followbench dataset config * [BUG] fix followbench dataset config	2024-09-25 20:58:34 +08:00
zhulinJulia24	87df8a73a3	[CI] add a common summarizer for qabench summarizer (#1545 ) * update * update * update --------- Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-09-25 13:40:47 +08:00
Linchen Xiao	c3fb9065db	[Feature] Add dlc sleep time (#1562 )	2024-09-25 11:53:48 +08:00
liushz	83eeb52b09	[Feature] Update WikiBench base model config (#1553 ) * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update GPQA & MMLU_Pro * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & Math base config * Update WikiBench base model config --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-25 11:26:36 +08:00
Songyang Zhang	e7681943f3	[Feature] Update the max_out_len for many models (#1559 )	2024-09-24 21:52:28 +08:00
bittersweet1999	a2e9bc0c41	[Fix] fix duplicate error in partitioner (#1552 ) * fix pip version * fix pip version * fix duplicate error in paritioner * fix duplicate error in paritioner	2024-09-23 19:45:21 +08:00
x54-729	335667183a	[Feature] Add Interntrain model support (#1548 ) Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>	2024-09-23 19:10:26 +08:00
klein	24915aeb3f	[BUG] Update CIbench config(#1544 ) * BUG: Update cibench.py * BUG: Update cibench.py	2024-09-23 18:32:27 +08:00
liushz	a0cfd61129	[Feature] Update MathBench & Math base model config (#1550 ) * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update GPQA & MMLU_Pro * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & Math base config --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-23 14:03:59 +08:00
Songyang Zhang	ee058e25b2	[Feature] Support verbose for OpenAI API (#1546 )	2024-09-20 17:12:52 +08:00
hailsham	a81bbb85bf	[FIX] Added handling for the "begin section" in meta_template to APITemplateParser (#1405 ) Co-authored-by: leifei <nuuooo@icloud.com>	2024-09-19 18:12:04 +08:00
Songyang Zhang	5a27c2bd6f	[Model] Support Qwen2.5 Instruct (#1543 )	2024-09-19 16:16:07 +08:00
Songyang Zhang	be460fbb21	[Feature] Support OpenAI O1 models (#1539 ) * [Feature] Support OpenAI O1 models * Update README.md --------- Co-authored-by: liushz <qq1791167085@163.com>	2024-09-18 22:41:17 +08:00
liushz	2e9db77d57	[Feature] Add custom model postprocess function (#1519 ) Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-18 14:40:51 +08:00
liushz	c9a7026f59	[Feature] Update MathBench & WikiBench for FullBench (#1521 ) * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update GPQA & MMLU_Pro * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench * Update MathBench & WikiBench for FullBench --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-18 14:35:30 +08:00
Linchen Xiao	90279b6461	[Feature] Dataset prompts update for ARC, BoolQ, Race (#1527 )	2024-09-13 10:30:43 +08:00
Songyang Zhang	6997990c93	[Feature] Update Models (#1518 ) * Update Models * Update * Update humanevalx * Update * Update	2024-09-12 23:35:30 +08:00
zhulinJulia24	3754dc1b67	update (#1522 ) Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-09-12 15:00:52 +08:00
bittersweet1999	7c7fa36235	[Feature] add support for internal Followbench (#1511 ) * fix pip version * fix pip version * add internal followbench * add internal followbench * fix lint * fix lint	2024-09-11 13:32:34 +08:00
Linchen Xiao	317763381c	update (#1517 )	2024-09-11 13:31:20 +08:00
bittersweet1999	c2bcd8725e	[Fix] Fix wildbench (#1508 ) * fix pip version * fix pip version * fix_wildbench	2024-09-10 17:35:07 +08:00
Alexander Lam	a31a77c5c1	[Feature] Add SciCode summarizer config (#1514 ) * [Feature] added SciCode summarizer config and dataset config for with background evaluation * fix lint issues * removed unnecessary type in summarizer group	2024-09-10 16:06:02 +08:00
Linchen Xiao	b5f8afb57b	[Bump] Bump version to 0.3.2.post1	2024-09-06 19:09:30 +08:00
Linchen Xiao	f04f3546bc	[Fix] Import fix (#1500 )	2024-09-06 18:29:24 +08:00
Linchen Xiao	ff18545f0e	[Bump] Bump version to 0.3.2 (#1497 )	2024-09-06 16:10:45 +08:00
Linchen Xiao	87ffa71d68	[Feature] Longbench dataset update	2024-09-06 15:50:12 +08:00
Albert Yan	928d0cfc3a	[Feature] Add support for Rendu API (#1468 ) * Add support for Rendu API * fix lint issue * fix lint issue * fix lint issue * Update --------- Co-authored-by: 13190 <zeyu.yan@transn.com> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-09-06 01:00:43 +08:00
Hari Seldon	faf5260155	[Feature] Optimize Evaluation Speed of SciCode (#1489 ) * update scicode * update comments * remove redundant variable * Update --------- Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-09-06 00:59:41 +08:00
liushz	00fc8da5be	[Feature] Add model postprocess function (#1484 ) * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function * Add model postprocess function --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-09-05 21:10:29 +08:00
Maxime SHE	45efdc994d	[Feature] Add an attribute api_key into TurboMindAPIModel default None (#1475 ) Co-authored-by: Maxime <maximeshe@163.com> Add an attribute api_key into TurboMindAPIModel default None then we can set the api_key while using lmdeploy to deploy the llm model	2024-09-05 17:51:16 +08:00
Linchen Xiao	6c9cd9a260	[Feature] Needlebench auto-download update (#1480 ) * update * update * update	2024-09-05 17:22:42 +08:00
zhulinJulia24	716d46e1f5	[ci] fix badcase and add env info (#1491 ) * update * update --------- Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-09-05 16:43:45 +08:00
zhulinJulia24	fb6a0df652	[ci] fix test env for vllm and add vllm baselines (#1481 ) * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update --------- Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>	2024-09-04 19:24:09 +08:00
Linchen Xiao	da74cbfa39	[Fix] Model configs update	2024-09-04 18:57:10 +08:00
Linchen Xiao	9693be46b7	[Feature] Mmlu-pro auto-download (#1464 ) * update * update * update * update * update	2024-08-30 10:03:40 +08:00
Alexander Lam	8b39225259	[Feature] Added `extra_body` support for OpenAISDK; Added support for proxy URL when connecting to OpenAI's API. (#1467 ) * fix lint issues * fix lint issues	2024-08-29 00:43:43 +08:00
Guoli Yin	a488b9b4f5	[Feature] Make OPENAI_API_BASE compatible with openai default env (#1461 ) * Make OPENAI_API_BASE compatible with openai default env * Make OPENAI_API_BASE compatible with openai default env --------- Co-authored-by: Guoli Yin <gyin@icloud.com>	2024-08-28 23:14:41 +08:00
Songyang Zhang	e5a8eb2283	[Feature] Update Lint and Leaderboard (#1458 ) * [Feature] Update Lint and Leaderboard * Update * Update	2024-08-28 22:36:42 +08:00
Linchen Xiao	245664f4c0	[Feature] Fullbench v0.1 language update (#1463 ) * update * update * update * update	2024-08-28 14:01:05 +08:00
CHEN PENGAN	463231c651	[Feature] Add icl_sliding_k_retriever.py and update __init__.py (#1305 ) * Add icl_sliding_k_retriever.py and update __init__.py * Fix flake8, isort, and yapf issues for Sliding Window Retriever	2024-08-23 17:18:31 +08:00
Linchen Xiao	94b6bd65fc	[Fix] Fix cli evaluation for multiple models (#1454 ) * update * update	2024-08-23 17:15:36 +08:00
Songyang Zhang	5485207fbe	[Bump] Bump version to 0.3.1 (#1450 ) * [Bump] Bump version 0.3.1 * Update	2024-08-23 10:47:57 +08:00
Songyang Zhang	7c2d25b557	[Fix] Update SciCode and Gemma model (#1449 ) * [Fix] Update SciCode and Gemma model * Update * Update	2024-08-23 10:42:27 +08:00
Xu Song	ad3931aa32	Update openicl_infer.py (#1308 )	2024-08-23 10:39:22 +08:00
liushz	9fdbc744dc	[Fix] Update option postprocess & mathbench language summarizer (#1413 ) * Update option postprocess & mathbench language summarizer * Update option postprocess & mathbench language summarizer --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-08-22 14:49:07 +08:00
Linchen Xiao	0fe9756c5d	[Doc] Update Readme (#1439 ) * update * update * update * update * update * update * update * update * update * update * update * update	2024-08-22 14:48:45 +08:00
Hari Seldon	14b4b735cb	[Feature] Add support for SciCode (#1417 ) * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode * add SciCode w/ bg * add scicode * Update README.md * Update README.md * Delete configs/eval_SciCode.py * rename * 1 * rename * Update README.md * Update scicode.py * Update scicode.py * fix some bugs * Update * Update --------- Co-authored-by: root <HariSeldon0> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-08-22 13:42:25 +08:00
liushz	d3963bceae	[Bug] Add model support for 'huggingface_above_v4_33' when using '-a' (#1430 ) Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-08-22 13:40:24 +08:00
seetimee	ac093fce53	[Update] Update openai_api.py (#1438 ) Most models' token limits are above 32k. It will fix long context dataset test bug of skiping some data.	2024-08-21 18:57:49 +08:00
liushz	e076dc5acf	[Fix] Fix openai api tiktoken bug for api server (#1433 ) * Fix openai api tiktoken * Fix openai api tiktoken --------- Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>	2024-08-20 22:02:14 +08:00
Linchen Xiao	a4b54048ae	[Feature] Add Ruler datasets (#1310 ) * [Feature] Add Ruler datasets * pre-commit fixed * Add model specific tokenizer to dataset * pre-commit modified * remove unused import * fix linting * add trust_remote to tokenizer load * lint fix * comments resolved * fix lint * Add readme * Fix lint * ruler refactorize * fix lint * lint fix * updated * lint fix * fix wonderwords import issue * prompt modified * update * readme updated * update * ruler dataset added * Update --------- Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-08-20 11:40:11 +08:00
Xu Song	99b5122ed5	[Feature] Add abbr for rolebench dataset (#1431 ) * Add abbr for rolebench dataset * add	2024-08-20 11:22:48 +08:00
Linchen Xiao	ecf9bb3e4c	[Bug] Commonsenseqa dataset fix (#1425 ) * longbench dataset load fix * update * Update * Update * Update * update * update --------- Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>	2024-08-16 15:54:07 +08:00
Songyang Zhang	9b3613f10b	[Update] Support auto-download of FOFO/MT-Bench-101 (#1423 ) * [Update] Support auto-download of FOFO/MT-Bench-101 * Update wildbench	2024-08-16 11:57:41 +08:00
bittersweet1999	ce7f4853ce	[Fix] Sub summarizer order fix (#1426 ) * fix pip version * fix pip version * fix sub summarizer order * fix order	2024-08-15 21:08:18 +08:00
Linchen Xiao	2596f226f4	[Fix] longbench dataset load fix (#1422 )	2024-08-15 11:30:30 +08:00
Linchen Xiao	8e55c9c6ee	[Update] Compassbench v1.3 (#1396 ) * stash files * compassbench subjective evaluation added * evaluation update * fix lint * update docs * Update lint * changes saved * changes saved * CompassBench subjective summarizer added (#1349) * subjective summarizer added * fix lint [Fix] Fix MathBench (#1351) Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> [Update] Update model support list (#1353) * fix pip version * fix pip version * update model support subjective summarizer updated knowledge, math objective done (data need update) remove secrets objective changes saved knowledge data added * secrets removed * changed added * summarizer modified * summarizer modified * compassbench coding added * fix lint * objective summarizer updated * compass_bench_v1.3 updated * update files in config folder * remove unused model * lcbench modified * removed model evaluation configs * remove duplicated sdk implementation --------- Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-08-12 19:09:19 +08:00
changyeyu	59586a8b4a	[Feature] Enable Truncation of Mid-Section for Long Prompts in `huggingface_above_v4_33.py` (#1373 ) * Retain the first and last halves of the tokens from the prompt, discarding the middle, to avoid exceeding the model's maximum length. * Add default parameter: mode * Modified a comment. * Modified variable names. * fix yapf lint	2024-08-09 11:36:30 +08:00
Songyang Zhang	88eb91219b	[Doc] Update README (#1404 ) * [Doc] Update README * Update	2024-08-08 16:18:33 +08:00
yaoyingyy	decb621ff6	[Fix] the issue where scores are negative in the Lawbench dataset evaluation(#1402 ) (#1403 )	2024-08-08 16:08:26 +08:00
Yunlin Mao	818d72a650	[Fix] modelscope dataset load problem (#1406 ) * fix modelscope dataset load * fix lint	2024-08-08 14:01:06 +08:00
Songyang Zhang	264fd23129	[Bump] Bump version for v0.3.0 (#1398 )	2024-08-07 01:25:24 +08:00
Songyang Zhang	fed1a4998b	[Fix] Fix CaLM import (#1395 )	2024-08-06 12:17:45 +08:00
Songyang Zhang	c81329b548	[Fix] Fix Slurm ENV (#1392 ) 1. Support Slurm Cluster 2. Support automatic data download 3. Update InternLM2.5-1.8B/20B-Chat	2024-08-06 01:35:20 +08:00
Songyang Zhang	c09fc79ba8	[Feature] Support OpenAI ChatCompletion (#1389 ) * [Feature] Support import configs/models/summarizers from whl * Update * Update openai sdk * Update * Update gemma	2024-08-01 19:10:13 +08:00
Peng Bo	07c96ac659	Calm dataset (#1385 ) * Add CALM Dataset	2024-08-01 10:03:21 +08:00
Songyang Zhang	46cc7894e1	[Feature] Support import configs/models/summarizers from whl (#1376 ) * [Feature] Support import configs/models/summarizers from whl * Update LCBench configs * Update * Update * Update * Update * update * Update * Update * Update * Update * Update	2024-08-01 00:42:48 +08:00
Songyang Zhang	33ceaa0eb8	[Bug] Fix bug in turbomind (#1377 )	2024-07-30 09:37:50 +08:00
Songyang Zhang	eee5a5be23	[Fix] Update get_data_path for LCBench and HumanEval (#1375 )	2024-07-29 19:28:09 +08:00
Songyang Zhang	704853e5e7	[Feature] Update pip install (#1324 ) * [Feature] Update pip install * Update Configuration * Update * Update * Update * Update Internal Config * Update collect env	2024-07-29 18:32:50 +08:00
Xingjun.Wang	edab1c07ba	[Feature] Support ModelScope datasets (#1289 ) * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * udpate dataset for modelscope support * update readme * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * update readme * remove tydiqa japanese subset * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * update readme * udpate dataset for modelscope support * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * remove tydiqa japanese subset * update util * remove .DS_Store * fix md format * move util into package * update docs/get_started.md * restore eval_api_zhipu_v2.py, add environment setting * Update dataset * Update * Update * Update * Update --------- Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local> Co-authored-by: Yunnglin <mao.looper@qq.com> Co-authored-by: Yun lin <yunlin@laptop.local> Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn> Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-07-29 13:48:32 +08:00
jxd	12b84aeb3b	[Feature] Update CHARM Memeorziation (#1230 ) * update gemini api and add gemini models * add openai models * update CHARM evaluation * add CHARM memorization tasks * add CharmMemSummarizer (output eval details for memorization-independent reasoning analysis * update CHARM readme --------- Co-authored-by: wujiang <wujiang@pjlab.org.cn>	2024-07-26 18:42:30 +08:00
bittersweet1999	d3782c1d47	Revert "Calm dataset (#1287 )" (#1366 ) This reverts commit `edd0ffdf70`.	2024-07-26 18:27:29 +08:00
Peng Bo	edd0ffdf70	Calm dataset (#1287 ) * add calm dataset * modify config max_out_len * update README * Modify README * update README * update README * update README * update README * update README * add summarizer and modify readme * delete summarizer config comment * update summarizer * modify same response to all questions * update README	2024-07-26 11:48:16 +08:00
mqy004	a08931f214	[Fix] origin_prompt should be None in llm-compression task (#1225 ) Co-authored-by: Qinyang Mou <qinyang_mou@intsig.net>	2024-07-26 11:46:02 +08:00
LeavittLang	8ee7fecb68	Adding support for Doubao API (#1218 ) * Adding support for Doubao API * Update doubao_api.py Fixed the bug that the connection would be retried even if it was normal. * Update doubao_api.py --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-07-26 11:44:51 +08:00
klein	65fad8e2ac	[Fix] minor update wildbench (#1335 ) * update crb * update crbbench * update crbbench * update crbbench * minor update wildbench * [Fix] Update doc of wildbench, and merge wildbench into subjective * [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench * Update crb.md * Update crb_pair_judge.py * Update crb_single_judge.py * Update subjective_evaluation.md * Update openai_api.py * [Update] update wildbench readme * [Update] update wildbench readme * [Update] update wildbench readme, remove crb * Delete configs/eval_subjective_wildbench_pair.py * Delete configs/eval_subjective_wildbench_single.py * Update __init__.py --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-07-26 11:19:04 +08:00
baymax591	51a94aee01	[Bug] fix bug: delete & (#1365 ) Co-authored-by: 白超 <baichao19@huawei.com>	2024-07-26 11:03:55 +08:00
Mo Li	69aa2f2d57	[Feature] Make NeedleBench available on HF (#1364 ) * update_lint * update_huggingface format * fix bug * update docs	2024-07-25 19:01:56 +08:00
Fengzhe Zhou	c3c02c2960	update docs (#1318 ) * update docs * 高效评测 -> 数据分片 * update * update * Update faq.md --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>	2024-07-25 18:44:25 +08:00
heya5	73aa55af6d	[Fix] Support HF models deployed with an OpenAI-compatible API. (#1352 ) * Support HF models deployed with an OpenAI-compatible API. * resolve lint issue * add extra_body arguments There are many other arguments when using openi-compatiable API like this: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters-for-chat-api * fix linting issue * fix yapf linting issue	2024-07-25 18:38:23 +08:00
WANG WENJIN	0aad8199c7	Fix the summary error in subjective.py (#1363 )	2024-07-25 18:36:13 +08:00
Linchen Xiao	8127fc3518	CompassBench subjective summarizer added (#1349 ) * subjective summarizer added * fix lint	2024-07-23 12:29:57 +08:00
Que Haoran	a244453d9e	[Feature] Support inference ppl datasets (#1315 ) * commit inference ppl datasets * revised format * revise * revise * revise * revise * revise * revise	2024-07-22 17:59:30 +08:00
liushz	98c58f8a6c	[Feature] Add compassbench knowledge&math part (#1342 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Update acclerator * Update MathBench * Update accelerator * Add Doc for accelerator * Add Doc for accelerator * Add Doc for accelerator * Add Doc for accelerator * Update compassbench august wiki&math * Update compassbench august wiki&math * Update compassbench august wiki&math * Update compassbench_aug_gen_068af0.py * Update compassbench_aug_gen_068af0.py * Update --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn> Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>	2024-07-19 22:54:46 +08:00
bittersweet1999	1f9f728f22	[Feature] support compassbench Checklist evaluation (#1339 ) * fix pip version * fix pip version * support checklist eval * init * add lan * fix typo	2024-07-19 16:40:44 +08:00
Mo Li	f40add2596	[Fix] Fix lint (#1334 ) * update needlebench docs * update model_name_mapping dict * update README * fix_lint	2024-07-18 17:15:06 +08:00
Xu Song	1bfb4217ff	Fix typing and typo (#1331 )	2024-07-18 13:41:24 +08:00
Mo Li	104bddf647	[Doc] Update NeedleBench Docs (#1330 ) * update needlebench docs * update model_name_mapping dict * update README * Update README_zh-CN.md --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>	2024-07-18 13:16:19 +08:00
bittersweet1999	8e7ad2e981	[Fix] add bc for alignbench summarizer (#1306 ) * fix pip version * fix pip version * fix alignbench * fix import error	2024-07-12 11:06:20 +08:00
Fengzhe Zhou	62f55987f1	force register (#1311 )	2024-07-11 19:59:35 +08:00
Fengzhe Zhou	a62c613d3e	[Sync] bump version 0.2.6+local (#1294 )	2024-07-06 00:44:06 +08:00
Fengzhe Zhou	1d3a26c732	[Doc] quick start swap tabs (#1263 ) * [doc] quick start swap tabs * update docs * update * update * update * update * update * update * update	2024-07-05 23:51:42 +08:00
bittersweet1999	68ca48496b	[Refactor] Reorganize subjective eval (#1284 ) * fix pip version * fix pip version * reorganize subjective eval * reorg sub * reorg subeval * reorg subeval * update subjective doc * reorg subeval * reorg subeval	2024-07-05 22:11:37 +08:00
baymax591	28eba6fe34	npu适配 (#1250 ) * npu适配 * Add suport for Ascend NPU * format --------- Co-authored-by: baymax591 <14428251+baymax591@user.noreply.gitee.com> Co-authored-by: Leymore <zfz-960727@163.com>	2024-07-03 18:55:19 +08:00
Fengzhe Zhou	a32f21a356	[Sync] Sync with internal codes 2024.06.28 (#1279 )	2024-06-28 14:16:34 +08:00
Xingyuan Bu	842fb1cd70	Update mtbench101.py (#1276 ) fix wrong-used import from torch.utils.data import DataLoader, Dataset	2024-06-26 00:40:22 +08:00
klein	1fa62c4a42	Support wildbench (#1266 ) Co-authored-by: Leymore <zfz-960727@163.com>	2024-06-24 13:16:27 +08:00
bittersweet1999	982e024540	[Feature] add dataset Fofo (#1224 ) * add fofo dataset * add dataset fofo	2024-06-06 11:40:48 +08:00
Xingyuan Bu	02a0a4e857	MT-Bench-101 (#1215 ) * add mt-bench-101 * add readme and requirements * add mt-bench-101 data * Update readme_mtbench101.md * update readme * update leaderboard * fix typo * Update readme_mtbench101.md * fit newest opencompass * update readme.md * mtbench101 to opencompass * mtbench101 to opencompass * for code review * for code review * for code review * hook * hook --------- Co-authored-by: liujie <ljie@buaa.edu.cn>	2024-06-03 14:52:12 +08:00
mqy004	b272803d8a	解决release版本安装后不能导入opencompass.cli.main的问题 (#1221 ) * Create __init__.py * Create __init__.py * Create __init__.py * Create __init__.py * Create __init__.py * Create __init__.py * format --------- Co-authored-by: Leymore <zfz-960727@163.com>	2024-05-31 13:23:33 +08:00
bittersweet1999	7c381e5be8	[Fix] fix summarizer (#1217 ) * fix summarizer * fix summarizer	2024-05-31 11:40:47 +08:00
Fengzhe Zhou	a77b8a5cec	[Sync] format (#1214 )	2024-05-30 00:21:58 +08:00
Fengzhe Zhou	d656e818f8	[Docs] Remove --no-batch-padding and Use --hf-num-gpus (#1205 ) * [Docs] Remove --no-batch-padding and Use -hf-num-gpus * update	2024-05-29 16:30:10 +08:00
Fengzhe Zhou	2954913d9b	[Sync] bump version (#1204 )	2024-05-28 23:09:59 +08:00
liushz	ba620c4afe	Update accelerator (#1195 ) * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Add Math Evaluation with Judge Model Evaluator * Fix Llama-3 meta template * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Fix MATH with JudgeLM Evaluation * Update acclerator * Update MathBench * Update accelerator --------- Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>	2024-05-28 17:17:54 +08:00

1 2 3 4 5 ...

605 Commits