Commit Graph

605 Commits

Author SHA1 Message Date
Songyang Zhang
c789ce5698
[Fix] the automatically download for several datasets (#1652)
* [Fix] the automatically download for several datasets

* Update

* Update

* Update CI
2024-11-01 15:57:18 +08:00
Linchen Xiao
695738a89b
[Update] Add lmdeploy DeepSeek configs (#1656)
* [Update] Add lmdeploy DeepSeek configs

* update max out length
2024-11-01 15:34:23 +08:00
bittersweet1999
a0853c939d
[Add] Add CompassArenaSubjectiveBench (#1645)
* fix pip version

* fix pip version

* add compassarenasubjectivebench

* add compassarenasubjectivebench

* add compassarenabench
2024-11-01 13:52:22 +08:00
Linchen Xiao
5212ffe8e2
[Update] Add new model configs (#1653) 2024-10-30 17:24:53 +08:00
Linchen Xiao
df57c08ccf
[Feature] Update Models, Summarizers (#1600) 2024-10-29 18:37:15 +08:00
Linchen Xiao
d91d66792a
[Update] Update Needlebench OSS path (#1651) 2024-10-29 18:05:44 +08:00
Chang Lan
46affab882
[Fix] Fix ruler_16k_gen (#1643) 2024-10-29 17:58:43 +08:00
Linchen Xiao
8172af49bb
[Update] Update wildbench max_seq_len (#1648)
* [Update] Wildbench max_seq_len update

* [Update] Wildbench max_seq_len update
2024-10-29 13:21:31 +08:00
Junnan Liu
645c5f3b2c
[Datasets] Add datasets CMO&AIME (#1610)
* add datasets cmo&aime

* delete unused modules

* modify prompt

* update __init__

* update data load and add README

* update data load

* update performance

* update md5

* remove indents

* add indent

* fix log for debug mode
2024-10-28 18:08:02 +08:00
Linchen Xiao
9c39cb68d4
[Bump] Bump version to 0.3.4 (#1639) 2024-10-25 20:10:16 +08:00
Linchen Xiao
a61e8a0803
[Update] Internal humaneval add (#1641)
* [Update] internal_humaneval_add

* update
2024-10-25 19:08:42 +08:00
Songyang Zhang
84be90669b
[Update] Fix issue of *_param.py, avoid name conflict;add keep_tmp_file flag to support keep the temp config file. (#1640) 2024-10-25 16:39:25 +08:00
BigDong
2542bc6907
[Feature] Support results saving as md format table (#1638) 2024-10-25 15:50:33 +08:00
Linchen Xiao
22fdea4bf2
[Update] Update DLC runner (#1637) 2024-10-24 21:36:16 +08:00
Lyu Han
fb12c3f98a
[Update] strip stop_words (#1635) 2024-10-24 20:39:20 +08:00
Linchen Xiao
662dddf41a
[Update] Add internal humaneval postprocess (#1636) 2024-10-24 17:45:21 +08:00
Linchen Xiao
be3c06a158
[Fix] Update common summarizer regex extraction (#1631) 2024-10-22 14:35:45 +08:00
Chang Lan
a927bba1cf
[Fix] Fix RULER datasets (#1628)
We need to ensure that we don't import anything that ends with "_datasets",
or they will be picked up by the runner, leading to duplicate / unwanted datasets
being evaluated.
2024-10-22 11:59:02 +08:00
Songyang Zhang
a4d5a6c81b
[Feature] Support LiveCodeBench (#1617)
* Update

* Update LCB

* Update

* Update

* Update

* Update

* Update
2024-10-21 20:50:39 +08:00
Chenguang Li
5868d5afa4
[Bug] Fix-NPU-Support (#1618)
* bugfix NPU support

* formatting

---------

Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-10-21 17:42:53 +08:00
liushz
500b44ba2d
[Fix] gpqa_few_shot_ppl prompt bug (#1627) 2024-10-21 16:59:06 +08:00
Linchen Xiao
096c347e7d
[Fix] Qwen 2.5 model config (#1626)
* [Fix] Fix Qwen 2.5 model config

* [Fix] Fix Qwen 2.5 model config

* [Fix] Fix Qwen 2.5 model config
2024-10-21 16:58:18 +08:00
bittersweet1999
a11e2b2fd4
[Fix] Compatible with old versions (#1616)
* fix pip version

* fix pip version

* Compatible with old versions

* compati old version

* compati old version

* compati old version

* update configs
2024-10-21 10:16:29 +08:00
Lyu Han
6e8adf5221
[Bug] Remove prefix bos_token from messages when using lmdeploy as the accelerator (#1623)
* remove prefix bos_token from messages when using lmdeploy as the accelerator

* update
2024-10-19 20:03:47 +08:00
Bob Tsang
dd0b655bd0
[Feature] Support MMMLU & MMMLU-lite Benchmark (#1565)
* rm folder

* modify format according to reviewer

* modify format according to reviewer

* modify format according to reviewer

* add some files requirement

* fix some bug

* fix bug

* change load type

* Update MMMLU Dataset

* Update MMMLU Dataset

* Add MMMLU-Lite Dataset

* update MMMMLU datast

* update MMMMLU datast

* update MMMMLU datast

---------

Co-authored-by: BobTsang <BobTsang1995@gmail.com>
Co-authored-by: liushz <qq1791167085@163.com>
2024-10-17 19:09:34 +08:00
bittersweet1999
f0d436496e
[Update] update docs and add compassarena (#1614)
* fix pip version

* fix pip version

* update docs and add compassarena

* update docs
2024-10-17 14:39:06 +08:00
Haoran Que
4fe251729b
Upload HelloBench (#1607)
* upload hellobench

* update hellobench

* update readme.md

* update eval_hellobench.py

* update lastest

---------

Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-10-15 17:11:37 +08:00
bittersweet1999
fa54aa62f6
[Feature] Add Judgerbench and reorg subeval (#1593)
* fix pip version

* fix pip version

* update (#1522)

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>

* [Feature] Update Models (#1518)

* Update Models

* Update

* Update humanevalx

* Update

* Update

* [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527)

add judgerbench and reorg sub

add judgerbench and reorg subeval

add judgerbench and reorg subeval

* add judgerbench and reorg subeval

* add judgerbench and reorg subeval

* add judgerbench and reorg subeval

* add judgerbench and reorg subeval

---------

Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-10-15 16:36:05 +08:00
x54-729
2b1afa7d1e
[Fix] fix interntrain's tokenizer truncate (#1605)
Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>
2024-10-15 16:03:57 +08:00
Linchen Xiao
f390697a5e
[Fix] Update dlc runner python env (#1604) 2024-10-14 15:50:21 +08:00
Lyu Han
4fde41036f
[Feature] Update TurboMindModel by integrating lmdeploy pipeline API (#1556)
* integrate lmdeploy's pipeline api

* fix linting

* update user guide

* rename

* update

* update

* update

* rollback class name

* update

* remove unused code

* update

* update

* use pipeline

* fix ci check

* compatibility

* compatibility

* remove concurrency

* update

* fix table content

* update
2024-10-14 15:33:40 +08:00
liushz
5faee929db
[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config (#1589)
* Add GaoKaoMath Dataset

* Add MATH LLM Eval

* Update GAOKAO Math Eval Dataset

* Update GAOKAO Math Eval Dataset
2024-10-12 19:13:06 +08:00
bittersweet1999
3f7a3730d7
[Fix] fix Flames (#1599)
* fix pip version

* fix pip version

* fix flames

* fix flames
2024-10-12 14:34:59 +08:00
Lyu Han
b52ba65c26
[Feature] Integrate lmdeploy pipeline api (#1198)
* integrate lmdeploy's pipeline api

* fix linting

* update user guide

* rename

* update

* update

* update

* rollback class name

* update

* remove unused code

* update

* update

* fix ci check

* compatibility

* remove concurrency

* Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py

* Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md

* [Bug] fix lint

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-10-09 22:58:06 +08:00
x54-729
4d6349dfe1
[FIX] fix interntrain get_loglikelihood (#1584) 2024-10-08 11:34:04 +08:00
Linchen Xiao
22a4e76511
[BUMP] Bump version to 0.3.3 (#1581) 2024-09-30 16:57:41 +08:00
x54-729
bbdca5eb4c
[BUG] Fix eos token handling and add comments for InternTrain (#1569)
Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>
2024-09-30 15:46:06 +08:00
Linchen Xiao
763d7755b6
[BUG]GaokaoBench dataset fix (#1583) 2024-09-30 15:13:26 +08:00
shijinpjlab
7528b8ab8a
[Feature] Add dingo test (#1529)
* add qa dingo

* update

* change name qa to dingo

* eval model: llm_base

* update path

* change name and move path

* add eval_dingo

* update import

* add for pip

* add dingo package

* change import place

* update import place

* fix lint fail

* isort

* double quoted

---------

Co-authored-by: sj <shijin@pjlab.org.cn>
2024-09-29 19:24:58 +08:00
Yi Ding
85a28874aa
[BUG]: Fix Bailing API configs (#1570) 2024-09-27 11:56:57 +08:00
Songyang Zhang
e8437db98f
[Feature] Update BailingLM/OpenAI verbose (#1568)
* [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI

* Update

* [Feature] Update API

* Update
2024-09-27 11:15:25 +08:00
Songyang Zhang
7d50294117
[Feature] Update Bailing (#1567)
* [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI

* Update

* Update

* Update
2024-09-26 18:56:17 +08:00
Songyang Zhang
a7bacfdf7e
[Feature] Update CoreBench 2.0 (#1566)
* [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI

* Update

* Update
2024-09-26 18:44:00 +08:00
Yi Ding
3f833186dc
[Feature] Support the reasoning from BaiLing LLM (#1541)
* [Feature] Support the reasoning from BaiLing LLM

This commit includes the access to BaiLing LLM and gets the reasoning.

* Add the api example

The example of evalute bailing api

* Revise the generation arguments

Based on current experiment, we update some generation arguments for better reasoning

* [fix] set the batch size

* Retry under flowcontrol of serverside

* add dependent package into requirement.txt

add dependent package retrying to clean up the pre-comment check.

* correct the file names and make the file copy

correct the file names.
copy the files under configs to opencompass

* fix the lint issue

---------

Co-authored-by: christopher.dy <christopher.dy@antgroup.com>
2024-09-26 16:49:52 +08:00
Linchen Xiao
80cda1980e
[BUG] fix followbench dataset config (#1564)
* [BUG] fix followbench dataset config

* [BUG] fix followbench dataset config
2024-09-25 20:58:34 +08:00
zhulinJulia24
87df8a73a3
[CI] add a common summarizer for qabench summarizer (#1545)
* update

* update

* update

---------

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-09-25 13:40:47 +08:00
Linchen Xiao
c3fb9065db
[Feature] Add dlc sleep time (#1562) 2024-09-25 11:53:48 +08:00
liushz
83eeb52b09
[Feature] Update WikiBench base model config (#1553)
* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

* Update GPQA & MMLU_Pro

* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

* Update MathBench & Math base config

* Update WikiBench base model config

---------

Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-25 11:26:36 +08:00
Songyang Zhang
e7681943f3
[Feature] Update the max_out_len for many models (#1559) 2024-09-24 21:52:28 +08:00
bittersweet1999
a2e9bc0c41
[Fix] fix duplicate error in partitioner (#1552)
* fix pip version

* fix pip version

* fix duplicate error in paritioner

* fix duplicate error in paritioner
2024-09-23 19:45:21 +08:00
x54-729
335667183a
[Feature] Add Interntrain model support (#1548)
Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>
2024-09-23 19:10:26 +08:00
klein
24915aeb3f
[BUG] Update CIbench config(#1544)
* BUG: Update cibench.py

* BUG: Update cibench.py
2024-09-23 18:32:27 +08:00
liushz
a0cfd61129
[Feature] Update MathBench & Math base model config (#1550)
* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

* Update GPQA & MMLU_Pro

* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

* Update MathBench & Math base config

---------

Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-23 14:03:59 +08:00
Songyang Zhang
ee058e25b2
[Feature] Support verbose for OpenAI API (#1546) 2024-09-20 17:12:52 +08:00
hailsham
a81bbb85bf
[FIX] Added handling for the "begin section" in meta_template to APITemplateParser (#1405)
Co-authored-by: leifei <nuuooo@icloud.com>
2024-09-19 18:12:04 +08:00
Songyang Zhang
5a27c2bd6f
[Model] Support Qwen2.5 Instruct (#1543) 2024-09-19 16:16:07 +08:00
Songyang Zhang
be460fbb21
[Feature] Support OpenAI O1 models (#1539)
* [Feature] Support OpenAI O1 models

* Update README.md

---------

Co-authored-by: liushz <qq1791167085@163.com>
2024-09-18 22:41:17 +08:00
liushz
2e9db77d57
[Feature] Add custom model postprocess function (#1519)
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-18 14:40:51 +08:00
liushz
c9a7026f59
[Feature] Update MathBench & WikiBench for FullBench (#1521)
* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

* Update GPQA & MMLU_Pro

* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

* Update MathBench & WikiBench for FullBench

---------

Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-18 14:35:30 +08:00
Linchen Xiao
90279b6461
[Feature] Dataset prompts update for ARC, BoolQ, Race (#1527) 2024-09-13 10:30:43 +08:00
Songyang Zhang
6997990c93
[Feature] Update Models (#1518)
* Update Models

* Update

* Update humanevalx

* Update

* Update
2024-09-12 23:35:30 +08:00
zhulinJulia24
3754dc1b67
update (#1522)
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-09-12 15:00:52 +08:00
bittersweet1999
7c7fa36235
[Feature] add support for internal Followbench (#1511)
* fix pip version

* fix pip version

* add internal followbench

* add internal followbench

* fix lint

* fix lint
2024-09-11 13:32:34 +08:00
Linchen Xiao
317763381c
update (#1517) 2024-09-11 13:31:20 +08:00
bittersweet1999
c2bcd8725e
[Fix] Fix wildbench (#1508)
* fix pip version

* fix pip version

* fix_wildbench
2024-09-10 17:35:07 +08:00
Alexander Lam
a31a77c5c1
[Feature] Add SciCode summarizer config (#1514)
* [Feature] added SciCode  summarizer config and dataset config for with background evaluation

* fix lint issues

* removed unnecessary type in summarizer group
2024-09-10 16:06:02 +08:00
Linchen Xiao
b5f8afb57b
[Bump] Bump version to 0.3.2.post1 2024-09-06 19:09:30 +08:00
Linchen Xiao
f04f3546bc
[Fix] Import fix (#1500) 2024-09-06 18:29:24 +08:00
Linchen Xiao
ff18545f0e
[Bump] Bump version to 0.3.2 (#1497) 2024-09-06 16:10:45 +08:00
Linchen Xiao
87ffa71d68
[Feature] Longbench dataset update 2024-09-06 15:50:12 +08:00
Albert Yan
928d0cfc3a
[Feature] Add support for Rendu API (#1468)
* Add support for Rendu API

* fix lint issue

* fix lint issue

* fix lint issue

* Update

---------

Co-authored-by: 13190 <zeyu.yan@transn.com>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-09-06 01:00:43 +08:00
Hari Seldon
faf5260155
[Feature] Optimize Evaluation Speed of SciCode (#1489)
* update scicode

* update comments

* remove redundant variable

* Update

---------

Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-09-06 00:59:41 +08:00
liushz
00fc8da5be
[Feature] Add model postprocess function (#1484)
* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

---------

Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-05 21:10:29 +08:00
Maxime SHE
45efdc994d
[Feature] Add an attribute api_key into TurboMindAPIModel default None (#1475)
Co-authored-by: Maxime <maximeshe@163.com>
Add an attribute api_key into TurboMindAPIModel default None then we can set the api_key while using lmdeploy to deploy the llm model
2024-09-05 17:51:16 +08:00
Linchen Xiao
6c9cd9a260
[Feature] Needlebench auto-download update (#1480)
* update

* update

* update
2024-09-05 17:22:42 +08:00
zhulinJulia24
716d46e1f5
[ci] fix badcase and add env info (#1491)
* update

* update

---------

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-09-05 16:43:45 +08:00
zhulinJulia24
fb6a0df652
[ci] fix test env for vllm and add vllm baselines (#1481)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-09-04 19:24:09 +08:00
Linchen Xiao
da74cbfa39
[Fix] Model configs update 2024-09-04 18:57:10 +08:00
Linchen Xiao
9693be46b7
[Feature] Mmlu-pro auto-download (#1464)
* update

* update

* update

* update

* update
2024-08-30 10:03:40 +08:00
Alexander Lam
8b39225259
[Feature] Added extra_body support for OpenAISDK; Added support for proxy URL when connecting to OpenAI's API. (#1467)
* fix lint issues

* fix lint issues
2024-08-29 00:43:43 +08:00
Guoli Yin
a488b9b4f5
[Feature] Make OPENAI_API_BASE compatible with openai default env (#1461)
* Make OPENAI_API_BASE compatible with openai default env

* Make OPENAI_API_BASE compatible with openai default env

---------

Co-authored-by: Guoli Yin <gyin@icloud.com>
2024-08-28 23:14:41 +08:00
Songyang Zhang
e5a8eb2283
[Feature] Update Lint and Leaderboard (#1458)
* [Feature] Update Lint and Leaderboard

* Update

* Update
2024-08-28 22:36:42 +08:00
Linchen Xiao
245664f4c0
[Feature] Fullbench v0.1 language update (#1463)
* update

* update

* update

* update
2024-08-28 14:01:05 +08:00
CHEN PENGAN
463231c651
[Feature] Add icl_sliding_k_retriever.py and update __init__.py (#1305)
* Add icl_sliding_k_retriever.py and update __init__.py

* Fix flake8, isort, and yapf issues for Sliding Window Retriever
2024-08-23 17:18:31 +08:00
Linchen Xiao
94b6bd65fc
[Fix] Fix cli evaluation for multiple models (#1454)
* update

* update
2024-08-23 17:15:36 +08:00
Songyang Zhang
5485207fbe
[Bump] Bump version to 0.3.1 (#1450)
* [Bump] Bump version 0.3.1

* Update
2024-08-23 10:47:57 +08:00
Songyang Zhang
7c2d25b557
[Fix] Update SciCode and Gemma model (#1449)
* [Fix] Update SciCode and Gemma model

* Update

* Update
2024-08-23 10:42:27 +08:00
Xu Song
ad3931aa32
Update openicl_infer.py (#1308) 2024-08-23 10:39:22 +08:00
liushz
9fdbc744dc
[Fix] Update option postprocess & mathbench language summarizer (#1413)
* Update option postprocess & mathbench language summarizer

* Update option postprocess & mathbench language summarizer

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-08-22 14:49:07 +08:00
Linchen Xiao
0fe9756c5d
[Doc] Update Readme (#1439)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update
2024-08-22 14:48:45 +08:00
Hari Seldon
14b4b735cb
[Feature] Add support for SciCode (#1417)
* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode w/ bg

* add scicode

* Update README.md

* Update README.md

* Delete configs/eval_SciCode.py

* rename

* 1

* rename

* Update README.md

* Update scicode.py

* Update scicode.py

* fix some bugs

* Update

* Update

---------

Co-authored-by: root <HariSeldon0>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-22 13:42:25 +08:00
liushz
d3963bceae
[Bug] Add model support for 'huggingface_above_v4_33' when using '-a' (#1430)
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-08-22 13:40:24 +08:00
seetimee
ac093fce53
[Update] Update openai_api.py (#1438)
Most models' token limits are above 32k. It will fix long context dataset test bug of skiping some data.
2024-08-21 18:57:49 +08:00
liushz
e076dc5acf
[Fix] Fix openai api tiktoken bug for api server (#1433)
* Fix openai api tiktoken

* Fix openai api tiktoken

---------

Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-08-20 22:02:14 +08:00
Linchen Xiao
a4b54048ae
[Feature] Add Ruler datasets (#1310)
* [Feature] Add Ruler datasets

* pre-commit fixed

* Add model specific tokenizer to dataset

* pre-commit modified

* remove unused import

* fix linting

* add trust_remote to tokenizer load

* lint fix

* comments resolved

* fix lint

* Add readme

* Fix lint

* ruler refactorize

* fix lint

* lint fix

* updated

* lint fix

* fix wonderwords import issue

* prompt modified

* update

* readme updated

* update

* ruler dataset added

* Update

---------

Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-20 11:40:11 +08:00
Xu Song
99b5122ed5
[Feature] Add abbr for rolebench dataset (#1431)
* Add abbr for rolebench dataset

* add
2024-08-20 11:22:48 +08:00
Linchen Xiao
ecf9bb3e4c
[Bug] Commonsenseqa dataset fix (#1425)
* longbench dataset load fix

* update

* Update

* Update

* Update

* update

* update

---------

Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-16 15:54:07 +08:00
Songyang Zhang
9b3613f10b
[Update] Support auto-download of FOFO/MT-Bench-101 (#1423)
* [Update] Support auto-download of FOFO/MT-Bench-101

* Update wildbench
2024-08-16 11:57:41 +08:00
bittersweet1999
ce7f4853ce
[Fix] Sub summarizer order fix (#1426)
* fix pip version

* fix pip version

* fix sub summarizer order

* fix order
2024-08-15 21:08:18 +08:00
Linchen Xiao
2596f226f4
[Fix] longbench dataset load fix (#1422) 2024-08-15 11:30:30 +08:00
Linchen Xiao
8e55c9c6ee
[Update] Compassbench v1.3 (#1396)
* stash files

* compassbench subjective evaluation added

* evaluation update

* fix lint

* update docs

* Update lint

* changes saved

* changes saved

* CompassBench subjective summarizer added (#1349)

* subjective summarizer added

* fix lint

[Fix] Fix MathBench (#1351)

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>

[Update] Update model support list (#1353)

* fix pip version

* fix pip version

* update model support

subjective summarizer updated

knowledge, math objective done (data need update)

remove secrets

objective changes saved

knowledge data added

* secrets removed

* changed added

* summarizer modified

* summarizer modified

* compassbench coding added

* fix lint

* objective summarizer updated

* compass_bench_v1.3 updated

* update files in config folder

* remove unused model

* lcbench modified

* removed model evaluation configs

* remove duplicated sdk implementation

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-08-12 19:09:19 +08:00
changyeyu
59586a8b4a
[Feature] Enable Truncation of Mid-Section for Long Prompts in huggingface_above_v4_33.py (#1373)
* Retain the first and last halves of the tokens from the prompt, discarding the middle, to avoid exceeding the model's maximum length.

* Add default parameter: mode

* Modified a comment.

* Modified variable names.

* fix yapf lint
2024-08-09 11:36:30 +08:00
Songyang Zhang
88eb91219b
[Doc] Update README (#1404)
* [Doc] Update README

* Update
2024-08-08 16:18:33 +08:00
yaoyingyy
decb621ff6
[Fix] the issue where scores are negative in the Lawbench dataset evaluation(#1402) (#1403) 2024-08-08 16:08:26 +08:00
Yunlin Mao
818d72a650
[Fix] modelscope dataset load problem (#1406)
* fix modelscope dataset load

* fix lint
2024-08-08 14:01:06 +08:00
Songyang Zhang
264fd23129
[Bump] Bump version for v0.3.0 (#1398) 2024-08-07 01:25:24 +08:00
Songyang Zhang
fed1a4998b
[Fix] Fix CaLM import (#1395) 2024-08-06 12:17:45 +08:00
Songyang Zhang
c81329b548
[Fix] Fix Slurm ENV (#1392)
1. Support Slurm Cluster
2. Support automatic data download
3. Update InternLM2.5-1.8B/20B-Chat
2024-08-06 01:35:20 +08:00
Songyang Zhang
c09fc79ba8
[Feature] Support OpenAI ChatCompletion (#1389)
* [Feature] Support import configs/models/summarizers from whl

* Update

* Update openai sdk

* Update

* Update gemma
2024-08-01 19:10:13 +08:00
Peng Bo
07c96ac659
Calm dataset (#1385)
* Add CALM Dataset
2024-08-01 10:03:21 +08:00
Songyang Zhang
46cc7894e1
[Feature] Support import configs/models/summarizers from whl (#1376)
* [Feature] Support import configs/models/summarizers from whl

* Update LCBench configs

* Update

* Update

* Update

* Update

* update

* Update

* Update

* Update

* Update

* Update
2024-08-01 00:42:48 +08:00
Songyang Zhang
33ceaa0eb8
[Bug] Fix bug in turbomind (#1377) 2024-07-30 09:37:50 +08:00
Songyang Zhang
eee5a5be23
[Fix] Update get_data_path for LCBench and HumanEval (#1375) 2024-07-29 19:28:09 +08:00
Songyang Zhang
704853e5e7
[Feature] Update pip install (#1324)
* [Feature] Update pip install

* Update Configuration

* Update

* Update

* Update

* Update Internal Config

* Update collect env
2024-07-29 18:32:50 +08:00
Xingjun.Wang
edab1c07ba
[Feature] Support ModelScope datasets (#1289)
* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* udpate dataset for modelscope support

* update readme

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* update readme

* remove tydiqa japanese subset

* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* update readme

* udpate dataset for modelscope support

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* remove tydiqa japanese subset

* update util

* remove .DS_Store

* fix md format

* move util into package

* update docs/get_started.md

* restore eval_api_zhipu_v2.py, add environment setting

* Update dataset

* Update

* Update

* Update

* Update

---------

Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local>
Co-authored-by: Yunnglin <mao.looper@qq.com>
Co-authored-by: Yun lin <yunlin@laptop.local>
Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-29 13:48:32 +08:00
jxd
12b84aeb3b
[Feature] Update CHARM Memeorziation (#1230)
* update gemini api and add gemini models

* add openai models

* update CHARM evaluation

* add CHARM memorization tasks

* add CharmMemSummarizer (output eval details for memorization-independent reasoning analysis

* update CHARM readme

---------

Co-authored-by: wujiang <wujiang@pjlab.org.cn>
2024-07-26 18:42:30 +08:00
bittersweet1999
d3782c1d47
Revert "Calm dataset (#1287)" (#1366)
This reverts commit edd0ffdf70.
2024-07-26 18:27:29 +08:00
Peng Bo
edd0ffdf70
Calm dataset (#1287)
* add calm dataset

* modify config max_out_len

* update README

* Modify README

* update README

* update README

* update README

* update README

* update README

* add summarizer and modify readme

* delete summarizer config comment

* update summarizer

* modify same response to all questions

* update README
2024-07-26 11:48:16 +08:00
mqy004
a08931f214
[Fix] origin_prompt should be None in llm-compression task (#1225)
Co-authored-by: Qinyang Mou <qinyang_mou@intsig.net>
2024-07-26 11:46:02 +08:00
LeavittLang
8ee7fecb68
Adding support for Doubao API (#1218)
* Adding support for Doubao API

* Update doubao_api.py

Fixed the bug that the connection would be retried even if it was normal.

* Update doubao_api.py

---------

Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-26 11:44:51 +08:00
klein
65fad8e2ac
[Fix] minor update wildbench (#1335)
* update crb

* update crbbench

* update crbbench

* update crbbench

* minor update wildbench

* [Fix] Update doc of wildbench, and merge wildbench into subjective

* [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench

* Update crb.md

* Update crb_pair_judge.py

* Update crb_single_judge.py

* Update subjective_evaluation.md

* Update openai_api.py

* [Update] update wildbench readme

* [Update] update wildbench readme

* [Update] update wildbench readme, remove crb

* Delete configs/eval_subjective_wildbench_pair.py

* Delete configs/eval_subjective_wildbench_single.py

* Update __init__.py

---------

Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-26 11:19:04 +08:00
baymax591
51a94aee01
[Bug] fix bug: delete & (#1365)
Co-authored-by: 白超 <baichao19@huawei.com>
2024-07-26 11:03:55 +08:00
Mo Li
69aa2f2d57
[Feature] Make NeedleBench available on HF (#1364)
* update_lint

* update_huggingface format

* fix bug

* update docs
2024-07-25 19:01:56 +08:00
Fengzhe Zhou
c3c02c2960
update docs (#1318)
* update docs

* 高效评测 -> 数据分片

* update

* update

* Update faq.md

---------

Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-25 18:44:25 +08:00
heya5
73aa55af6d
[Fix] Support HF models deployed with an OpenAI-compatible API. (#1352)
* Support HF models deployed with an OpenAI-compatible API.

* resolve lint issue

* add extra_body arguments

There are many other arguments when using openi-compatiable API like this: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters-for-chat-api

* fix linting issue

* fix yapf linting issue
2024-07-25 18:38:23 +08:00
WANG WENJIN
0aad8199c7
Fix the summary error in subjective.py (#1363) 2024-07-25 18:36:13 +08:00
Linchen Xiao
8127fc3518
CompassBench subjective summarizer added (#1349)
* subjective summarizer added

* fix lint
2024-07-23 12:29:57 +08:00
Que Haoran
a244453d9e
[Feature] Support inference ppl datasets (#1315)
* commit inference ppl datasets

* revised format

* revise

* revise

* revise

* revise

* revise

* revise
2024-07-22 17:59:30 +08:00
liushz
98c58f8a6c
[Feature] Add compassbench knowledge&math part (#1342)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

* Update accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Update compassbench august wiki&math

* Update compassbench august wiki&math

* Update compassbench august wiki&math

* Update compassbench_aug_gen_068af0.py

* Update compassbench_aug_gen_068af0.py

* Update

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 22:54:46 +08:00
bittersweet1999
1f9f728f22
[Feature] support compassbench Checklist evaluation (#1339)
* fix pip version

* fix pip version

* support checklist eval

* init

* add lan

* fix typo
2024-07-19 16:40:44 +08:00
Mo Li
f40add2596
[Fix] Fix lint (#1334)
* update needlebench docs

* update model_name_mapping dict

* update README

* fix_lint
2024-07-18 17:15:06 +08:00
Xu Song
1bfb4217ff
Fix typing and typo (#1331) 2024-07-18 13:41:24 +08:00
Mo Li
104bddf647
[Doc] Update NeedleBench Docs (#1330)
* update needlebench docs

* update model_name_mapping dict

* update README

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-07-18 13:16:19 +08:00
bittersweet1999
8e7ad2e981
[Fix] add bc for alignbench summarizer (#1306)
* fix pip version

* fix pip version

* fix alignbench

* fix import error
2024-07-12 11:06:20 +08:00
Fengzhe Zhou
62f55987f1
force register (#1311) 2024-07-11 19:59:35 +08:00
Fengzhe Zhou
a62c613d3e
[Sync] bump version 0.2.6+local (#1294) 2024-07-06 00:44:06 +08:00
Fengzhe Zhou
1d3a26c732
[Doc] quick start swap tabs (#1263)
* [doc] quick start swap tabs

* update docs

* update

* update

* update

* update

* update

* update

* update
2024-07-05 23:51:42 +08:00
bittersweet1999
68ca48496b
[Refactor] Reorganize subjective eval (#1284)
* fix pip version

* fix pip version

* reorganize subjective eval

* reorg sub

* reorg subeval

* reorg subeval

* update subjective doc

* reorg subeval

* reorg subeval
2024-07-05 22:11:37 +08:00
baymax591
28eba6fe34
npu适配 (#1250)
* npu适配

* Add suport for Ascend NPU

* format

---------

Co-authored-by: baymax591 <14428251+baymax591@user.noreply.gitee.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-07-03 18:55:19 +08:00
Fengzhe Zhou
a32f21a356
[Sync] Sync with internal codes 2024.06.28 (#1279) 2024-06-28 14:16:34 +08:00
Xingyuan Bu
842fb1cd70
Update mtbench101.py (#1276)
fix wrong-used import
from torch.utils.data import DataLoader, Dataset
2024-06-26 00:40:22 +08:00
klein
1fa62c4a42
Support wildbench (#1266)
Co-authored-by: Leymore <zfz-960727@163.com>
2024-06-24 13:16:27 +08:00
bittersweet1999
982e024540
[Feature] add dataset Fofo (#1224)
* add fofo dataset

* add dataset fofo
2024-06-06 11:40:48 +08:00
Xingyuan Bu
02a0a4e857
MT-Bench-101 (#1215)
* add mt-bench-101

* add readme and requirements

* add mt-bench-101 data

* Update readme_mtbench101.md

* update readme

* update leaderboard

* fix typo

* Update readme_mtbench101.md

* fit newest opencompass

* update readme.md

* mtbench101 to opencompass

* mtbench101 to opencompass

* for code review

* for code review

* for code review

* hook

* hook

---------

Co-authored-by: liujie <ljie@buaa.edu.cn>
2024-06-03 14:52:12 +08:00
mqy004
b272803d8a
解决release版本安装后不能导入opencompass.cli.main的问题 (#1221)
* Create __init__.py

* Create __init__.py

* Create __init__.py

* Create __init__.py

* Create __init__.py

* Create __init__.py

* format

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-31 13:23:33 +08:00
bittersweet1999
7c381e5be8
[Fix] fix summarizer (#1217)
* fix summarizer

* fix summarizer
2024-05-31 11:40:47 +08:00
Fengzhe Zhou
a77b8a5cec
[Sync] format (#1214) 2024-05-30 00:21:58 +08:00
Fengzhe Zhou
d656e818f8
[Docs] Remove --no-batch-padding and Use --hf-num-gpus (#1205)
* [Docs] Remove --no-batch-padding and Use -hf-num-gpus

* update
2024-05-29 16:30:10 +08:00
Fengzhe Zhou
2954913d9b
[Sync] bump version (#1204) 2024-05-28 23:09:59 +08:00
liushz
ba620c4afe
Update accelerator (#1195)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

* Update accelerator

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-28 17:17:54 +08:00