Songyang Zhang
c789ce5698
[Fix] the automatically download for several datasets ( #1652 )
...
* [Fix] the automatically download for several datasets
* Update
* Update
* Update CI
2024-11-01 15:57:18 +08:00
bittersweet1999
a0853c939d
[Add] Add CompassArenaSubjectiveBench ( #1645 )
...
* fix pip version
* fix pip version
* add compassarenasubjectivebench
* add compassarenasubjectivebench
* add compassarenabench
2024-11-01 13:52:22 +08:00
Chang Lan
46affab882
[Fix] Fix ruler_16k_gen ( #1643 )
2024-10-29 17:58:43 +08:00
Linchen Xiao
8172af49bb
[Update] Update wildbench max_seq_len ( #1648 )
...
* [Update] Wildbench max_seq_len update
* [Update] Wildbench max_seq_len update
2024-10-29 13:21:31 +08:00
Chang Lan
a927bba1cf
[Fix] Fix RULER datasets ( #1628 )
...
We need to ensure that we don't import anything that ends with "_datasets",
or they will be picked up by the runner, leading to duplicate / unwanted datasets
being evaluated.
2024-10-22 11:59:02 +08:00
Songyang Zhang
a4d5a6c81b
[Feature] Support LiveCodeBench ( #1617 )
...
* Update
* Update LCB
* Update
* Update
* Update
* Update
* Update
2024-10-21 20:50:39 +08:00
liushz
500b44ba2d
[Fix] gpqa_few_shot_ppl prompt bug ( #1627 )
2024-10-21 16:59:06 +08:00
Linchen Xiao
096c347e7d
[Fix] Qwen 2.5 model config ( #1626 )
...
* [Fix] Fix Qwen 2.5 model config
* [Fix] Fix Qwen 2.5 model config
* [Fix] Fix Qwen 2.5 model config
2024-10-21 16:58:18 +08:00
bittersweet1999
1188e1ecf0
[Update] eval_judgerbench.py ( #1625 )
2024-10-21 15:30:29 +08:00
bittersweet1999
a11e2b2fd4
[Fix] Compatible with old versions ( #1616 )
...
* fix pip version
* fix pip version
* Compatible with old versions
* compati old version
* compati old version
* compati old version
* update configs
2024-10-21 10:16:29 +08:00
bittersweet1999
f0d436496e
[Update] update docs and add compassarena ( #1614 )
...
* fix pip version
* fix pip version
* update docs and add compassarena
* update docs
2024-10-17 14:39:06 +08:00
Haoran Que
4fe251729b
Upload HelloBench ( #1607 )
...
* upload hellobench
* update hellobench
* update readme.md
* update eval_hellobench.py
* update lastest
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-10-15 17:11:37 +08:00
bittersweet1999
fa54aa62f6
[Feature] Add Judgerbench and reorg subeval ( #1593 )
...
* fix pip version
* fix pip version
* update (#1522 )
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
* [Feature] Update Models (#1518 )
* Update Models
* Update
* Update humanevalx
* Update
* Update
* [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527 )
add judgerbench and reorg sub
add judgerbench and reorg subeval
add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
---------
Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-10-15 16:36:05 +08:00
liushz
5faee929db
[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config ( #1589 )
...
* Add GaoKaoMath Dataset
* Add MATH LLM Eval
* Update GAOKAO Math Eval Dataset
* Update GAOKAO Math Eval Dataset
2024-10-12 19:13:06 +08:00
bittersweet1999
3f7a3730d7
[Fix] fix Flames ( #1599 )
...
* fix pip version
* fix pip version
* fix flames
* fix flames
2024-10-12 14:34:59 +08:00
Lyu Han
b52ba65c26
[Feature] Integrate lmdeploy pipeline api ( #1198 )
...
* integrate lmdeploy's pipeline api
* fix linting
* update user guide
* rename
* update
* update
* update
* rollback class name
* update
* remove unused code
* update
* update
* fix ci check
* compatibility
* remove concurrency
* Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py
* Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md
* [Bug] fix lint
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-10-09 22:58:06 +08:00
shijinpjlab
7528b8ab8a
[Feature] Add dingo test ( #1529 )
...
* add qa dingo
* update
* change name qa to dingo
* eval model: llm_base
* update path
* change name and move path
* add eval_dingo
* update import
* add for pip
* add dingo package
* change import place
* update import place
* fix lint fail
* isort
* double quoted
---------
Co-authored-by: sj <shijin@pjlab.org.cn>
2024-09-29 19:24:58 +08:00
Songyang Zhang
e8437db98f
[Feature] Update BailingLM/OpenAI verbose ( #1568 )
...
* [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI
* Update
* [Feature] Update API
* Update
2024-09-27 11:15:25 +08:00
Songyang Zhang
a7bacfdf7e
[Feature] Update CoreBench 2.0 ( #1566 )
...
* [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI
* Update
* Update
2024-09-26 18:44:00 +08:00
Yi Ding
3f833186dc
[Feature] Support the reasoning from BaiLing LLM ( #1541 )
...
* [Feature] Support the reasoning from BaiLing LLM
This commit includes the access to BaiLing LLM and gets the reasoning.
* Add the api example
The example of evalute bailing api
* Revise the generation arguments
Based on current experiment, we update some generation arguments for better reasoning
* [fix] set the batch size
* Retry under flowcontrol of serverside
* add dependent package into requirement.txt
add dependent package retrying to clean up the pre-comment check.
* correct the file names and make the file copy
correct the file names.
copy the files under configs to opencompass
* fix the lint issue
---------
Co-authored-by: christopher.dy <christopher.dy@antgroup.com>
2024-09-26 16:49:52 +08:00
Linchen Xiao
80cda1980e
[BUG] fix followbench dataset config ( #1564 )
...
* [BUG] fix followbench dataset config
* [BUG] fix followbench dataset config
2024-09-25 20:58:34 +08:00
Songyang Zhang
fe84bbd9a0
[Feature] Add Config for CoreBench ( #1547 )
...
* [Feature] Add Config for CoreBench
* Update
2024-09-25 11:36:43 +08:00
liushz
83eeb52b09
[Feature] Update WikiBench base model config ( #1553 )
...
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update GPQA & MMLU_Pro
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update MathBench & Math base config
* Update WikiBench base model config
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-25 11:26:36 +08:00
Songyang Zhang
e7681943f3
[Feature] Update the max_out_len for many models ( #1559 )
2024-09-24 21:52:28 +08:00
klein
24915aeb3f
[BUG] Update CIbench config( #1544 )
...
* BUG: Update cibench.py
* BUG: Update cibench.py
2024-09-23 18:32:27 +08:00
liushz
a0cfd61129
[Feature] Update MathBench & Math base model config ( #1550 )
...
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update GPQA & MMLU_Pro
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update MathBench & Math base config
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-23 14:03:59 +08:00
Songyang Zhang
5a27c2bd6f
[Model] Support Qwen2.5 Instruct ( #1543 )
2024-09-19 16:16:07 +08:00
Songyang Zhang
be460fbb21
[Feature] Support OpenAI O1 models ( #1539 )
...
* [Feature] Support OpenAI O1 models
* Update README.md
---------
Co-authored-by: liushz <qq1791167085@163.com>
2024-09-18 22:41:17 +08:00
liushz
2e9db77d57
[Feature] Add custom model postprocess function ( #1519 )
...
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-18 14:40:51 +08:00
liushz
c9a7026f59
[Feature] Update MathBench & WikiBench for FullBench ( #1521 )
...
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update GPQA & MMLU_Pro
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-18 14:35:30 +08:00
Linchen Xiao
90279b6461
[Feature] Dataset prompts update for ARC, BoolQ, Race ( #1527 )
2024-09-13 10:30:43 +08:00
Songyang Zhang
6997990c93
[Feature] Update Models ( #1518 )
...
* Update Models
* Update
* Update humanevalx
* Update
* Update
2024-09-12 23:35:30 +08:00
bittersweet1999
7c7fa36235
[Feature] add support for internal Followbench ( #1511 )
...
* fix pip version
* fix pip version
* add internal followbench
* add internal followbench
* fix lint
* fix lint
2024-09-11 13:32:34 +08:00
bittersweet1999
c2bcd8725e
[Fix] Fix wildbench ( #1508 )
...
* fix pip version
* fix pip version
* fix_wildbench
2024-09-10 17:35:07 +08:00
Alexander Lam
a31a77c5c1
[Feature] Add SciCode summarizer config ( #1514 )
...
* [Feature] added SciCode summarizer config and dataset config for with background evaluation
* fix lint issues
* removed unnecessary type in summarizer group
2024-09-10 16:06:02 +08:00
Linchen Xiao
87ffa71d68
[Feature] Longbench dataset update
2024-09-06 15:50:12 +08:00
Albert Yan
928d0cfc3a
[Feature] Add support for Rendu API ( #1468 )
...
* Add support for Rendu API
* fix lint issue
* fix lint issue
* fix lint issue
* Update
---------
Co-authored-by: 13190 <zeyu.yan@transn.com>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-09-06 01:00:43 +08:00
liushz
00fc8da5be
[Feature] Add model postprocess function ( #1484 )
...
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-05 21:10:29 +08:00
Maxime SHE
45efdc994d
[Feature] Add an attribute api_key into TurboMindAPIModel default None ( #1475 )
...
Co-authored-by: Maxime <maximeshe@163.com>
Add an attribute api_key into TurboMindAPIModel default None then we can set the api_key while using lmdeploy to deploy the llm model
2024-09-05 17:51:16 +08:00
Linchen Xiao
6c9cd9a260
[Feature] Needlebench auto-download update ( #1480 )
...
* update
* update
* update
2024-09-05 17:22:42 +08:00
Linchen Xiao
da74cbfa39
[Fix] Model configs update
2024-09-04 18:57:10 +08:00
Linchen Xiao
9693be46b7
[Feature] Mmlu-pro auto-download ( #1464 )
...
* update
* update
* update
* update
* update
2024-08-30 10:03:40 +08:00
Songyang Zhang
e5a8eb2283
[Feature] Update Lint and Leaderboard ( #1458 )
...
* [Feature] Update Lint and Leaderboard
* Update
* Update
2024-08-28 22:36:42 +08:00
Linchen Xiao
245664f4c0
[Feature] Fullbench v0.1 language update ( #1463 )
...
* update
* update
* update
* update
2024-08-28 14:01:05 +08:00
Songyang Zhang
7c2d25b557
[Fix] Update SciCode and Gemma model ( #1449 )
...
* [Fix] Update SciCode and Gemma model
* Update
* Update
2024-08-23 10:42:27 +08:00
liushz
9fdbc744dc
[Fix] Update option postprocess & mathbench language summarizer ( #1413 )
...
* Update option postprocess & mathbench language summarizer
* Update option postprocess & mathbench language summarizer
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-08-22 14:49:07 +08:00
Linchen Xiao
0fe9756c5d
[Doc] Update Readme ( #1439 )
...
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
2024-08-22 14:48:45 +08:00
Hari Seldon
14b4b735cb
[Feature] Add support for SciCode ( #1417 )
...
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode w/ bg
* add scicode
* Update README.md
* Update README.md
* Delete configs/eval_SciCode.py
* rename
* 1
* rename
* Update README.md
* Update scicode.py
* Update scicode.py
* fix some bugs
* Update
* Update
---------
Co-authored-by: root <HariSeldon0>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-22 13:42:25 +08:00
Linchen Xiao
a4b54048ae
[Feature] Add Ruler datasets ( #1310 )
...
* [Feature] Add Ruler datasets
* pre-commit fixed
* Add model specific tokenizer to dataset
* pre-commit modified
* remove unused import
* fix linting
* add trust_remote to tokenizer load
* lint fix
* comments resolved
* fix lint
* Add readme
* Fix lint
* ruler refactorize
* fix lint
* lint fix
* updated
* lint fix
* fix wonderwords import issue
* prompt modified
* update
* readme updated
* update
* ruler dataset added
* Update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-20 11:40:11 +08:00
Xu Song
99b5122ed5
[Feature] Add abbr for rolebench dataset ( #1431 )
...
* Add abbr for rolebench dataset
* add
2024-08-20 11:22:48 +08:00