Yi Ding
bcb707dbfc
[Fix] Fix BailingAPI model ( #1707 )
...
* [fix] sequence under the multiple samples
* resolve the lint problems
* change the parameter name
* add another error code for retry
* output the log for invalid response
* format correction
* update
* update
* update
* update
* add two model python files
* update the default parameter
* use random for delay
* update the api example of bailing
* remove the unnecessary parameter
2024-11-26 19:24:47 +08:00
Linchen Xiao
ef695e28e5
[Bug] Fix Korbench dataset module ( #1717 )
2024-11-26 17:13:28 +08:00
Songyang Zhang
f97c4eae42
[Update] Update Fullbench ( #1712 )
...
* Update JuderBench
* Support O1-style Prompts
* Update Code
2024-11-26 14:26:55 +08:00
Yufeng Zhao
300adc31e8
[Feature] Add Korbench dataset ( #1713 )
...
* first version for korbench
* first stage for korbench
* korbench_1
* korbench_1
* korbench_1
* korbench_1
* korbench_1_revised
* korbench_combined_1
* korbench_combined_1
* kor_combined
* kor_combined
* update
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2024-11-25 20:11:27 +08:00
Chang Lan
5c1916ea4c
[Update] Add RULER 64k config ( #1709 )
2024-11-25 19:35:27 +08:00
liushz
e49fcfd3a3
[Update] Update MATH dataset with model judge ( #1711 )
...
* Update math with llm judge
* Update math with llm judge
* Update math with llm judge
* Update math with llm judge
* Update math with llm judge
2024-11-25 15:14:55 +08:00
Linchen Xiao
80e3b9ef37
[Update] Add math prm 800k ( #1708 )
2024-11-21 21:29:43 +08:00
Linchen Xiao
500fb1032a
[Update] Update configurations ( #1704 )
2024-11-21 16:51:18 +08:00
Yi Ding
05044dfaf2
[Update] Support new error code for Bailing model ( #1702 )
...
* support new error code
* fix the lint problems
2024-11-20 16:40:22 +08:00
Linchen Xiao
ff831b153e
[BUMP] Bump version to 0.3.6 ( #1694 )
2024-11-18 20:24:50 +08:00
Linchen Xiao
ab8fdbbaab
[Update] Update Math auto-download data ( #1700 )
2024-11-18 20:24:35 +08:00
Linchen Xiao
98242ff1d1
[Update] first_option_postprocess ( #1699 )
...
* update first_option_postprocess
* update
2024-11-18 20:14:29 +08:00
Linchen Xiao
4653f6976e
[Update] update volc CPU flavor ( #1698 )
2024-11-18 12:33:51 +08:00
Linchen Xiao
40a9f0be0d
[Update] MUSR dataset config prefix update ( #1692 )
2024-11-15 11:06:30 +08:00
abrohamLee
e9e4b69ddb
[Feature] MuSR Datset Evaluation ( #1689 )
...
* MuSR Datset Evaluation
* MuSR Datset Evaluation
Add an assertion and a Readme.md
2024-11-14 20:42:12 +08:00
Linchen Xiao
d415439f9b
[Fix] Fix bug for first_option_postprocess ( #1688 )
2024-11-14 16:45:59 +08:00
Linchen Xiao
e92a5d4230
[Feature] BABILong Dataset added ( #1684 )
...
* update
* update
* update
* update
2024-11-14 15:32:43 +08:00
Linchen Xiao
2fee63f537
[Update] Auto-download for followbench ( #1685 )
2024-11-13 15:47:29 +08:00
bittersweet1999
aca8ec3c6a
[Hotfix] Hotfix ( #1683 )
...
* fix pip version
* fix pip version
* fix lint
* hotfix
2024-11-13 10:14:27 +08:00
sobeit
3ec178f4a9
add single lora adapter support for vLLM inference. ( #1679 )
2024-11-12 17:31:36 +08:00
bittersweet1999
17b5e52f6c
[Hotfix] lmdeploy temp ( #1674 )
...
* fix pip version
* fix pip version
* hotfix
2024-11-12 16:10:16 +08:00
Linchen Xiao
a0ef2fd3b4
[Update] Dingo Dataset update ( #1670 )
...
* [Update] Dingo Dataset update
* update
2024-11-08 14:38:43 +08:00
Linchen Xiao
835bf75a36
[Feature] Add long context evaluation for base models ( #1666 )
...
* [Update] Add base long context evaluation
* update
2024-11-08 10:53:29 +08:00
Chang Cheng
fd7aa83c01
[Update] Update DLC Runner( #1662 )
...
* push interntrain hard code
* push interntrain hard code
* remove redundant post process
---------
Co-authored-by: changcheng <changcheng@pjlab.org.cb>
Co-authored-by: changcheng <changcheng@pjlab.org.cn>
2024-11-07 15:45:35 +08:00
Linchen Xiao
db258eb7d5
[Bump] Bump version to v0.3.5 ( #1657 )
2024-11-03 21:23:35 +08:00
Lyu Han
888f1f3bef
[Fix] Update loglikehood compatibility ( #1659 )
2024-11-02 17:19:11 +08:00
liushz
f7d899823c
[Update] Update mmmlu_lite dataload ( #1658 )
...
* update mmmlu_lite dataload from oss
* update mmmlu_lite dataload from oss
2024-11-01 17:32:29 +08:00
Songyang Zhang
c789ce5698
[Fix] the automatically download for several datasets ( #1652 )
...
* [Fix] the automatically download for several datasets
* Update
* Update
* Update CI
2024-11-01 15:57:18 +08:00
Linchen Xiao
695738a89b
[Update] Add lmdeploy DeepSeek configs ( #1656 )
...
* [Update] Add lmdeploy DeepSeek configs
* update max out length
2024-11-01 15:34:23 +08:00
bittersweet1999
a0853c939d
[Add] Add CompassArenaSubjectiveBench ( #1645 )
...
* fix pip version
* fix pip version
* add compassarenasubjectivebench
* add compassarenasubjectivebench
* add compassarenabench
2024-11-01 13:52:22 +08:00
Linchen Xiao
5212ffe8e2
[Update] Add new model configs ( #1653 )
2024-10-30 17:24:53 +08:00
Linchen Xiao
df57c08ccf
[Feature] Update Models, Summarizers ( #1600 )
2024-10-29 18:37:15 +08:00
Linchen Xiao
d91d66792a
[Update] Update Needlebench OSS path ( #1651 )
2024-10-29 18:05:44 +08:00
Chang Lan
46affab882
[Fix] Fix ruler_16k_gen ( #1643 )
2024-10-29 17:58:43 +08:00
Linchen Xiao
8172af49bb
[Update] Update wildbench max_seq_len ( #1648 )
...
* [Update] Wildbench max_seq_len update
* [Update] Wildbench max_seq_len update
2024-10-29 13:21:31 +08:00
Junnan Liu
645c5f3b2c
[Datasets] Add datasets CMO&AIME ( #1610 )
...
* add datasets cmo&aime
* delete unused modules
* modify prompt
* update __init__
* update data load and add README
* update data load
* update performance
* update md5
* remove indents
* add indent
* fix log for debug mode
2024-10-28 18:08:02 +08:00
Linchen Xiao
9c39cb68d4
[Bump] Bump version to 0.3.4 ( #1639 )
2024-10-25 20:10:16 +08:00
Linchen Xiao
a61e8a0803
[Update] Internal humaneval add ( #1641 )
...
* [Update] internal_humaneval_add
* update
2024-10-25 19:08:42 +08:00
Songyang Zhang
84be90669b
[Update] Fix issue of *_param.py, avoid name conflict;add keep_tmp_file flag to support keep the temp config file. ( #1640 )
2024-10-25 16:39:25 +08:00
BigDong
2542bc6907
[Feature] Support results saving as md format table ( #1638 )
2024-10-25 15:50:33 +08:00
Linchen Xiao
22fdea4bf2
[Update] Update DLC runner ( #1637 )
2024-10-24 21:36:16 +08:00
Lyu Han
fb12c3f98a
[Update] strip stop_words ( #1635 )
2024-10-24 20:39:20 +08:00
Linchen Xiao
662dddf41a
[Update] Add internal humaneval postprocess ( #1636 )
2024-10-24 17:45:21 +08:00
Linchen Xiao
be3c06a158
[Fix] Update common summarizer regex extraction ( #1631 )
2024-10-22 14:35:45 +08:00
Chang Lan
a927bba1cf
[Fix] Fix RULER datasets ( #1628 )
...
We need to ensure that we don't import anything that ends with "_datasets",
or they will be picked up by the runner, leading to duplicate / unwanted datasets
being evaluated.
2024-10-22 11:59:02 +08:00
Songyang Zhang
a4d5a6c81b
[Feature] Support LiveCodeBench ( #1617 )
...
* Update
* Update LCB
* Update
* Update
* Update
* Update
* Update
2024-10-21 20:50:39 +08:00
Chenguang Li
5868d5afa4
[Bug] Fix-NPU-Support ( #1618 )
...
* bugfix NPU support
* formatting
---------
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-10-21 17:42:53 +08:00
liushz
500b44ba2d
[Fix] gpqa_few_shot_ppl prompt bug ( #1627 )
2024-10-21 16:59:06 +08:00
Linchen Xiao
096c347e7d
[Fix] Qwen 2.5 model config ( #1626 )
...
* [Fix] Fix Qwen 2.5 model config
* [Fix] Fix Qwen 2.5 model config
* [Fix] Fix Qwen 2.5 model config
2024-10-21 16:58:18 +08:00
bittersweet1999
a11e2b2fd4
[Fix] Compatible with old versions ( #1616 )
...
* fix pip version
* fix pip version
* Compatible with old versions
* compati old version
* compati old version
* compati old version
* update configs
2024-10-21 10:16:29 +08:00
Lyu Han
6e8adf5221
[Bug] Remove prefix bos_token from messages when using lmdeploy as the accelerator ( #1623 )
...
* remove prefix bos_token from messages when using lmdeploy as the accelerator
* update
2024-10-19 20:03:47 +08:00
Bob Tsang
dd0b655bd0
[Feature] Support MMMLU & MMMLU-lite Benchmark ( #1565 )
...
* rm folder
* modify format according to reviewer
* modify format according to reviewer
* modify format according to reviewer
* add some files requirement
* fix some bug
* fix bug
* change load type
* Update MMMLU Dataset
* Update MMMLU Dataset
* Add MMMLU-Lite Dataset
* update MMMMLU datast
* update MMMMLU datast
* update MMMMLU datast
---------
Co-authored-by: BobTsang <BobTsang1995@gmail.com>
Co-authored-by: liushz <qq1791167085@163.com>
2024-10-17 19:09:34 +08:00
bittersweet1999
f0d436496e
[Update] update docs and add compassarena ( #1614 )
...
* fix pip version
* fix pip version
* update docs and add compassarena
* update docs
2024-10-17 14:39:06 +08:00
Haoran Que
4fe251729b
Upload HelloBench ( #1607 )
...
* upload hellobench
* update hellobench
* update readme.md
* update eval_hellobench.py
* update lastest
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-10-15 17:11:37 +08:00
bittersweet1999
fa54aa62f6
[Feature] Add Judgerbench and reorg subeval ( #1593 )
...
* fix pip version
* fix pip version
* update (#1522 )
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
* [Feature] Update Models (#1518 )
* Update Models
* Update
* Update humanevalx
* Update
* Update
* [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527 )
add judgerbench and reorg sub
add judgerbench and reorg subeval
add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
---------
Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-10-15 16:36:05 +08:00
x54-729
2b1afa7d1e
[Fix] fix interntrain's tokenizer truncate ( #1605 )
...
Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>
2024-10-15 16:03:57 +08:00
Linchen Xiao
f390697a5e
[Fix] Update dlc runner python env ( #1604 )
2024-10-14 15:50:21 +08:00
Lyu Han
4fde41036f
[Feature] Update TurboMindModel by integrating lmdeploy pipeline API ( #1556 )
...
* integrate lmdeploy's pipeline api
* fix linting
* update user guide
* rename
* update
* update
* update
* rollback class name
* update
* remove unused code
* update
* update
* use pipeline
* fix ci check
* compatibility
* compatibility
* remove concurrency
* update
* fix table content
* update
2024-10-14 15:33:40 +08:00
liushz
5faee929db
[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config ( #1589 )
...
* Add GaoKaoMath Dataset
* Add MATH LLM Eval
* Update GAOKAO Math Eval Dataset
* Update GAOKAO Math Eval Dataset
2024-10-12 19:13:06 +08:00
bittersweet1999
3f7a3730d7
[Fix] fix Flames ( #1599 )
...
* fix pip version
* fix pip version
* fix flames
* fix flames
2024-10-12 14:34:59 +08:00
Lyu Han
b52ba65c26
[Feature] Integrate lmdeploy pipeline api ( #1198 )
...
* integrate lmdeploy's pipeline api
* fix linting
* update user guide
* rename
* update
* update
* update
* rollback class name
* update
* remove unused code
* update
* update
* fix ci check
* compatibility
* remove concurrency
* Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py
* Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md
* [Bug] fix lint
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-10-09 22:58:06 +08:00
x54-729
4d6349dfe1
[FIX] fix interntrain get_loglikelihood ( #1584 )
2024-10-08 11:34:04 +08:00
Linchen Xiao
22a4e76511
[BUMP] Bump version to 0.3.3 ( #1581 )
2024-09-30 16:57:41 +08:00
x54-729
bbdca5eb4c
[BUG] Fix eos token handling and add comments for InternTrain ( #1569 )
...
Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>
2024-09-30 15:46:06 +08:00
Linchen Xiao
763d7755b6
[BUG]GaokaoBench dataset fix ( #1583 )
2024-09-30 15:13:26 +08:00
shijinpjlab
7528b8ab8a
[Feature] Add dingo test ( #1529 )
...
* add qa dingo
* update
* change name qa to dingo
* eval model: llm_base
* update path
* change name and move path
* add eval_dingo
* update import
* add for pip
* add dingo package
* change import place
* update import place
* fix lint fail
* isort
* double quoted
---------
Co-authored-by: sj <shijin@pjlab.org.cn>
2024-09-29 19:24:58 +08:00
Yi Ding
85a28874aa
[BUG]: Fix Bailing API configs ( #1570 )
2024-09-27 11:56:57 +08:00
Songyang Zhang
e8437db98f
[Feature] Update BailingLM/OpenAI verbose ( #1568 )
...
* [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI
* Update
* [Feature] Update API
* Update
2024-09-27 11:15:25 +08:00
Songyang Zhang
7d50294117
[Feature] Update Bailing ( #1567 )
...
* [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI
* Update
* Update
* Update
2024-09-26 18:56:17 +08:00
Songyang Zhang
a7bacfdf7e
[Feature] Update CoreBench 2.0 ( #1566 )
...
* [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI
* Update
* Update
2024-09-26 18:44:00 +08:00
Yi Ding
3f833186dc
[Feature] Support the reasoning from BaiLing LLM ( #1541 )
...
* [Feature] Support the reasoning from BaiLing LLM
This commit includes the access to BaiLing LLM and gets the reasoning.
* Add the api example
The example of evalute bailing api
* Revise the generation arguments
Based on current experiment, we update some generation arguments for better reasoning
* [fix] set the batch size
* Retry under flowcontrol of serverside
* add dependent package into requirement.txt
add dependent package retrying to clean up the pre-comment check.
* correct the file names and make the file copy
correct the file names.
copy the files under configs to opencompass
* fix the lint issue
---------
Co-authored-by: christopher.dy <christopher.dy@antgroup.com>
2024-09-26 16:49:52 +08:00
Linchen Xiao
80cda1980e
[BUG] fix followbench dataset config ( #1564 )
...
* [BUG] fix followbench dataset config
* [BUG] fix followbench dataset config
2024-09-25 20:58:34 +08:00
zhulinJulia24
87df8a73a3
[CI] add a common summarizer for qabench summarizer ( #1545 )
...
* update
* update
* update
---------
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-09-25 13:40:47 +08:00
Linchen Xiao
c3fb9065db
[Feature] Add dlc sleep time ( #1562 )
2024-09-25 11:53:48 +08:00
liushz
83eeb52b09
[Feature] Update WikiBench base model config ( #1553 )
...
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update GPQA & MMLU_Pro
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update MathBench & Math base config
* Update WikiBench base model config
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-25 11:26:36 +08:00
Songyang Zhang
e7681943f3
[Feature] Update the max_out_len for many models ( #1559 )
2024-09-24 21:52:28 +08:00
bittersweet1999
a2e9bc0c41
[Fix] fix duplicate error in partitioner ( #1552 )
...
* fix pip version
* fix pip version
* fix duplicate error in paritioner
* fix duplicate error in paritioner
2024-09-23 19:45:21 +08:00
x54-729
335667183a
[Feature] Add Interntrain model support ( #1548 )
...
Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>
2024-09-23 19:10:26 +08:00
klein
24915aeb3f
[BUG] Update CIbench config( #1544 )
...
* BUG: Update cibench.py
* BUG: Update cibench.py
2024-09-23 18:32:27 +08:00
liushz
a0cfd61129
[Feature] Update MathBench & Math base model config ( #1550 )
...
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update GPQA & MMLU_Pro
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update MathBench & Math base config
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-23 14:03:59 +08:00
Songyang Zhang
ee058e25b2
[Feature] Support verbose for OpenAI API ( #1546 )
2024-09-20 17:12:52 +08:00
hailsham
a81bbb85bf
[FIX] Added handling for the "begin section" in meta_template to APITemplateParser ( #1405 )
...
Co-authored-by: leifei <nuuooo@icloud.com>
2024-09-19 18:12:04 +08:00
Songyang Zhang
5a27c2bd6f
[Model] Support Qwen2.5 Instruct ( #1543 )
2024-09-19 16:16:07 +08:00
Songyang Zhang
be460fbb21
[Feature] Support OpenAI O1 models ( #1539 )
...
* [Feature] Support OpenAI O1 models
* Update README.md
---------
Co-authored-by: liushz <qq1791167085@163.com>
2024-09-18 22:41:17 +08:00
liushz
2e9db77d57
[Feature] Add custom model postprocess function ( #1519 )
...
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-18 14:40:51 +08:00
liushz
c9a7026f59
[Feature] Update MathBench & WikiBench for FullBench ( #1521 )
...
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update GPQA & MMLU_Pro
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-18 14:35:30 +08:00
Linchen Xiao
90279b6461
[Feature] Dataset prompts update for ARC, BoolQ, Race ( #1527 )
2024-09-13 10:30:43 +08:00
Songyang Zhang
6997990c93
[Feature] Update Models ( #1518 )
...
* Update Models
* Update
* Update humanevalx
* Update
* Update
2024-09-12 23:35:30 +08:00
zhulinJulia24
3754dc1b67
update ( #1522 )
...
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-09-12 15:00:52 +08:00
bittersweet1999
7c7fa36235
[Feature] add support for internal Followbench ( #1511 )
...
* fix pip version
* fix pip version
* add internal followbench
* add internal followbench
* fix lint
* fix lint
2024-09-11 13:32:34 +08:00
Linchen Xiao
317763381c
update ( #1517 )
2024-09-11 13:31:20 +08:00
bittersweet1999
c2bcd8725e
[Fix] Fix wildbench ( #1508 )
...
* fix pip version
* fix pip version
* fix_wildbench
2024-09-10 17:35:07 +08:00
Alexander Lam
a31a77c5c1
[Feature] Add SciCode summarizer config ( #1514 )
...
* [Feature] added SciCode summarizer config and dataset config for with background evaluation
* fix lint issues
* removed unnecessary type in summarizer group
2024-09-10 16:06:02 +08:00
Linchen Xiao
b5f8afb57b
[Bump] Bump version to 0.3.2.post1
2024-09-06 19:09:30 +08:00
Linchen Xiao
f04f3546bc
[Fix] Import fix ( #1500 )
2024-09-06 18:29:24 +08:00
Linchen Xiao
ff18545f0e
[Bump] Bump version to 0.3.2 ( #1497 )
2024-09-06 16:10:45 +08:00
Linchen Xiao
87ffa71d68
[Feature] Longbench dataset update
2024-09-06 15:50:12 +08:00
Albert Yan
928d0cfc3a
[Feature] Add support for Rendu API ( #1468 )
...
* Add support for Rendu API
* fix lint issue
* fix lint issue
* fix lint issue
* Update
---------
Co-authored-by: 13190 <zeyu.yan@transn.com>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-09-06 01:00:43 +08:00
Hari Seldon
faf5260155
[Feature] Optimize Evaluation Speed of SciCode ( #1489 )
...
* update scicode
* update comments
* remove redundant variable
* Update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-09-06 00:59:41 +08:00
liushz
00fc8da5be
[Feature] Add model postprocess function ( #1484 )
...
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-05 21:10:29 +08:00
Maxime SHE
45efdc994d
[Feature] Add an attribute api_key into TurboMindAPIModel default None ( #1475 )
...
Co-authored-by: Maxime <maximeshe@163.com>
Add an attribute api_key into TurboMindAPIModel default None then we can set the api_key while using lmdeploy to deploy the llm model
2024-09-05 17:51:16 +08:00
Linchen Xiao
6c9cd9a260
[Feature] Needlebench auto-download update ( #1480 )
...
* update
* update
* update
2024-09-05 17:22:42 +08:00
zhulinJulia24
716d46e1f5
[ci] fix badcase and add env info ( #1491 )
...
* update
* update
---------
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-09-05 16:43:45 +08:00
zhulinJulia24
fb6a0df652
[ci] fix test env for vllm and add vllm baselines ( #1481 )
...
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
---------
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-09-04 19:24:09 +08:00
Linchen Xiao
da74cbfa39
[Fix] Model configs update
2024-09-04 18:57:10 +08:00
Linchen Xiao
9693be46b7
[Feature] Mmlu-pro auto-download ( #1464 )
...
* update
* update
* update
* update
* update
2024-08-30 10:03:40 +08:00
Alexander Lam
8b39225259
[Feature] Added extra_body
support for OpenAISDK; Added support for proxy URL when connecting to OpenAI's API. ( #1467 )
...
* fix lint issues
* fix lint issues
2024-08-29 00:43:43 +08:00
Guoli Yin
a488b9b4f5
[Feature] Make OPENAI_API_BASE compatible with openai default env ( #1461 )
...
* Make OPENAI_API_BASE compatible with openai default env
* Make OPENAI_API_BASE compatible with openai default env
---------
Co-authored-by: Guoli Yin <gyin@icloud.com>
2024-08-28 23:14:41 +08:00
Songyang Zhang
e5a8eb2283
[Feature] Update Lint and Leaderboard ( #1458 )
...
* [Feature] Update Lint and Leaderboard
* Update
* Update
2024-08-28 22:36:42 +08:00
Linchen Xiao
245664f4c0
[Feature] Fullbench v0.1 language update ( #1463 )
...
* update
* update
* update
* update
2024-08-28 14:01:05 +08:00
CHEN PENGAN
463231c651
[Feature] Add icl_sliding_k_retriever.py and update __init__.py ( #1305 )
...
* Add icl_sliding_k_retriever.py and update __init__.py
* Fix flake8, isort, and yapf issues for Sliding Window Retriever
2024-08-23 17:18:31 +08:00
Linchen Xiao
94b6bd65fc
[Fix] Fix cli evaluation for multiple models ( #1454 )
...
* update
* update
2024-08-23 17:15:36 +08:00
Songyang Zhang
5485207fbe
[Bump] Bump version to 0.3.1 ( #1450 )
...
* [Bump] Bump version 0.3.1
* Update
2024-08-23 10:47:57 +08:00
Songyang Zhang
7c2d25b557
[Fix] Update SciCode and Gemma model ( #1449 )
...
* [Fix] Update SciCode and Gemma model
* Update
* Update
2024-08-23 10:42:27 +08:00
Xu Song
ad3931aa32
Update openicl_infer.py ( #1308 )
2024-08-23 10:39:22 +08:00
liushz
9fdbc744dc
[Fix] Update option postprocess & mathbench language summarizer ( #1413 )
...
* Update option postprocess & mathbench language summarizer
* Update option postprocess & mathbench language summarizer
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-08-22 14:49:07 +08:00
Linchen Xiao
0fe9756c5d
[Doc] Update Readme ( #1439 )
...
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
2024-08-22 14:48:45 +08:00
Hari Seldon
14b4b735cb
[Feature] Add support for SciCode ( #1417 )
...
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode w/ bg
* add scicode
* Update README.md
* Update README.md
* Delete configs/eval_SciCode.py
* rename
* 1
* rename
* Update README.md
* Update scicode.py
* Update scicode.py
* fix some bugs
* Update
* Update
---------
Co-authored-by: root <HariSeldon0>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-22 13:42:25 +08:00
liushz
d3963bceae
[Bug] Add model support for 'huggingface_above_v4_33' when using '-a' ( #1430 )
...
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-08-22 13:40:24 +08:00
seetimee
ac093fce53
[Update] Update openai_api.py ( #1438 )
...
Most models' token limits are above 32k. It will fix long context dataset test bug of skiping some data.
2024-08-21 18:57:49 +08:00
liushz
e076dc5acf
[Fix] Fix openai api tiktoken bug for api server ( #1433 )
...
* Fix openai api tiktoken
* Fix openai api tiktoken
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-08-20 22:02:14 +08:00
Linchen Xiao
a4b54048ae
[Feature] Add Ruler datasets ( #1310 )
...
* [Feature] Add Ruler datasets
* pre-commit fixed
* Add model specific tokenizer to dataset
* pre-commit modified
* remove unused import
* fix linting
* add trust_remote to tokenizer load
* lint fix
* comments resolved
* fix lint
* Add readme
* Fix lint
* ruler refactorize
* fix lint
* lint fix
* updated
* lint fix
* fix wonderwords import issue
* prompt modified
* update
* readme updated
* update
* ruler dataset added
* Update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-20 11:40:11 +08:00
Xu Song
99b5122ed5
[Feature] Add abbr for rolebench dataset ( #1431 )
...
* Add abbr for rolebench dataset
* add
2024-08-20 11:22:48 +08:00
Linchen Xiao
ecf9bb3e4c
[Bug] Commonsenseqa dataset fix ( #1425 )
...
* longbench dataset load fix
* update
* Update
* Update
* Update
* update
* update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-16 15:54:07 +08:00
Songyang Zhang
9b3613f10b
[Update] Support auto-download of FOFO/MT-Bench-101 ( #1423 )
...
* [Update] Support auto-download of FOFO/MT-Bench-101
* Update wildbench
2024-08-16 11:57:41 +08:00
bittersweet1999
ce7f4853ce
[Fix] Sub summarizer order fix ( #1426 )
...
* fix pip version
* fix pip version
* fix sub summarizer order
* fix order
2024-08-15 21:08:18 +08:00
Linchen Xiao
2596f226f4
[Fix] longbench dataset load fix ( #1422 )
2024-08-15 11:30:30 +08:00
Linchen Xiao
8e55c9c6ee
[Update] Compassbench v1.3 ( #1396 )
...
* stash files
* compassbench subjective evaluation added
* evaluation update
* fix lint
* update docs
* Update lint
* changes saved
* changes saved
* CompassBench subjective summarizer added (#1349 )
* subjective summarizer added
* fix lint
[Fix] Fix MathBench (#1351 )
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
[Update] Update model support list (#1353 )
* fix pip version
* fix pip version
* update model support
subjective summarizer updated
knowledge, math objective done (data need update)
remove secrets
objective changes saved
knowledge data added
* secrets removed
* changed added
* summarizer modified
* summarizer modified
* compassbench coding added
* fix lint
* objective summarizer updated
* compass_bench_v1.3 updated
* update files in config folder
* remove unused model
* lcbench modified
* removed model evaluation configs
* remove duplicated sdk implementation
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-08-12 19:09:19 +08:00
changyeyu
59586a8b4a
[Feature] Enable Truncation of Mid-Section for Long Prompts in huggingface_above_v4_33.py
( #1373 )
...
* Retain the first and last halves of the tokens from the prompt, discarding the middle, to avoid exceeding the model's maximum length.
* Add default parameter: mode
* Modified a comment.
* Modified variable names.
* fix yapf lint
2024-08-09 11:36:30 +08:00
Songyang Zhang
88eb91219b
[Doc] Update README ( #1404 )
...
* [Doc] Update README
* Update
2024-08-08 16:18:33 +08:00
yaoyingyy
decb621ff6
[Fix] the issue where scores are negative in the Lawbench dataset evaluation( #1402 ) ( #1403 )
2024-08-08 16:08:26 +08:00
Yunlin Mao
818d72a650
[Fix] modelscope dataset load problem ( #1406 )
...
* fix modelscope dataset load
* fix lint
2024-08-08 14:01:06 +08:00
Songyang Zhang
264fd23129
[Bump] Bump version for v0.3.0 ( #1398 )
2024-08-07 01:25:24 +08:00
Songyang Zhang
fed1a4998b
[Fix] Fix CaLM import ( #1395 )
2024-08-06 12:17:45 +08:00
Songyang Zhang
c81329b548
[Fix] Fix Slurm ENV ( #1392 )
...
1. Support Slurm Cluster
2. Support automatic data download
3. Update InternLM2.5-1.8B/20B-Chat
2024-08-06 01:35:20 +08:00
Songyang Zhang
c09fc79ba8
[Feature] Support OpenAI ChatCompletion ( #1389 )
...
* [Feature] Support import configs/models/summarizers from whl
* Update
* Update openai sdk
* Update
* Update gemma
2024-08-01 19:10:13 +08:00
Peng Bo
07c96ac659
Calm dataset ( #1385 )
...
* Add CALM Dataset
2024-08-01 10:03:21 +08:00
Songyang Zhang
46cc7894e1
[Feature] Support import configs/models/summarizers from whl ( #1376 )
...
* [Feature] Support import configs/models/summarizers from whl
* Update LCBench configs
* Update
* Update
* Update
* Update
* update
* Update
* Update
* Update
* Update
* Update
2024-08-01 00:42:48 +08:00
Songyang Zhang
33ceaa0eb8
[Bug] Fix bug in turbomind ( #1377 )
2024-07-30 09:37:50 +08:00
Songyang Zhang
eee5a5be23
[Fix] Update get_data_path for LCBench and HumanEval ( #1375 )
2024-07-29 19:28:09 +08:00
Songyang Zhang
704853e5e7
[Feature] Update pip install ( #1324 )
...
* [Feature] Update pip install
* Update Configuration
* Update
* Update
* Update
* Update Internal Config
* Update collect env
2024-07-29 18:32:50 +08:00
Xingjun.Wang
edab1c07ba
[Feature] Support ModelScope datasets ( #1289 )
...
* add ceval, gsm8k modelscope surpport
* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest
* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets
* format file
* format file
* update dataset format
* support ms_dataset
* udpate dataset for modelscope support
* merge myl_dev and update test_ms_dataset
* udpate dataset for modelscope support
* update readme
* update eval_api_zhipu_v2
* remove unused code
* add get_data_path function
* update readme
* remove tydiqa japanese subset
* add ceval, gsm8k modelscope surpport
* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest
* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets
* format file
* format file
* update dataset format
* support ms_dataset
* udpate dataset for modelscope support
* merge myl_dev and update test_ms_dataset
* update readme
* udpate dataset for modelscope support
* update eval_api_zhipu_v2
* remove unused code
* add get_data_path function
* remove tydiqa japanese subset
* update util
* remove .DS_Store
* fix md format
* move util into package
* update docs/get_started.md
* restore eval_api_zhipu_v2.py, add environment setting
* Update dataset
* Update
* Update
* Update
* Update
---------
Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local>
Co-authored-by: Yunnglin <mao.looper@qq.com>
Co-authored-by: Yun lin <yunlin@laptop.local>
Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-29 13:48:32 +08:00
jxd
12b84aeb3b
[Feature] Update CHARM Memeorziation ( #1230 )
...
* update gemini api and add gemini models
* add openai models
* update CHARM evaluation
* add CHARM memorization tasks
* add CharmMemSummarizer (output eval details for memorization-independent reasoning analysis
* update CHARM readme
---------
Co-authored-by: wujiang <wujiang@pjlab.org.cn>
2024-07-26 18:42:30 +08:00
bittersweet1999
d3782c1d47
Revert "Calm dataset ( #1287 )" ( #1366 )
...
This reverts commit edd0ffdf70
.
2024-07-26 18:27:29 +08:00
Peng Bo
edd0ffdf70
Calm dataset ( #1287 )
...
* add calm dataset
* modify config max_out_len
* update README
* Modify README
* update README
* update README
* update README
* update README
* update README
* add summarizer and modify readme
* delete summarizer config comment
* update summarizer
* modify same response to all questions
* update README
2024-07-26 11:48:16 +08:00
mqy004
a08931f214
[Fix] origin_prompt should be None in llm-compression task ( #1225 )
...
Co-authored-by: Qinyang Mou <qinyang_mou@intsig.net>
2024-07-26 11:46:02 +08:00
LeavittLang
8ee7fecb68
Adding support for Doubao API ( #1218 )
...
* Adding support for Doubao API
* Update doubao_api.py
Fixed the bug that the connection would be retried even if it was normal.
* Update doubao_api.py
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-26 11:44:51 +08:00
klein
65fad8e2ac
[Fix] minor update wildbench ( #1335 )
...
* update crb
* update crbbench
* update crbbench
* update crbbench
* minor update wildbench
* [Fix] Update doc of wildbench, and merge wildbench into subjective
* [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench
* Update crb.md
* Update crb_pair_judge.py
* Update crb_single_judge.py
* Update subjective_evaluation.md
* Update openai_api.py
* [Update] update wildbench readme
* [Update] update wildbench readme
* [Update] update wildbench readme, remove crb
* Delete configs/eval_subjective_wildbench_pair.py
* Delete configs/eval_subjective_wildbench_single.py
* Update __init__.py
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-26 11:19:04 +08:00
baymax591
51a94aee01
[Bug] fix bug: delete & ( #1365 )
...
Co-authored-by: 白超 <baichao19@huawei.com>
2024-07-26 11:03:55 +08:00
Mo Li
69aa2f2d57
[Feature] Make NeedleBench available on HF ( #1364 )
...
* update_lint
* update_huggingface format
* fix bug
* update docs
2024-07-25 19:01:56 +08:00
Fengzhe Zhou
c3c02c2960
update docs ( #1318 )
...
* update docs
* 高效评测 -> 数据分片
* update
* update
* Update faq.md
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-25 18:44:25 +08:00
heya5
73aa55af6d
[Fix] Support HF models deployed with an OpenAI-compatible API. ( #1352 )
...
* Support HF models deployed with an OpenAI-compatible API.
* resolve lint issue
* add extra_body arguments
There are many other arguments when using openi-compatiable API like this: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters-for-chat-api
* fix linting issue
* fix yapf linting issue
2024-07-25 18:38:23 +08:00
WANG WENJIN
0aad8199c7
Fix the summary error in subjective.py ( #1363 )
2024-07-25 18:36:13 +08:00
Linchen Xiao
8127fc3518
CompassBench subjective summarizer added ( #1349 )
...
* subjective summarizer added
* fix lint
2024-07-23 12:29:57 +08:00
Que Haoran
a244453d9e
[Feature] Support inference ppl datasets ( #1315 )
...
* commit inference ppl datasets
* revised format
* revise
* revise
* revise
* revise
* revise
* revise
2024-07-22 17:59:30 +08:00
liushz
98c58f8a6c
[Feature] Add compassbench knowledge&math part ( #1342 )
...
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Fix Llama-3 meta template
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Update acclerator
* Update MathBench
* Update accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Update compassbench august wiki&math
* Update compassbench august wiki&math
* Update compassbench august wiki&math
* Update compassbench_aug_gen_068af0.py
* Update compassbench_aug_gen_068af0.py
* Update
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 22:54:46 +08:00
bittersweet1999
1f9f728f22
[Feature] support compassbench Checklist evaluation ( #1339 )
...
* fix pip version
* fix pip version
* support checklist eval
* init
* add lan
* fix typo
2024-07-19 16:40:44 +08:00
Mo Li
f40add2596
[Fix] Fix lint ( #1334 )
...
* update needlebench docs
* update model_name_mapping dict
* update README
* fix_lint
2024-07-18 17:15:06 +08:00
Xu Song
1bfb4217ff
Fix typing and typo ( #1331 )
2024-07-18 13:41:24 +08:00
Mo Li
104bddf647
[Doc] Update NeedleBench Docs ( #1330 )
...
* update needlebench docs
* update model_name_mapping dict
* update README
* Update README_zh-CN.md
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-07-18 13:16:19 +08:00
bittersweet1999
8e7ad2e981
[Fix] add bc for alignbench summarizer ( #1306 )
...
* fix pip version
* fix pip version
* fix alignbench
* fix import error
2024-07-12 11:06:20 +08:00
Fengzhe Zhou
62f55987f1
force register ( #1311 )
2024-07-11 19:59:35 +08:00
Fengzhe Zhou
a62c613d3e
[Sync] bump version 0.2.6+local ( #1294 )
2024-07-06 00:44:06 +08:00
Fengzhe Zhou
1d3a26c732
[Doc] quick start swap tabs ( #1263 )
...
* [doc] quick start swap tabs
* update docs
* update
* update
* update
* update
* update
* update
* update
2024-07-05 23:51:42 +08:00
bittersweet1999
68ca48496b
[Refactor] Reorganize subjective eval ( #1284 )
...
* fix pip version
* fix pip version
* reorganize subjective eval
* reorg sub
* reorg subeval
* reorg subeval
* update subjective doc
* reorg subeval
* reorg subeval
2024-07-05 22:11:37 +08:00
baymax591
28eba6fe34
npu适配 ( #1250 )
...
* npu适配
* Add suport for Ascend NPU
* format
---------
Co-authored-by: baymax591 <14428251+baymax591@user.noreply.gitee.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-07-03 18:55:19 +08:00
Fengzhe Zhou
a32f21a356
[Sync] Sync with internal codes 2024.06.28 ( #1279 )
2024-06-28 14:16:34 +08:00
Xingyuan Bu
842fb1cd70
Update mtbench101.py ( #1276 )
...
fix wrong-used import
from torch.utils.data import DataLoader, Dataset
2024-06-26 00:40:22 +08:00
klein
1fa62c4a42
Support wildbench ( #1266 )
...
Co-authored-by: Leymore <zfz-960727@163.com>
2024-06-24 13:16:27 +08:00
bittersweet1999
982e024540
[Feature] add dataset Fofo ( #1224 )
...
* add fofo dataset
* add dataset fofo
2024-06-06 11:40:48 +08:00
Xingyuan Bu
02a0a4e857
MT-Bench-101 ( #1215 )
...
* add mt-bench-101
* add readme and requirements
* add mt-bench-101 data
* Update readme_mtbench101.md
* update readme
* update leaderboard
* fix typo
* Update readme_mtbench101.md
* fit newest opencompass
* update readme.md
* mtbench101 to opencompass
* mtbench101 to opencompass
* for code review
* for code review
* for code review
* hook
* hook
---------
Co-authored-by: liujie <ljie@buaa.edu.cn>
2024-06-03 14:52:12 +08:00
mqy004
b272803d8a
解决release版本安装后不能导入opencompass.cli.main的问题 ( #1221 )
...
* Create __init__.py
* Create __init__.py
* Create __init__.py
* Create __init__.py
* Create __init__.py
* Create __init__.py
* format
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-31 13:23:33 +08:00
bittersweet1999
7c381e5be8
[Fix] fix summarizer ( #1217 )
...
* fix summarizer
* fix summarizer
2024-05-31 11:40:47 +08:00
Fengzhe Zhou
a77b8a5cec
[Sync] format ( #1214 )
2024-05-30 00:21:58 +08:00
Fengzhe Zhou
d656e818f8
[Docs] Remove --no-batch-padding and Use --hf-num-gpus ( #1205 )
...
* [Docs] Remove --no-batch-padding and Use -hf-num-gpus
* update
2024-05-29 16:30:10 +08:00
Fengzhe Zhou
2954913d9b
[Sync] bump version ( #1204 )
2024-05-28 23:09:59 +08:00
liushz
ba620c4afe
Update accelerator ( #1195 )
...
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Fix Llama-3 meta template
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Update acclerator
* Update MathBench
* Update accelerator
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-28 17:17:54 +08:00
jxd
608ff5810d
support CHARM ( https://github.com/opendatalab/CHARM ) reasoning tasks ( #1190 )
...
* support CHARM (https://github.com/opendatalab/CHARM ) reasoning tasks
* fix lint error
* add dataset card for CHARM
* minor refactor
* add txt
---------
Co-authored-by: wujiang <wujiang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-27 13:48:22 +08:00
bittersweet1999
88c14d3d04
add support for lmdeploy api judge ( #1193 )
2024-05-24 23:28:56 +08:00
yaoyingyy
749e4cea71
[Fix] temporary files using tempfile ( #1186 )
...
Co-authored-by: yaoying <yaoying@kingsoft.com>
2024-05-24 23:27:37 +08:00
Fengzhe Zhou
2b3d4150f3
[Sync] update evaluator ( #1175 )
2024-05-21 14:22:46 +08:00
Fengzhe Zhou
5de85406ce
[Sync] add OC16 entry ( #1171 )
2024-05-17 16:50:58 +08:00
Fengzhe Zhou
8ea2c404d7
[Feat] enable HuggingFacewithChatTemplate with --accelerator via cli ( #1163 )
...
* enable HuggingFacewithChatTemplate with --accelerator via cli
* rm vllm_internlm2_chat_7b
2024-05-15 21:51:07 +08:00
liushz
e3c0448bbc
Update accelerator ( #1152 )
...
* Update acclerator
* update run
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
2024-05-15 14:31:47 +08:00
Fengzhe Zhou
f10dd48f9c
[Fix] Update stop_words in huggingface_above_v4_33 ( #1160 )
2024-05-15 14:10:33 +08:00
Fengzhe Zhou
80f831b425
[Fix] use ProcessPoolExecutor during mbpp eval ( #1159 )
2024-05-15 13:48:29 +08:00
bittersweet1999
8a8987be0b
fix arenahard summarizer ( #1154 )
...
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-15 13:31:29 +08:00
Fengzhe Zhou
62dbf04708
[Sync] update github workflow ( #1156 )
2024-05-14 22:42:23 +08:00
Fengzhe Zhou
7505b3cadf
[Feature] Add huggingface apply_chat_template ( #1098 )
...
* add TheoremQA with 5-shot
* add huggingface_above_v4_33 classes
* use num_worker partitioner in cli
* update theoremqa
* update TheoremQA
* add TheoremQA
* rename theoremqa -> TheoremQA
* update TheoremQA output path
* rewrite many model configs
* update huggingface
* further update
* refine configs
* update configs
* update configs
* add configs/eval_llama3_instruct.py
* add summarizer multi faceted
* update bbh datasets
* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py
* rename class
* update readme
* update hf above v4.33
2024-05-14 14:50:16 +08:00
Mo Li
6c711cb262
[Fix] Fix Needlebench Summarizer ( #1143 )
...
* update few-shot example
* add 128k
2024-05-13 15:59:34 +08:00
bittersweet1999
833a35140b
[Fix] fix alpacaeval while add caching path ( #1139 )
...
* fix alpacaeval
* fix alpacaeval
2024-05-11 14:02:26 +08:00
Fengzhe Zhou
19d7e630d6
[Sync] Update accelerator ( #1122 )
...
(cherry picked from commit 4beb6d9ab655d8a626971841b7acfd9fae9d438f)
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-09 14:32:31 +08:00
bittersweet1999
826d8307ac
fix links ( #1120 )
2024-05-08 15:13:18 +08:00
JuhaoLiang
d2c40e5648
[Feature] Add AceGPT-MMLUArabic benchmark ( #1099 )
...
* add AceGPT-MMLUArabic benchmark
* update readme and fix lint issue
* remove unused package
* add MMLUArabic zero-shot settings
* rename filename and update readme
2024-05-08 15:00:26 +08:00
Fangyu Lei
862044fb7d
[Feature] Add S3Eval Dataset ( #916 )
...
* s3eval_branch
* update s3eval
2024-05-06 19:41:52 +08:00
Yggdrasill7D6
af10ecc272
add mgsm datasets ( #1081 )
...
* add mgsm datasets
* fix lint
* fix lint
* update mgsm
* update mgsm
* ease code spell
* update
* update
* update
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-06 15:29:34 +08:00
klein
153c4fc988
[Feature] update drop dataset from openai simple eval ( #1092 )
...
* [Feature] update drop dataset from openai simple eval
* update drop template presentation
* update
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-06 13:37:08 +08:00
Fengzhe Zhou
d43392a3bb
[Feature] Add mmlu prompt from simple_evals, openai ( #1074 )
...
* add mmlu prompt from simple_evals, openai
* return empty str on failure
2024-05-06 13:26:26 +08:00
Yang Yong
53fe390454
fix LightllmApi workers bug ( #1113 )
2024-04-30 22:09:22 +08:00
Alexander Lam
35c94d0cde
[Feature] Adding support for LLM Compression Evaluation ( #1108 )
...
* fixed formatting based on pre-commit tests
* fixed typo in comments; reduced the number of models in the eval config
* fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset
* removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English
2024-04-30 10:51:01 +08:00