abrohamLee
e9e4b69ddb
[Feature] MuSR Datset Evaluation ( #1689 )
...
* MuSR Datset Evaluation
* MuSR Datset Evaluation
Add an assertion and a Readme.md
2024-11-14 20:42:12 +08:00
Linchen Xiao
d415439f9b
[Fix] Fix bug for first_option_postprocess ( #1688 )
2024-11-14 16:45:59 +08:00
Linchen Xiao
e92a5d4230
[Feature] BABILong Dataset added ( #1684 )
...
* update
* update
* update
* update
2024-11-14 15:32:43 +08:00
Linchen Xiao
2fee63f537
[Update] Auto-download for followbench ( #1685 )
2024-11-13 15:47:29 +08:00
zhulinJulia24
f8a1c1f487
[CI] update ( #1682 )
...
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-11-13 10:48:05 +08:00
bittersweet1999
aca8ec3c6a
[Hotfix] Hotfix ( #1683 )
...
* fix pip version
* fix pip version
* fix lint
* hotfix
2024-11-13 10:14:27 +08:00
zhulinJulia24
a9d6b6461f
[ci] react daily test ( #1668 )
...
* updaste
* update
* update
* update
* update
* update
* update
* update
* update
* update
* updaste
* update
* update
* refactor summarize
* update
* update
* update
* update
* update
* updaste
* update
* update
* update
* update
* updaste
* update
* update
* update
* update
* update
* updaste
* updaste
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* Update daily-run-test.yml
* Update daily-run-test.yml
* update
* update
* update
* update
* update
* Update daily-run-test.yml
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* Update daily-run-test.yml
* Update daily-run-test.yml
* update
* update
* Update daily-run-test.yml
* update
* update
* update
---------
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-11-12 18:40:27 +08:00
sobeit
3ec178f4a9
add single lora adapter support for vLLM inference. ( #1679 )
2024-11-12 17:31:36 +08:00
bittersweet1999
17b5e52f6c
[Hotfix] lmdeploy temp ( #1674 )
...
* fix pip version
* fix pip version
* hotfix
2024-11-12 16:10:16 +08:00
Linchen Xiao
a0ef2fd3b4
[Update] Dingo Dataset update ( #1670 )
...
* [Update] Dingo Dataset update
* update
2024-11-08 14:38:43 +08:00
Linchen Xiao
835bf75a36
[Feature] Add long context evaluation for base models ( #1666 )
...
* [Update] Add base long context evaluation
* update
2024-11-08 10:53:29 +08:00
Chang Cheng
fd7aa83c01
[Update] Update DLC Runner( #1662 )
...
* push interntrain hard code
* push interntrain hard code
* remove redundant post process
---------
Co-authored-by: changcheng <changcheng@pjlab.org.cb>
Co-authored-by: changcheng <changcheng@pjlab.org.cn>
2024-11-07 15:45:35 +08:00
Linchen Xiao
db258eb7d5
[Bump] Bump version to v0.3.5 ( #1657 )
2024-11-03 21:23:35 +08:00
Lyu Han
888f1f3bef
[Fix] Update loglikehood compatibility ( #1659 )
2024-11-02 17:19:11 +08:00
liushz
f7d899823c
[Update] Update mmmlu_lite dataload ( #1658 )
...
* update mmmlu_lite dataload from oss
* update mmmlu_lite dataload from oss
2024-11-01 17:32:29 +08:00
Songyang Zhang
c789ce5698
[Fix] the automatically download for several datasets ( #1652 )
...
* [Fix] the automatically download for several datasets
* Update
* Update
* Update CI
2024-11-01 15:57:18 +08:00
Linchen Xiao
695738a89b
[Update] Add lmdeploy DeepSeek configs ( #1656 )
...
* [Update] Add lmdeploy DeepSeek configs
* update max out length
2024-11-01 15:34:23 +08:00
bittersweet1999
a0853c939d
[Add] Add CompassArenaSubjectiveBench ( #1645 )
...
* fix pip version
* fix pip version
* add compassarenasubjectivebench
* add compassarenasubjectivebench
* add compassarenabench
2024-11-01 13:52:22 +08:00
Songyang Zhang
d611907d14
[Doc] Update Doc ( #1655 )
2024-10-31 18:08:09 +08:00
Linchen Xiao
5212ffe8e2
[Update] Add new model configs ( #1653 )
2024-10-30 17:24:53 +08:00
Linchen Xiao
df57c08ccf
[Feature] Update Models, Summarizers ( #1600 )
2024-10-29 18:37:15 +08:00
Linchen Xiao
d91d66792a
[Update] Update Needlebench OSS path ( #1651 )
2024-10-29 18:05:44 +08:00
Chang Lan
46affab882
[Fix] Fix ruler_16k_gen ( #1643 )
2024-10-29 17:58:43 +08:00
Linchen Xiao
8172af49bb
[Update] Update wildbench max_seq_len ( #1648 )
...
* [Update] Wildbench max_seq_len update
* [Update] Wildbench max_seq_len update
2024-10-29 13:21:31 +08:00
Junnan Liu
645c5f3b2c
[Datasets] Add datasets CMO&AIME ( #1610 )
...
* add datasets cmo&aime
* delete unused modules
* modify prompt
* update __init__
* update data load and add README
* update data load
* update performance
* update md5
* remove indents
* add indent
* fix log for debug mode
2024-10-28 18:08:02 +08:00
Linchen Xiao
9c39cb68d4
[Bump] Bump version to 0.3.4 ( #1639 )
2024-10-25 20:10:16 +08:00
Linchen Xiao
a61e8a0803
[Update] Internal humaneval add ( #1641 )
...
* [Update] internal_humaneval_add
* update
2024-10-25 19:08:42 +08:00
Songyang Zhang
84be90669b
[Update] Fix issue of *_param.py, avoid name conflict;add keep_tmp_file flag to support keep the temp config file. ( #1640 )
2024-10-25 16:39:25 +08:00
BigDong
2542bc6907
[Feature] Support results saving as md format table ( #1638 )
2024-10-25 15:50:33 +08:00
Linchen Xiao
22fdea4bf2
[Update] Update DLC runner ( #1637 )
2024-10-24 21:36:16 +08:00
Lyu Han
fb12c3f98a
[Update] strip stop_words ( #1635 )
2024-10-24 20:39:20 +08:00
Linchen Xiao
662dddf41a
[Update] Add internal humaneval postprocess ( #1636 )
2024-10-24 17:45:21 +08:00
Linchen Xiao
be3c06a158
[Fix] Update common summarizer regex extraction ( #1631 )
2024-10-22 14:35:45 +08:00
Chang Lan
a927bba1cf
[Fix] Fix RULER datasets ( #1628 )
...
We need to ensure that we don't import anything that ends with "_datasets",
or they will be picked up by the runner, leading to duplicate / unwanted datasets
being evaluated.
2024-10-22 11:59:02 +08:00
Songyang Zhang
a4d5a6c81b
[Feature] Support LiveCodeBench ( #1617 )
...
* Update
* Update LCB
* Update
* Update
* Update
* Update
* Update
2024-10-21 20:50:39 +08:00
Chenguang Li
5868d5afa4
[Bug] Fix-NPU-Support ( #1618 )
...
* bugfix NPU support
* formatting
---------
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-10-21 17:42:53 +08:00
liushz
500b44ba2d
[Fix] gpqa_few_shot_ppl prompt bug ( #1627 )
2024-10-21 16:59:06 +08:00
Linchen Xiao
096c347e7d
[Fix] Qwen 2.5 model config ( #1626 )
...
* [Fix] Fix Qwen 2.5 model config
* [Fix] Fix Qwen 2.5 model config
* [Fix] Fix Qwen 2.5 model config
2024-10-21 16:58:18 +08:00
bittersweet1999
1188e1ecf0
[Update] eval_judgerbench.py ( #1625 )
2024-10-21 15:30:29 +08:00
zhulinJulia24
825d3388d5
[CI] Test PR staging fixed ( #1624 )
...
* Update oc_score_baseline.yaml
* Update runtime.txt
2024-10-21 11:02:37 +08:00
bittersweet1999
a11e2b2fd4
[Fix] Compatible with old versions ( #1616 )
...
* fix pip version
* fix pip version
* Compatible with old versions
* compati old version
* compati old version
* compati old version
* update configs
2024-10-21 10:16:29 +08:00
Lyu Han
6e8adf5221
[Bug] Remove prefix bos_token from messages when using lmdeploy as the accelerator ( #1623 )
...
* remove prefix bos_token from messages when using lmdeploy as the accelerator
* update
2024-10-19 20:03:47 +08:00
zhulinJulia24
b89c7b2fc3
[CI] Update daily-run-test.yml ( #1620 )
2024-10-18 18:30:35 +08:00
Bob Tsang
dd0b655bd0
[Feature] Support MMMLU & MMMLU-lite Benchmark ( #1565 )
...
* rm folder
* modify format according to reviewer
* modify format according to reviewer
* modify format according to reviewer
* add some files requirement
* fix some bug
* fix bug
* change load type
* Update MMMLU Dataset
* Update MMMLU Dataset
* Add MMMLU-Lite Dataset
* update MMMMLU datast
* update MMMMLU datast
* update MMMMLU datast
---------
Co-authored-by: BobTsang <BobTsang1995@gmail.com>
Co-authored-by: liushz <qq1791167085@163.com>
2024-10-17 19:09:34 +08:00
bittersweet1999
f0d436496e
[Update] update docs and add compassarena ( #1614 )
...
* fix pip version
* fix pip version
* update docs and add compassarena
* update docs
2024-10-17 14:39:06 +08:00
Haoran Que
4fe251729b
Upload HelloBench ( #1607 )
...
* upload hellobench
* update hellobench
* update readme.md
* update eval_hellobench.py
* update lastest
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-10-15 17:11:37 +08:00
bittersweet1999
fa54aa62f6
[Feature] Add Judgerbench and reorg subeval ( #1593 )
...
* fix pip version
* fix pip version
* update (#1522 )
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
* [Feature] Update Models (#1518 )
* Update Models
* Update
* Update humanevalx
* Update
* Update
* [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527 )
add judgerbench and reorg sub
add judgerbench and reorg subeval
add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
---------
Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-10-15 16:36:05 +08:00
x54-729
2b1afa7d1e
[Fix] fix interntrain's tokenizer truncate ( #1605 )
...
Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>
2024-10-15 16:03:57 +08:00
zhulinJulia24
8aba547e06
[ci] fix stable issue of daily test ( #1602 )
...
* update
* update
* update
* Update daily-run-test.yml
* update
* Update daily-run-test.yml
* update
* update
* update
* Update pr-run-test.yml
* Update pr-run-test.yml
* update
* update
* Update daily-run-test.yml
* update
* update
* update
* update
* Update daily-run-test.yml
* Update daily-run-test.yml
* updaste
---------
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-10-15 10:14:49 +08:00
Linchen Xiao
f390697a5e
[Fix] Update dlc runner python env ( #1604 )
2024-10-14 15:50:21 +08:00