Commit Graph

932 Commits

Author SHA1 Message Date
liushz
c437135fad
[Feature] Add Openai Simpleqa dataset (#1720)
* Add Openai SimpleQA dataset

* Add Openai SimpleQA dataset

* Add Openai SimpleQA dataset

* Update eval_simpleqa.py

---------

Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-11-28 19:16:07 +08:00
liushz
06ab27861e
[Fix] Fix pmmeval_gen config (#1719)
* Update with PMMEval

* Update

* Update __init__.py

* Fix Bugs

* Delete .pre-commit-config.yaml

* Pull merge

* Fix pmmeval_gen config

---------

Co-authored-by: wanyu <wanyu2018umac@gmail.com>
Co-authored-by: wanyu2018umac <42405907+wanyu2018umac@users.noreply.github.com>
2024-11-28 11:53:36 +08:00
wanyu2018umac
90efcf2216
[Feature] Add P-MMEval (#1714)
* Update with PMMEval

* Update

* Update __init__.py

* Fix Bugs

* Delete .pre-commit-config.yaml

* Pull merge

---------

Co-authored-by: liushz <qq1791167085@163.com>
2024-11-27 21:26:18 +08:00
Junnan Liu
f7dbe6bb7d
[Feature] Add Arc Prize Public Evaluation (#1690)
* support arc prize

* update arc-prize dataset info & update arc-prize evaluation performance
2024-11-27 15:44:41 +08:00
Yi Ding
bcb707dbfc
[Fix] Fix BailingAPI model (#1707)
* [fix] sequence under the multiple samples

* resolve the lint problems

* change the parameter name

* add another error code for retry

* output the log for invalid response

* format correction

* update

* update

* update

* update

* add two model python files

* update the default parameter

* use random for delay

* update the api example of bailing

* remove the unnecessary parameter
2024-11-26 19:24:47 +08:00
Linchen Xiao
ef695e28e5
[Bug] Fix Korbench dataset module (#1717) 2024-11-26 17:13:28 +08:00
Songyang Zhang
f97c4eae42
[Update] Update Fullbench (#1712)
* Update JuderBench

* Support O1-style Prompts

* Update Code
2024-11-26 14:26:55 +08:00
Yufeng Zhao
300adc31e8
[Feature] Add Korbench dataset (#1713)
* first version for korbench

* first stage for korbench

* korbench_1

* korbench_1

* korbench_1

* korbench_1

* korbench_1_revised

* korbench_combined_1

* korbench_combined_1

* kor_combined

* kor_combined

* update

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2024-11-25 20:11:27 +08:00
Chang Lan
5c1916ea4c
[Update] Add RULER 64k config (#1709) 2024-11-25 19:35:27 +08:00
liushz
e49fcfd3a3
[Update] Update MATH dataset with model judge (#1711)
* Update math with llm judge

* Update math with llm judge

* Update math with llm judge

* Update math with llm judge

* Update math with llm judge
2024-11-25 15:14:55 +08:00
Linchen Xiao
80e3b9ef37
[Update] Add math prm 800k (#1708) 2024-11-21 21:29:43 +08:00
Linchen Xiao
500fb1032a
[Update] Update configurations (#1704) 2024-11-21 16:51:18 +08:00
zhulinJulia24
ed81f9df30
[CI] update torch version and add more datasets into daily testcase (#1701)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-11-21 10:37:33 +08:00
Yi Ding
05044dfaf2
[Update] Support new error code for Bailing model (#1702)
* support new error code

* fix the lint problems
2024-11-20 16:40:22 +08:00
Linchen Xiao
ff831b153e
[BUMP] Bump version to 0.3.6 (#1694) 2024-11-18 20:24:50 +08:00
Linchen Xiao
ab8fdbbaab
[Update] Update Math auto-download data (#1700) 2024-11-18 20:24:35 +08:00
Linchen Xiao
98242ff1d1
[Update] first_option_postprocess (#1699)
* update first_option_postprocess

* update
2024-11-18 20:14:29 +08:00
Linchen Xiao
4653f6976e
[Update] update volc CPU flavor (#1698) 2024-11-18 12:33:51 +08:00
zhulinJulia24
4a20e1176d
[CI] Update baselines (#1693)
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-11-15 14:46:29 +08:00
Linchen Xiao
40a9f0be0d
[Update] MUSR dataset config prefix update (#1692) 2024-11-15 11:06:30 +08:00
abrohamLee
e9e4b69ddb
[Feature] MuSR Datset Evaluation (#1689)
* MuSR Datset Evaluation

* MuSR Datset Evaluation

Add an assertion and a Readme.md
2024-11-14 20:42:12 +08:00
Linchen Xiao
d415439f9b
[Fix] Fix bug for first_option_postprocess (#1688) 2024-11-14 16:45:59 +08:00
Linchen Xiao
e92a5d4230
[Feature] BABILong Dataset added (#1684)
* update

* update

* update

* update
2024-11-14 15:32:43 +08:00
Linchen Xiao
2fee63f537
[Update] Auto-download for followbench (#1685) 2024-11-13 15:47:29 +08:00
zhulinJulia24
f8a1c1f487
[CI] update (#1682)
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-11-13 10:48:05 +08:00
bittersweet1999
aca8ec3c6a
[Hotfix] Hotfix (#1683)
* fix pip version

* fix pip version

* fix lint

* hotfix
2024-11-13 10:14:27 +08:00
zhulinJulia24
a9d6b6461f
[ci] react daily test (#1668)
* updaste

* update

* update

* update

* update

* update

* update

* update

* update

* update

* updaste

* update

* update

* refactor summarize

* update

* update

* update

* update

* update

* updaste

* update

* update

* update

* update

* updaste

* update

* update

* update

* update

* update

* updaste

* updaste

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update daily-run-test.yml

* Update daily-run-test.yml

* update

* update

* update

* update

* update

* Update daily-run-test.yml

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update daily-run-test.yml

* Update daily-run-test.yml

* update

* update

* Update daily-run-test.yml

* update

* update

* update

---------

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-11-12 18:40:27 +08:00
sobeit
3ec178f4a9
add single lora adapter support for vLLM inference. (#1679) 2024-11-12 17:31:36 +08:00
bittersweet1999
17b5e52f6c
[Hotfix] lmdeploy temp (#1674)
* fix pip version

* fix pip version

* hotfix
2024-11-12 16:10:16 +08:00
Linchen Xiao
a0ef2fd3b4
[Update] Dingo Dataset update (#1670)
* [Update] Dingo Dataset update

* update
2024-11-08 14:38:43 +08:00
Linchen Xiao
835bf75a36
[Feature] Add long context evaluation for base models (#1666)
* [Update] Add base long context evaluation

* update
2024-11-08 10:53:29 +08:00
Chang Cheng
fd7aa83c01
[Update] Update DLC Runner(#1662)
* push interntrain hard code

* push interntrain hard code

* remove redundant post process

---------

Co-authored-by: changcheng <changcheng@pjlab.org.cb>
Co-authored-by: changcheng <changcheng@pjlab.org.cn>
2024-11-07 15:45:35 +08:00
Linchen Xiao
db258eb7d5
[Bump] Bump version to v0.3.5 (#1657) 2024-11-03 21:23:35 +08:00
Lyu Han
888f1f3bef
[Fix] Update loglikehood compatibility (#1659) 2024-11-02 17:19:11 +08:00
liushz
f7d899823c
[Update] Update mmmlu_lite dataload (#1658)
* update mmmlu_lite dataload from oss

* update mmmlu_lite dataload from oss
2024-11-01 17:32:29 +08:00
Songyang Zhang
c789ce5698
[Fix] the automatically download for several datasets (#1652)
* [Fix] the automatically download for several datasets

* Update

* Update

* Update CI
2024-11-01 15:57:18 +08:00
Linchen Xiao
695738a89b
[Update] Add lmdeploy DeepSeek configs (#1656)
* [Update] Add lmdeploy DeepSeek configs

* update max out length
2024-11-01 15:34:23 +08:00
bittersweet1999
a0853c939d
[Add] Add CompassArenaSubjectiveBench (#1645)
* fix pip version

* fix pip version

* add compassarenasubjectivebench

* add compassarenasubjectivebench

* add compassarenabench
2024-11-01 13:52:22 +08:00
Songyang Zhang
d611907d14
[Doc] Update Doc (#1655) 2024-10-31 18:08:09 +08:00
Linchen Xiao
5212ffe8e2
[Update] Add new model configs (#1653) 2024-10-30 17:24:53 +08:00
Linchen Xiao
df57c08ccf
[Feature] Update Models, Summarizers (#1600) 2024-10-29 18:37:15 +08:00
Linchen Xiao
d91d66792a
[Update] Update Needlebench OSS path (#1651) 2024-10-29 18:05:44 +08:00
Chang Lan
46affab882
[Fix] Fix ruler_16k_gen (#1643) 2024-10-29 17:58:43 +08:00
Linchen Xiao
8172af49bb
[Update] Update wildbench max_seq_len (#1648)
* [Update] Wildbench max_seq_len update

* [Update] Wildbench max_seq_len update
2024-10-29 13:21:31 +08:00
Junnan Liu
645c5f3b2c
[Datasets] Add datasets CMO&AIME (#1610)
* add datasets cmo&aime

* delete unused modules

* modify prompt

* update __init__

* update data load and add README

* update data load

* update performance

* update md5

* remove indents

* add indent

* fix log for debug mode
2024-10-28 18:08:02 +08:00
Linchen Xiao
9c39cb68d4
[Bump] Bump version to 0.3.4 (#1639) 2024-10-25 20:10:16 +08:00
Linchen Xiao
a61e8a0803
[Update] Internal humaneval add (#1641)
* [Update] internal_humaneval_add

* update
2024-10-25 19:08:42 +08:00
Songyang Zhang
84be90669b
[Update] Fix issue of *_param.py, avoid name conflict;add keep_tmp_file flag to support keep the temp config file. (#1640) 2024-10-25 16:39:25 +08:00
BigDong
2542bc6907
[Feature] Support results saving as md format table (#1638) 2024-10-25 15:50:33 +08:00
Linchen Xiao
22fdea4bf2
[Update] Update DLC runner (#1637) 2024-10-24 21:36:16 +08:00