Commit Graph

894 Commits

Author SHA1 Message Date
bittersweet1999
08d63b5bf3
[Fix] Fix error in subjective default summarizer (#1740)
* fix pip version

* fix pip version

* fix summarizer bug
2024-12-06 11:03:53 +08:00
Songyang Zhang
fb43dd1906
[Update] Update Skywork/Qwen-QwQ (#1728)
* Update JuderBench

* Support O1-style Prompts

* Update Code

* Update OpenAI

* Update BigCodeBench

* Update BigCodeBench

* Update BigCodeBench

* Update BigCodeBench

* Update BigCodeBench

* Update
2024-12-05 19:30:43 +08:00
Junnan Liu
6181ac1122
[Update] Update LiveMathBench Evaluation to Support Single Dataset Split Metric Computation (#1730)
* upload dataset definitions & configs

* add single dataset split specific metrics

* add k-pass@threshold & MATH500
2024-12-05 16:54:16 +08:00
Linchen Xiao
4f317d1bd5
[Update] Update Manifest (#1738) 2024-12-05 13:59:56 +08:00
Linchen Xiao
ac23f0ce1f
[Update] Update init file for Korbench (#1737) 2024-12-05 11:26:00 +08:00
Yufeng Zhao
4d773904d4
[Update] Korbench readme supplementation (#1734)
* renewed

* readme

---------

Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>
2024-12-05 11:24:35 +08:00
Linchen Xiao
a011be6798
[Feature] DLC runner Lark report (#1735)
* [Bump] Bump version to 0.3.7

* DLC lark report update
2024-12-04 18:03:12 +08:00
Linchen Xiao
e2a290fd46
[Bump] Bump version to 0.3.7 (#1733) 2024-12-03 19:34:57 +08:00
Yufeng Zhao
98c4666d65
[Update] Update Korbench dataset abbr (#1729)
Co-authored-by: yufeng zhao <zhaoyufeng@pjlab.org.cn>
2024-12-02 16:20:58 +08:00
Linchen Xiao
9de27b4d85
[Update] Update max_out_len for datasets (#1726)
* [Update] Update max_out_len for datasets

* Update eval_regression_chat_objective_fullbench.py

* Update eval_regression_chat.py

* Update eval_regression_chat.py

* Update oc_score_baseline_fullbench.yaml

---------

Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
2024-12-02 11:42:07 +08:00
Junnan Liu
fe6d76fb13
[Feature] Support LiveMathBench (#1727) 2024-11-30 00:07:19 +08:00
liushz
b063779034
[Fix] Update P-MMEVAL OSS data (#1722)
* Update with PMMEval

* Update

* Update __init__.py

* Fix Bugs

* Delete .pre-commit-config.yaml

* Pull merge

* Fix pmmeval_gen config

* Update P-MMEVAL data

---------

Co-authored-by: wanyu <wanyu2018umac@gmail.com>
Co-authored-by: wanyu2018umac <42405907+wanyu2018umac@users.noreply.github.com>
2024-11-28 20:55:46 +08:00
liushz
c437135fad
[Feature] Add Openai Simpleqa dataset (#1720)
* Add Openai SimpleQA dataset

* Add Openai SimpleQA dataset

* Add Openai SimpleQA dataset

* Update eval_simpleqa.py

---------

Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-11-28 19:16:07 +08:00
liushz
06ab27861e
[Fix] Fix pmmeval_gen config (#1719)
* Update with PMMEval

* Update

* Update __init__.py

* Fix Bugs

* Delete .pre-commit-config.yaml

* Pull merge

* Fix pmmeval_gen config

---------

Co-authored-by: wanyu <wanyu2018umac@gmail.com>
Co-authored-by: wanyu2018umac <42405907+wanyu2018umac@users.noreply.github.com>
2024-11-28 11:53:36 +08:00
wanyu2018umac
90efcf2216
[Feature] Add P-MMEval (#1714)
* Update with PMMEval

* Update

* Update __init__.py

* Fix Bugs

* Delete .pre-commit-config.yaml

* Pull merge

---------

Co-authored-by: liushz <qq1791167085@163.com>
2024-11-27 21:26:18 +08:00
Junnan Liu
f7dbe6bb7d
[Feature] Add Arc Prize Public Evaluation (#1690)
* support arc prize

* update arc-prize dataset info & update arc-prize evaluation performance
2024-11-27 15:44:41 +08:00
Yi Ding
bcb707dbfc
[Fix] Fix BailingAPI model (#1707)
* [fix] sequence under the multiple samples

* resolve the lint problems

* change the parameter name

* add another error code for retry

* output the log for invalid response

* format correction

* update

* update

* update

* update

* add two model python files

* update the default parameter

* use random for delay

* update the api example of bailing

* remove the unnecessary parameter
2024-11-26 19:24:47 +08:00
Linchen Xiao
ef695e28e5
[Bug] Fix Korbench dataset module (#1717) 2024-11-26 17:13:28 +08:00
Songyang Zhang
f97c4eae42
[Update] Update Fullbench (#1712)
* Update JuderBench

* Support O1-style Prompts

* Update Code
2024-11-26 14:26:55 +08:00
Yufeng Zhao
300adc31e8
[Feature] Add Korbench dataset (#1713)
* first version for korbench

* first stage for korbench

* korbench_1

* korbench_1

* korbench_1

* korbench_1

* korbench_1_revised

* korbench_combined_1

* korbench_combined_1

* kor_combined

* kor_combined

* update

---------

Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2024-11-25 20:11:27 +08:00
Chang Lan
5c1916ea4c
[Update] Add RULER 64k config (#1709) 2024-11-25 19:35:27 +08:00
liushz
e49fcfd3a3
[Update] Update MATH dataset with model judge (#1711)
* Update math with llm judge

* Update math with llm judge

* Update math with llm judge

* Update math with llm judge

* Update math with llm judge
2024-11-25 15:14:55 +08:00
Linchen Xiao
80e3b9ef37
[Update] Add math prm 800k (#1708) 2024-11-21 21:29:43 +08:00
Linchen Xiao
500fb1032a
[Update] Update configurations (#1704) 2024-11-21 16:51:18 +08:00
zhulinJulia24
ed81f9df30
[CI] update torch version and add more datasets into daily testcase (#1701)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-11-21 10:37:33 +08:00
Yi Ding
05044dfaf2
[Update] Support new error code for Bailing model (#1702)
* support new error code

* fix the lint problems
2024-11-20 16:40:22 +08:00
Linchen Xiao
ff831b153e
[BUMP] Bump version to 0.3.6 (#1694) 2024-11-18 20:24:50 +08:00
Linchen Xiao
ab8fdbbaab
[Update] Update Math auto-download data (#1700) 2024-11-18 20:24:35 +08:00
Linchen Xiao
98242ff1d1
[Update] first_option_postprocess (#1699)
* update first_option_postprocess

* update
2024-11-18 20:14:29 +08:00
Linchen Xiao
4653f6976e
[Update] update volc CPU flavor (#1698) 2024-11-18 12:33:51 +08:00
zhulinJulia24
4a20e1176d
[CI] Update baselines (#1693)
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-11-15 14:46:29 +08:00
Linchen Xiao
40a9f0be0d
[Update] MUSR dataset config prefix update (#1692) 2024-11-15 11:06:30 +08:00
abrohamLee
e9e4b69ddb
[Feature] MuSR Datset Evaluation (#1689)
* MuSR Datset Evaluation

* MuSR Datset Evaluation

Add an assertion and a Readme.md
2024-11-14 20:42:12 +08:00
Linchen Xiao
d415439f9b
[Fix] Fix bug for first_option_postprocess (#1688) 2024-11-14 16:45:59 +08:00
Linchen Xiao
e92a5d4230
[Feature] BABILong Dataset added (#1684)
* update

* update

* update

* update
2024-11-14 15:32:43 +08:00
Linchen Xiao
2fee63f537
[Update] Auto-download for followbench (#1685) 2024-11-13 15:47:29 +08:00
zhulinJulia24
f8a1c1f487
[CI] update (#1682)
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-11-13 10:48:05 +08:00
bittersweet1999
aca8ec3c6a
[Hotfix] Hotfix (#1683)
* fix pip version

* fix pip version

* fix lint

* hotfix
2024-11-13 10:14:27 +08:00
zhulinJulia24
a9d6b6461f
[ci] react daily test (#1668)
* updaste

* update

* update

* update

* update

* update

* update

* update

* update

* update

* updaste

* update

* update

* refactor summarize

* update

* update

* update

* update

* update

* updaste

* update

* update

* update

* update

* updaste

* update

* update

* update

* update

* update

* updaste

* updaste

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update daily-run-test.yml

* Update daily-run-test.yml

* update

* update

* update

* update

* update

* Update daily-run-test.yml

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update daily-run-test.yml

* Update daily-run-test.yml

* update

* update

* Update daily-run-test.yml

* update

* update

* update

---------

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-11-12 18:40:27 +08:00
sobeit
3ec178f4a9
add single lora adapter support for vLLM inference. (#1679) 2024-11-12 17:31:36 +08:00
bittersweet1999
17b5e52f6c
[Hotfix] lmdeploy temp (#1674)
* fix pip version

* fix pip version

* hotfix
2024-11-12 16:10:16 +08:00
Linchen Xiao
a0ef2fd3b4
[Update] Dingo Dataset update (#1670)
* [Update] Dingo Dataset update

* update
2024-11-08 14:38:43 +08:00
Linchen Xiao
835bf75a36
[Feature] Add long context evaluation for base models (#1666)
* [Update] Add base long context evaluation

* update
2024-11-08 10:53:29 +08:00
Chang Cheng
fd7aa83c01
[Update] Update DLC Runner(#1662)
* push interntrain hard code

* push interntrain hard code

* remove redundant post process

---------

Co-authored-by: changcheng <changcheng@pjlab.org.cb>
Co-authored-by: changcheng <changcheng@pjlab.org.cn>
2024-11-07 15:45:35 +08:00
Linchen Xiao
db258eb7d5
[Bump] Bump version to v0.3.5 (#1657) 2024-11-03 21:23:35 +08:00
Lyu Han
888f1f3bef
[Fix] Update loglikehood compatibility (#1659) 2024-11-02 17:19:11 +08:00
liushz
f7d899823c
[Update] Update mmmlu_lite dataload (#1658)
* update mmmlu_lite dataload from oss

* update mmmlu_lite dataload from oss
2024-11-01 17:32:29 +08:00
Songyang Zhang
c789ce5698
[Fix] the automatically download for several datasets (#1652)
* [Fix] the automatically download for several datasets

* Update

* Update

* Update CI
2024-11-01 15:57:18 +08:00
Linchen Xiao
695738a89b
[Update] Add lmdeploy DeepSeek configs (#1656)
* [Update] Add lmdeploy DeepSeek configs

* update max out length
2024-11-01 15:34:23 +08:00
bittersweet1999
a0853c939d
[Add] Add CompassArenaSubjectiveBench (#1645)
* fix pip version

* fix pip version

* add compassarenasubjectivebench

* add compassarenasubjectivebench

* add compassarenabench
2024-11-01 13:52:22 +08:00