Commit Graph

953 Commits

Author SHA1 Message Date
Linchen Xiao
a0ef2fd3b4
[Update] Dingo Dataset update (#1670)
* [Update] Dingo Dataset update

* update
2024-11-08 14:38:43 +08:00
Linchen Xiao
835bf75a36
[Feature] Add long context evaluation for base models (#1666)
* [Update] Add base long context evaluation

* update
2024-11-08 10:53:29 +08:00
Chang Cheng
fd7aa83c01
[Update] Update DLC Runner(#1662)
* push interntrain hard code

* push interntrain hard code

* remove redundant post process

---------

Co-authored-by: changcheng <changcheng@pjlab.org.cb>
Co-authored-by: changcheng <changcheng@pjlab.org.cn>
2024-11-07 15:45:35 +08:00
Linchen Xiao
db258eb7d5
[Bump] Bump version to v0.3.5 (#1657) 2024-11-03 21:23:35 +08:00
Lyu Han
888f1f3bef
[Fix] Update loglikehood compatibility (#1659) 2024-11-02 17:19:11 +08:00
liushz
f7d899823c
[Update] Update mmmlu_lite dataload (#1658)
* update mmmlu_lite dataload from oss

* update mmmlu_lite dataload from oss
2024-11-01 17:32:29 +08:00
Songyang Zhang
c789ce5698
[Fix] the automatically download for several datasets (#1652)
* [Fix] the automatically download for several datasets

* Update

* Update

* Update CI
2024-11-01 15:57:18 +08:00
Linchen Xiao
695738a89b
[Update] Add lmdeploy DeepSeek configs (#1656)
* [Update] Add lmdeploy DeepSeek configs

* update max out length
2024-11-01 15:34:23 +08:00
bittersweet1999
a0853c939d
[Add] Add CompassArenaSubjectiveBench (#1645)
* fix pip version

* fix pip version

* add compassarenasubjectivebench

* add compassarenasubjectivebench

* add compassarenabench
2024-11-01 13:52:22 +08:00
Songyang Zhang
d611907d14
[Doc] Update Doc (#1655) 2024-10-31 18:08:09 +08:00
Linchen Xiao
5212ffe8e2
[Update] Add new model configs (#1653) 2024-10-30 17:24:53 +08:00
Linchen Xiao
df57c08ccf
[Feature] Update Models, Summarizers (#1600) 2024-10-29 18:37:15 +08:00
Linchen Xiao
d91d66792a
[Update] Update Needlebench OSS path (#1651) 2024-10-29 18:05:44 +08:00
Chang Lan
46affab882
[Fix] Fix ruler_16k_gen (#1643) 2024-10-29 17:58:43 +08:00
Linchen Xiao
8172af49bb
[Update] Update wildbench max_seq_len (#1648)
* [Update] Wildbench max_seq_len update

* [Update] Wildbench max_seq_len update
2024-10-29 13:21:31 +08:00
Junnan Liu
645c5f3b2c
[Datasets] Add datasets CMO&AIME (#1610)
* add datasets cmo&aime

* delete unused modules

* modify prompt

* update __init__

* update data load and add README

* update data load

* update performance

* update md5

* remove indents

* add indent

* fix log for debug mode
2024-10-28 18:08:02 +08:00
Linchen Xiao
9c39cb68d4
[Bump] Bump version to 0.3.4 (#1639) 2024-10-25 20:10:16 +08:00
Linchen Xiao
a61e8a0803
[Update] Internal humaneval add (#1641)
* [Update] internal_humaneval_add

* update
2024-10-25 19:08:42 +08:00
Songyang Zhang
84be90669b
[Update] Fix issue of *_param.py, avoid name conflict;add keep_tmp_file flag to support keep the temp config file. (#1640) 2024-10-25 16:39:25 +08:00
BigDong
2542bc6907
[Feature] Support results saving as md format table (#1638) 2024-10-25 15:50:33 +08:00
Linchen Xiao
22fdea4bf2
[Update] Update DLC runner (#1637) 2024-10-24 21:36:16 +08:00
Lyu Han
fb12c3f98a
[Update] strip stop_words (#1635) 2024-10-24 20:39:20 +08:00
Linchen Xiao
662dddf41a
[Update] Add internal humaneval postprocess (#1636) 2024-10-24 17:45:21 +08:00
Linchen Xiao
be3c06a158
[Fix] Update common summarizer regex extraction (#1631) 2024-10-22 14:35:45 +08:00
Chang Lan
a927bba1cf
[Fix] Fix RULER datasets (#1628)
We need to ensure that we don't import anything that ends with "_datasets",
or they will be picked up by the runner, leading to duplicate / unwanted datasets
being evaluated.
2024-10-22 11:59:02 +08:00
Songyang Zhang
a4d5a6c81b
[Feature] Support LiveCodeBench (#1617)
* Update

* Update LCB

* Update

* Update

* Update

* Update

* Update
2024-10-21 20:50:39 +08:00
Chenguang Li
5868d5afa4
[Bug] Fix-NPU-Support (#1618)
* bugfix NPU support

* formatting

---------

Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-10-21 17:42:53 +08:00
liushz
500b44ba2d
[Fix] gpqa_few_shot_ppl prompt bug (#1627) 2024-10-21 16:59:06 +08:00
Linchen Xiao
096c347e7d
[Fix] Qwen 2.5 model config (#1626)
* [Fix] Fix Qwen 2.5 model config

* [Fix] Fix Qwen 2.5 model config

* [Fix] Fix Qwen 2.5 model config
2024-10-21 16:58:18 +08:00
bittersweet1999
1188e1ecf0
[Update] eval_judgerbench.py (#1625) 2024-10-21 15:30:29 +08:00
zhulinJulia24
825d3388d5
[CI] Test PR staging fixed (#1624)
* Update oc_score_baseline.yaml

* Update runtime.txt
2024-10-21 11:02:37 +08:00
bittersweet1999
a11e2b2fd4
[Fix] Compatible with old versions (#1616)
* fix pip version

* fix pip version

* Compatible with old versions

* compati old version

* compati old version

* compati old version

* update configs
2024-10-21 10:16:29 +08:00
Lyu Han
6e8adf5221
[Bug] Remove prefix bos_token from messages when using lmdeploy as the accelerator (#1623)
* remove prefix bos_token from messages when using lmdeploy as the accelerator

* update
2024-10-19 20:03:47 +08:00
zhulinJulia24
b89c7b2fc3
[CI] Update daily-run-test.yml (#1620) 2024-10-18 18:30:35 +08:00
Bob Tsang
dd0b655bd0
[Feature] Support MMMLU & MMMLU-lite Benchmark (#1565)
* rm folder

* modify format according to reviewer

* modify format according to reviewer

* modify format according to reviewer

* add some files requirement

* fix some bug

* fix bug

* change load type

* Update MMMLU Dataset

* Update MMMLU Dataset

* Add MMMLU-Lite Dataset

* update MMMMLU datast

* update MMMMLU datast

* update MMMMLU datast

---------

Co-authored-by: BobTsang <BobTsang1995@gmail.com>
Co-authored-by: liushz <qq1791167085@163.com>
2024-10-17 19:09:34 +08:00
bittersweet1999
f0d436496e
[Update] update docs and add compassarena (#1614)
* fix pip version

* fix pip version

* update docs and add compassarena

* update docs
2024-10-17 14:39:06 +08:00
Haoran Que
4fe251729b
Upload HelloBench (#1607)
* upload hellobench

* update hellobench

* update readme.md

* update eval_hellobench.py

* update lastest

---------

Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-10-15 17:11:37 +08:00
bittersweet1999
fa54aa62f6
[Feature] Add Judgerbench and reorg subeval (#1593)
* fix pip version

* fix pip version

* update (#1522)

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>

* [Feature] Update Models (#1518)

* Update Models

* Update

* Update humanevalx

* Update

* Update

* [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527)

add judgerbench and reorg sub

add judgerbench and reorg subeval

add judgerbench and reorg subeval

* add judgerbench and reorg subeval

* add judgerbench and reorg subeval

* add judgerbench and reorg subeval

* add judgerbench and reorg subeval

---------

Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-10-15 16:36:05 +08:00
x54-729
2b1afa7d1e
[Fix] fix interntrain's tokenizer truncate (#1605)
Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>
2024-10-15 16:03:57 +08:00
zhulinJulia24
8aba547e06
[ci] fix stable issue of daily test (#1602)
* update

* update

* update

* Update daily-run-test.yml

* update

* Update daily-run-test.yml

* update

* update

* update

* Update pr-run-test.yml

* Update pr-run-test.yml

* update

* update

* Update daily-run-test.yml

* update

* update

* update

* update

* Update daily-run-test.yml

* Update daily-run-test.yml

* updaste

---------

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-10-15 10:14:49 +08:00
Linchen Xiao
f390697a5e
[Fix] Update dlc runner python env (#1604) 2024-10-14 15:50:21 +08:00
Lyu Han
4fde41036f
[Feature] Update TurboMindModel by integrating lmdeploy pipeline API (#1556)
* integrate lmdeploy's pipeline api

* fix linting

* update user guide

* rename

* update

* update

* update

* rollback class name

* update

* remove unused code

* update

* update

* use pipeline

* fix ci check

* compatibility

* compatibility

* remove concurrency

* update

* fix table content

* update
2024-10-14 15:33:40 +08:00
liushz
5faee929db
[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config (#1589)
* Add GaoKaoMath Dataset

* Add MATH LLM Eval

* Update GAOKAO Math Eval Dataset

* Update GAOKAO Math Eval Dataset
2024-10-12 19:13:06 +08:00
Linchen Xiao
69997f11f8
[Feature] Update requirements.txt (#1601)
* update crb

* update crbbench

* update crbbench

* update crbbench

* minor update wildbench

* [Fix] Update doc of wildbench, and merge wildbench into subjective

* [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench

* Update crb.md

* Update crb_pair_judge.py

* Update crb_single_judge.py

* Update subjective_evaluation.md

* Update openai_api.py

* [Update] update wildbench readme

* [Update] update wildbench readme

* [Update] update wildbench readme, remove crb

* Delete configs/eval_subjective_wildbench_pair.py

* Delete configs/eval_subjective_wildbench_single.py

* Update __init__.py

* [Fix] fix version mismatch for CIBench

* [Fix] fix version mismatch for CIBench, local runer

* [Fix] fix version mismatch for CIBench, local runer, remove oracle mode

* BUG: Update cibench.py

* BUG: Update cibench.py

* [Bug] Update agent.txt

* update agent

* Update agent.txt

* update readme

* update

---------

Co-authored-by: kleinzcy <zhangchy2@shanghaitech.edu.cn>
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-10-12 18:26:57 +08:00
bittersweet1999
3f7a3730d7
[Fix] fix Flames (#1599)
* fix pip version

* fix pip version

* fix flames

* fix flames
2024-10-12 14:34:59 +08:00
Lyu Han
b52ba65c26
[Feature] Integrate lmdeploy pipeline api (#1198)
* integrate lmdeploy's pipeline api

* fix linting

* update user guide

* rename

* update

* update

* update

* rollback class name

* update

* remove unused code

* update

* update

* fix ci check

* compatibility

* remove concurrency

* Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py

* Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md

* [Bug] fix lint

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-10-09 22:58:06 +08:00
Songyang Zhang
d2ab51abbd
[Bug] Fix pre-commit hook (#1592) 2024-10-09 17:09:48 +08:00
x54-729
4d6349dfe1
[FIX] fix interntrain get_loglikelihood (#1584) 2024-10-08 11:34:04 +08:00
zhulinJulia24
89abcba486
[CI] Fix testcase failure (#1582)
* update

* Update oc_score_baseline.yaml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

---------

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-10-02 12:30:38 +08:00
Linchen Xiao
22a4e76511
[BUMP] Bump version to 0.3.3 (#1581) 2024-09-30 16:57:41 +08:00