Linchen Xiao
9c39cb68d4
[Bump] Bump version to 0.3.4 ( #1639 )
2024-10-25 20:10:16 +08:00
Linchen Xiao
a61e8a0803
[Update] Internal humaneval add ( #1641 )
...
* [Update] internal_humaneval_add
* update
2024-10-25 19:08:42 +08:00
Songyang Zhang
84be90669b
[Update] Fix issue of *_param.py, avoid name conflict;add keep_tmp_file flag to support keep the temp config file. ( #1640 )
2024-10-25 16:39:25 +08:00
BigDong
2542bc6907
[Feature] Support results saving as md format table ( #1638 )
2024-10-25 15:50:33 +08:00
Linchen Xiao
22fdea4bf2
[Update] Update DLC runner ( #1637 )
2024-10-24 21:36:16 +08:00
Lyu Han
fb12c3f98a
[Update] strip stop_words ( #1635 )
2024-10-24 20:39:20 +08:00
Linchen Xiao
662dddf41a
[Update] Add internal humaneval postprocess ( #1636 )
2024-10-24 17:45:21 +08:00
Linchen Xiao
be3c06a158
[Fix] Update common summarizer regex extraction ( #1631 )
2024-10-22 14:35:45 +08:00
Chang Lan
a927bba1cf
[Fix] Fix RULER datasets ( #1628 )
...
We need to ensure that we don't import anything that ends with "_datasets",
or they will be picked up by the runner, leading to duplicate / unwanted datasets
being evaluated.
2024-10-22 11:59:02 +08:00
Songyang Zhang
a4d5a6c81b
[Feature] Support LiveCodeBench ( #1617 )
...
* Update
* Update LCB
* Update
* Update
* Update
* Update
* Update
2024-10-21 20:50:39 +08:00
Chenguang Li
5868d5afa4
[Bug] Fix-NPU-Support ( #1618 )
...
* bugfix NPU support
* formatting
---------
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-10-21 17:42:53 +08:00
liushz
500b44ba2d
[Fix] gpqa_few_shot_ppl prompt bug ( #1627 )
2024-10-21 16:59:06 +08:00
Linchen Xiao
096c347e7d
[Fix] Qwen 2.5 model config ( #1626 )
...
* [Fix] Fix Qwen 2.5 model config
* [Fix] Fix Qwen 2.5 model config
* [Fix] Fix Qwen 2.5 model config
2024-10-21 16:58:18 +08:00
bittersweet1999
1188e1ecf0
[Update] eval_judgerbench.py ( #1625 )
2024-10-21 15:30:29 +08:00
zhulinJulia24
825d3388d5
[CI] Test PR staging fixed ( #1624 )
...
* Update oc_score_baseline.yaml
* Update runtime.txt
2024-10-21 11:02:37 +08:00
bittersweet1999
a11e2b2fd4
[Fix] Compatible with old versions ( #1616 )
...
* fix pip version
* fix pip version
* Compatible with old versions
* compati old version
* compati old version
* compati old version
* update configs
2024-10-21 10:16:29 +08:00
Lyu Han
6e8adf5221
[Bug] Remove prefix bos_token from messages when using lmdeploy as the accelerator ( #1623 )
...
* remove prefix bos_token from messages when using lmdeploy as the accelerator
* update
2024-10-19 20:03:47 +08:00
zhulinJulia24
b89c7b2fc3
[CI] Update daily-run-test.yml ( #1620 )
2024-10-18 18:30:35 +08:00
Bob Tsang
dd0b655bd0
[Feature] Support MMMLU & MMMLU-lite Benchmark ( #1565 )
...
* rm folder
* modify format according to reviewer
* modify format according to reviewer
* modify format according to reviewer
* add some files requirement
* fix some bug
* fix bug
* change load type
* Update MMMLU Dataset
* Update MMMLU Dataset
* Add MMMLU-Lite Dataset
* update MMMMLU datast
* update MMMMLU datast
* update MMMMLU datast
---------
Co-authored-by: BobTsang <BobTsang1995@gmail.com>
Co-authored-by: liushz <qq1791167085@163.com>
2024-10-17 19:09:34 +08:00
bittersweet1999
f0d436496e
[Update] update docs and add compassarena ( #1614 )
...
* fix pip version
* fix pip version
* update docs and add compassarena
* update docs
2024-10-17 14:39:06 +08:00
Haoran Que
4fe251729b
Upload HelloBench ( #1607 )
...
* upload hellobench
* update hellobench
* update readme.md
* update eval_hellobench.py
* update lastest
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-10-15 17:11:37 +08:00
bittersweet1999
fa54aa62f6
[Feature] Add Judgerbench and reorg subeval ( #1593 )
...
* fix pip version
* fix pip version
* update (#1522 )
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
* [Feature] Update Models (#1518 )
* Update Models
* Update
* Update humanevalx
* Update
* Update
* [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527 )
add judgerbench and reorg sub
add judgerbench and reorg subeval
add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
---------
Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-10-15 16:36:05 +08:00
x54-729
2b1afa7d1e
[Fix] fix interntrain's tokenizer truncate ( #1605 )
...
Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>
2024-10-15 16:03:57 +08:00
zhulinJulia24
8aba547e06
[ci] fix stable issue of daily test ( #1602 )
...
* update
* update
* update
* Update daily-run-test.yml
* update
* Update daily-run-test.yml
* update
* update
* update
* Update pr-run-test.yml
* Update pr-run-test.yml
* update
* update
* Update daily-run-test.yml
* update
* update
* update
* update
* Update daily-run-test.yml
* Update daily-run-test.yml
* updaste
---------
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-10-15 10:14:49 +08:00
Linchen Xiao
f390697a5e
[Fix] Update dlc runner python env ( #1604 )
2024-10-14 15:50:21 +08:00
Lyu Han
4fde41036f
[Feature] Update TurboMindModel by integrating lmdeploy pipeline API ( #1556 )
...
* integrate lmdeploy's pipeline api
* fix linting
* update user guide
* rename
* update
* update
* update
* rollback class name
* update
* remove unused code
* update
* update
* use pipeline
* fix ci check
* compatibility
* compatibility
* remove concurrency
* update
* fix table content
* update
2024-10-14 15:33:40 +08:00
liushz
5faee929db
[Feature] Add GaoKaoMath Dataset for Evaluation & MATH Model Eval Config ( #1589 )
...
* Add GaoKaoMath Dataset
* Add MATH LLM Eval
* Update GAOKAO Math Eval Dataset
* Update GAOKAO Math Eval Dataset
2024-10-12 19:13:06 +08:00
Linchen Xiao
69997f11f8
[Feature] Update requirements.txt ( #1601 )
...
* update crb
* update crbbench
* update crbbench
* update crbbench
* minor update wildbench
* [Fix] Update doc of wildbench, and merge wildbench into subjective
* [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench
* Update crb.md
* Update crb_pair_judge.py
* Update crb_single_judge.py
* Update subjective_evaluation.md
* Update openai_api.py
* [Update] update wildbench readme
* [Update] update wildbench readme
* [Update] update wildbench readme, remove crb
* Delete configs/eval_subjective_wildbench_pair.py
* Delete configs/eval_subjective_wildbench_single.py
* Update __init__.py
* [Fix] fix version mismatch for CIBench
* [Fix] fix version mismatch for CIBench, local runer
* [Fix] fix version mismatch for CIBench, local runer, remove oracle mode
* BUG: Update cibench.py
* BUG: Update cibench.py
* [Bug] Update agent.txt
* update agent
* Update agent.txt
* update readme
* update
---------
Co-authored-by: kleinzcy <zhangchy2@shanghaitech.edu.cn>
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-10-12 18:26:57 +08:00
bittersweet1999
3f7a3730d7
[Fix] fix Flames ( #1599 )
...
* fix pip version
* fix pip version
* fix flames
* fix flames
2024-10-12 14:34:59 +08:00
Lyu Han
b52ba65c26
[Feature] Integrate lmdeploy pipeline api ( #1198 )
...
* integrate lmdeploy's pipeline api
* fix linting
* update user guide
* rename
* update
* update
* update
* rollback class name
* update
* remove unused code
* update
* update
* fix ci check
* compatibility
* remove concurrency
* Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py
* Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md
* [Bug] fix lint
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-10-09 22:58:06 +08:00
Songyang Zhang
d2ab51abbd
[Bug] Fix pre-commit hook ( #1592 )
2024-10-09 17:09:48 +08:00
x54-729
4d6349dfe1
[FIX] fix interntrain get_loglikelihood ( #1584 )
2024-10-08 11:34:04 +08:00
zhulinJulia24
89abcba486
[CI] Fix testcase failure ( #1582 )
...
* update
* Update oc_score_baseline.yaml
* Update daily-run-test.yml
* Update daily-run-test.yml
* Update daily-run-test.yml
* Update daily-run-test.yml
---------
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-10-02 12:30:38 +08:00
Linchen Xiao
22a4e76511
[BUMP] Bump version to 0.3.3 ( #1581 )
2024-09-30 16:57:41 +08:00
x54-729
bbdca5eb4c
[BUG] Fix eos token handling and add comments for InternTrain ( #1569 )
...
Co-authored-by: x54-729 <xingshuhao.dispatch@pjlab.org.cn>
2024-09-30 15:46:06 +08:00
Linchen Xiao
763d7755b6
[BUG]GaokaoBench dataset fix ( #1583 )
2024-09-30 15:13:26 +08:00
shijinpjlab
7528b8ab8a
[Feature] Add dingo test ( #1529 )
...
* add qa dingo
* update
* change name qa to dingo
* eval model: llm_base
* update path
* change name and move path
* add eval_dingo
* update import
* add for pip
* add dingo package
* change import place
* update import place
* fix lint fail
* isort
* double quoted
---------
Co-authored-by: sj <shijin@pjlab.org.cn>
2024-09-29 19:24:58 +08:00
Yi Ding
85a28874aa
[BUG]: Fix Bailing API configs ( #1570 )
2024-09-27 11:56:57 +08:00
Songyang Zhang
e8437db98f
[Feature] Update BailingLM/OpenAI verbose ( #1568 )
...
* [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI
* Update
* [Feature] Update API
* Update
2024-09-27 11:15:25 +08:00
Songyang Zhang
7d50294117
[Feature] Update Bailing ( #1567 )
...
* [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI
* Update
* Update
* Update
2024-09-26 18:56:17 +08:00
Songyang Zhang
a7bacfdf7e
[Feature] Update CoreBench 2.0 ( #1566 )
...
* [Feature] 1. Update CoreBench Base\n 2. Fix lint issue in BalingAPI
* Update
* Update
2024-09-26 18:44:00 +08:00
Yi Ding
3f833186dc
[Feature] Support the reasoning from BaiLing LLM ( #1541 )
...
* [Feature] Support the reasoning from BaiLing LLM
This commit includes the access to BaiLing LLM and gets the reasoning.
* Add the api example
The example of evalute bailing api
* Revise the generation arguments
Based on current experiment, we update some generation arguments for better reasoning
* [fix] set the batch size
* Retry under flowcontrol of serverside
* add dependent package into requirement.txt
add dependent package retrying to clean up the pre-comment check.
* correct the file names and make the file copy
correct the file names.
copy the files under configs to opencompass
* fix the lint issue
---------
Co-authored-by: christopher.dy <christopher.dy@antgroup.com>
2024-09-26 16:49:52 +08:00
Linchen Xiao
80cda1980e
[BUG] fix followbench dataset config ( #1564 )
...
* [BUG] fix followbench dataset config
* [BUG] fix followbench dataset config
2024-09-25 20:58:34 +08:00
zhulinJulia24
aa43eaf267
[CI] add more models into testcase and test env of cu12 ( #1558 )
...
* update
* update
* Update pr-run-test.yml
* update
* update
* update
* update
* Update daily-run-test.yml
* update
* updaste
* update
* update
* update
* Update daily-run-test.yml
* update
* update
* Update daily-run-test.yml
* Update daily-run-test.yml
* update
* update
* update
* update
* update
* Update daily-run-test.yml
* update
---------
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-09-25 17:07:27 +08:00
zhulinJulia24
87df8a73a3
[CI] add a common summarizer for qabench summarizer ( #1545 )
...
* update
* update
* update
---------
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-09-25 13:40:47 +08:00
Linchen Xiao
c3fb9065db
[Feature] Add dlc sleep time ( #1562 )
2024-09-25 11:53:48 +08:00
Songyang Zhang
fe84bbd9a0
[Feature] Add Config for CoreBench ( #1547 )
...
* [Feature] Add Config for CoreBench
* Update
2024-09-25 11:36:43 +08:00
Chuanyang Jin
17eefc0e1e
[Fix] Correct typos ( #1561 )
2024-09-25 11:27:17 +08:00
liushz
83eeb52b09
[Feature] Update WikiBench base model config ( #1553 )
...
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update GPQA & MMLU_Pro
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update MathBench & WikiBench for FullBench
* Update MathBench & Math base config
* Update WikiBench base model config
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-25 11:26:36 +08:00
Songyang Zhang
e7681943f3
[Feature] Update the max_out_len for many models ( #1559 )
2024-09-24 21:52:28 +08:00