Connor-Shen
444d8d9507
[feat] support multipl-e ( #846 )
...
* [feat] support humaneval_multipl-e
* format
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-02-06 23:30:28 +08:00
Yggdrasill7D6
a6c49f15ce
fix lawbench 2-1 f0.5 score calculation bug ( #795 )
...
* fix lawbench 2-1 f0.5 score calculation bug
* use path in overall datasets folder
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-02-06 22:20:11 +08:00
bittersweet1999
1c8e193de8
[Fix] hotfix for mtbench ( #877 )
...
* hotfix for mtbench
* hotfix
2024-02-06 21:26:47 +08:00
Fengzhe Zhou
d34ba11106
[Sync] Merge branch 'dev' into zfz/update-keyset-demo ( #876 )
2024-02-05 23:29:10 +08:00
bittersweet1999
32b5948f4e
[Fix] add do sample demo for subjective dataset ( #873 )
...
* add do sample demo for subjective dataset
* fix strings
* format
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-02-05 15:55:58 +08:00
Skyfall-xzz
7ad1168062
Support NPHardEval ( #835 )
...
* support NPHardEval
* add .md file and fix minor bugs
* refactor and minor fix
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-02-05 15:52:28 +08:00
zhulinJulia24
b4a9acd7be
Update daily test ( #871 )
...
* add daily test case
* Update pr-run-test.yml
* Update daily-run-test.yml
* Update daily-run-test.yml
* Update pr-run-test.yml
* Update daily-run-test.yml
* Update oc_score_assert.py
* Update daily-run-test.yml
* Update daily-run-test.yml
* Update daily-run-test.yml
* update testcase baseline
* fix test case name
* add more models into daily test
---------
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-02-05 15:52:00 +08:00
Fengzhe Zhou
fc84aff963
[CI] Update github workflow cuda image ( #874 )
...
* update workflow
* another trial
* another trial
* another trial
2024-02-05 15:22:59 +08:00
Yuchen Yan
fed7d800c6
[Fix] Fix error in gsm8k evaluator ( #782 )
...
Co-authored-by: jiangjin1999 <1261842974@qq.com>
2024-02-04 22:55:11 +08:00
bittersweet1999
7806cd0f64
[Feature] support alpacaeval ( #809 )
...
* support alpacaeval_v1
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/alpacaeval_v1.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix conflict
* support alpacaeval v2
* support alpacav2
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-02-04 14:18:36 +08:00
zhulinJulia24
0919b08ec8
[Feature] Add daily test case ( #864 )
...
* add daily test case
* Update pr-run-test.yml
* Update daily-run-test.yml
* Update daily-run-test.yml
* Update pr-run-test.yml
---------
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-02-02 12:03:05 +08:00
RunningLeon
4c87e777d8
[Feature] Add end_str for turbomind ( #859 )
...
* fix
* update
* fix internlm1
* fix docs
* remove sys
2024-02-01 22:31:14 +08:00
bittersweet1999
5c6dc908cd
fix compass arena ( #854 )
2024-01-30 16:34:38 +08:00
Guo Qipeng
4f78388c71
Update runtime.txt to fix rouge_chinese bugs. ( #803 )
...
* Update runtime.txt to fix rouge_chinese bugs.
the wheel file of rouge_chinese will overwrite the rouge package, causing bugs. Replacing it to the github code, which is the correct version.
* fix PEP format issues
* fix PEP format issues
* enable pip install
---------
Co-authored-by: 郭琦鹏 <guoqipeng@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-01-29 19:18:22 +08:00
del-zhenwu
e8067ac456
Create link-check.yml ( #853 )
...
* Create link-check.yml
* Update link-check.yml
2024-01-29 19:16:52 +08:00
Songyang Zhang
cdca59ff49
[Fix] Update Zhipu API and Fix issue min_out_len issue of API models ( #847 )
...
* Update zhipu api and fix min_out_len issue of API class
* Update example
* Update example
2024-01-28 14:52:43 +08:00
Jingming
2801883351
[Fix] Fix acc of IFEval ( #849 )
...
* [Feature] Add IFEval
* [Fix] Changing the Score Rule.
2024-01-27 22:27:07 +08:00
Xiaoming Shi
35aace776a
[Fix] Update MedBench ( #845 )
2024-01-26 17:56:13 +08:00
Songyang Zhang
8ed022b4c4
Update Sensetime API ( #844 )
2024-01-26 16:40:49 +08:00
Hubert
4aa74565e2
[Feat] minor update agent related ( #839 )
...
* [Feat] update cibench
* [Feat] Support CIBench
* [Feat] Support CIBench
* [Feat] Support CIBench
* [Feat] Support CIBench
2024-01-26 14:15:51 +08:00
bittersweet1999
77be07dbb5
[Fix] fix corev2 ( #838 )
...
* fix corev2
* fix corev2
2024-01-24 18:15:29 +08:00
Fengzhe Zhou
0991dd33a0
[Sync] Updata dataset cfg for internMath ( #837 )
...
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-01-24 16:30:32 +08:00
zhulinJulia24
f7d7837ac0
add fail notify ( #836 )
2024-01-24 14:26:30 +08:00
Fengzhe Zhou
f367551668
update doc ( #830 )
2024-01-24 13:39:28 +08:00
Songyang Zhang
793e32c9cc
[Feature] Update API implementation ( #834 )
2024-01-24 13:35:21 +08:00
bittersweet1999
2ee8e8a1a1
[Feature] add mtbench ( #829 )
...
* add mtbench
* add mtbench
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/mtbench.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix mtbench
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-24 12:11:47 +08:00
Jingming
e059a5c2bf
[Feature] Add IFEval ( #813 )
...
* [Feature] Add IFEval
* [Doc] add introduction of IFEval
2024-01-23 20:07:49 +08:00
bittersweet1999
3d9bb4aed7
[Fix] fix strings ( #833 )
...
* add compass arena
* add compass_arena
* add compass arena
* Update opencompass/summarizers/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix check position bias
* fix string
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-23 10:57:26 +00:00
bittersweet1999
2d4da8dd02
[Feature] Add CompassArena ( #828 )
...
* add compass arena
* add compass_arena
* add compass arena
* Update opencompass/summarizers/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix check position bias
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-23 15:12:46 +08:00
RangiLyu
40a2441deb
Update hf_internlm2_chat template ( #823 )
...
* Update hf_internlm2_chat template
* Update 20B
2024-01-19 18:21:47 +08:00
Guo Qipeng
e975a96fa1
Update cdme config and evaluator ( #812 )
...
* update cdme config and evaluator
* fix cdme prompt
* move CDME trim post-processor as a separate evaluator
---------
Co-authored-by: 郭琦鹏 <guoqipeng@pjlab.org.cn>
2024-01-19 11:29:27 +08:00
Yang Yong
f09a2ff418
Add LightllmApi KeyError log & Update doc ( #816 )
...
* Add LightllmApi KeyError log
* Update LightllmApi doc
2024-01-18 22:23:38 +08:00
zhulinJulia24
8b5c467cc5
Test runner update - split step, change schedule time and disable hf cache ( #814 )
...
* Update pr-run-test.yml
* Update pr-run-test.yml
* Update pr-run-test.yml
* split step and change order, change schedule time and disable hf cache
2024-01-18 21:04:41 +08:00
Mo Li
dcc32ed856
[Fix] Update yi 200k config ( #815 )
2024-01-18 20:54:24 +08:00
RunningLeon
61fe873c89
[Fix] Fix turbomind and update docs ( #808 )
...
* update
* update docs
* add engine_config and gen_config in eval_config
* update
* fix
* fix
* fix
* fix docstr
* fix url
2024-01-18 14:41:35 +08:00
Fengzhe Zhou
9e5746d3d8
[Doc] Update News ( #810 )
2024-01-17 18:22:12 +08:00
Fengzhe Zhou
b4afe3e7c1
[Sync] Add InternLM2 Keyset Evaluation Demo ( #807 )
...
Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
2024-01-17 13:48:12 +08:00
Mo Li
acae560911
Added support for multi-needle testing in needle-in-a-haystack test ( #802 )
...
* Add NeedleInAHaystack Test
* Apply pre-commit formatting
* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* add needle in haystack test
* update needle in haystack test
* update plot function in tools_needleinahaystack.py
* optimizing needleinahaystack dataset generation strategy
* modify minor formatting issues
* add English version support
* change NeedleInAHaystackDataset to dynamic loading
* change NeedleInAHaystackDataset to dynamic loading
* fix needleinahaystack test eval bug
* fix needleinahaystack config bug
* Added support for multi-needle testing in needle-in-a-haystack test
* Optimize the code for plotting in the needle-in-a-haystack test.
* Correct the typo in the dataset parameters.
* update needleinahaystack test docs
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-17 13:47:34 +08:00
RunningLeon
0836aec67b
[Feature] Update evaluate turbomind ( #804 )
...
* update
* fix
* fix
* fix
2024-01-17 11:09:50 +08:00
bittersweet1999
814b3f73bd
reorganize subject files ( #801 )
2024-01-16 18:03:11 +08:00
zhulinJulia24
2cd091647c
Add test runner, one case, daily and pr trigger ( #751 )
...
* init test yaml
* add simple pr
* update
* update
* change name
* Update pr-run-test.yml
* Update pr-run-test.yml
---------
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-01-16 11:59:22 +08:00
bittersweet1999
83d6c48378
[Feature] Add configs for creationbench ( #791 )
...
* add creationv2_zh
* add creationv2_zh
* add eng config for creationbench
* add eng config for creationbench
* add eng config for creationbench
2024-01-12 14:20:21 +08:00
Hubert
d0dc3534e5
[Fix] hot fix for requirements ( #789 )
2024-01-11 15:48:32 +08:00
Songyang Zhang
467ad0ac21
Update gsm8k agent prompt ( #788 )
2024-01-11 14:07:36 +08:00
notoschord
d3a0ddc3ef
[Feature] Add support for Nanbeige API ( #786 )
...
Co-authored-by: notoschord <wangzekai@kanzhun.com>
2024-01-11 13:54:27 +08:00
bittersweet1999
5679edb490
add temperature in alles ( #787 )
2024-01-11 03:57:24 +00:00
Xiaoming Shi
ad872a5dc2
[Feature] Update MedBench ( #779 )
...
* update medbench
* medbench update
* format medbench
* format
* Update
* update
* update
* update suffix
---------
Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-01-09 11:42:44 +08:00
Fengzhe Zhou
a74e4c1a8d
[Sync] Bump version to 0.2.1 ( #778 )
2024-01-08 14:56:28 +00:00
Fengzhe Zhou
32f40a8f83
[Sync] Sync with internal codes 2023.01.08 ( #777 )
2024-01-08 14:07:24 +00:00
jiangjin1999
8194199d79
[Feature] *_batch_generate* function, add the MultiTokenEOSCriteria ( #772 )
...
* jiangjin1999: in the _batch_generate function, add the MultiTokenEOSCriteria feature to speed up inference.
* jiangjin1999: in the _batch_generate function, add the MultiTokenEOSCriteria feature to speed up inference.
---------
Co-authored-by: jiangjin08 <jiangjin08@MBP-2F32S5MD6P-0029.local>
Co-authored-by: jiangjin08 <jiangjin08@a.sh.vip.dianping.com>
2024-01-08 16:40:02 +08:00