Mo Li
acae560911
Added support for multi-needle testing in needle-in-a-haystack test ( #802 )
...
* Add NeedleInAHaystack Test
* Apply pre-commit formatting
* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* add needle in haystack test
* update needle in haystack test
* update plot function in tools_needleinahaystack.py
* optimizing needleinahaystack dataset generation strategy
* modify minor formatting issues
* add English version support
* change NeedleInAHaystackDataset to dynamic loading
* change NeedleInAHaystackDataset to dynamic loading
* fix needleinahaystack test eval bug
* fix needleinahaystack config bug
* Added support for multi-needle testing in needle-in-a-haystack test
* Optimize the code for plotting in the needle-in-a-haystack test.
* Correct the typo in the dataset parameters.
* update needleinahaystack test docs
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-17 13:47:34 +08:00
bittersweet1999
814b3f73bd
reorganize subject files ( #801 )
2024-01-16 18:03:11 +08:00
bittersweet1999
83d6c48378
[Feature] Add configs for creationbench ( #791 )
...
* add creationv2_zh
* add creationv2_zh
* add eng config for creationbench
* add eng config for creationbench
* add eng config for creationbench
2024-01-12 14:20:21 +08:00
Xiaoming Shi
ad872a5dc2
[Feature] Update MedBench ( #779 )
...
* update medbench
* medbench update
* format medbench
* format
* Update
* update
* update
* update suffix
---------
Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-01-09 11:42:44 +08:00
Fengzhe Zhou
32f40a8f83
[Sync] Sync with internal codes 2023.01.08 ( #777 )
2024-01-08 14:07:24 +00:00
liyucheng09
0b2863039e
[Feature] Contamination analysis for MMLU, Hellaswag, and ARC_c ( #699 )
...
* Contamination analysis for ARC_c, mmlu, and Hellaswag
* update `eval_contamination.py`
* update `contamination.py` summarizer
* fix `eval_contamination.py`
* add mmlu groups for contamination analysis
2024-01-08 15:51:48 +08:00
Connor-Shen
30a90d8dd8
Support Mbpp_plus dataset ( #770 )
...
* support mbpp+
* support mbpp+
* minor fix
* [Feat] minor fix
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2024-01-05 22:01:57 +08:00
bittersweet1999
2163f9398f
[Feature] add subject ir dataset ( #755 )
...
* add subject ir
* Add ir dataset
* Add ir dataset
2024-01-05 12:00:57 +00:00
bittersweet1999
be369c3e06
[Feature] Add multi_round dataset evaluation ( #766 )
...
* multi_round dataset
* add multi_round evaluation
2024-01-04 10:37:52 +00:00
Mo Li
33f8df1ca3
[Update] Change NeedleInAHaystackDataset to dynamic dataset loading ( #754 )
...
* Add NeedleInAHaystack Test
* Apply pre-commit formatting
* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* add needle in haystack test
* update needle in haystack test
* update plot function in tools_needleinahaystack.py
* optimizing needleinahaystack dataset generation strategy
* modify minor formatting issues
* add English version support
* change NeedleInAHaystackDataset to dynamic loading
* change NeedleInAHaystackDataset to dynamic loading
* fix needleinahaystack test eval bug
* fix needleinahaystack config bug
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-02 17:22:56 +08:00
Francis-llgg
b69fe2343b
[Feature] Add GPQA Dataset ( #729 )
...
* check
* message
* add
* change prompt
* change a para nameq
* modify name of the file
* delete an useless file
2024-01-01 15:54:40 +08:00
Francis-llgg
ef3ae63539
[Feature] Add new dataset mastermath2024v1 ( #744 )
...
* add new dataset mastermath2024v1
* change it to simplified chinese prompt
* change file name
2024-01-01 15:53:24 +08:00
Mo Li
17b8e929dd
[Feature] Update plot function in tools_needleinahaystack.py ( #747 )
...
* Add NeedleInAHaystack Test
* Apply pre-commit formatting
* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* add needle in haystack test
* update needle in haystack test
* update plot function in tools_needleinahaystack.py
* optimizing needleinahaystack dataset generation strategy
* modify minor formatting issues
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-29 18:51:09 +08:00
bittersweet1999
fe0b717033
add creationbench ( #753 )
2023-12-29 10:03:44 +00:00
Connor-Shen
81098722d2
add chinese version of humaneval, mbpp ( #743 )
...
* add chinese_version of humaneval,mbpp
* add humaneval&mbpp gen.py
* minor fix
* minor add
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-12-28 14:47:56 +08:00
Hubert
0a525985e8
[Feature] Support sanitized MBPP dataset ( #745 )
2023-12-27 22:17:23 +08:00
bittersweet1999
dfd9ac0fd9
[Feature] Add other judgelm prompts for Alignbench ( #731 )
...
* add judgellm prompts
* add judgelm prompts
* update import info
* fix situation that no abbr in config
* fix situation that no abbr in config
* add summarizer for other judgellm
* change config name
* add maxlen
* add maxlen
* dict assert
* dict assert
* fix strings
* fix strings
2023-12-27 17:54:53 +08:00
philipwangOvO
34561ececb
[Feature] Add InfiniteBench ( #739 )
...
* add InfiniteBench
* add InfiniteBench
---------
Co-authored-by: wangchonghua <wangchonghua@pjlab.org.cn>
2023-12-26 15:36:27 +08:00
Fengzhe Zhou
3a68083ecc
[Sync] update configs ( #734 )
2023-12-25 21:59:16 +08:00
Mo Li
0e24f4213e
[Feature] Add NeedleInAHaystack Test Support ( #714 )
...
* Add NeedleInAHaystack Test
* Apply pre-commit formatting
* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* add needle in haystack test
* update needle in haystack test
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-23 12:00:51 +08:00
Skyfall-xzz
b35d991786
[Feature] Add ReasonBench(Internal) dataset ( #577 )
...
* [Feature] Add reasonbench dataset
* add configs for supporting generative inference & merge datasets in the same category
* modify config filename to prompt version
* fix codes to meet pre-commit requirements
* lint the code to meet pre-commit requirements
* Align Load_data Sourcecode Briefly
* fix bugs
* reduce code redundancy
2023-12-20 17:57:42 +08:00
Jingming
76a95e9e81
[Feature] Support the use of humaneval_plus. ( #720 )
...
* [Feature] Support the use of humaneval_plus.
* [Feature] Add humaneval_plus_gen.py
* minor check
* [Fix] Fix bug
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-12-20 17:25:17 +08:00
DseidLi
db2920326a
[Fix] remove redundant in gsm8k.py ( #700 )
...
Removed redundant code in GSM8KDataset.load method.
2023-12-14 19:55:58 +08:00
bittersweet1999
1fe152b3e8
[Feature] Support AlignmentBench infer and judge ( #697 )
...
* alignmentbench infer and judge
* alignmentbench
* alignmentbench done
* alignment all done
* alignment all done
2023-12-13 19:59:30 +08:00
bittersweet1999
465308e430
[Feature] Add Subjective Evaluation ( #680 )
...
* new version of subject
* fixed draw
* fixed draw
* fixed draw
* done
* done
* done
* done
* fixed lint
2023-12-11 22:22:11 +08:00
Hubert
e78857ac36
[Sync] minor test ( #683 )
2023-12-11 17:42:53 +08:00
Jingming
dd4318f6ab
[Feature] enhance the ability of humaneval_postprocess ( #676 )
...
* [Feature] enhance the ability of humaneval_postprocess
* refactor
* [Feature] Keep the old version of the function and realize the new function in humaneval_postprocess_v2.
* Update opencompass/datasets/humaneval.py
---------
Co-authored-by: Leymore <zfz-960727@163.com>
Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>
2023-12-11 14:39:56 +08:00
Xiaoming Shi
1bf85949ef
[Feature] Add medbench ( #678 )
...
* update medbench
* medbench update
* format medbench
* format
---------
Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-09 16:05:46 +08:00
liyucheng09
05bbce8b08
[Feature] Add Data Contamination Analysis ( #639 )
...
* add contamination analysis to ceval
* fix bugs
* add contamination docs
* to pass CI check
* update
---------
Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-08 10:00:11 +08:00
bittersweet1999
1c95790fdd
New subjective judgement ( #660 )
...
* TabMWP
* TabMWP
* fixed
* fixed
* fixed
* done
* done
* done
* add new subjective judgement
* add new subjective judgement
* add new subjective judgement
* add new subjective judgement
* add new subjective judgement
* modified to a more general way
* modified to a more general way
* final
* final
* add summarizer
* add new summarize
* fixed
* fixed
* fixed
---------
Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>
2023-12-06 13:28:33 +08:00
rolellm
e10f1c9139
added rolebench dataset. ( #633 )
...
* added rolebench
* 修改了不合理的变量名
* 修改了评论中的变量名
2023-12-01 22:54:42 +08:00
Hubert
9eb5cadcac
[Feat] update gsm8k and math agent config ( #652 )
...
* [Feat] update gsm8k and math agent config
* minor fix
2023-12-01 15:08:38 +08:00
liushz
a331c9abfd
[Feature] Add wikibench dataset ( #655 )
...
* Add WikiBench
* Add WikiBench
* format
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-01 14:56:54 +08:00
liushz
e019c831fe
[Feature] Add Chinese version: commonsenseqa, crowspairs and nq ( #144 )
...
* add Chinese version: csqa crowspairs nq
* Update cn_data
* Update cn_data
* update format
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-30 15:33:02 +08:00
Ma Zerun
6aaf3b91ec
[Feature] Support chat style inferencer. ( #643 )
...
* [Feature] Support chat style inferencer.
* [Fix] use new prompt
* [Fix] use new prompt
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-11-30 14:00:06 +08:00
liushz
6d0d78986c
[Feature] Add GSM_Hard dataset ( #619 )
...
* Add SVAMP dataset
* Add SVAMP dataset
* Add SVAMP dataset
* Add gsm_hard dataset
* Add gsm_hard dataset
* format
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-27 17:40:34 +08:00
Fengzhe Zhou
9083dea683
[Sync] some renaming ( #641 )
2023-11-27 16:06:49 +08:00
Fengzhe Zhou
d949e3c003
[Feature] Add circular eval ( #610 )
...
* refactor default, add circular summarizer
* add circular
* update impl
* update doc
* minor update
* no more to be added
2023-11-23 16:45:47 +08:00
Fengzhe Zhou
d4d1330a5a
[Sync] Fix cmnli, fix vicuna meta template, fix longbench postprocess and other minor fixes ( #625 )
2023-11-23 14:05:59 +08:00
liushz
048775192b
[Feature] Add SVAMP dataset ( #604 )
...
* Add SVAMP dataset
* Add SVAMP dataset
* Add SVAMP dataset
2023-11-22 14:54:39 +08:00
liushz
dbacd36379
Add aritch to mathbench ( #607 )
2023-11-20 19:40:41 +08:00
liushz
c9c5c5d92e
Mathbench update postprocess ( #600 )
...
* Update mathbench
* Update mathbench
2023-11-20 16:48:55 +08:00
Hubert
91fba2c2e9
[Feat] support humaneval and mbpp pass@k ( #598 )
...
* [Feat] support pass@ k
* [Feat] support pass@k
* [Feat] support pass@k
* [Feat] support pass@k
* [Feat] support pass@k
* [Feat] support pass@k docs
* update naming
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-16 21:22:06 +08:00
Raymond Zhang
c0acd06b05
[Feature] Add FinanceIQ dataset ( #596 )
2023-11-16 17:47:57 +08:00
Fengzhe Zhou
19ad7f9613
fix cmb dataset ( #587 )
2023-11-14 16:13:39 +08:00
Wei Jueqi
14e6fe6f13
Fix bugs in subjective evaluation ( #589 )
...
* rename
* fix sub bugs and update docs
* update
* update
2023-11-14 16:11:55 +08:00
Fengzhe Zhou
d3de5c41fb
[Sync] update model configs ( #574 )
2023-11-13 15:15:34 +08:00
Fengzhe Zhou
689ffe5b63
[Feature] Use dataset in local path ( #570 )
...
* update commonsenseqa
* update drop
* update flores_first100
* update gsm8k
* update humaneval
* update lambda
* update obqa
* update piqa
* update race
* update siqa
* update story_cloze
* update strategyqa
* update tydiqa
* update winogrande
* update doc
* update hellaswag
* fix obqa
* update collections
* update .zip name
2023-11-13 13:00:37 +08:00
Fengzhe Zhou
d6aaac22e7
[Feature] Update cmb ( #571 )
2023-11-13 00:09:05 +08:00
jingmingzhuo
b3cbef3226
[Feature] Add py150 and maxmin ( #562 )
...
* [feat] add clozeTesst_maxmin dataset
* [feat] add py150 datasets
* [feat] change __init__.py in opencompass/datasets
* [fix] pre-commit check
* [fix] rename py150 and masxmin datasets in configs
* [feat] add gen.py of py150 and maxmin in configs/datasets
2023-11-09 22:05:25 +08:00