Fengzhe Zhou
32f40a8f83
[Sync] Sync with internal codes 2023.01.08 ( #777 )
2024-01-08 14:07:24 +00:00
jiangjin1999
8194199d79
[Feature] *_batch_generate* function, add the MultiTokenEOSCriteria ( #772 )
...
* jiangjin1999: in the _batch_generate function, add the MultiTokenEOSCriteria feature to speed up inference.
* jiangjin1999: in the _batch_generate function, add the MultiTokenEOSCriteria feature to speed up inference.
---------
Co-authored-by: jiangjin08 <jiangjin08@MBP-2F32S5MD6P-0029.local>
Co-authored-by: jiangjin08 <jiangjin08@a.sh.vip.dianping.com>
2024-01-08 16:40:02 +08:00
Fengzhe Zhou
f78fcf6eeb
[Docs] Update contamination docs ( #775 )
2024-01-08 16:37:28 +08:00
liyucheng09
0b2863039e
[Feature] Contamination analysis for MMLU, Hellaswag, and ARC_c ( #699 )
...
* Contamination analysis for ARC_c, mmlu, and Hellaswag
* update `eval_contamination.py`
* update `contamination.py` summarizer
* fix `eval_contamination.py`
* add mmlu groups for contamination analysis
2024-01-08 15:51:48 +08:00
tpoisonooo
ba1b684fec
typo(installation.md): fix unzip commands ( #774 )
...
* Update installation.md
* Update installation.md
2024-01-08 14:23:35 +08:00
Yuchen Yan
11f3b91e78
[Fix] fix typos in drop prompt ( #773 )
...
Co-authored-by: yanyuchen04 <yanyuchen04@meituan.com>
2024-01-08 14:22:35 +08:00
Connor-Shen
30a90d8dd8
Support Mbpp_plus dataset ( #770 )
...
* support mbpp+
* support mbpp+
* minor fix
* [Feat] minor fix
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2024-01-05 22:01:57 +08:00
bittersweet1999
3c606cb712
quick fix for postprocess pred extraction ( #771 )
2024-01-05 21:10:18 +08:00
Songyang Zhang
0c75f0f95a
[Update] Update introduction of CompassBench-2024-Q1 ( #769 )
...
* [Doc] Update Example of CompassBench
* [Doc] Update Example of CompassBench
* [Doc] Update Example of CompassBench
* update
* Update docs/zh_cn/advanced_guides/compassbench_intro.md
Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
---------
Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
2024-01-05 20:39:36 +08:00
bittersweet1999
2163f9398f
[Feature] add subject ir dataset ( #755 )
...
* add subject ir
* Add ir dataset
* Add ir dataset
2024-01-05 12:00:57 +00:00
bittersweet1999
be369c3e06
[Feature] Add multi_round dataset evaluation ( #766 )
...
* multi_round dataset
* add multi_round evaluation
2024-01-04 10:37:52 +00:00
bittersweet1999
7cd65d49d8
[Fix] Fix small bug in alignbench ( #764 )
...
* fix small bugs
* fix small bugs
2024-01-03 07:44:53 +00:00
Chris Liu
3eb225a5e6
[Feature] Support LLaMA2-Accessory ( #732 )
...
* Support LLaMA2-Accessory
* remove strip
* clear imports
* reformat
* fix lint
* fix lint
* update readme
* update readme
* update readme
* update readme
2024-01-02 20:48:51 +08:00
HUANG Fei
ba027eeeac
[Feature] Add support of qwen api ( #735 )
2024-01-02 20:47:12 +08:00
Mo Li
33f8df1ca3
[Update] Change NeedleInAHaystackDataset to dynamic dataset loading ( #754 )
...
* Add NeedleInAHaystack Test
* Apply pre-commit formatting
* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* add needle in haystack test
* update needle in haystack test
* update plot function in tools_needleinahaystack.py
* optimizing needleinahaystack dataset generation strategy
* modify minor formatting issues
* add English version support
* change NeedleInAHaystackDataset to dynamic loading
* change NeedleInAHaystackDataset to dynamic loading
* fix needleinahaystack test eval bug
* fix needleinahaystack config bug
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-02 17:22:56 +08:00
Francis-llgg
b69fe2343b
[Feature] Add GPQA Dataset ( #729 )
...
* check
* message
* add
* change prompt
* change a para nameq
* modify name of the file
* delete an useless file
2024-01-01 15:54:40 +08:00
Francis-llgg
ef3ae63539
[Feature] Add new dataset mastermath2024v1 ( #744 )
...
* add new dataset mastermath2024v1
* change it to simplified chinese prompt
* change file name
2024-01-01 15:53:24 +08:00
Mo Li
17b8e929dd
[Feature] Update plot function in tools_needleinahaystack.py ( #747 )
...
* Add NeedleInAHaystack Test
* Apply pre-commit formatting
* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* add needle in haystack test
* update needle in haystack test
* update plot function in tools_needleinahaystack.py
* optimizing needleinahaystack dataset generation strategy
* modify minor formatting issues
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-29 18:51:09 +08:00
Hubert
327951087f
[Feat] update code config ( #749 )
...
* [Feat] update code dataset
* [Feat] update code dataset
* [Feat] update code dataset
2023-12-29 18:46:34 +08:00
bittersweet1999
fe0b717033
add creationbench ( #753 )
2023-12-29 10:03:44 +00:00
bittersweet1999
8728287a55
fix erro in configs ( #750 )
2023-12-28 11:53:07 +00:00
Connor-Shen
81098722d2
add chinese version of humaneval, mbpp ( #743 )
...
* add chinese_version of humaneval,mbpp
* add humaneval&mbpp gen.py
* minor fix
* minor add
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-12-28 14:47:56 +08:00
bittersweet1999
db919f0191
[Fix] SubSizePartition fix ( #746 )
...
* fix subjective_eval
* subject_eval partition situation fixed
* subject_eval partition situation fixed
2023-12-28 11:46:46 +08:00
Hubert
0a525985e8
[Feature] Support sanitized MBPP dataset ( #745 )
2023-12-27 22:17:23 +08:00
bittersweet1999
dfd9ac0fd9
[Feature] Add other judgelm prompts for Alignbench ( #731 )
...
* add judgellm prompts
* add judgelm prompts
* update import info
* fix situation that no abbr in config
* fix situation that no abbr in config
* add summarizer for other judgellm
* change config name
* add maxlen
* add maxlen
* dict assert
* dict assert
* fix strings
* fix strings
2023-12-27 17:54:53 +08:00
Yang Yong
54345c56b7
Update LightllmApi and Fix mmlu bug ( #738 )
...
* Update LightllmApi and Fix mmlu bug
* checkout mmlu_gen_a484b3.py
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-27 13:49:08 +08:00
philipwangOvO
34561ececb
[Feature] Add InfiniteBench ( #739 )
...
* add InfiniteBench
* add InfiniteBench
---------
Co-authored-by: wangchonghua <wangchonghua@pjlab.org.cn>
2023-12-26 15:36:27 +08:00
Fengzhe Zhou
3a68083ecc
[Sync] update configs ( #734 )
2023-12-25 21:59:16 +08:00
Songyang Zhang
ad96f2156f
Update merge script ( #733 )
2023-12-25 16:45:22 +08:00
AllentDan
336d8d76ff
add turbomind restful api support ( #693 )
...
* add turbomind restful api support
* config
* top_p 0.8
* top_k = 1
2023-12-24 01:40:00 +08:00
bittersweet1999
e985100cd1
[Fix] Fix subjective alignbench ( #730 )
2023-12-23 20:06:53 +08:00
Mo Li
0e24f4213e
[Feature] Add NeedleInAHaystack Test Support ( #714 )
...
* Add NeedleInAHaystack Test
* Apply pre-commit formatting
* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* add needle in haystack test
* update needle in haystack test
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-23 12:00:51 +08:00
loveSnowBest
4a2d1926a2
[News] add news for T-Eval ( #727 )
...
* add news for teval
* update
* update doc for cz&en
2023-12-22 19:58:24 +08:00
RunningLeon
e34c552282
[Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend ( #721 )
...
* add llama2 test
* fix
* test qwen chat-7b
* test w4
* add baichuan2
* update
* update
* update configs and docs
* update
2023-12-21 18:22:17 +08:00
bittersweet1999
fbb912ddf3
[Feature] Add abbr for judgemodel in subjective evaluation ( #724 )
...
* add_judgemodel_abbr
* add judgemodel abbr
2023-12-21 15:58:20 +08:00
Skyfall-xzz
b35d991786
[Feature] Add ReasonBench(Internal) dataset ( #577 )
...
* [Feature] Add reasonbench dataset
* add configs for supporting generative inference & merge datasets in the same category
* modify config filename to prompt version
* fix codes to meet pre-commit requirements
* lint the code to meet pre-commit requirements
* Align Load_data Sourcecode Briefly
* fix bugs
* reduce code redundancy
2023-12-20 17:57:42 +08:00
Jingming
76a95e9e81
[Feature] Support the use of humaneval_plus. ( #720 )
...
* [Feature] Support the use of humaneval_plus.
* [Feature] Add humaneval_plus_gen.py
* minor check
* [Fix] Fix bug
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-12-20 17:25:17 +08:00
bittersweet1999
47e745d748
quick fix for maxoutlen ( #719 )
2023-12-20 00:00:28 +08:00
Hubert
fdf18a3238
[Docs] Update Docker docs ( #718 )
...
* [Docs] update docker docs
* [Docs] update docker docs
2023-12-19 23:29:43 +08:00
Hubert
5e8b838f51
[Feat] Update math/agent ( #716 )
...
* minor add
* minor add
* minor fix
2023-12-19 21:20:42 +08:00
bittersweet1999
97c2068bd9
[Feature] Add JudgeLLMs ( #710 )
...
* add judgellms
* add judgellms
* add sub_size_partition
* add docs
* add ref
2023-12-19 18:40:25 +08:00
Hubert
eda72e756e
[Fix] minor fix openai ( #711 )
2023-12-18 15:45:31 +08:00
Songyang Zhang
637628a70f
[Doc] Update Doc for Alignbench ( #707 )
...
* update alignmentbench
* update alignmentbench
* update doc
* update
* update
2023-12-15 15:07:25 +08:00
Jingming
d7e7a637a5
[Fix] fix a bug on configs/eval_mixtral_8x7b.py ( #706 )
2023-12-15 14:15:32 +08:00
DseidLi
db2920326a
[Fix] remove redundant in gsm8k.py ( #700 )
...
Removed redundant code in GSM8KDataset.load method.
2023-12-14 19:55:58 +08:00
Songyang Zhang
bfe4aa2af5
[Fix] Update alignmentbench ( #704 )
...
* update alignmentbench
* update alignmentbench
* update alignmentbench
2023-12-14 18:24:21 +08:00
bittersweet1999
1fe152b3e8
[Feature] Support AlignmentBench infer and judge ( #697 )
...
* alignmentbench infer and judge
* alignmentbench
* alignmentbench done
* alignment all done
* alignment all done
2023-12-13 19:59:30 +08:00
Fengzhe Zhou
cadab9474f
[Doc] Update contamination docs ( #698 )
...
* update contamination docs
* add citation
* Update contamination_eval.md
* Update contamination_eval.md
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-13 18:03:39 +08:00
Hubert
a94598d921
[Feat] update python action and slurm ( #694 )
2023-12-13 10:41:10 +08:00
bittersweet1999
6130394165
[Feature] Add double order of subjective evaluation and removing duplicated response among two models ( #692 )
...
* add features
* add doc string
* add doc string
2023-12-12 20:58:17 +08:00