Commit Graph

379 Commits

Author SHA1 Message Date
RunningLeon
32ba0b074e
Support lmdeploy pytorch engine (#875)
* add lmdeploy pytorch model

* fix

* speed up encoding and decoding

* fix

* change tokenizer
2024-02-22 03:46:07 -03:00
Yang Yong
b6e21ece38
Support LightllmApi input_format (#888) 2024-02-19 10:02:59 +08:00
Fengzhe Zhou
08133e060a
[Sync] Bump version to 0.2.2 (#880) 2024-02-07 10:45:48 +08:00
hailsham
dd444685bb
fix bug of gsm8k_postprocess (#863)
* fix bug of gsm8k_postprocess

* update postprocess

---------

Co-authored-by: Lei Fei <SENSETIME\leifei1@cn3114002087l.domain.sensetime.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-02-06 23:52:47 +08:00
Connor-Shen
444d8d9507
[feat] support multipl-e (#846)
* [feat] support humaneval_multipl-e

* format

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-02-06 23:30:28 +08:00
Yggdrasill7D6
a6c49f15ce
fix lawbench 2-1 f0.5 score calculation bug (#795)
* fix lawbench 2-1 f0.5 score calculation bug

* use path in overall datasets folder

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-02-06 22:20:11 +08:00
bittersweet1999
1c8e193de8
[Fix] hotfix for mtbench (#877)
* hotfix for mtbench

* hotfix
2024-02-06 21:26:47 +08:00
Fengzhe Zhou
d34ba11106
[Sync] Merge branch 'dev' into zfz/update-keyset-demo (#876) 2024-02-05 23:29:10 +08:00
Skyfall-xzz
7ad1168062
Support NPHardEval (#835)
* support NPHardEval

* add .md file and fix minor bugs

* refactor and minor fix

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-02-05 15:52:28 +08:00
Yuchen Yan
fed7d800c6
[Fix] Fix error in gsm8k evaluator (#782)
Co-authored-by: jiangjin1999 <1261842974@qq.com>
2024-02-04 22:55:11 +08:00
bittersweet1999
7806cd0f64
[Feature] support alpacaeval (#809)
* support alpacaeval_v1

* Update opencompass/summarizers/subjective/__init__.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/summarizers/subjective/alpacaeval_v1.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* fix conflict

* support alpacaeval v2

* support alpacav2

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-02-04 14:18:36 +08:00
RunningLeon
4c87e777d8
[Feature] Add end_str for turbomind (#859)
* fix

* update

* fix internlm1

* fix docs

* remove sys
2024-02-01 22:31:14 +08:00
bittersweet1999
5c6dc908cd
fix compass arena (#854) 2024-01-30 16:34:38 +08:00
Songyang Zhang
cdca59ff49
[Fix] Update Zhipu API and Fix issue min_out_len issue of API models (#847)
* Update zhipu api and fix min_out_len issue of API class

* Update example

* Update example
2024-01-28 14:52:43 +08:00
Jingming
2801883351
[Fix] Fix acc of IFEval (#849)
* [Feature] Add IFEval

* [Fix] Changing the Score Rule.
2024-01-27 22:27:07 +08:00
Xiaoming Shi
35aace776a
[Fix] Update MedBench (#845) 2024-01-26 17:56:13 +08:00
Songyang Zhang
8ed022b4c4
Update Sensetime API (#844) 2024-01-26 16:40:49 +08:00
Hubert
4aa74565e2
[Feat] minor update agent related (#839)
* [Feat] update cibench

* [Feat] Support CIBench

* [Feat] Support CIBench

* [Feat] Support CIBench

* [Feat] Support CIBench
2024-01-26 14:15:51 +08:00
Fengzhe Zhou
0991dd33a0
[Sync] Updata dataset cfg for internMath (#837)
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-01-24 16:30:32 +08:00
Songyang Zhang
793e32c9cc
[Feature] Update API implementation (#834) 2024-01-24 13:35:21 +08:00
bittersweet1999
2ee8e8a1a1
[Feature] add mtbench (#829)
* add mtbench

* add mtbench

* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/datasets/subjective/__init__.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/datasets/subjective/mtbench.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* fix mtbench

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-24 12:11:47 +08:00
Jingming
e059a5c2bf
[Feature] Add IFEval (#813)
* [Feature] Add IFEval

* [Doc] add introduction of IFEval
2024-01-23 20:07:49 +08:00
bittersweet1999
3d9bb4aed7
[Fix] fix strings (#833)
* add compass arena

* add compass_arena

* add compass arena

* Update opencompass/summarizers/subjective/compass_arena.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/summarizers/subjective/__init__.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/datasets/subjective/compass_arena.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/datasets/subjective/__init__.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update configs/eval_subjective_compassarena.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update configs/datasets/subjective/compassarena/compassarena_compare.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update configs/eval_subjective_compassarena.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update configs/datasets/subjective/compassarena/compassarena_compare.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* fix check position bias

* fix string

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-23 10:57:26 +00:00
bittersweet1999
2d4da8dd02
[Feature] Add CompassArena (#828)
* add compass arena

* add compass_arena

* add compass arena

* Update opencompass/summarizers/subjective/compass_arena.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/summarizers/subjective/__init__.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/datasets/subjective/compass_arena.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/datasets/subjective/__init__.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update configs/eval_subjective_compassarena.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update configs/datasets/subjective/compassarena/compassarena_compare.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update configs/eval_subjective_compassarena.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update configs/datasets/subjective/compassarena/compassarena_compare.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* fix check position bias

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-23 15:12:46 +08:00
Guo Qipeng
e975a96fa1
Update cdme config and evaluator (#812)
* update cdme config and evaluator

* fix cdme prompt

* move CDME trim post-processor as a separate evaluator

---------

Co-authored-by: 郭琦鹏 <guoqipeng@pjlab.org.cn>
2024-01-19 11:29:27 +08:00
Yang Yong
f09a2ff418
Add LightllmApi KeyError log & Update doc (#816)
* Add LightllmApi KeyError log

* Update LightllmApi doc
2024-01-18 22:23:38 +08:00
RunningLeon
61fe873c89
[Fix] Fix turbomind and update docs (#808)
* update

* update docs

* add engine_config and gen_config in eval_config

* update

* fix

* fix

* fix

* fix docstr

* fix url
2024-01-18 14:41:35 +08:00
Fengzhe Zhou
b4afe3e7c1
[Sync] Add InternLM2 Keyset Evaluation Demo (#807)
Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
2024-01-17 13:48:12 +08:00
Mo Li
acae560911
Added support for multi-needle testing in needle-in-a-haystack test (#802)
* Add NeedleInAHaystack Test

* Apply pre-commit formatting

* Update configs/eval_hf_internlm_chat_20b_cdme.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* add needle in haystack test

* update needle in haystack test

* update plot function in tools_needleinahaystack.py

* optimizing needleinahaystack dataset generation strategy

* modify minor formatting issues

* add English version support

* change NeedleInAHaystackDataset to dynamic loading

* change NeedleInAHaystackDataset to dynamic loading

* fix needleinahaystack test eval bug

* fix needleinahaystack config bug

* Added support for multi-needle testing in needle-in-a-haystack test

* Optimize the code for plotting in the needle-in-a-haystack test.

* Correct the typo in the dataset parameters.

* update needleinahaystack test docs

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-17 13:47:34 +08:00
RunningLeon
0836aec67b
[Feature] Update evaluate turbomind (#804)
* update

* fix

* fix

* fix
2024-01-17 11:09:50 +08:00
bittersweet1999
814b3f73bd
reorganize subject files (#801) 2024-01-16 18:03:11 +08:00
bittersweet1999
83d6c48378
[Feature] Add configs for creationbench (#791)
* add creationv2_zh

* add creationv2_zh

* add eng config for creationbench

* add eng config for creationbench

* add eng config for creationbench
2024-01-12 14:20:21 +08:00
notoschord
d3a0ddc3ef
[Feature] Add support for Nanbeige API (#786)
Co-authored-by: notoschord <wangzekai@kanzhun.com>
2024-01-11 13:54:27 +08:00
bittersweet1999
5679edb490
add temperature in alles (#787) 2024-01-11 03:57:24 +00:00
Xiaoming Shi
ad872a5dc2
[Feature] Update MedBench (#779)
* update medbench

* medbench update

* format medbench

* format

* Update

* update

* update

* update suffix

---------

Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-01-09 11:42:44 +08:00
Fengzhe Zhou
a74e4c1a8d
[Sync] Bump version to 0.2.1 (#778) 2024-01-08 14:56:28 +00:00
Fengzhe Zhou
32f40a8f83
[Sync] Sync with internal codes 2023.01.08 (#777) 2024-01-08 14:07:24 +00:00
jiangjin1999
8194199d79
[Feature] *_batch_generate* function, add the MultiTokenEOSCriteria (#772)
* jiangjin1999: in the _batch_generate function, add the MultiTokenEOSCriteria feature to speed up inference.

* jiangjin1999: in the _batch_generate function, add the MultiTokenEOSCriteria feature to speed up inference.

---------

Co-authored-by: jiangjin08 <jiangjin08@MBP-2F32S5MD6P-0029.local>
Co-authored-by: jiangjin08 <jiangjin08@a.sh.vip.dianping.com>
2024-01-08 16:40:02 +08:00
liyucheng09
0b2863039e
[Feature] Contamination analysis for MMLU, Hellaswag, and ARC_c (#699)
* Contamination analysis for ARC_c, mmlu, and Hellaswag

* update `eval_contamination.py`

* update `contamination.py` summarizer

* fix `eval_contamination.py`

* add mmlu groups for contamination analysis
2024-01-08 15:51:48 +08:00
Connor-Shen
30a90d8dd8
Support Mbpp_plus dataset (#770)
* support mbpp+

* support mbpp+

* minor fix

* [Feat] minor fix

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
2024-01-05 22:01:57 +08:00
bittersweet1999
3c606cb712
quick fix for postprocess pred extraction (#771) 2024-01-05 21:10:18 +08:00
bittersweet1999
2163f9398f
[Feature] add subject ir dataset (#755)
* add subject ir

* Add ir dataset

* Add ir dataset
2024-01-05 12:00:57 +00:00
bittersweet1999
be369c3e06
[Feature] Add multi_round dataset evaluation (#766)
* multi_round dataset

* add multi_round evaluation
2024-01-04 10:37:52 +00:00
bittersweet1999
7cd65d49d8
[Fix] Fix small bug in alignbench (#764)
* fix small bugs

* fix small bugs
2024-01-03 07:44:53 +00:00
Chris Liu
3eb225a5e6
[Feature] Support LLaMA2-Accessory (#732)
* Support LLaMA2-Accessory

* remove strip

* clear imports

* reformat

* fix lint

* fix lint

* update readme

* update readme

* update readme

* update readme
2024-01-02 20:48:51 +08:00
HUANG Fei
ba027eeeac
[Feature] Add support of qwen api (#735) 2024-01-02 20:47:12 +08:00
Mo Li
33f8df1ca3
[Update] Change NeedleInAHaystackDataset to dynamic dataset loading (#754)
* Add NeedleInAHaystack Test

* Apply pre-commit formatting

* Update configs/eval_hf_internlm_chat_20b_cdme.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* add needle in haystack test

* update needle in haystack test

* update plot function in tools_needleinahaystack.py

* optimizing needleinahaystack dataset generation strategy

* modify minor formatting issues

* add English version support

* change NeedleInAHaystackDataset to dynamic loading

* change NeedleInAHaystackDataset to dynamic loading

* fix needleinahaystack test eval bug

* fix needleinahaystack config bug

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-02 17:22:56 +08:00
Francis-llgg
b69fe2343b
[Feature] Add GPQA Dataset (#729)
* check

* message

* add

* change prompt

* change a para nameq

* modify name of the file

* delete an useless file
2024-01-01 15:54:40 +08:00
Francis-llgg
ef3ae63539
[Feature] Add new dataset mastermath2024v1 (#744)
* add new dataset mastermath2024v1

* change it to simplified chinese prompt

* change file name
2024-01-01 15:53:24 +08:00
Mo Li
17b8e929dd
[Feature] Update plot function in tools_needleinahaystack.py (#747)
* Add NeedleInAHaystack Test

* Apply pre-commit formatting

* Update configs/eval_hf_internlm_chat_20b_cdme.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* add needle in haystack test

* update needle in haystack test

* update plot function in tools_needleinahaystack.py

* optimizing needleinahaystack dataset generation strategy

* modify minor formatting issues

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-29 18:51:09 +08:00
Hubert
327951087f
[Feat] update code config (#749)
* [Feat] update code dataset

* [Feat] update code dataset

* [Feat] update code dataset
2023-12-29 18:46:34 +08:00
bittersweet1999
fe0b717033
add creationbench (#753) 2023-12-29 10:03:44 +00:00
Connor-Shen
81098722d2
add chinese version of humaneval, mbpp (#743)
* add chinese_version of humaneval,mbpp

* add humaneval&mbpp gen.py

* minor fix

* minor add

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-12-28 14:47:56 +08:00
bittersweet1999
db919f0191
[Fix] SubSizePartition fix (#746)
* fix subjective_eval

* subject_eval partition situation fixed

* subject_eval partition situation fixed
2023-12-28 11:46:46 +08:00
Hubert
0a525985e8
[Feature] Support sanitized MBPP dataset (#745) 2023-12-27 22:17:23 +08:00
bittersweet1999
dfd9ac0fd9
[Feature] Add other judgelm prompts for Alignbench (#731)
* add judgellm prompts

* add judgelm prompts

* update import info

* fix situation that no abbr in config

* fix situation that no abbr in config

* add summarizer for other judgellm

* change config name

* add maxlen

* add maxlen

* dict assert

* dict assert

* fix strings

* fix strings
2023-12-27 17:54:53 +08:00
Yang Yong
54345c56b7
Update LightllmApi and Fix mmlu bug (#738)
* Update LightllmApi and Fix mmlu bug

* checkout mmlu_gen_a484b3.py

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-27 13:49:08 +08:00
philipwangOvO
34561ececb
[Feature] Add InfiniteBench (#739)
* add InfiniteBench

* add InfiniteBench

---------

Co-authored-by: wangchonghua <wangchonghua@pjlab.org.cn>
2023-12-26 15:36:27 +08:00
Fengzhe Zhou
3a68083ecc
[Sync] update configs (#734) 2023-12-25 21:59:16 +08:00
AllentDan
336d8d76ff
add turbomind restful api support (#693)
* add turbomind restful api support

* config

* top_p 0.8

* top_k = 1
2023-12-24 01:40:00 +08:00
bittersweet1999
e985100cd1
[Fix] Fix subjective alignbench (#730) 2023-12-23 20:06:53 +08:00
Mo Li
0e24f4213e
[Feature] Add NeedleInAHaystack Test Support (#714)
* Add NeedleInAHaystack Test

* Apply pre-commit formatting

* Update configs/eval_hf_internlm_chat_20b_cdme.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* add needle in haystack test

* update needle in haystack test

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-23 12:00:51 +08:00
RunningLeon
e34c552282
[Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend (#721)
* add llama2 test

* fix

* test qwen chat-7b

* test w4

* add baichuan2

* update

* update

* update configs and docs

* update
2023-12-21 18:22:17 +08:00
bittersweet1999
fbb912ddf3
[Feature] Add abbr for judgemodel in subjective evaluation (#724)
* add_judgemodel_abbr

* add judgemodel abbr
2023-12-21 15:58:20 +08:00
Skyfall-xzz
b35d991786
[Feature] Add ReasonBench(Internal) dataset (#577)
* [Feature] Add reasonbench dataset

* add configs for supporting generative inference & merge datasets in the same category

* modify config filename to prompt version

* fix codes to meet pre-commit requirements

* lint the code to meet pre-commit requirements

* Align Load_data Sourcecode Briefly

* fix bugs

* reduce code redundancy
2023-12-20 17:57:42 +08:00
Jingming
76a95e9e81
[Feature] Support the use of humaneval_plus. (#720)
* [Feature] Support the use of humaneval_plus.

* [Feature] Add humaneval_plus_gen.py

* minor check

* [Fix] Fix bug

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-12-20 17:25:17 +08:00
bittersweet1999
97c2068bd9
[Feature] Add JudgeLLMs (#710)
* add judgellms

* add judgellms

* add sub_size_partition

* add docs

* add ref
2023-12-19 18:40:25 +08:00
Hubert
eda72e756e
[Fix] minor fix openai (#711) 2023-12-18 15:45:31 +08:00
Songyang Zhang
637628a70f
[Doc] Update Doc for Alignbench (#707)
* update alignmentbench

* update alignmentbench

* update doc

* update

* update
2023-12-15 15:07:25 +08:00
DseidLi
db2920326a
[Fix] remove redundant in gsm8k.py (#700)
Removed redundant code in GSM8KDataset.load method.
2023-12-14 19:55:58 +08:00
Songyang Zhang
bfe4aa2af5
[Fix] Update alignmentbench (#704)
* update alignmentbench

* update alignmentbench

* update alignmentbench
2023-12-14 18:24:21 +08:00
bittersweet1999
1fe152b3e8
[Feature] Support AlignmentBench infer and judge (#697)
* alignmentbench infer and judge

* alignmentbench

* alignmentbench done

* alignment all done

* alignment all done
2023-12-13 19:59:30 +08:00
Hubert
a94598d921
[Feat] update python action and slurm (#694) 2023-12-13 10:41:10 +08:00
bittersweet1999
6130394165
[Feature] Add double order of subjective evaluation and removing duplicated response among two models (#692)
* add features

* add doc string

* add doc string
2023-12-12 20:58:17 +08:00
Hubert
4780b39eda
[Sync] format (#690)
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-12 14:03:45 +08:00
bittersweet1999
3e77175720
[Fix] Hotfix for Subjective Evaluation (#686) 2023-12-12 09:22:08 +08:00
bittersweet1999
465308e430
[Feature] Add Subjective Evaluation (#680)
* new version of subject

* fixed draw

* fixed draw

* fixed draw

* done

* done

* done

* done

* fixed lint
2023-12-11 22:22:11 +08:00
Hubert
4f0b373a0a
[Fix] fix docstring (#684) 2023-12-11 19:12:01 +08:00
Hubert
e78857ac36
[Sync] minor test (#683) 2023-12-11 17:42:53 +08:00
Jingming
dd4318f6ab
[Feature] enhance the ability of humaneval_postprocess (#676)
* [Feature] enhance the ability of humaneval_postprocess

* refactor

* [Feature] Keep the old version of the function and realize the new function in humaneval_postprocess_v2.

* Update opencompass/datasets/humaneval.py

---------

Co-authored-by: Leymore <zfz-960727@163.com>
Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>
2023-12-11 14:39:56 +08:00
Songyang Zhang
e25c5f9525
[Enhancement] Update API Interface and Mixtral (#681)
* [Enhancement] Update API interface

* [Enhancement] Update API interface

* Update mixtral

* Update readme
2023-12-10 13:29:26 +08:00
Xiaoming Shi
1bf85949ef
[Feature] Add medbench (#678)
* update medbench

* medbench update

* format medbench

* format

---------

Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-09 16:05:46 +08:00
Jingming
7cb53a95fa
[Fix] fix bug on standart_deviation summarizer (#675) 2023-12-08 13:38:07 +08:00
liyucheng09
05bbce8b08
[Feature] Add Data Contamination Analysis (#639)
* add contamination analysis to ceval

* fix bugs

* add contamination docs

* to pass CI check

* update

---------

Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-08 10:00:11 +08:00
bittersweet1999
1c95790fdd
New subjective judgement (#660)
* TabMWP

* TabMWP

* fixed

* fixed

* fixed

* done

* done

* done

* add new subjective judgement

* add new subjective judgement

* add new subjective judgement

* add new subjective judgement

* add new subjective judgement

* modified to a more general way

* modified to a more general way

* final

* final

* add summarizer

* add new summarize

* fixed

* fixed

* fixed

---------

Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>
2023-12-06 13:28:33 +08:00
rolellm
e10f1c9139
added rolebench dataset. (#633)
* added rolebench

* 修改了不合理的变量名

* 修改了评论中的变量名
2023-12-01 22:54:42 +08:00
Hubert
9eb5cadcac
[Feat] update gsm8k and math agent config (#652)
* [Feat] update gsm8k and math agent config

* minor fix
2023-12-01 15:08:38 +08:00
liushz
a331c9abfd
[Feature] Add wikibench dataset (#655)
* Add WikiBench

* Add WikiBench

* format

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-01 14:56:54 +08:00
liushz
e019c831fe
[Feature] Add Chinese version: commonsenseqa, crowspairs and nq (#144)
* add Chinese version: csqa crowspairs nq

* Update cn_data

* Update cn_data

* update format

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-30 15:33:02 +08:00
Ma Zerun
6aaf3b91ec
[Feature] Support chat style inferencer. (#643)
* [Feature] Support chat style inferencer.

* [Fix] use new prompt

* [Fix] use new prompt

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-11-30 14:00:06 +08:00
Fengzhe Zhou
e20d654c18
[Sync] Bump version to 0.1.9 (#644) 2023-11-28 11:42:43 +08:00
Hubert
d4af31bab4
[Feat] support zhipu post process (#642)
* [Feat] support zhipu post

* [Feat] support zhipu post

* [Feat] support zhipu post
2023-11-27 19:57:36 +08:00
liushz
6d0d78986c
[Feature] Add GSM_Hard dataset (#619)
* Add SVAMP dataset

* Add SVAMP dataset

* Add SVAMP dataset

* Add gsm_hard dataset

* Add gsm_hard dataset

* format

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-27 17:40:34 +08:00
Fengzhe Zhou
9083dea683
[Sync] some renaming (#641) 2023-11-27 16:06:49 +08:00
Yang Yong
522241a8c9
[Fix] Fix lightllmapi list bug (#635) 2023-11-24 14:24:13 +08:00
Hubert
1884912674
[Bug] fix icl eval with nested list (#632) 2023-11-24 13:43:26 +08:00
Fengzhe Zhou
79f6449d85
[Doc] Update FAQ (#628)
* update faq

* Update docs/zh_cn/get_started/faq.md

* Update docs/en/get_started/faq.md

* Update docs/zh_cn/get_started/faq.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-11-23 18:19:17 +08:00
Fengzhe Zhou
d949e3c003
[Feature] Add circular eval (#610)
* refactor default, add circular summarizer

* add circular

* update impl

* update doc

* minor update

* no more to be added
2023-11-23 16:45:47 +08:00
Songyang Zhang
5202456b4c
[API] Update API (#624)
* update api

* update generation_kwargs impl

* update api

* refactor

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-23 15:06:20 +08:00
Fengzhe Zhou
d4d1330a5a
[Sync] Fix cmnli, fix vicuna meta template, fix longbench postprocess and other minor fixes (#625) 2023-11-23 14:05:59 +08:00
Kevin Wang
c0785e53d8
[Feature] support download from modelscope (#534)
* [Feature] download from modelscope

* [Feature] download from modelscope

* minor fix

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-11-22 15:32:21 +08:00
liushz
048775192b
[Feature] Add SVAMP dataset (#604)
* Add SVAMP dataset

* Add SVAMP dataset

* Add SVAMP dataset
2023-11-22 14:54:39 +08:00
Fengzhe Zhou
fb30b7c7a2
[Fix] Fix gen inferencer (#615) 2023-11-22 12:04:31 +08:00
Songyang Zhang
721a45c68f
[Bug] Update api with generation_kargs (#614)
* update api

* update generation_kwargs impl

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-22 10:02:57 +08:00
Lyu Han
eb56fd6d16
Integrate turbomind python api (#484)
* integrate turbomind python api

* update

* update user guide

* update

* fix according to reviewer's comments

* fix error

* fix linting

* update user guide

* remove debug log

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-11-21 22:34:46 +08:00
Songyang Zhang
d925748266
[Feature] Support 360API and FixKRetriever for CSQA dataset (#601)
* [Feature] Support 360API and FixKRetriever for CSQA dataset

* Update API

* Update API

* [Feature] Support 360API and FixKRetriever for CSQA dataset

* Update API

* Update API

* rm mathbench

* fix_lint

* Update opencompass/models/bytedance_api.py

Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>

* update

* update

* update

---------

Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>
2023-11-21 20:25:47 +08:00
Yang Yong
d3b0d5c4ce
[Feature] Support Lightllm API (#613)
* [Feature] Support Lightllm api

* formatting & renaming

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-21 19:18:40 +08:00
Yuan Feng
7199acc25d
Add support for DataCanvas Alaya LM (#612)
* Support for Alaya

* Remove useless requirements
2023-11-21 17:51:30 +08:00
liushz
dbacd36379
Add aritch to mathbench (#607) 2023-11-20 19:40:41 +08:00
liushz
c9c5c5d92e
Mathbench update postprocess (#600)
* Update mathbench

* Update mathbench
2023-11-20 16:48:55 +08:00
Jingming
5e75e29711
[Feature] Add multi-prompt generation demo (#568)
* [Feature] Add multi-prompt generation demo

* [Fix] change form in winogrande_gen_XXX.py

* [Fix] make multi prompt demo more directly

* [Fix] fix bug

* [Fix] minor fix

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-11-20 16:16:37 +08:00
Hubert
91fba2c2e9
[Feat] support humaneval and mbpp pass@k (#598)
* [Feat] support pass@ k

* [Feat] support pass@k

* [Feat] support pass@k

* [Feat] support pass@k

* [Feat] support pass@k

* [Feat] support pass@k docs

* update naming

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-16 21:22:06 +08:00
Raymond Zhang
c0acd06b05
[Feature] Add FinanceIQ dataset (#596) 2023-11-16 17:47:57 +08:00
Hubert
fcab30f82e
[Fix] change save_every defaults to 1 (#592) 2023-11-15 13:00:25 +08:00
Fengzhe Zhou
19ad7f9613
fix cmb dataset (#587) 2023-11-14 16:13:39 +08:00
Wei Jueqi
14e6fe6f13
Fix bugs in subjective evaluation (#589)
* rename

* fix sub bugs and update docs

* update

* update
2023-11-14 16:11:55 +08:00
Fengzhe Zhou
1ea88d5822
[Sync] Bump version to 0.1.8 (#576) 2023-11-13 16:00:38 +08:00
Fengzhe Zhou
d3de5c41fb
[Sync] update model configs (#574) 2023-11-13 15:15:34 +08:00
Fengzhe Zhou
689ffe5b63
[Feature] Use dataset in local path (#570)
* update commonsenseqa

* update drop

* update flores_first100

* update gsm8k

* update humaneval

* update lambda

* update obqa

* update piqa

* update race

* update siqa

* update story_cloze

* update strategyqa

* update tydiqa

* update winogrande

* update doc

* update hellaswag

* fix obqa

* update collections

* update .zip name
2023-11-13 13:00:37 +08:00
Fengzhe Zhou
d6aaac22e7
[Feature] Update cmb (#571) 2023-11-13 00:09:05 +08:00
Songyang Zhang
9e42cb163b
[Feature] Update xunfei api (#572)
* update xunfei api

* fix lint

* avoid warning
2023-11-10 22:46:06 +08:00
jingmingzhuo
b3cbef3226
[Feature] Add py150 and maxmin (#562)
* [feat] add clozeTesst_maxmin dataset

* [feat] add py150 datasets

* [feat] change __init__.py in opencompass/datasets

* [fix] pre-commit check

* [fix] rename py150 and masxmin datasets in configs

* [feat] add gen.py of py150 and maxmin in configs/datasets
2023-11-09 22:05:25 +08:00
Hubert
889a6b26ae
[Fix] fix log re-direct (#564) 2023-11-09 19:34:19 +08:00
Hubert
cf5a6d1ab7
[Fix] fix unnecessary import and update requirements (#555) 2023-11-08 17:58:49 +08:00
Hubert
9f8a721313
[Fix] fix registry error with internal (#551)
* [Fix] fix conflict with internal

* [Fix] fix conflict with internal
2023-11-07 20:01:23 +08:00
Hubert
bb2ecf416e
[Feat] Support cibench (#538)
* [Feat] support cidataset

* [Feat] support cidataset

* [Feat] support cidataset

* [Feat] support cidataset

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* rename cibench

* rename cibench

* rename cibench

* rename cibench

* minor fix

* minor fix

* minor fix
2023-11-07 19:11:44 +08:00
Songyang Zhang
239c2a346e
[Feature] Add support for MiniMax API (#548)
* update requirement

* update requirement

* update with minimax

* update api model

* Update readme

* fix error

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2023-11-06 21:57:32 +08:00
Hubert
1ccdfaa623
[Feat] support xunfei api (#547) 2023-11-06 19:29:26 +08:00
Yuan Liu
6e31520128
[Feature]: To be compatible with the latest version of MiniGPT-4 (#539)
* [Feature]: To be compatible with the latest version of MiniGPT-4

* [Feature]: User try and except

Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>

* [Fix]: Fix lint

---------

Co-authored-by: bensenliu <bensenliu@tencent.com>
Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
2023-11-04 09:50:36 +08:00
bittersweet1999
f25a980043
[fFeat] Add an opensource dataset Tabmwp (#505)
* TabMWP

* TabMWP

* fixed

* fixed

* fixed

* done

* done

* done

---------

Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>
2023-11-03 11:15:46 +08:00
Hubert
b9270c3a60
[Fix] Fix local debug mode not restrict the resources (#522)
* [Fix] fix local debug mode not restrict the resources

* minor fix
2023-10-30 18:13:43 +08:00
Qing
e2355a2ede
[Feature] Add multi model viz (#509)
* add viz_multi_model.py tool

* Modify the viz_multi_model.py script according to the review

* highlight multiple optimal scores

---------

Co-authored-by: wq.chu <wq.chu@tianrang-inc.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-10-30 12:11:33 +08:00
Fengzhe Zhou
6a398d171c
Bump version to 0.1.7 (#518) 2023-10-27 20:32:27 +08:00
Fengzhe Zhou
dbb20b8270
[Sync] update (#517) 2023-10-27 20:31:22 +08:00
Hubert
6f07af3039
[Feat] Support local runner for windows (#515) 2023-10-27 17:16:22 +08:00
Fengzhe Zhou
df07391ed8
[Fix] Enforce do_sample=False in HF model (#506)
* update hf model wrapper

* patch llama

---------

Co-authored-by: bot <bot@bot.com>
2023-10-27 16:54:19 +08:00
Wei Jueqi
b62842335d
[Doc] Update Subjective docs (#510)
* rename

* add en subdoc

* fix name

* fix writing

* update

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-10-27 16:27:24 +08:00
Fengzhe Zhou
e3d4901bed
[Feat] Add _set_model_kwargs_torch_dtype for HF model (#507)
* add _set_model_kwargs_torch_dtype for hf models

* add logger
2023-10-27 11:45:41 +08:00
Fengzhe Zhou
6405cd2db5
use example summarizer by default (#508) 2023-10-27 11:45:29 +08:00
Hubert
b3f5d9e421
[Feat] support math/gms8k agent config (#494)
* support math agent

* support gsm8k agent

* support gsm8k agent

* minor fix

* minor fix

* minor fix

* Update configs/eval_codeagent.py
2023-10-25 23:05:15 +08:00
Hubert
ac3a2c4501
[Feat] local api speed up with fixed concurrent users (#497)
* [Feat] local api speed up

* fix lint

* fix lint

* minor fix

* add example api
2023-10-25 21:12:20 +08:00
Leymore
4dd9a3fc10
[Sync] sync with internal codes 20231019 (#488) 2023-10-18 23:37:35 -05:00
liushz
2737249f31
[Feature] Add mathbench dataset and circular evaluator (#408)
* add_mathbench

* update mathbench

* support non circular eval dataset

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-10-18 04:08:31 -05:00
Leymore
fccfcb6f5b
fix summary default (#483) 2023-10-17 11:32:38 +08:00
Leymore
6317da08b3
Bump version to 0.1.6 (#478) 2023-10-13 06:54:51 -05:00
Leymore
7d9e386821
[Fix] Split if and only if complete eos string shows up (#477) 2023-10-13 06:52:20 -05:00
Leymore
861942ab1b
[Feature] Add lawbench (#460)
* add lawbench

* update requirements

* update
2023-10-13 06:51:36 -05:00
Leymore
fbf5089c40
[Sync] update github token (#475) 2023-10-13 06:50:54 -05:00
Leymore
362c33dff4
fix jieba rouge (#467) 2023-10-12 10:25:19 +08:00
Leymore
d7ff933a73
[Fix] Use jieba rouge in lcsts (#459)
* use jieba rouge in lcsts

* use rouge_chinese
2023-10-09 10:10:33 +08:00