AllentDan
336d8d76ff
add turbomind restful api support ( #693 )
...
* add turbomind restful api support
* config
* top_p 0.8
* top_k = 1
2023-12-24 01:40:00 +08:00
Mo Li
0e24f4213e
[Feature] Add NeedleInAHaystack Test Support ( #714 )
...
* Add NeedleInAHaystack Test
* Apply pre-commit formatting
* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* add needle in haystack test
* update needle in haystack test
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-23 12:00:51 +08:00
RunningLeon
e34c552282
[Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend ( #721 )
...
* add llama2 test
* fix
* test qwen chat-7b
* test w4
* add baichuan2
* update
* update
* update configs and docs
* update
2023-12-21 18:22:17 +08:00
Skyfall-xzz
b35d991786
[Feature] Add ReasonBench(Internal) dataset ( #577 )
...
* [Feature] Add reasonbench dataset
* add configs for supporting generative inference & merge datasets in the same category
* modify config filename to prompt version
* fix codes to meet pre-commit requirements
* lint the code to meet pre-commit requirements
* Align Load_data Sourcecode Briefly
* fix bugs
* reduce code redundancy
2023-12-20 17:57:42 +08:00
Jingming
76a95e9e81
[Feature] Support the use of humaneval_plus. ( #720 )
...
* [Feature] Support the use of humaneval_plus.
* [Feature] Add humaneval_plus_gen.py
* minor check
* [Fix] Fix bug
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-12-20 17:25:17 +08:00
bittersweet1999
47e745d748
quick fix for maxoutlen ( #719 )
2023-12-20 00:00:28 +08:00
Hubert
5e8b838f51
[Feat] Update math/agent ( #716 )
...
* minor add
* minor add
* minor fix
2023-12-19 21:20:42 +08:00
bittersweet1999
97c2068bd9
[Feature] Add JudgeLLMs ( #710 )
...
* add judgellms
* add judgellms
* add sub_size_partition
* add docs
* add ref
2023-12-19 18:40:25 +08:00
Songyang Zhang
637628a70f
[Doc] Update Doc for Alignbench ( #707 )
...
* update alignmentbench
* update alignmentbench
* update doc
* update
* update
2023-12-15 15:07:25 +08:00
Jingming
d7e7a637a5
[Fix] fix a bug on configs/eval_mixtral_8x7b.py ( #706 )
2023-12-15 14:15:32 +08:00
Songyang Zhang
bfe4aa2af5
[Fix] Update alignmentbench ( #704 )
...
* update alignmentbench
* update alignmentbench
* update alignmentbench
2023-12-14 18:24:21 +08:00
bittersweet1999
1fe152b3e8
[Feature] Support AlignmentBench infer and judge ( #697 )
...
* alignmentbench infer and judge
* alignmentbench
* alignmentbench done
* alignment all done
* alignment all done
2023-12-13 19:59:30 +08:00
bittersweet1999
6130394165
[Feature] Add double order of subjective evaluation and removing duplicated response among two models ( #692 )
...
* add features
* add doc string
* add doc string
2023-12-12 20:58:17 +08:00
Xiaoyu Zhang
82a533a690
add rwkv-5-3b model ( #666 )
...
* support rwkv5-3b learnboard
* update rwkv-5-3b config
* update config
* refine
* fix bug
* update config
* refine
* reduce batch size
* refine
* reduce batch size to avoid oom in special datasets
* Update huggingface.py
* Update huggingface.py
2023-12-12 18:15:19 +08:00
bittersweet1999
3e77175720
[Fix] Hotfix for Subjective Evaluation ( #686 )
2023-12-12 09:22:08 +08:00
bittersweet1999
465308e430
[Feature] Add Subjective Evaluation ( #680 )
...
* new version of subject
* fixed draw
* fixed draw
* fixed draw
* done
* done
* done
* done
* fixed lint
2023-12-11 22:22:11 +08:00
Hubert
e78857ac36
[Sync] minor test ( #683 )
2023-12-11 17:42:53 +08:00
Songyang Zhang
e25c5f9525
[Enhancement] Update API Interface and Mixtral ( #681 )
...
* [Enhancement] Update API interface
* [Enhancement] Update API interface
* Update mixtral
* Update readme
2023-12-10 13:29:26 +08:00
Xiaoming Shi
1bf85949ef
[Feature] Add medbench ( #678 )
...
* update medbench
* medbench update
* format medbench
* format
---------
Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-09 16:05:46 +08:00
liyucheng09
05bbce8b08
[Feature] Add Data Contamination Analysis ( #639 )
...
* add contamination analysis to ceval
* fix bugs
* add contamination docs
* to pass CI check
* update
---------
Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-08 10:00:11 +08:00
Fengzhe Zhou
3a354bd1da
add qwen and deepseek configs ( #672 )
2023-12-07 20:29:00 +08:00
bittersweet1999
1c95790fdd
New subjective judgement ( #660 )
...
* TabMWP
* TabMWP
* fixed
* fixed
* fixed
* done
* done
* done
* add new subjective judgement
* add new subjective judgement
* add new subjective judgement
* add new subjective judgement
* add new subjective judgement
* modified to a more general way
* modified to a more general way
* final
* final
* add summarizer
* add new summarize
* fixed
* fixed
* fixed
---------
Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>
2023-12-06 13:28:33 +08:00
rolellm
e10f1c9139
added rolebench dataset. ( #633 )
...
* added rolebench
* 修改了不合理的变量名
* 修改了评论中的变量名
2023-12-01 22:54:42 +08:00
liushz
f4bbff6537
[Feature] Update MathBench CodeInterpreter & fix MathBench Bug ( #657 )
...
* Update MathBench CodeInterpreter & fix MathBench Bug
* Fix errors
* update
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
2023-12-01 22:27:24 +08:00
Hubert
9eb5cadcac
[Feat] update gsm8k and math agent config ( #652 )
...
* [Feat] update gsm8k and math agent config
* minor fix
2023-12-01 15:08:38 +08:00
liushz
a331c9abfd
[Feature] Add wikibench dataset ( #655 )
...
* Add WikiBench
* Add WikiBench
* format
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-01 14:56:54 +08:00
liushz
e019c831fe
[Feature] Add Chinese version: commonsenseqa, crowspairs and nq ( #144 )
...
* add Chinese version: csqa crowspairs nq
* Update cn_data
* Update cn_data
* update format
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-30 15:33:02 +08:00
Ma Zerun
6aaf3b91ec
[Feature] Support chat style inferencer. ( #643 )
...
* [Feature] Support chat style inferencer.
* [Fix] use new prompt
* [Fix] use new prompt
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-11-30 14:00:06 +08:00
Fengzhe Zhou
5933c04fda
fix hellaswag_ppl_47bff9 ( #648 )
2023-11-29 16:51:44 +08:00
Hubert
d4af31bab4
[Feat] support zhipu post process ( #642 )
...
* [Feat] support zhipu post
* [Feat] support zhipu post
* [Feat] support zhipu post
2023-11-27 19:57:36 +08:00
liushz
6d0d78986c
[Feature] Add GSM_Hard dataset ( #619 )
...
* Add SVAMP dataset
* Add SVAMP dataset
* Add SVAMP dataset
* Add gsm_hard dataset
* Add gsm_hard dataset
* format
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-27 17:40:34 +08:00
Fengzhe Zhou
9083dea683
[Sync] some renaming ( #641 )
2023-11-27 16:06:49 +08:00
Fengzhe Zhou
d949e3c003
[Feature] Add circular eval ( #610 )
...
* refactor default, add circular summarizer
* add circular
* update impl
* update doc
* minor update
* no more to be added
2023-11-23 16:45:47 +08:00
Songyang Zhang
5202456b4c
[API] Update API ( #624 )
...
* update api
* update generation_kwargs impl
* update api
* refactor
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-23 15:06:20 +08:00
Fengzhe Zhou
d4d1330a5a
[Sync] Fix cmnli, fix vicuna meta template, fix longbench postprocess and other minor fixes ( #625 )
2023-11-23 14:05:59 +08:00
Kevin Wang
c0785e53d8
[Feature] support download from modelscope ( #534 )
...
* [Feature] download from modelscope
* [Feature] download from modelscope
* minor fix
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-11-22 15:32:21 +08:00
liushz
048775192b
[Feature] Add SVAMP dataset ( #604 )
...
* Add SVAMP dataset
* Add SVAMP dataset
* Add SVAMP dataset
2023-11-22 14:54:39 +08:00
Lyu Han
eb56fd6d16
Integrate turbomind python api ( #484 )
...
* integrate turbomind python api
* update
* update user guide
* update
* fix according to reviewer's comments
* fix error
* fix linting
* update user guide
* remove debug log
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-11-21 22:34:46 +08:00
Songyang Zhang
d925748266
[Feature] Support 360API and FixKRetriever for CSQA dataset ( #601 )
...
* [Feature] Support 360API and FixKRetriever for CSQA dataset
* Update API
* Update API
* [Feature] Support 360API and FixKRetriever for CSQA dataset
* Update API
* Update API
* rm mathbench
* fix_lint
* Update opencompass/models/bytedance_api.py
Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>
* update
* update
* update
---------
Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>
2023-11-21 20:25:47 +08:00
Yang Yong
d3b0d5c4ce
[Feature] Support Lightllm API ( #613 )
...
* [Feature] Support Lightllm api
* formatting & renaming
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-21 19:18:40 +08:00
Yuan Feng
7199acc25d
Add support for DataCanvas Alaya LM ( #612 )
...
* Support for Alaya
* Remove useless requirements
2023-11-21 17:51:30 +08:00
liushz
dbacd36379
Add aritch to mathbench ( #607 )
2023-11-20 19:40:41 +08:00
liushz
c9c5c5d92e
Mathbench update postprocess ( #600 )
...
* Update mathbench
* Update mathbench
2023-11-20 16:48:55 +08:00
Jingming
5e75e29711
[Feature] Add multi-prompt generation demo ( #568 )
...
* [Feature] Add multi-prompt generation demo
* [Fix] change form in winogrande_gen_XXX.py
* [Fix] make multi prompt demo more directly
* [Fix] fix bug
* [Fix] minor fix
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-11-20 16:16:37 +08:00
Hubert
91fba2c2e9
[Feat] support humaneval and mbpp pass@k ( #598 )
...
* [Feat] support pass@ k
* [Feat] support pass@k
* [Feat] support pass@k
* [Feat] support pass@k
* [Feat] support pass@k
* [Feat] support pass@k docs
* update naming
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-16 21:22:06 +08:00
Raymond Zhang
c0acd06b05
[Feature] Add FinanceIQ dataset ( #596 )
2023-11-16 17:47:57 +08:00
Yu
8160cb84e3
update word spell ( #594 )
2023-11-15 15:23:58 +08:00
Wei Jueqi
14e6fe6f13
Fix bugs in subjective evaluation ( #589 )
...
* rename
* fix sub bugs and update docs
* update
* update
2023-11-14 16:11:55 +08:00
Songyang Zhang
c8cb38e822
[Feature] Update mathbench ( #580 )
...
* update xunfei api
* fix lint
* update mathbench to avoid incomplete prediction
2023-11-14 16:04:02 +08:00
Fengzhe Zhou
1ea88d5822
[Sync] Bump version to 0.1.8 ( #576 )
2023-11-13 16:00:38 +08:00
Fengzhe Zhou
d3de5c41fb
[Sync] update model configs ( #574 )
2023-11-13 15:15:34 +08:00
Fengzhe Zhou
689ffe5b63
[Feature] Use dataset in local path ( #570 )
...
* update commonsenseqa
* update drop
* update flores_first100
* update gsm8k
* update humaneval
* update lambda
* update obqa
* update piqa
* update race
* update siqa
* update story_cloze
* update strategyqa
* update tydiqa
* update winogrande
* update doc
* update hellaswag
* fix obqa
* update collections
* update .zip name
2023-11-13 13:00:37 +08:00
Fengzhe Zhou
d6aaac22e7
[Feature] Update cmb ( #571 )
2023-11-13 00:09:05 +08:00
Kevin Wang
7f77e8dae5
[Docs] fix dataset name error ( #533 )
2023-11-10 18:54:20 +08:00
Hubert
95e0da0173
[Docs] add humanevalx dataset link in config ( #559 )
...
* [Docs] add humanevalx dataset link in config
* [Docs] add humanevalx dataset link in config
* minor fix
2023-11-10 18:18:58 +08:00
jingmingzhuo
b3cbef3226
[Feature] Add py150 and maxmin ( #562 )
...
* [feat] add clozeTesst_maxmin dataset
* [feat] add py150 datasets
* [feat] change __init__.py in opencompass/datasets
* [fix] pre-commit check
* [fix] rename py150 and masxmin datasets in configs
* [feat] add gen.py of py150 and maxmin in configs/datasets
2023-11-09 22:05:25 +08:00
Hubert
889a6b26ae
[Fix] fix log re-direct ( #564 )
2023-11-09 19:34:19 +08:00
Hubert
bb2ecf416e
[Feat] Support cibench ( #538 )
...
* [Feat] support cidataset
* [Feat] support cidataset
* [Feat] support cidataset
* [Feat] support cidataset
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* rename cibench
* rename cibench
* rename cibench
* rename cibench
* minor fix
* minor fix
* minor fix
2023-11-07 19:11:44 +08:00
Hubert
36360bdfc3
[Fix] fix filename typo ( #549 )
2023-11-07 14:00:26 +08:00
liushz
214a34f0b8
【Feature】Update Mathbench dataset prompt and fix small errors ( #546 )
...
* Update mathbench
* Update mathbench
* Update mathbench
2023-11-06 21:58:31 +08:00
Songyang Zhang
239c2a346e
[Feature] Add support for MiniMax API ( #548 )
...
* update requirement
* update requirement
* update with minimax
* update api model
* Update readme
* fix error
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2023-11-06 21:57:32 +08:00
bittersweet1999
f25a980043
[fFeat] Add an opensource dataset Tabmwp ( #505 )
...
* TabMWP
* TabMWP
* fixed
* fixed
* fixed
* done
* done
* done
---------
Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>
2023-11-03 11:15:46 +08:00
Surav Shrestha
e5ae86221c
docs: fix typos in markdown files ( #530 )
...
* fix typos in configs/multimodal/llava/README.md
* fix typos in configs/multimodal/minigpt_4/README.md
2023-11-01 16:16:16 +08:00
Qing
229a65f305
[Fix] Fix typo in WSC prompt ( #520 )
...
Co-authored-by: wq.chu <wq.chu@tianrang-inc.com>
2023-10-30 12:16:26 +08:00
Fengzhe Zhou
dbb20b8270
[Sync] update ( #517 )
2023-10-27 20:31:22 +08:00
Wei Jueqi
b62842335d
[Doc] Update Subjective docs ( #510 )
...
* rename
* add en subdoc
* fix name
* fix writing
* update
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-10-27 16:27:24 +08:00
Hubert
b3f5d9e421
[Feat] support math/gms8k agent config ( #494 )
...
* support math agent
* support gsm8k agent
* support gsm8k agent
* minor fix
* minor fix
* minor fix
* Update configs/eval_codeagent.py
2023-10-25 23:05:15 +08:00
liushz
2737249f31
[Feature] Add mathbench dataset and circular evaluator ( #408 )
...
* add_mathbench
* update mathbench
* support non circular eval dataset
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-10-18 04:08:31 -05:00
Leymore
861942ab1b
[Feature] Add lawbench ( #460 )
...
* add lawbench
* update requirements
* update
2023-10-13 06:51:36 -05:00
Leymore
fbf5089c40
[Sync] update github token ( #475 )
2023-10-13 06:50:54 -05:00
Leymore
d7ff933a73
[Fix] Use jieba rouge in lcsts ( #459 )
...
* use jieba rouge in lcsts
* use rouge_chinese
2023-10-09 10:10:33 +08:00
Tong Gao
119bfd1569
[Refactor] Move fix_id_list to Retriever ( #442 )
...
* [Refactor] Move fix_id_list to Retriever
* update
* move to base
* fix
2023-10-07 12:53:41 +08:00
Lyu Han
6738247142
Integrate turbomind inference via its RPC API instead of its python API ( #414 )
...
* support tis
* integrate turbomind inference via its RPC API instead of its python API
* update guide
* update ip address spec
* update according to reviewer's comments
2023-10-07 10:27:48 +08:00
Leymore
9db5652638
[Feature] re-implement ceval load dataset ( #446 )
2023-09-27 21:18:48 +08:00
philipwangOvO
3bb3d330eb
[Sync] Update LongEval ( #443 )
2023-09-27 16:32:40 +08:00
Kevin Wang
dc1b82c346
[SIG] add GLUE_MRPC dataset ( #440 )
2023-09-27 11:44:54 +08:00
Kevin Wang
14fdecfecc
[Dataset] add GLUE QQP dataset ( #438 )
2023-09-27 11:36:43 +08:00
Kevin Wang
d8354fe5d8
[SIG] add GLUE_CoLA dataset ( #406 )
...
* [Dataset] add GLUE_CoLA dataset
* [update] use HFDataset to load glue/cola dataset
* update
---------
Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2023-09-27 11:30:44 +08:00
Kevin Wang
012546666b
[SIG] add WikiText-2&103 ( #397 )
...
* fix conflict
* add eval_cfg
2023-09-26 14:31:15 +08:00
liushz
c5224c2a91
[Feature] Add kaoshi dataset ( #392 )
...
* Add ToT method
* Update ToT
* Update ToT
* Update ToT
* Update ToT
* Update ToT
* Add Koashi
* Update Kaoshi
* Update Kaoshi
* Update kaoshi
* Update kaoshi
* Update Kaoshi
* Update Kaoshi
* Update Kaoshi
* Update Kaoshi
* update Kaoshi
* update
* update
* fix
---------
Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2023-09-22 18:46:33 +08:00
TTTTTiam
2a62bea1a4
add evaluation of scibench ( #393 )
...
* add evaluation of scibench
* add evaluation of scibench
* update scibench
* remove scibench evaluator
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-09-22 17:42:08 +08:00
Ma Zerun
0f2c388280
Support GSM8k evaluation with tools by Lagent and LangChain ( #277 )
...
* Support GSM8k evaluation with tools by Lagent and LangChain
* Avoid to use MMEngine new feature
* update document
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-09-22 15:28:22 +08:00
Yike Yuan
97fdc51102
[Fix] Fix performance issue of visualglm. ( #424 )
...
* [Fix] Visualglm performance fixed.
* [Fix] Hide ckpt path.
2023-09-21 19:54:23 +08:00
Hubert
8803f7f7a6
[Feat] support antropics evals dataset ( #422 )
...
* [Feat] support anthropics ai risk dataset
* [Feat] support anthropics evals dataset
* [Feat] support anthropics evals dataset
2023-09-20 18:36:44 +08:00
Yike Yuan
bd50bad8b5
[Feat] Support mm models on public dataset and fix several issues. ( #412 )
...
* [Feat] Add public dataset support for visualglm, qwenvl, and flamingo
* [Fix] MMBench related changes.
* [Fix] Openflamingo inference.
* [Fix] Hide ckpt path.
* [Fix] Pre-commit.
---------
Co-authored-by: Haodong Duan <dhd.efz@gmail.com>
2023-09-19 19:08:44 +08:00
Yuanhan Zhang
7c2726c23b
[Model] Yhzhang/add mlugowl llamaadapter ( #405 )
...
* refine gitignore
* [Feature]: Add minigpt-4
* [Feature]: Add mm local runner
* [Feature]: Add instructblip
* add otter and llama-adapter
* add owl
* add llama2-adapter and owl
* lint
* [Feature]: Add minigpt-4
* [Feature]: Add instructblip
* add otter and llama-adapter
* add owl
* add llama2-adapter and owl
* lint
* lint
* update
* lint
* lint
* add __init__.py
* update
* update
* update
* update
* [Feature]: Add minigpt-4
* [Feature]: Add mm local runner
* [Feature]: Add instructblip
* add otter and llama-adapter
* add owl
* add llama2-adapter and owl
* lint
* [Feature]: Add minigpt-4
* [Feature]: Add instructblip
* add otter and llama-adapter
* add owl
* add llama2-adapter and owl
* lint
* lint
* update
* lint
* lint
* add __init__.py
* update
* update
* update
* update
* optimize mmbench dataset args
* update
* update
* run commit hook
---------
Co-authored-by: liuyuan <3463423099@qq.com>
Co-authored-by: kennymckormick <dhd@pku.edu.cn>
Co-authored-by: kennymckormick <dhd.efz@gmail.com>
2023-09-19 14:21:26 +08:00
Hubert
2c15a0c01d
[Feat] refine docs and codes for more user guides ( #409 )
2023-09-18 16:12:13 +08:00
Hubert
a11cb45c83
[Feat] implementation for support promptbench ( #239 )
...
* [Feat] support adv_glue dataset for adversarial robustness
* reorg files
* minor fix
* minor fix
* support prompt bench demo
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
2023-09-15 15:06:53 +08:00
Hubert
de8a154795
[Feat] support ds1000 dataset ( #395 )
...
* [Feat] support ds1000 datase
2023-09-15 12:50:27 +08:00
Yuan Liu
545d50a4c0
[Fix]: Add has_image to scienceqa ( #391 )
...
Co-authored-by: bensenliu <bensenliu@tencent.com>
2023-09-13 13:07:14 +08:00
Xidong Wang
47a752cd56
[Dataset] Add CMB ( #376 )
...
* Add CMB
* modify CMB
---------
Co-authored-by: wangxidong <xidongw@163.com>
2023-09-12 19:16:41 +08:00
Tong Gao
b9b145c335
[Docs] Fix incorrect name in get_started ( #380 )
2023-09-11 16:10:09 +08:00
Leymore
2c915218e8
[Feaure] Add new models: baichuan2, tigerbot, vicuna v1.5 ( #373 )
...
* add bag of new models: baichuan2, tigerbot, vicuna v1.5
* update
* re-organize models
* update readme
* update
2023-09-08 15:41:20 +08:00
Leymore
b48d084020
[Fix] update bbh implement & fix bbh suffix ( #371 )
2023-09-08 15:14:30 +08:00
Yixiao Fang
fada77a31c
[Feature] Add open source dataset eval config of instruct-blip ( #370 )
...
* add configs
* refactor model
* add post processor and prompt constructor
2023-09-08 15:07:09 +08:00
Tong Gao
b11838f80a
[Feature] Update claude2 postprocessor ( #365 )
...
* [Feature] Update claude2 config
* [Feature] Update claude2 postprocessor
2023-09-07 11:26:26 +08:00
Yike Yuan
b885ec84df
[Feat] Support Qwen-VL-Chat on MMBench. ( #312 )
...
* [Feat] Support Qwen-VL base.
* [Feat] Support Qwen-VL-Chat on MMBench.
* [Fix] Add postprocessor and fix format.
* [Fix] Add type hint and remove redundant codes.
* [Fix] fix bugs in postprocessor.
* [Fix] Use given commit id.
2023-09-06 18:42:19 +08:00
Hubert
ddb8197212
[Feat] support wizardcoder series ( #344 )
...
* [Feat] support wizardcoder series
* minor fix
2023-09-06 17:52:35 +08:00
Leymore
764c2f799a
[Fix] update qwen config ( #358 )
2023-09-05 10:15:19 +08:00
Yuanhan Zhang
f2dd98ca7a
[Feat] Support LLaVA and mPLUG-Owl ( #331 )
...
* refine gitignore
* [Feature]: Add minigpt-4
* [Feature]: Add mm local runner
* [Feature]: Add instructblip
* add otter and llama-adapter
* add owl
* add llama2-adapter and owl
* lint
* [Feature]: Add minigpt-4
* [Feature]: Add instructblip
* add otter and llama-adapter
* add owl
* add llama2-adapter and owl
* lint
* lint
* update
* lint
* lint
* add __init__.py
* update
* update
* update
---------
Co-authored-by: liuyuan <3463423099@qq.com>
2023-09-01 23:32:05 +08:00
Tong Gao
166022f568
[Docs] Update docs for new entry script ( #246 )
...
* update docs
* update docs
* update
* update en docs
* update
* update
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-08-31 16:43:55 +08:00
Li Bo
a4d6840739
[Feat] Add Otter to OpenCompass MMBench Evaluation ( #232 )
...
* add otter model for opencompass mmbench
* add docs
* add readme docs
* debug for otter opencomass eval
* delete unused folders
* change to default data path
* remove unused files
* remove unused files
* update
* update config file
* flake8 lint formated and add prompt generator
* add prompt generator to config
* add a specific postproecss
* add post processor
* add post processor
* add post processor
* update according to suggestions
* remove unused redefinition
2023-08-31 12:55:53 +08:00
Leymore
7ca6ba625e
[Feature] Add qwen & qwen-chat support ( #286 )
...
* add and apply update suffix tool
* add tool doc
* add qwen configs
* add cmmlu
* rename bbh
* update datasets
* delete
* update hf_qwen_7b.py
2023-08-31 11:29:05 +08:00
Hubert
fd389e2d78
[Feat] support codellama and preds collection tools ( #335 )
2023-08-31 11:14:42 +08:00
Leymore
c26ecdb1b0
[Feature] Add and apply update suffix tool ( #280 )
...
* add and apply update suffix tool
* add dataset suffix updater as precommit hook
* update workflow
* update scripts
* update ci
* update
* ci with py3.8
* run in serial
* update bbh
* use py 3.10
* update pre commit zh cn
2023-08-28 17:35:04 +08:00
Tong Gao
9058be07b8
[Feature] Simplify entry script ( #204 )
...
* [Feature] Simply entry script
* update
2023-08-25 17:36:30 +08:00
Tong Gao
f480b72703
[Feature] Support model-bound prediction postprocessor, use it in Claude ( #268 )
...
* [Feature] Support model-bound text postprocessor, add claude as an example
* update
* update
* minor fix
---------
Co-authored-by: zhoufengzhe <zhoufengzhe@pjlab.org.cn>
2023-08-25 16:12:21 +08:00
Tong Gao
fda42fd5fd
[Fix] wrong path in dataset collections ( #272 )
2023-08-25 15:50:30 +08:00
Yike Yuan
3f601f420b
[Feat] Support public dataset of visualglm and llava. ( #265 )
...
* [Feat] Add public dataset support of VisualGLM.
* [Feat] Refactor LLaVA.
* [Feat] Add public dataset support of LlaVA.
* [Fix] Add arg.
2023-08-25 15:44:32 +08:00
Yuan Liu
dc6e54f6f4
[Feature]: Verify the acc of these public datasets ( #269 )
...
* [Feature]: Refactor public dataset eval
* [Feature]: Verify public dataset acc
2023-08-25 15:01:58 +08:00
philipwangOvO
3f37c40aa3
[Dataset] Refactor LEval
2023-08-25 11:46:23 +08:00
Tong Gao
60c2d3d76b
[Feature] Add Claude support ( #253 )
...
* [Feature] Add Claude support
* [Feature] Add Claude support
* Update opencompass/models/claude_api.py
Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>
* raise import erorr
---------
Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>
2023-08-24 14:29:45 +08:00
Yuan Liu
343f785b07
[Feature]: Add Flamingo ( #258 )
...
* [Feature]: Add Openflamingo MMBench
* [Fix]: Fix import error
* [Fix]: Revert task config
* [Fix]: Fix path bug
2023-08-24 14:11:29 +08:00
Yixiao Fang
1034c487ef
[Refactor] Refactor instructblip ( #227 )
...
* refactor instructblip
* add post processor
* add forward
* fix lint
* update
* update
2023-08-23 15:33:59 +08:00
liushz
02ce139bc6
[Feature] Add Tree-of-Thought method ( #173 )
...
* Add ToT method
* Update ToT
* Update ToT
* Update ToT
* Update ToT
* Update ToT
* Update ToT
* Update ToT
* Update chain_of_thought.md
* Update icl_tot_inferencer.py
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2023-08-23 12:23:05 +08:00
Leymore
ff5ab92331
[Feature] Add llama2 native implements ( #235 )
...
* add llama2 native implements
* rename configs/eval_llama_7b.py
---------
Co-authored-by: zhoufengzhe <zhoufengzhe@pjlab.org.cn>
2023-08-23 11:33:25 +08:00
Yike Yuan
8d368d1cd6
[Feat] Support visualglm and llava for MMBench evaluation. ( #211 )
...
* [Feat] Support visualglm inference on MMBench.
* [Feat] Support llava inference on MMBench.
* [Fix] Fix pre-commit format.
* [Fix] Add docstring for llava
* [Fix] Fix multi-process inference error of LlaVA and add comments.
1. Set `low_cpu_mem_usage` to False to address device issue.
2. Add docstring and type hints.
3. Rename class and remove registry.
* [Fix] Pre-commit fix.
* [Fix] add forward entry, add dynamic import to seedbench
* [Fix] Fix pre-commit.
* [Fix] Fix missing context.
* [Fix] Fix docstring.
2023-08-21 15:57:30 +08:00
Yike Yuan
a6552224cb
[Feat] Support multi-modal evaluation on MME benchmark. ( #197 )
...
* [Feat] Support multi-modal evaluation on MME benchmark.
* [Fix] Remove debug code.
* [Fix] Remove redundant codes and add type hints.
* [Fix] Rename in config.
* [Fix] Rebase main.
* [Fix] Fix isort and yapf conflict.
2023-08-21 15:53:20 +08:00
philipwangOvO
655a807f4b
[Dataset] LongBench ( #236 )
...
Co-authored-by: wangchonghua <wangchonghua@pjlab.org.cn>
2023-08-21 14:15:20 +08:00
Yuan Liu
90c07a3dfd
[Fix]: Fix name ( #223 )
2023-08-17 18:30:48 +08:00
Yuan Liu
3d49a20b95
[Feature]: Add launch script ( #222 )
2023-08-17 18:26:01 +08:00
Yixiao Fang
0fa2482661
[Feature] Support SEED-Bench ( #203 )
...
* support seedbench
* update docstrings
* update
* update
* update
* update according to review
* rebase
* fix lint
* update
2023-08-17 17:24:02 +08:00
Yuan Liu
ae3c1869da
[Feature]: Add other public datasets config ( #214 )
...
* [Feature]: Add flickr30k
* [Feature]: Add GQA
* [Feature]: Add OCR VQA
* [Feature]: Add OK VQA
* [Feature]: Add text vqa
* [Feature]: Add other vqa
2023-08-17 11:11:26 +08:00
Ezra-Yu
17ccaa5980
[Feat] Add codegeex2 and Humanevalx ( #210 )
...
* add codegeex2
* add humanevalx dataset
* add evaluator
* update evaluator
* update configs
* update clean code
* update configs
* fix lint
* remove sleep
* fix lint
* update docs
* fix lint
2023-08-17 11:03:16 +08:00
Hubert
0fe2366a72
[Feat] support adv_glue dataset for adversarial robustness ( #205 )
...
* [Feat] support adv_glue dataset for adversarial robustness
* reorg files
* minor fix
* minor fix
2023-08-16 18:42:06 +08:00
Yuan Liu
78df9bd0cb
[Feature]: Add other public datasets ( #206 )
...
* [Feature]: Refactor class name
* [Feature]: Add minigpt-4 coco caption
* [Feature]: Update minigpt-4 coco caption
* [Feature]: Add MiniGPT-4 ScienceQA
* [Feature]: Add minigpt-4 vqav2
* [Feature]: Add VSR
* [Feature]: Revert task to previous version
2023-08-16 11:37:26 +08:00
Hubert
7c393192af
[Fix] fix bug for postprocessor ( #195 )
...
* [Fix] fix bug for postprocessor
* minor fix
2023-08-11 18:41:12 +08:00
Tong Gao
bf79ff1c6d
[Feature] Add LEval datasets
...
Co-authored-by: kennymckormick <dhd@pku.edu.cn>
2023-08-11 17:38:31 +08:00
Hubert
8d9cee060f
[Feat] update postprocessor to get first option more accurately ( #193 )
...
* [Feat] update postprocessor to get first option
* minor fix
* minor fix
2023-08-11 17:33:00 +08:00
Leymore
14332e08fd
[Feature] add llama-oriented dataset configs ( #82 )
...
* add llama-oriented dataset configs
* update
* revert cvalues & update llama_example
2023-08-11 12:48:05 +08:00
Hubert
5a9539f375
[Feat] add safety to collections ( #185 )
...
* [Feat] add safety to collections
* minor fix
2023-08-11 11:19:26 +08:00
Tong Gao
2931f3dcb8
[Enhancement] Add humaneval postprocessor for GPT models & eval config for GPT4, enhance the original humaneval postprocessor ( #129 )
...
* [Enhancement] Enhance humaneval postprocessor
* add human-eval testcase
* update
* update
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-08-10 16:31:12 +08:00
Songyang Zhang
3f36db3b06
[Feature] Support turbomind ( #166 )
...
* support turbomind
* update doc
* Update docs/en/advanced_guides/evaluation_turbomind.md
Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
* Update docs/zh_cn/advanced_guides/evaluation_turbomind.md
Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
* Update docs/zh_cn/advanced_guides/evaluation_turbomind.md
Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
* Update docs/en/advanced_guides/evaluation_turbomind.md
Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
* update
---------
Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
2023-08-10 16:25:11 +08:00
Leymore
e7fc54baf1
[Feature] Add Xiezhi SQuAD2.0 ANLI ( #101 )
...
* add Xiezhi SQuAD2.0 ANLI; update WSC
* update
* update
* update doc string
2023-08-10 14:04:18 +08:00
Yuan Liu
a205629ff3
[Feature]: Refactor input and output ( #176 )
...
* [Feature]: Refactor input and output
* [Feature]: Update tasks
2023-08-10 14:01:28 +08:00
Leymore
876ade71a5
[Fix] Fix AGIEval multiple choice ( #137 )
...
* update agieval data
* rename variables
2023-08-10 11:38:24 +08:00
Zaida Zhou
af436f5951
[Feature] Calculate max_out_len without hard code for OpenAI model ( #158 )
...
* calulate max_out_len without hard code
* set default value
* update configs
* Update configs/eval_gpt3.5.py
Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
---------
Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
2023-08-08 15:16:56 +08:00
Yuan Liu
2f1949e7a1
[Feature]: Add mm suport for local ( #169 )
2023-08-08 14:21:58 +08:00
Yuan Liu
191a3f6f9d
[Feature]: Use multimodal ( #73 )
...
* [Feature]: Add minigpt-4
* [Feature]: Add mm local runner
* [Feature]: Add instructblip
* [Feature]: Delete redundant file
* [Feature]: Delete redundant file
* [Feature]: Add README to InstructBLIP
* [Feature]: Update MiniGPT-4
* [Fix]: Fix lint
* [Feature]add omnibenchmark readme (#49 )
* add omnibenchmark readme
* fix
* Update OmniMMBench.md
* Update OmniMMBench.md
* Update OmniMMBench.md
* [Fix]: Refine name (#54 )
* [Feature]: Unify out and err
* [Fix]: Fix lint
* [Feature]: Rename to mmbench and change weight path
* [Feature]: Delete Omni in instructblip
* [Feature]: Check the avaliablity of lavis
* [Fix]: Fix lint
* [Feature]: Refactor MM
* [Refactor]: Refactor path
* [Feature]: Delete redundant files
* [Refactor]: Delete redundant files
---------
Co-authored-by: Wangbo Zhao(黑色枷锁) <56866854+wangbo-zhao@users.noreply.github.com>
2023-08-03 11:07:50 +08:00
Tong Gao
c00179d46b
[Feature] Evaluating acc based on minimum edit distance, update SIQA ( #130 )
...
* [Feature] Support evaluating acc based on minimum edit distance, update SIQA
* update
2023-08-01 14:24:27 +08:00
Leymore
d862f570aa
[Feature] Add SC ( #126 )
...
* add self-consistency
* add CoT method Self-Consistency
* fix typo error and update openicl_eval
* add tydiQA-GoldP task
* fix sc
* rename gsm8k_sc
* fix sc
* add self-consistency doc
* refine sc
---------
Authored-by: liushz <qq1791167085@163.com>
2023-07-28 17:29:37 +08:00
gowithme
57fcfc975a
[Feature] Support intern lanuage model ( #51 )
...
* support internLM
* support internLM
* simplify intern model files
* update storage_manager
* support internLM
* Modify the file organization structure
* support internLM
* support internLM
* support internLM
* support internLM
* change some details
2023-07-27 18:49:36 +08:00
Hubert
b7184e9db5
[Refactor] Update crows-pairs evaluation ( #98 )
...
* [Refactor] Update crows-pairs evaluation
* [Refactor] Update crows-pairs evaluation
* minor
2023-07-26 11:21:32 +08:00
Tong Gao
3715be6595
[Fix] Fix llama configs ( #72 )
...
Co-authored-by: Leymore <zfz-960727@163.com>
2023-07-25 10:21:31 +08:00
Haonan Li
e9cdb24ddd
[Feature] Add CMMLU dataset ( #91 )
...
* add CMMLU
* debug cmmlu
* add slurm args `qos`
* fix format: space before comment
* remove unused variable
* change the location of `answer is`
---------
Co-authored-by: 李浩楠 <lihaonan@lihaonandeMacBook-Air.local>
Co-authored-by: 李浩楠 <haonan.li>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-07-25 10:14:27 +08:00
Leymore
eea8b04417
[Feature] Add llama-2 models ( #81 )
...
* add llama-2 models
* update docs
---------
Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2023-07-19 19:51:29 +08:00
Hubert
f83e125e5a
[Feat] Support CValues Responsibility dataset ( #78 )
...
* [Feat] support CValues
* minor fix
2023-07-18 18:45:15 +08:00
liushz
f36c0496f3
[Feature] Add tydiqa-goldp ( #75 )
...
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2023-07-18 14:54:35 +08:00
Hubert
29598e3619
[Feat] add falcon-40b ( #76 )
...
* [Feat] add falcon-40b
* minor fix
2023-07-18 14:40:16 +08:00
Leymore
9a16448905
[Fix] eval_llama_7b ( #68 )
2023-07-17 15:28:21 +08:00
Leymore
edb23d15d1
[Feature] Add baichuan13b model configs ( #60 )
...
* [Feature] Add baichuan13b
* update num_gpus
2023-07-17 14:38:12 +08:00
Leymore
1326aff77e
[Feature] Add logger info and remove dataset bugs ( #61 )
...
* Add logger info and remove dataset bugs
* fix typo
2023-07-17 14:26:30 +08:00
Tong Gao
7ee5a86fee
[Feature] Enhance OpenAI API, add example config for GPT evaluation ( #53 )
...
* [Feature] Enhance OpenAI API, add example config for GPT evaluation
* fix
2023-07-12 16:43:46 +08:00
Leymore
50b658d234
[Fix] Update HF configs ( #42 )
2023-07-11 10:51:49 +08:00
Ezra-Yu
0c6fb6cf67
[Doc] Update logo icon ( #32 )
...
* update logo_icon and fix type in docs
* rebase:
* update get_started
* update .gitignore
* remove extra lines
* remove extra 'S'
* update
* update
* update docs
* update docs
* update docs
---------
Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2023-07-08 16:40:24 +08:00
Leymore
5f2e7c3469
Add models ( #18 )
...
* Add models
* Add comments
2023-07-06 16:02:39 +08:00
Tong Gao
18ace3d549
Add docs ( #8 )
...
* Add docs
* update
* update
2023-07-06 12:58:58 +08:00
Ezra-Yu
83dac269bd
update docs ( #14 )
...
* update docs
* update docs
* update docs
2023-07-06 12:41:17 +08:00
Leymore
86d5ec3d0f
Update configs ( #9 )
...
* Update implements
* Update
2023-07-06 12:27:41 +08:00
Tong Gao
16e759b996
Align prompt files with their hash ( #1 )
...
* fix bbh
* fix bbh
* rename
2023-07-05 18:28:58 +08:00
yuzhaohui
dcf11cf8fd
New logo and update setup.py
2023-07-05 06:54:06 +00:00
mzr1996
04dd01a235
Update configs and code
2023-07-05 11:45:08 +08:00
Leymore
c94cc94348
Add release contribution
2023-07-05 03:15:31 +00:00
tonysy
e6b5bdcb87
OpenCompass Public MR
2023-07-05 03:15:21 +00:00
Ezra-Yu
cbe9fe2cdb
Add Release Contraibution
2023-07-05 02:22:40 +00:00
cky
36f111100f
update datasets
2023-07-05 01:45:26 +00:00
mzr1996
3cfe73de3f
Support a batch of datasets.
2023-07-05 01:30:27 +00:00
yingfhu
fb11108723
[Feat] support opencompass
2023-07-04 22:11:33 +08:00
gaotongxiao
7d346000bb
initial commit
2023-07-04 21:34:55 +08:00