Commit Graph

182 Commits

Author SHA1 Message Date
Lyu Han
eb56fd6d16
Integrate turbomind python api (#484)
* integrate turbomind python api

* update

* update user guide

* update

* fix according to reviewer's comments

* fix error

* fix linting

* update user guide

* remove debug log

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-11-21 22:34:46 +08:00
Songyang Zhang
d925748266
[Feature] Support 360API and FixKRetriever for CSQA dataset (#601)
* [Feature] Support 360API and FixKRetriever for CSQA dataset

* Update API

* Update API

* [Feature] Support 360API and FixKRetriever for CSQA dataset

* Update API

* Update API

* rm mathbench

* fix_lint

* Update opencompass/models/bytedance_api.py

Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>

* update

* update

* update

---------

Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>
2023-11-21 20:25:47 +08:00
Yang Yong
d3b0d5c4ce
[Feature] Support Lightllm API (#613)
* [Feature] Support Lightllm api

* formatting & renaming

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-21 19:18:40 +08:00
Yuan Feng
7199acc25d
Add support for DataCanvas Alaya LM (#612)
* Support for Alaya

* Remove useless requirements
2023-11-21 17:51:30 +08:00
liushz
dbacd36379
Add aritch to mathbench (#607) 2023-11-20 19:40:41 +08:00
liushz
c9c5c5d92e
Mathbench update postprocess (#600)
* Update mathbench

* Update mathbench
2023-11-20 16:48:55 +08:00
Jingming
5e75e29711
[Feature] Add multi-prompt generation demo (#568)
* [Feature] Add multi-prompt generation demo

* [Fix] change form in winogrande_gen_XXX.py

* [Fix] make multi prompt demo more directly

* [Fix] fix bug

* [Fix] minor fix

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-11-20 16:16:37 +08:00
Hubert
91fba2c2e9
[Feat] support humaneval and mbpp pass@k (#598)
* [Feat] support pass@ k

* [Feat] support pass@k

* [Feat] support pass@k

* [Feat] support pass@k

* [Feat] support pass@k

* [Feat] support pass@k docs

* update naming

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-16 21:22:06 +08:00
Raymond Zhang
c0acd06b05
[Feature] Add FinanceIQ dataset (#596) 2023-11-16 17:47:57 +08:00
Yu
8160cb84e3
update word spell (#594) 2023-11-15 15:23:58 +08:00
Wei Jueqi
14e6fe6f13
Fix bugs in subjective evaluation (#589)
* rename

* fix sub bugs and update docs

* update

* update
2023-11-14 16:11:55 +08:00
Songyang Zhang
c8cb38e822
[Feature] Update mathbench (#580)
* update xunfei api

* fix lint

* update mathbench to avoid incomplete prediction
2023-11-14 16:04:02 +08:00
Fengzhe Zhou
1ea88d5822
[Sync] Bump version to 0.1.8 (#576) 2023-11-13 16:00:38 +08:00
Fengzhe Zhou
d3de5c41fb
[Sync] update model configs (#574) 2023-11-13 15:15:34 +08:00
Fengzhe Zhou
689ffe5b63
[Feature] Use dataset in local path (#570)
* update commonsenseqa

* update drop

* update flores_first100

* update gsm8k

* update humaneval

* update lambda

* update obqa

* update piqa

* update race

* update siqa

* update story_cloze

* update strategyqa

* update tydiqa

* update winogrande

* update doc

* update hellaswag

* fix obqa

* update collections

* update .zip name
2023-11-13 13:00:37 +08:00
Fengzhe Zhou
d6aaac22e7
[Feature] Update cmb (#571) 2023-11-13 00:09:05 +08:00
Kevin Wang
7f77e8dae5
[Docs] fix dataset name error (#533) 2023-11-10 18:54:20 +08:00
Hubert
95e0da0173
[Docs] add humanevalx dataset link in config (#559)
* [Docs] add humanevalx dataset link in config

* [Docs] add humanevalx dataset link in config

* minor fix
2023-11-10 18:18:58 +08:00
jingmingzhuo
b3cbef3226
[Feature] Add py150 and maxmin (#562)
* [feat] add clozeTesst_maxmin dataset

* [feat] add py150 datasets

* [feat] change __init__.py in opencompass/datasets

* [fix] pre-commit check

* [fix] rename py150 and masxmin datasets in configs

* [feat] add gen.py of py150 and maxmin in configs/datasets
2023-11-09 22:05:25 +08:00
Hubert
889a6b26ae
[Fix] fix log re-direct (#564) 2023-11-09 19:34:19 +08:00
Hubert
bb2ecf416e
[Feat] Support cibench (#538)
* [Feat] support cidataset

* [Feat] support cidataset

* [Feat] support cidataset

* [Feat] support cidataset

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* rename cibench

* rename cibench

* rename cibench

* rename cibench

* minor fix

* minor fix

* minor fix
2023-11-07 19:11:44 +08:00
Hubert
36360bdfc3
[Fix] fix filename typo (#549) 2023-11-07 14:00:26 +08:00
liushz
214a34f0b8
【Feature】Update Mathbench dataset prompt and fix small errors (#546)
* Update mathbench

* Update mathbench

* Update mathbench
2023-11-06 21:58:31 +08:00
Songyang Zhang
239c2a346e
[Feature] Add support for MiniMax API (#548)
* update requirement

* update requirement

* update with minimax

* update api model

* Update readme

* fix error

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2023-11-06 21:57:32 +08:00
bittersweet1999
f25a980043
[fFeat] Add an opensource dataset Tabmwp (#505)
* TabMWP

* TabMWP

* fixed

* fixed

* fixed

* done

* done

* done

---------

Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>
2023-11-03 11:15:46 +08:00
Surav Shrestha
e5ae86221c
docs: fix typos in markdown files (#530)
* fix typos in configs/multimodal/llava/README.md

* fix typos in configs/multimodal/minigpt_4/README.md
2023-11-01 16:16:16 +08:00
Qing
229a65f305
[Fix] Fix typo in WSC prompt (#520)
Co-authored-by: wq.chu <wq.chu@tianrang-inc.com>
2023-10-30 12:16:26 +08:00
Fengzhe Zhou
dbb20b8270
[Sync] update (#517) 2023-10-27 20:31:22 +08:00
Wei Jueqi
b62842335d
[Doc] Update Subjective docs (#510)
* rename

* add en subdoc

* fix name

* fix writing

* update

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-10-27 16:27:24 +08:00
Hubert
b3f5d9e421
[Feat] support math/gms8k agent config (#494)
* support math agent

* support gsm8k agent

* support gsm8k agent

* minor fix

* minor fix

* minor fix

* Update configs/eval_codeagent.py
2023-10-25 23:05:15 +08:00
liushz
2737249f31
[Feature] Add mathbench dataset and circular evaluator (#408)
* add_mathbench

* update mathbench

* support non circular eval dataset

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-10-18 04:08:31 -05:00
Leymore
861942ab1b
[Feature] Add lawbench (#460)
* add lawbench

* update requirements

* update
2023-10-13 06:51:36 -05:00
Leymore
fbf5089c40
[Sync] update github token (#475) 2023-10-13 06:50:54 -05:00
Leymore
d7ff933a73
[Fix] Use jieba rouge in lcsts (#459)
* use jieba rouge in lcsts

* use rouge_chinese
2023-10-09 10:10:33 +08:00
Tong Gao
119bfd1569
[Refactor] Move fix_id_list to Retriever (#442)
* [Refactor] Move fix_id_list to Retriever

* update

* move to base

* fix
2023-10-07 12:53:41 +08:00
Lyu Han
6738247142
Integrate turbomind inference via its RPC API instead of its python API (#414)
* support tis

* integrate turbomind inference via its RPC API instead of its python API

* update guide

* update ip address spec

* update according to reviewer's comments
2023-10-07 10:27:48 +08:00
Leymore
9db5652638
[Feature] re-implement ceval load dataset (#446) 2023-09-27 21:18:48 +08:00
philipwangOvO
3bb3d330eb
[Sync] Update LongEval (#443) 2023-09-27 16:32:40 +08:00
Kevin Wang
dc1b82c346
[SIG] add GLUE_MRPC dataset (#440) 2023-09-27 11:44:54 +08:00
Kevin Wang
14fdecfecc
[Dataset] add GLUE QQP dataset (#438) 2023-09-27 11:36:43 +08:00
Kevin Wang
d8354fe5d8
[SIG] add GLUE_CoLA dataset (#406)
* [Dataset] add GLUE_CoLA dataset

* [update] use HFDataset to load glue/cola dataset

* update

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2023-09-27 11:30:44 +08:00
Kevin Wang
012546666b
[SIG] add WikiText-2&103 (#397)
* fix conflict

* add eval_cfg
2023-09-26 14:31:15 +08:00
liushz
c5224c2a91
[Feature] Add kaoshi dataset (#392)
* Add ToT method

* Update ToT

* Update ToT

* Update ToT

* Update ToT

* Update ToT

* Add Koashi

* Update Kaoshi

* Update Kaoshi

* Update kaoshi

* Update kaoshi

* Update Kaoshi

* Update Kaoshi

* Update Kaoshi

* Update Kaoshi

* update Kaoshi

* update

* update

* fix

---------
Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2023-09-22 18:46:33 +08:00
TTTTTiam
2a62bea1a4
add evaluation of scibench (#393)
* add evaluation of scibench

* add evaluation of scibench

* update scibench

* remove scibench evaluator

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-09-22 17:42:08 +08:00
Ma Zerun
0f2c388280
Support GSM8k evaluation with tools by Lagent and LangChain (#277)
* Support GSM8k evaluation with tools by Lagent and LangChain

* Avoid to use MMEngine new feature

* update document

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-09-22 15:28:22 +08:00
Yike Yuan
97fdc51102
[Fix] Fix performance issue of visualglm. (#424)
* [Fix] Visualglm performance fixed.

* [Fix] Hide ckpt path.
2023-09-21 19:54:23 +08:00
Hubert
8803f7f7a6
[Feat] support antropics evals dataset (#422)
* [Feat] support anthropics ai risk dataset

* [Feat] support anthropics evals dataset

* [Feat] support anthropics evals dataset
2023-09-20 18:36:44 +08:00
Yike Yuan
bd50bad8b5
[Feat] Support mm models on public dataset and fix several issues. (#412)
* [Feat] Add public dataset support for visualglm, qwenvl, and flamingo

* [Fix] MMBench related changes.

* [Fix] Openflamingo inference.

* [Fix] Hide ckpt path.

* [Fix] Pre-commit.

---------

Co-authored-by: Haodong Duan <dhd.efz@gmail.com>
2023-09-19 19:08:44 +08:00
Yuanhan Zhang
7c2726c23b
[Model] Yhzhang/add mlugowl llamaadapter (#405)
* refine gitignore

* [Feature]: Add minigpt-4

* [Feature]: Add mm local runner

* [Feature]: Add instructblip

* add otter and llama-adapter

* add owl

* add llama2-adapter and owl

* lint

* [Feature]: Add minigpt-4

* [Feature]: Add instructblip

* add otter and llama-adapter

* add owl

* add llama2-adapter and owl

* lint

* lint

* update

* lint

* lint

* add __init__.py

* update

* update

* update

* update

* [Feature]: Add minigpt-4

* [Feature]: Add mm local runner

* [Feature]: Add instructblip

* add otter and llama-adapter

* add owl

* add llama2-adapter and owl

* lint

* [Feature]: Add minigpt-4

* [Feature]: Add instructblip

* add otter and llama-adapter

* add owl

* add llama2-adapter and owl

* lint

* lint

* update

* lint

* lint

* add __init__.py

* update

* update

* update

* update

* optimize mmbench dataset args

* update

* update

* run commit hook

---------

Co-authored-by: liuyuan <3463423099@qq.com>
Co-authored-by: kennymckormick <dhd@pku.edu.cn>
Co-authored-by: kennymckormick <dhd.efz@gmail.com>
2023-09-19 14:21:26 +08:00
Hubert
2c15a0c01d
[Feat] refine docs and codes for more user guides (#409) 2023-09-18 16:12:13 +08:00
Hubert
a11cb45c83
[Feat] implementation for support promptbench (#239)
* [Feat] support adv_glue dataset for adversarial robustness

* reorg files

* minor fix

* minor fix

* support prompt bench demo

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix

* minor fix
2023-09-15 15:06:53 +08:00
Hubert
de8a154795
[Feat] support ds1000 dataset (#395)
* [Feat] support ds1000 datase
2023-09-15 12:50:27 +08:00
Yuan Liu
545d50a4c0
[Fix]: Add has_image to scienceqa (#391)
Co-authored-by: bensenliu <bensenliu@tencent.com>
2023-09-13 13:07:14 +08:00
Xidong Wang
47a752cd56
[Dataset] Add CMB (#376)
* Add CMB

* modify CMB

---------

Co-authored-by: wangxidong <xidongw@163.com>
2023-09-12 19:16:41 +08:00
Tong Gao
b9b145c335
[Docs] Fix incorrect name in get_started (#380) 2023-09-11 16:10:09 +08:00
Leymore
2c915218e8
[Feaure] Add new models: baichuan2, tigerbot, vicuna v1.5 (#373)
* add bag of new models: baichuan2, tigerbot, vicuna v1.5

* update

* re-organize models

* update readme

* update
2023-09-08 15:41:20 +08:00
Leymore
b48d084020
[Fix] update bbh implement & fix bbh suffix (#371) 2023-09-08 15:14:30 +08:00
Yixiao Fang
fada77a31c
[Feature] Add open source dataset eval config of instruct-blip (#370)
* add configs

* refactor model

* add post processor and prompt constructor
2023-09-08 15:07:09 +08:00
Tong Gao
b11838f80a
[Feature] Update claude2 postprocessor (#365)
* [Feature] Update claude2 config

* [Feature] Update claude2 postprocessor
2023-09-07 11:26:26 +08:00
Yike Yuan
b885ec84df
[Feat] Support Qwen-VL-Chat on MMBench. (#312)
* [Feat] Support Qwen-VL base.

* [Feat] Support Qwen-VL-Chat on MMBench.

* [Fix] Add postprocessor and fix format.

* [Fix] Add type hint and remove redundant codes.

* [Fix] fix bugs in postprocessor.

* [Fix] Use given commit id.
2023-09-06 18:42:19 +08:00
Hubert
ddb8197212
[Feat] support wizardcoder series (#344)
* [Feat] support wizardcoder series

* minor fix
2023-09-06 17:52:35 +08:00
Leymore
764c2f799a
[Fix] update qwen config (#358) 2023-09-05 10:15:19 +08:00
Yuanhan Zhang
f2dd98ca7a
[Feat] Support LLaVA and mPLUG-Owl (#331)
* refine gitignore

* [Feature]: Add minigpt-4

* [Feature]: Add mm local runner

* [Feature]: Add instructblip

* add otter and llama-adapter

* add owl

* add llama2-adapter and owl

* lint

* [Feature]: Add minigpt-4

* [Feature]: Add instructblip

* add otter and llama-adapter

* add owl

* add llama2-adapter and owl

* lint

* lint

* update

* lint

* lint

* add __init__.py

* update

* update

* update

---------

Co-authored-by: liuyuan <3463423099@qq.com>
2023-09-01 23:32:05 +08:00
Tong Gao
166022f568
[Docs] Update docs for new entry script (#246)
* update docs

* update docs

* update

* update en docs

* update

* update

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-08-31 16:43:55 +08:00
Li Bo
a4d6840739
[Feat] Add Otter to OpenCompass MMBench Evaluation (#232)
* add otter model for opencompass mmbench

* add docs

* add readme docs

* debug for otter opencomass eval

* delete unused folders

* change to default data path

* remove unused files

* remove unused files

* update

* update config file

* flake8 lint formated and add prompt generator

* add prompt generator to config

* add a specific postproecss

* add post processor

* add post processor

* add post processor

* update according to suggestions

* remove unused redefinition
2023-08-31 12:55:53 +08:00
Leymore
7ca6ba625e
[Feature] Add qwen & qwen-chat support (#286)
* add and apply update suffix tool

* add tool doc

* add qwen configs

* add cmmlu

* rename bbh

* update datasets

* delete

* update hf_qwen_7b.py
2023-08-31 11:29:05 +08:00
Hubert
fd389e2d78
[Feat] support codellama and preds collection tools (#335) 2023-08-31 11:14:42 +08:00
Leymore
c26ecdb1b0
[Feature] Add and apply update suffix tool (#280)
* add and apply update suffix tool

* add dataset suffix updater as precommit hook

* update workflow

* update scripts

* update ci

* update

* ci with py3.8

* run in serial

* update bbh

* use py 3.10

* update pre commit zh cn
2023-08-28 17:35:04 +08:00
Tong Gao
9058be07b8
[Feature] Simplify entry script (#204)
* [Feature] Simply entry script

* update
2023-08-25 17:36:30 +08:00
Tong Gao
f480b72703
[Feature] Support model-bound prediction postprocessor, use it in Claude (#268)
* [Feature] Support model-bound text postprocessor, add claude as an example

* update

* update

* minor fix

---------

Co-authored-by: zhoufengzhe <zhoufengzhe@pjlab.org.cn>
2023-08-25 16:12:21 +08:00
Tong Gao
fda42fd5fd
[Fix] wrong path in dataset collections (#272) 2023-08-25 15:50:30 +08:00
Yike Yuan
3f601f420b
[Feat] Support public dataset of visualglm and llava. (#265)
* [Feat] Add public dataset support of VisualGLM.

* [Feat] Refactor LLaVA.

* [Feat] Add public dataset support of LlaVA.

* [Fix] Add  arg.
2023-08-25 15:44:32 +08:00
Yuan Liu
dc6e54f6f4
[Feature]: Verify the acc of these public datasets (#269)
* [Feature]: Refactor public dataset eval

* [Feature]: Verify public dataset acc
2023-08-25 15:01:58 +08:00
philipwangOvO
3f37c40aa3
[Dataset] Refactor LEval 2023-08-25 11:46:23 +08:00
Tong Gao
60c2d3d76b
[Feature] Add Claude support (#253)
* [Feature] Add Claude support

* [Feature] Add Claude support

* Update opencompass/models/claude_api.py

Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>

* raise import erorr

---------

Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>
2023-08-24 14:29:45 +08:00
Yuan Liu
343f785b07
[Feature]: Add Flamingo (#258)
* [Feature]: Add Openflamingo MMBench

* [Fix]: Fix import error

* [Fix]: Revert task config

* [Fix]: Fix path bug
2023-08-24 14:11:29 +08:00
Yixiao Fang
1034c487ef
[Refactor] Refactor instructblip (#227)
* refactor instructblip

* add post processor

* add forward

* fix lint

* update

* update
2023-08-23 15:33:59 +08:00
liushz
02ce139bc6
[Feature] Add Tree-of-Thought method (#173)
* Add ToT method

* Update ToT

* Update ToT

* Update ToT

* Update ToT

* Update ToT

* Update ToT

* Update ToT

* Update chain_of_thought.md

* Update icl_tot_inferencer.py

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2023-08-23 12:23:05 +08:00
Leymore
ff5ab92331
[Feature] Add llama2 native implements (#235)
* add llama2 native implements

* rename configs/eval_llama_7b.py

---------

Co-authored-by: zhoufengzhe <zhoufengzhe@pjlab.org.cn>
2023-08-23 11:33:25 +08:00
Yike Yuan
8d368d1cd6
[Feat] Support visualglm and llava for MMBench evaluation. (#211)
* [Feat] Support visualglm inference on MMBench.

* [Feat] Support llava inference on MMBench.

* [Fix] Fix pre-commit format.

* [Fix] Add docstring for llava

* [Fix] Fix multi-process inference error of LlaVA and add comments.
1. Set `low_cpu_mem_usage` to False to address device issue.
2. Add docstring and type hints.
3. Rename class and remove registry.

* [Fix] Pre-commit fix.

* [Fix] add forward entry, add dynamic import to seedbench

* [Fix] Fix pre-commit.

* [Fix] Fix missing context.

* [Fix] Fix docstring.
2023-08-21 15:57:30 +08:00
Yike Yuan
a6552224cb
[Feat] Support multi-modal evaluation on MME benchmark. (#197)
* [Feat] Support multi-modal evaluation on MME benchmark.

* [Fix] Remove debug code.

* [Fix] Remove redundant codes and add type hints.

* [Fix] Rename in config.

* [Fix] Rebase main.

* [Fix] Fix isort and yapf conflict.
2023-08-21 15:53:20 +08:00
philipwangOvO
655a807f4b
[Dataset] LongBench (#236)
Co-authored-by: wangchonghua <wangchonghua@pjlab.org.cn>
2023-08-21 14:15:20 +08:00
Yuan Liu
90c07a3dfd
[Fix]: Fix name (#223) 2023-08-17 18:30:48 +08:00
Yuan Liu
3d49a20b95
[Feature]: Add launch script (#222) 2023-08-17 18:26:01 +08:00
Yixiao Fang
0fa2482661
[Feature] Support SEED-Bench (#203)
* support seedbench

* update docstrings

* update

* update

* update

* update according to review

* rebase

* fix lint

* update
2023-08-17 17:24:02 +08:00
Yuan Liu
ae3c1869da
[Feature]: Add other public datasets config (#214)
* [Feature]: Add flickr30k

* [Feature]: Add GQA

* [Feature]: Add OCR VQA

* [Feature]: Add OK VQA

* [Feature]: Add text vqa

* [Feature]: Add other vqa
2023-08-17 11:11:26 +08:00
Ezra-Yu
17ccaa5980
[Feat] Add codegeex2 and Humanevalx (#210)
* add codegeex2

* add humanevalx dataset

* add evaluator

* update evaluator

* update configs

* update clean code

* update configs

* fix lint

* remove sleep

* fix lint

* update docs

* fix lint
2023-08-17 11:03:16 +08:00
Hubert
0fe2366a72
[Feat] support adv_glue dataset for adversarial robustness (#205)
* [Feat] support adv_glue dataset for adversarial robustness

* reorg files

* minor fix

* minor fix
2023-08-16 18:42:06 +08:00
Yuan Liu
78df9bd0cb
[Feature]: Add other public datasets (#206)
* [Feature]: Refactor class name

* [Feature]: Add minigpt-4 coco caption

* [Feature]: Update minigpt-4 coco caption

* [Feature]: Add MiniGPT-4 ScienceQA

* [Feature]: Add minigpt-4 vqav2

* [Feature]: Add VSR

* [Feature]: Revert task to previous version
2023-08-16 11:37:26 +08:00
Hubert
7c393192af
[Fix] fix bug for postprocessor (#195)
* [Fix] fix bug for postprocessor

* minor fix
2023-08-11 18:41:12 +08:00
Tong Gao
bf79ff1c6d
[Feature] Add LEval datasets
Co-authored-by: kennymckormick <dhd@pku.edu.cn>
2023-08-11 17:38:31 +08:00
Hubert
8d9cee060f
[Feat] update postprocessor to get first option more accurately (#193)
* [Feat] update postprocessor to get first option

* minor fix

* minor fix
2023-08-11 17:33:00 +08:00
Leymore
14332e08fd
[Feature] add llama-oriented dataset configs (#82)
* add llama-oriented dataset configs

* update

* revert cvalues & update llama_example
2023-08-11 12:48:05 +08:00
Hubert
5a9539f375
[Feat] add safety to collections (#185)
* [Feat] add safety to collections

* minor fix
2023-08-11 11:19:26 +08:00
Tong Gao
2931f3dcb8
[Enhancement] Add humaneval postprocessor for GPT models & eval config for GPT4, enhance the original humaneval postprocessor (#129)
* [Enhancement] Enhance humaneval postprocessor

* add human-eval testcase

* update

* update

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-08-10 16:31:12 +08:00
Songyang Zhang
3f36db3b06
[Feature] Support turbomind (#166)
* support turbomind

* update doc

* Update docs/en/advanced_guides/evaluation_turbomind.md

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* Update docs/zh_cn/advanced_guides/evaluation_turbomind.md

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* Update docs/zh_cn/advanced_guides/evaluation_turbomind.md

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* Update docs/en/advanced_guides/evaluation_turbomind.md

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* update

---------

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
2023-08-10 16:25:11 +08:00
Leymore
e7fc54baf1
[Feature] Add Xiezhi SQuAD2.0 ANLI (#101)
* add Xiezhi SQuAD2.0 ANLI; update WSC

* update

* update

* update doc string
2023-08-10 14:04:18 +08:00
Yuan Liu
a205629ff3
[Feature]: Refactor input and output (#176)
* [Feature]: Refactor input and output

* [Feature]: Update tasks
2023-08-10 14:01:28 +08:00
Leymore
876ade71a5
[Fix] Fix AGIEval multiple choice (#137)
* update agieval data

* rename variables
2023-08-10 11:38:24 +08:00
Zaida Zhou
af436f5951
[Feature] Calculate max_out_len without hard code for OpenAI model (#158)
* calulate max_out_len without hard code

* set default value

* update configs

* Update configs/eval_gpt3.5.py

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

---------

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
2023-08-08 15:16:56 +08:00