Commit Graph

138 Commits

Author SHA1 Message Date
Songyang Zhang
6997990c93
[Feature] Update Models (#1518)
* Update Models

* Update

* Update humanevalx

* Update

* Update
2024-09-12 23:35:30 +08:00
Linchen Xiao
317763381c
update (#1517) 2024-09-11 13:31:20 +08:00
Linchen Xiao
f04f3546bc
[Fix] Import fix (#1500) 2024-09-06 18:29:24 +08:00
Linchen Xiao
87ffa71d68
[Feature] Longbench dataset update 2024-09-06 15:50:12 +08:00
Hari Seldon
faf5260155
[Feature] Optimize Evaluation Speed of SciCode (#1489)
* update scicode

* update comments

* remove redundant variable

* Update

---------

Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-09-06 00:59:41 +08:00
liushz
00fc8da5be
[Feature] Add model postprocess function (#1484)
* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

* Add model postprocess function

---------

Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-05 21:10:29 +08:00
Linchen Xiao
6c9cd9a260
[Feature] Needlebench auto-download update (#1480)
* update

* update

* update
2024-09-05 17:22:42 +08:00
Linchen Xiao
9693be46b7
[Feature] Mmlu-pro auto-download (#1464)
* update

* update

* update

* update

* update
2024-08-30 10:03:40 +08:00
Songyang Zhang
e5a8eb2283
[Feature] Update Lint and Leaderboard (#1458)
* [Feature] Update Lint and Leaderboard

* Update

* Update
2024-08-28 22:36:42 +08:00
Linchen Xiao
245664f4c0
[Feature] Fullbench v0.1 language update (#1463)
* update

* update

* update

* update
2024-08-28 14:01:05 +08:00
Linchen Xiao
94b6bd65fc
[Fix] Fix cli evaluation for multiple models (#1454)
* update

* update
2024-08-23 17:15:36 +08:00
Songyang Zhang
5485207fbe
[Bump] Bump version to 0.3.1 (#1450)
* [Bump] Bump version 0.3.1

* Update
2024-08-23 10:47:57 +08:00
Songyang Zhang
7c2d25b557
[Fix] Update SciCode and Gemma model (#1449)
* [Fix] Update SciCode and Gemma model

* Update

* Update
2024-08-23 10:42:27 +08:00
liushz
9fdbc744dc
[Fix] Update option postprocess & mathbench language summarizer (#1413)
* Update option postprocess & mathbench language summarizer

* Update option postprocess & mathbench language summarizer

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-08-22 14:49:07 +08:00
Linchen Xiao
0fe9756c5d
[Doc] Update Readme (#1439)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update
2024-08-22 14:48:45 +08:00
Hari Seldon
14b4b735cb
[Feature] Add support for SciCode (#1417)
* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode

* add SciCode w/ bg

* add scicode

* Update README.md

* Update README.md

* Delete configs/eval_SciCode.py

* rename

* 1

* rename

* Update README.md

* Update scicode.py

* Update scicode.py

* fix some bugs

* Update

* Update

---------

Co-authored-by: root <HariSeldon0>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-22 13:42:25 +08:00
liushz
d3963bceae
[Bug] Add model support for 'huggingface_above_v4_33' when using '-a' (#1430)
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-08-22 13:40:24 +08:00
Linchen Xiao
a4b54048ae
[Feature] Add Ruler datasets (#1310)
* [Feature] Add Ruler datasets

* pre-commit fixed

* Add model specific tokenizer to dataset

* pre-commit modified

* remove unused import

* fix linting

* add trust_remote to tokenizer load

* lint fix

* comments resolved

* fix lint

* Add readme

* Fix lint

* ruler refactorize

* fix lint

* lint fix

* updated

* lint fix

* fix wonderwords import issue

* prompt modified

* update

* readme updated

* update

* ruler dataset added

* Update

---------

Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-20 11:40:11 +08:00
Songyang Zhang
9b3613f10b
[Update] Support auto-download of FOFO/MT-Bench-101 (#1423)
* [Update] Support auto-download of FOFO/MT-Bench-101

* Update wildbench
2024-08-16 11:57:41 +08:00
Linchen Xiao
8e55c9c6ee
[Update] Compassbench v1.3 (#1396)
* stash files

* compassbench subjective evaluation added

* evaluation update

* fix lint

* update docs

* Update lint

* changes saved

* changes saved

* CompassBench subjective summarizer added (#1349)

* subjective summarizer added

* fix lint

[Fix] Fix MathBench (#1351)

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>

[Update] Update model support list (#1353)

* fix pip version

* fix pip version

* update model support

subjective summarizer updated

knowledge, math objective done (data need update)

remove secrets

objective changes saved

knowledge data added

* secrets removed

* changed added

* summarizer modified

* summarizer modified

* compassbench coding added

* fix lint

* objective summarizer updated

* compass_bench_v1.3 updated

* update files in config folder

* remove unused model

* lcbench modified

* removed model evaluation configs

* remove duplicated sdk implementation

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-08-12 19:09:19 +08:00
Songyang Zhang
c81329b548
[Fix] Fix Slurm ENV (#1392)
1. Support Slurm Cluster
2. Support automatic data download
3. Update InternLM2.5-1.8B/20B-Chat
2024-08-06 01:35:20 +08:00
Songyang Zhang
704853e5e7
[Feature] Update pip install (#1324)
* [Feature] Update pip install

* Update Configuration

* Update

* Update

* Update

* Update Internal Config

* Update collect env
2024-07-29 18:32:50 +08:00
Xingjun.Wang
edab1c07ba
[Feature] Support ModelScope datasets (#1289)
* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* udpate dataset for modelscope support

* update readme

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* update readme

* remove tydiqa japanese subset

* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* update readme

* udpate dataset for modelscope support

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* remove tydiqa japanese subset

* update util

* remove .DS_Store

* fix md format

* move util into package

* update docs/get_started.md

* restore eval_api_zhipu_v2.py, add environment setting

* Update dataset

* Update

* Update

* Update

* Update

---------

Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local>
Co-authored-by: Yunnglin <mao.looper@qq.com>
Co-authored-by: Yun lin <yunlin@laptop.local>
Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-29 13:48:32 +08:00
Fengzhe Zhou
1d3a26c732
[Doc] quick start swap tabs (#1263)
* [doc] quick start swap tabs

* update docs

* update

* update

* update

* update

* update

* update

* update
2024-07-05 23:51:42 +08:00
Fengzhe Zhou
a32f21a356
[Sync] Sync with internal codes 2024.06.28 (#1279) 2024-06-28 14:16:34 +08:00
klein
1fa62c4a42
Support wildbench (#1266)
Co-authored-by: Leymore <zfz-960727@163.com>
2024-06-24 13:16:27 +08:00
bittersweet1999
7c381e5be8
[Fix] fix summarizer (#1217)
* fix summarizer

* fix summarizer
2024-05-31 11:40:47 +08:00
Fengzhe Zhou
a77b8a5cec
[Sync] format (#1214) 2024-05-30 00:21:58 +08:00
Fengzhe Zhou
d656e818f8
[Docs] Remove --no-batch-padding and Use --hf-num-gpus (#1205)
* [Docs] Remove --no-batch-padding and Use -hf-num-gpus

* update
2024-05-29 16:30:10 +08:00
Fengzhe Zhou
2954913d9b
[Sync] bump version (#1204) 2024-05-28 23:09:59 +08:00
liushz
ba620c4afe
Update accelerator (#1195)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

* Update accelerator

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-28 17:17:54 +08:00
Fengzhe Zhou
5de85406ce
[Sync] add OC16 entry (#1171) 2024-05-17 16:50:58 +08:00
Fengzhe Zhou
8ea2c404d7
[Feat] enable HuggingFacewithChatTemplate with --accelerator via cli (#1163)
* enable HuggingFacewithChatTemplate with --accelerator via cli

* rm vllm_internlm2_chat_7b
2024-05-15 21:51:07 +08:00
liushz
e3c0448bbc
Update accelerator (#1152)
* Update acclerator

* update run

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
2024-05-15 14:31:47 +08:00
Fengzhe Zhou
62dbf04708
[Sync] update github workflow (#1156) 2024-05-14 22:42:23 +08:00
Fengzhe Zhou
7505b3cadf
[Feature] Add huggingface apply_chat_template (#1098)
* add TheoremQA with 5-shot

* add huggingface_above_v4_33 classes

* use num_worker partitioner in cli

* update theoremqa

* update TheoremQA

* add TheoremQA

* rename theoremqa -> TheoremQA

* update TheoremQA output path

* rewrite many model configs

* update huggingface

* further update

* refine configs

* update configs

* update configs

* add configs/eval_llama3_instruct.py

* add summarizer multi faceted

* update bbh datasets

* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py

* rename class

* update readme

* update hf above v4.33
2024-05-14 14:50:16 +08:00
Fengzhe Zhou
19d7e630d6
[Sync] Update accelerator (#1122)
(cherry picked from commit 4beb6d9ab655d8a626971841b7acfd9fae9d438f)

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-09 14:32:31 +08:00
Fengzhe Zhou
d43392a3bb
[Feature] Add mmlu prompt from simple_evals, openai (#1074)
* add mmlu prompt from simple_evals, openai

* return empty str on failure
2024-05-06 13:26:26 +08:00
dmitrysarov
cce5b6fbb6
fix output typing, change mutable list to immutable tuple (#989)
* fix output typing, change mutable list to immutable tuple

* import missed type

* format

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 23:07:34 +08:00
Haodong Duan
3a232db471
[Deperecate] Remove multi-modal related stuff (#1072)
* Remove MultiModal

* update index.rst

* update README

* remove mmbench codes

* update news

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 21:20:14 +08:00
bittersweet1999
6ba1c4937d
[Feature] Support Math evaluation via judgemodel (#1094)
* support openai math evaluation

* support openai math evaluation

* support openai math evaluation

* support math llm judge

* support math llm judge
2024-04-26 14:56:23 +08:00
Fengzhe Zhou
8c85edd1cd
[Sync] deprecate old mbpps (#1064) 2024-04-19 20:49:46 +08:00
Robin Chen
c172401323
[Fix] Fixed repeated loading of VLLM (#1051)
* [fix]Fixed the issue caused by the repeated loading of VLLM model during task segmentation.

* [fix] avoid TypeError: VLLM.__init__() got an unexpected keyword argument 'tokenizer_only'

* restore .pre-commit-config.yaml

* restore opencompass/tasks/openicl_infer.py

---------

Co-authored-by: IcyFeather <mengzhuo.happy@gmail.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-17 20:36:08 +08:00
Fengzhe Zhou
b39f501563
[Sync] update taco (#1030) 2024-04-09 17:50:23 +08:00
Mo Li
b50d163265
[Fix] Refactor Needlebench Configs for CLI Testing Support (#1020)
* add needlebench datasets suffix

* fix import

* update run.py args for summarizer key and dataset suffix

* update utils/run.py
2024-04-07 15:12:56 +08:00
bittersweet1999
2d4e559763
[Feature] Add multi-model judge and fix some problems (#1016)
* support multi-model judge and moe judge

* test_moe

* test_moe

* test

* add moe judge

* support multi-judge-model
2024-04-02 11:52:06 +08:00
Fengzhe Zhou
d34ba11106
[Sync] Merge branch 'dev' into zfz/update-keyset-demo (#876) 2024-02-05 23:29:10 +08:00
Fengzhe Zhou
0991dd33a0
[Sync] Updata dataset cfg for internMath (#837)
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-01-24 16:30:32 +08:00
Fengzhe Zhou
b4afe3e7c1
[Sync] Add InternLM2 Keyset Evaluation Demo (#807)
Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
2024-01-17 13:48:12 +08:00
Fengzhe Zhou
32f40a8f83
[Sync] Sync with internal codes 2023.01.08 (#777) 2024-01-08 14:07:24 +00:00
Fengzhe Zhou
3a68083ecc
[Sync] update configs (#734) 2023-12-25 21:59:16 +08:00
Hubert
e78857ac36
[Sync] minor test (#683) 2023-12-11 17:42:53 +08:00
Hubert
d4af31bab4
[Feat] support zhipu post process (#642)
* [Feat] support zhipu post

* [Feat] support zhipu post

* [Feat] support zhipu post
2023-11-27 19:57:36 +08:00
Fengzhe Zhou
9083dea683
[Sync] some renaming (#641) 2023-11-27 16:06:49 +08:00
liushz
c9c5c5d92e
Mathbench update postprocess (#600)
* Update mathbench

* Update mathbench
2023-11-20 16:48:55 +08:00
Hubert
91fba2c2e9
[Feat] support humaneval and mbpp pass@k (#598)
* [Feat] support pass@ k

* [Feat] support pass@k

* [Feat] support pass@k

* [Feat] support pass@k

* [Feat] support pass@k

* [Feat] support pass@k docs

* update naming

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-16 21:22:06 +08:00
Fengzhe Zhou
d6aaac22e7
[Feature] Update cmb (#571) 2023-11-13 00:09:05 +08:00
Hubert
6f07af3039
[Feat] Support local runner for windows (#515) 2023-10-27 17:16:22 +08:00
Fengzhe Zhou
6405cd2db5
use example summarizer by default (#508) 2023-10-27 11:45:29 +08:00
Leymore
fccfcb6f5b
fix summary default (#483) 2023-10-17 11:32:38 +08:00
Leymore
fbf5089c40
[Sync] update github token (#475) 2023-10-13 06:50:54 -05:00
Tong Gao
119bfd1569
[Refactor] Move fix_id_list to Retriever (#442)
* [Refactor] Move fix_id_list to Retriever

* update

* move to base

* fix
2023-10-07 12:53:41 +08:00
chenbohua3
b2926eac8f
[Feature] support customize config path (#423)
* support customize config path

* support customize config path

* support customize config path
2023-09-22 19:12:02 +08:00
Tong Gao
a1ea3c094a
[Sync] Initial support of subjective evaluation (#421)
Co-authored-by: Leymore <zfz-960727@163.com>
2023-09-22 15:42:31 +08:00
Zequn Liu
ff2c15a09f
[fix] summarizer debug logger (#417) 2023-09-20 15:29:26 +08:00
Yuanhan Zhang
7c2726c23b
[Model] Yhzhang/add mlugowl llamaadapter (#405)
* refine gitignore

* [Feature]: Add minigpt-4

* [Feature]: Add mm local runner

* [Feature]: Add instructblip

* add otter and llama-adapter

* add owl

* add llama2-adapter and owl

* lint

* [Feature]: Add minigpt-4

* [Feature]: Add instructblip

* add otter and llama-adapter

* add owl

* add llama2-adapter and owl

* lint

* lint

* update

* lint

* lint

* add __init__.py

* update

* update

* update

* update

* [Feature]: Add minigpt-4

* [Feature]: Add mm local runner

* [Feature]: Add instructblip

* add otter and llama-adapter

* add owl

* add llama2-adapter and owl

* lint

* [Feature]: Add minigpt-4

* [Feature]: Add instructblip

* add otter and llama-adapter

* add owl

* add llama2-adapter and owl

* lint

* lint

* update

* lint

* lint

* add __init__.py

* update

* update

* update

* update

* optimize mmbench dataset args

* update

* update

* run commit hook

---------

Co-authored-by: liuyuan <3463423099@qq.com>
Co-authored-by: kennymckormick <dhd@pku.edu.cn>
Co-authored-by: kennymckormick <dhd.efz@gmail.com>
2023-09-19 14:21:26 +08:00
so2liu
267401bded
[Feat] add custom summarizer argument in CLI run mode 在CLI启动模式中添加自定义Summarizer参数 (#411)
* feat: add custom summarizer in CLI run mode

* feat: search local config by match_cfg_file
2023-09-18 18:11:22 +08:00
Mashiro
ab21f3be66
[Enhance] Supress warning raised by get_logger (#353) 2023-09-04 15:27:08 +08:00
Tong Gao
ce65d3393b
[Sync] Use finally to clean up temp files (#337) 2023-09-04 15:20:16 +08:00
Leymore
e810974068
[Fix] Fix when missing both pad and eos token (#287)
* fix when missing both pad and eos token

* update pad_token_id impl
2023-08-31 16:53:39 +08:00
Tong Gao
9058be07b8
[Feature] Simplify entry script (#204)
* [Feature] Simply entry script

* update
2023-08-25 17:36:30 +08:00
Tong Gao
f480b72703
[Feature] Support model-bound prediction postprocessor, use it in Claude (#268)
* [Feature] Support model-bound text postprocessor, add claude as an example

* update

* update

* minor fix

---------

Co-authored-by: zhoufengzhe <zhoufengzhe@pjlab.org.cn>
2023-08-25 16:12:21 +08:00
Hubert
7c393192af
[Fix] fix bug for postprocessor (#195)
* [Fix] fix bug for postprocessor

* minor fix
2023-08-11 18:41:12 +08:00
Hubert
8d9cee060f
[Feat] update postprocessor to get first option more accurately (#193)
* [Feat] update postprocessor to get first option

* minor fix

* minor fix
2023-08-11 17:33:00 +08:00
Zaida Zhou
f4c70ba6c3
[Feature] Support filtering specified levels message (#187)
* Support filtering message

* minor fix
2023-08-11 10:46:46 +08:00
Yuan Liu
191a3f6f9d
[Feature]: Use multimodal (#73)
* [Feature]: Add minigpt-4

* [Feature]: Add mm local runner

* [Feature]: Add instructblip

* [Feature]: Delete redundant file

* [Feature]: Delete redundant file

* [Feature]: Add README to InstructBLIP

* [Feature]: Update MiniGPT-4

* [Fix]: Fix lint

* [Feature]add omnibenchmark readme (#49)

* add omnibenchmark readme

* fix

* Update OmniMMBench.md

* Update OmniMMBench.md

* Update OmniMMBench.md

* [Fix]: Refine name (#54)

* [Feature]: Unify out and err

* [Fix]: Fix lint

* [Feature]: Rename to mmbench and change weight path

* [Feature]: Delete Omni in instructblip

* [Feature]: Check the avaliablity of lavis

* [Fix]: Fix lint

* [Feature]: Refactor MM

* [Refactor]: Refactor path

* [Feature]: Delete redundant files

* [Refactor]: Delete redundant files

---------

Co-authored-by: Wangbo Zhao(黑色枷锁) <56866854+wangbo-zhao@users.noreply.github.com>
2023-08-03 11:07:50 +08:00
Tong Gao
8b163bd8e9
[Feature] Several enhancements (#142) 2023-08-01 18:19:49 +08:00
Haodong Duan
6e885d668b
force utf-8 encoding for all non-dataset fileios (#97) 2023-07-25 10:06:01 +08:00
Tong Gao
1e44541730
[Enhancement] Test linting in CI and fix existing linting errors (#69)
* [Enhancement] Test linting in CI

* fix linting
2023-07-17 15:59:10 +08:00
Hubert
7f8eee4725
[Docs] add en docs (#15)
* add en docs

* update

---------

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2023-07-06 12:58:44 +08:00
Leymore
86d5ec3d0f
Update configs (#9)
* Update implements

* Update
2023-07-06 12:27:41 +08:00
Hubert
5c19c8c5fc
[Docs] add issue and pr template (#12)
* [Feat] add issue and pr template

* minor add utils

* minor fix
2023-07-06 11:55:01 +08:00
Tong Gao
719ba34d1b
[Enhancement] Update prompt hash computation (#2) 2023-07-05 18:29:07 +08:00
Ma Zerun
5840c7655c
Update start guide (#4) 2023-07-05 18:26:26 +08:00
Leymore
c94cc94348 Add release contribution 2023-07-05 03:15:31 +00:00
Ezra-Yu
cbe9fe2cdb Add Release Contraibution 2023-07-05 02:22:40 +00:00
yingfhu
fb11108723 [Feat] support opencompass 2023-07-04 22:11:33 +08:00
gaotongxiao
7d346000bb initial commit 2023-07-04 21:34:55 +08:00