Commit Graph

175 Commits

Author SHA1 Message Date
Songyang Zhang
46cc7894e1
[Feature] Support import configs/models/summarizers from whl (#1376)
* [Feature] Support import configs/models/summarizers from whl

* Update LCBench configs

* Update

* Update

* Update

* Update

* update

* Update

* Update

* Update

* Update

* Update
2024-08-01 00:42:48 +08:00
Songyang Zhang
eee5a5be23
[Fix] Update get_data_path for LCBench and HumanEval (#1375) 2024-07-29 19:28:09 +08:00
Songyang Zhang
704853e5e7
[Feature] Update pip install (#1324)
* [Feature] Update pip install

* Update Configuration

* Update

* Update

* Update

* Update Internal Config

* Update collect env
2024-07-29 18:32:50 +08:00
Xingjun.Wang
edab1c07ba
[Feature] Support ModelScope datasets (#1289)
* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* udpate dataset for modelscope support

* update readme

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* update readme

* remove tydiqa japanese subset

* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* update readme

* udpate dataset for modelscope support

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* remove tydiqa japanese subset

* update util

* remove .DS_Store

* fix md format

* move util into package

* update docs/get_started.md

* restore eval_api_zhipu_v2.py, add environment setting

* Update dataset

* Update

* Update

* Update

* Update

---------

Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local>
Co-authored-by: Yunnglin <mao.looper@qq.com>
Co-authored-by: Yun lin <yunlin@laptop.local>
Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-29 13:48:32 +08:00
jxd
12b84aeb3b
[Feature] Update CHARM Memeorziation (#1230)
* update gemini api and add gemini models

* add openai models

* update CHARM evaluation

* add CHARM memorization tasks

* add CharmMemSummarizer (output eval details for memorization-independent reasoning analysis

* update CHARM readme

---------

Co-authored-by: wujiang <wujiang@pjlab.org.cn>
2024-07-26 18:42:30 +08:00
bittersweet1999
d3782c1d47
Revert "Calm dataset (#1287)" (#1366)
This reverts commit edd0ffdf70.
2024-07-26 18:27:29 +08:00
Peng Bo
edd0ffdf70
Calm dataset (#1287)
* add calm dataset

* modify config max_out_len

* update README

* Modify README

* update README

* update README

* update README

* update README

* update README

* add summarizer and modify readme

* delete summarizer config comment

* update summarizer

* modify same response to all questions

* update README
2024-07-26 11:48:16 +08:00
klein
65fad8e2ac
[Fix] minor update wildbench (#1335)
* update crb

* update crbbench

* update crbbench

* update crbbench

* minor update wildbench

* [Fix] Update doc of wildbench, and merge wildbench into subjective

* [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench

* Update crb.md

* Update crb_pair_judge.py

* Update crb_single_judge.py

* Update subjective_evaluation.md

* Update openai_api.py

* [Update] update wildbench readme

* [Update] update wildbench readme

* [Update] update wildbench readme, remove crb

* Delete configs/eval_subjective_wildbench_pair.py

* Delete configs/eval_subjective_wildbench_single.py

* Update __init__.py

---------

Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-26 11:19:04 +08:00
Mo Li
69aa2f2d57
[Feature] Make NeedleBench available on HF (#1364)
* update_lint

* update_huggingface format

* fix bug

* update docs
2024-07-25 19:01:56 +08:00
Que Haoran
a244453d9e
[Feature] Support inference ppl datasets (#1315)
* commit inference ppl datasets

* revised format

* revise

* revise

* revise

* revise

* revise

* revise
2024-07-22 17:59:30 +08:00
liushz
98c58f8a6c
[Feature] Add compassbench knowledge&math part (#1342)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

* Update accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Update compassbench august wiki&math

* Update compassbench august wiki&math

* Update compassbench august wiki&math

* Update compassbench_aug_gen_068af0.py

* Update compassbench_aug_gen_068af0.py

* Update

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 22:54:46 +08:00
bittersweet1999
1f9f728f22
[Feature] support compassbench Checklist evaluation (#1339)
* fix pip version

* fix pip version

* support checklist eval

* init

* add lan

* fix typo
2024-07-19 16:40:44 +08:00
Fengzhe Zhou
a62c613d3e
[Sync] bump version 0.2.6+local (#1294) 2024-07-06 00:44:06 +08:00
bittersweet1999
68ca48496b
[Refactor] Reorganize subjective eval (#1284)
* fix pip version

* fix pip version

* reorganize subjective eval

* reorg sub

* reorg subeval

* reorg subeval

* update subjective doc

* reorg subeval

* reorg subeval
2024-07-05 22:11:37 +08:00
Fengzhe Zhou
a32f21a356
[Sync] Sync with internal codes 2024.06.28 (#1279) 2024-06-28 14:16:34 +08:00
Xingyuan Bu
842fb1cd70
Update mtbench101.py (#1276)
fix wrong-used import
from torch.utils.data import DataLoader, Dataset
2024-06-26 00:40:22 +08:00
klein
1fa62c4a42
Support wildbench (#1266)
Co-authored-by: Leymore <zfz-960727@163.com>
2024-06-24 13:16:27 +08:00
bittersweet1999
982e024540
[Feature] add dataset Fofo (#1224)
* add fofo dataset

* add dataset fofo
2024-06-06 11:40:48 +08:00
Xingyuan Bu
02a0a4e857
MT-Bench-101 (#1215)
* add mt-bench-101

* add readme and requirements

* add mt-bench-101 data

* Update readme_mtbench101.md

* update readme

* update leaderboard

* fix typo

* Update readme_mtbench101.md

* fit newest opencompass

* update readme.md

* mtbench101 to opencompass

* mtbench101 to opencompass

* for code review

* for code review

* for code review

* hook

* hook

---------

Co-authored-by: liujie <ljie@buaa.edu.cn>
2024-06-03 14:52:12 +08:00
mqy004
b272803d8a
解决release版本安装后不能导入opencompass.cli.main的问题 (#1221)
* Create __init__.py

* Create __init__.py

* Create __init__.py

* Create __init__.py

* Create __init__.py

* Create __init__.py

* format

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-31 13:23:33 +08:00
Fengzhe Zhou
a77b8a5cec
[Sync] format (#1214) 2024-05-30 00:21:58 +08:00
Fengzhe Zhou
2954913d9b
[Sync] bump version (#1204) 2024-05-28 23:09:59 +08:00
jxd
608ff5810d
support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks (#1190)
* support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks

* fix lint error

* add dataset card for CHARM

* minor refactor

* add txt

---------

Co-authored-by: wujiang <wujiang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-27 13:48:22 +08:00
yaoyingyy
749e4cea71
[Fix] temporary files using tempfile (#1186)
Co-authored-by: yaoying <yaoying@kingsoft.com>
2024-05-24 23:27:37 +08:00
Fengzhe Zhou
2b3d4150f3
[Sync] update evaluator (#1175) 2024-05-21 14:22:46 +08:00
Fengzhe Zhou
5de85406ce
[Sync] add OC16 entry (#1171) 2024-05-17 16:50:58 +08:00
Fengzhe Zhou
80f831b425
[Fix] use ProcessPoolExecutor during mbpp eval (#1159) 2024-05-15 13:48:29 +08:00
Fengzhe Zhou
7505b3cadf
[Feature] Add huggingface apply_chat_template (#1098)
* add TheoremQA with 5-shot

* add huggingface_above_v4_33 classes

* use num_worker partitioner in cli

* update theoremqa

* update TheoremQA

* add TheoremQA

* rename theoremqa -> TheoremQA

* update TheoremQA output path

* rewrite many model configs

* update huggingface

* further update

* refine configs

* update configs

* update configs

* add configs/eval_llama3_instruct.py

* add summarizer multi faceted

* update bbh datasets

* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py

* rename class

* update readme

* update hf above v4.33
2024-05-14 14:50:16 +08:00
bittersweet1999
826d8307ac
fix links (#1120) 2024-05-08 15:13:18 +08:00
JuhaoLiang
d2c40e5648
[Feature] Add AceGPT-MMLUArabic benchmark (#1099)
* add AceGPT-MMLUArabic benchmark

* update readme and fix lint issue

* remove unused package

* add MMLUArabic zero-shot settings

* rename filename and update readme
2024-05-08 15:00:26 +08:00
Fangyu Lei
862044fb7d
[Feature] Add S3Eval Dataset (#916)
* s3eval_branch

* update s3eval
2024-05-06 19:41:52 +08:00
Yggdrasill7D6
af10ecc272
add mgsm datasets (#1081)
* add mgsm datasets

* fix lint

* fix lint

* update mgsm

* update mgsm

* ease code spell

* update

* update

* update

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-06 15:29:34 +08:00
klein
153c4fc988
[Feature] update drop dataset from openai simple eval (#1092)
* [Feature] update drop dataset from openai simple eval

* update drop template presentation

* update

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-06 13:37:08 +08:00
Alexander Lam
35c94d0cde
[Feature] Adding support for LLM Compression Evaluation (#1108)
* fixed formatting based on pre-commit tests

* fixed typo in comments; reduced the number of models in the eval config

* fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset

* removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English
2024-04-30 10:51:01 +08:00
bittersweet1999
3de48e9b35
[Bug] Fix CMB dataset (#1106) 2024-04-30 00:33:43 +08:00
Yggdrasill7D6
58a57a4c45
[Feature] add support for Flames datasets (#1093)
* add flames datasets

* fix lint

* rm quota

* add judgemodel info and fix os path

* support flames dataset

* support flames dataset

---------

Co-authored-by: bittersweet1999 <1487910649@qq.com>
2024-04-28 18:56:24 +08:00
Francis-llgg
f1ee11de14
[Feature] Add gpqa prompt from simple_evals, openai (#1080)
* add gpqa_openai_simple_eval

* 触发CI构建

* reorg

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 20:13:00 +08:00
klein
e4830a6926
Update CIBench (#1089)
* modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4

* update cibench: dataset and evluation

* cibench summarizer bug

* update cibench

* move extract_code import

---------

Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 18:46:02 +08:00
bittersweet1999
e404b72c52
[Feature] support arenahard evaluation (#1096)
* support arenahard

* support arenahard

* support arenahard
2024-04-26 15:42:00 +08:00
bittersweet1999
6ba1c4937d
[Feature] Support Math evaluation via judgemodel (#1094)
* support openai math evaluation

* support openai math evaluation

* support openai math evaluation

* support math llm judge

* support math llm judge
2024-04-26 14:56:23 +08:00
Fengzhe Zhou
004ed79593
[Feature] Add TheoremQA with 5-shot (#1048)
* add TheoremQA with 5-shot

* cherry pick from add-huggingface-above-v4.33, good TheoremQA results
2024-04-22 15:22:04 +08:00
Fengzhe Zhou
8c85edd1cd
[Sync] deprecate old mbpps (#1064) 2024-04-19 20:49:46 +08:00
liuwei130
a00e57296f
[Feature] Add ChemBench (#1032)
* add ChemBench

* update results

* molbench -> ChemBench

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-12 08:46:26 +08:00
Fengzhe Zhou
b39f501563
[Sync] update taco (#1030) 2024-04-09 17:50:23 +08:00
Mo Li
f2af49337d
[Feature] Add ATC Choice Version (#1019)
* Squashed commit of the following:

commit c48ad194c3976dc63d1b60d8c8ab2d5ff9e1cbfe
Author: DseidLi <2568818204@qq.com>
Date:   Tue Apr 2 16:57:43 2024 +0800

    add atc_choice

commit 3ac6efea29619573e6fac8fa3cce464853dcead0
Merge: 2d4e559 8e3a9c3
Author: DseidLi <2568818204@qq.com>
Date:   Tue Apr 2 16:41:38 2024 +0800

    Merge branch 'atc_choice' into atc_add_choice

commit 8e3a9c396a3e5546d3faf584183f6fd60b974d5e
Merge: 150a036 0a6a03f
Author: DseidLi <2568818204@qq.com>
Date:   Tue Mar 26 04:47:07 2024 +0800

    Merge branch 'main' into atc_choice

    Conflicts:
    	configs/summarizers/needlebench.py
    	opencompass/datasets/needlebench/multi.py
    	opencompass/datasets/needlebench/origin.py
    	opencompass/datasets/needlebench/parallel.py

commit 150a036d6d990f26a57c974d1af83d88c31a0f9d
Merge: 8d6ac9a 940dd18
Author: DseidLi <2568818204@qq.com>
Date:   Wed Mar 20 03:49:08 2024 +0800

    Merge branch 'needlebench_fix' into atc_choice

commit 8d6ac9a1a43b1c9d0f0ea27e7d58968a203ea898
Author: DseidLi <2568818204@qq.com>
Date:   Wed Mar 20 03:41:49 2024 +0800

    optimize needlebench code

commit 940dd18a4270f24bc69edd2a780182c68918e1a9
Author: DseidLi <2568818204@qq.com>
Date:   Wed Mar 20 03:39:46 2024 +0800

    fix vllm

commit d8be6877bc41051f3edcc0421c462c834c0f1c9a
Merge: ecad78a 2527fda
Author: DseidLi <2568818204@qq.com>
Date:   Tue Mar 19 21:07:08 2024 +0800

    Merge remote-tracking branch 'origin/add_1M_dataset' into atc_choice

commit 2527fda8a5
Author: DseidLi <2568818204@qq.com>
Date:   Tue Mar 19 16:03:40 2024 +0800

    add model configs

commit 75425acdf8
Author: DseidLi <2568818204@qq.com>
Date:   Tue Mar 19 16:02:15 2024 +0800

    add prompt postion args

commit 367ba1ba61
Author: DseidLi <2568818204@qq.com>
Date:   Wed Feb 28 21:40:00 2024 +0800

    add Needlebench-1000K configs

commit ecad78af14c4bb00fe325779114b384c57ab30bf
Author: DseidLi <2568818204@qq.com>
Date:   Thu Mar 14 22:08:32 2024 +0800

    fix atc

commit 08772c0787b18872abadc9ffec3223941a5ee0c2
Merge: 9f3f8cf caf1cf8
Author: DseidLi <2568818204@qq.com>
Date:   Thu Mar 14 22:07:28 2024 +0800

    Merge branch 'main' into atc_choice

    Conflicts:
    	configs/datasets/needlebench/readme.md
    	configs/datasets/needlebench/readme_zh-CN.md
    	configs/summarizers/needlebench.py
    	opencompass/datasets/needlebench/atc.py
    	opencompass/summarizers/needlebench.py

commit 9f3f8cfb4452722734d334114ac1d14110e57406
Author: DseidLi <2568818204@qq.com>
Date:   Thu Mar 14 21:35:53 2024 +0800

    add atc-choice test

commit 52be7c1202376b4e09821188b826f1a805328129
Author: DseidLi <2568818204@qq.com>
Date:   Wed Mar 6 02:54:15 2024 +0800

    update needlebench randomseed and add vllm qwen14b

commit fc1effce596ae2e5ece4933e8cd34aef8e64a6f9
Merge: 4e747ed caf1cf8
Author: DseidLi <2568818204@qq.com>
Date:   Wed Mar 6 02:51:14 2024 +0800

    Merge branch 'main' into add_model_configs

commit 31834f9b23af3354ac3581ec86d693d0f05cdd1c
Merge: 7dabc82 120bf8b
Author: DseidLi <2568818204@qq.com>
Date:   Sun Mar 3 23:29:42 2024 +0800

    Merge branch 'main' of https://github.com/open-compass/opencompass into atc_choice

commit 4e747ed1988ddbcfcc7fff334601259ade72d363
Author: DseidLi <2568818204@qq.com>
Date:   Sun Mar 3 22:15:25 2024 +0800

    add internlm2-lmdeploy model and gemma configs

commit 7dabc828123d711c8cf834d6aab4137bb55e85ed
Author: DseidLi <2568818204@qq.com>
Date:   Sat Mar 2 17:26:15 2024 +0800

    add atc choice version -ZH

commit 996f8ae43d
Author: DseidLi <2568818204@qq.com>
Date:   Wed Feb 28 16:58:56 2024 +0800

    update readme for needlebench

commit f7266e873c
Author: DseidLi <2568818204@qq.com>
Date:   Wed Feb 28 16:44:53 2024 +0800

    move readme.md

commit 1c7375681d
Author: DseidLi <2568818204@qq.com>
Date:   Wed Feb 28 16:38:31 2024 +0800

    fix linting error

commit b6524f3ebf
Author: DseidLi <2568818204@qq.com>
Date:   Wed Feb 28 16:33:51 2024 +0800

    lint summarizer

commit c0d1190e39
Author: DseidLi <2568818204@qq.com>
Date:   Wed Feb 28 16:29:03 2024 +0800

    add needlebench intro, fix summarizer

commit 0965baf785
Author: DseidLi <2568818204@qq.com>
Date:   Mon Feb 26 13:31:26 2024 +0800

    fix bug in needlebench summarizer

commit 5d32b31eb8
Author: DseidLi <2568818204@qq.com>
Date:   Sat Feb 24 03:19:08 2024 +0800

    update act prompt

commit af82a7f085
Merge: 32bf9fe 53fe788
Author: DseidLi <2568818204@qq.com>
Date:   Fri Feb 23 17:50:32 2024 +0800

    Merge remote-tracking branch 'upstream/main' into needlebench

commit 32bf9fe802
Author: DseidLi <2568818204@qq.com>
Date:   Fri Feb 23 17:31:32 2024 +0800

    simplify needlebench 32k, 128k, 200k for eval

commit a7cb025e05
Author: DseidLi <2568818204@qq.com>
Date:   Fri Feb 23 14:48:58 2024 +0800

    add needlebench

* fix summarizer

* remove repeated code

* remove chinese comments
2024-04-07 15:46:20 +08:00
Mo Li
0a6a03fe1a
[Feature] update needlebench and configs (#986)
* add Needlebench-1000K configs

* add prompt postion args

* add model configs

* Update parallel.py

* fix lint
2024-03-25 18:05:01 +08:00
Connor-Shen
0221d30877
[Fix] Update APPS/TACO (#988)
* [Feature] update apps/taco

* [Feature] update apps/taco
2024-03-19 20:21:39 +08:00
Connor-Shen
8a3c6e51ed
[Feature] Update APPS (#985)
* update post process

* update post process
2024-03-19 15:47:05 +08:00
Connor-Shen
d92595b671
[Feat] Support TACO (#966)
* [Feat] Support TACO

* update README

* update README
2024-03-19 15:39:16 +08:00
Jingming
89a8a8917b
[Feature] Add the implement of QuALITY datasets (#976)
#976
2024-03-15 21:22:38 +08:00