Commit Graph

366 Commits

Author SHA1 Message Date
disperaller
c86f9a2e66
Merge branch 'open-compass:main' into main 2024-06-07 13:32:38 +08:00
bittersweet1999
982e024540
[Feature] add dataset Fofo (#1224)
* add fofo dataset

* add dataset fofo
2024-06-06 11:40:48 +08:00
Xingyuan Bu
02a0a4e857
MT-Bench-101 (#1215)
* add mt-bench-101

* add readme and requirements

* add mt-bench-101 data

* Update readme_mtbench101.md

* update readme

* update leaderboard

* fix typo

* Update readme_mtbench101.md

* fit newest opencompass

* update readme.md

* mtbench101 to opencompass

* mtbench101 to opencompass

* for code review

* for code review

* for code review

* hook

* hook

---------

Co-authored-by: liujie <ljie@buaa.edu.cn>
2024-06-03 14:52:12 +08:00
mqy004
b272803d8a
解决release版本安装后不能导入opencompass.cli.main的问题 (#1221)
* Create __init__.py

* Create __init__.py

* Create __init__.py

* Create __init__.py

* Create __init__.py

* Create __init__.py

* format

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-31 13:23:33 +08:00
bittersweet1999
7c381e5be8
[Fix] fix summarizer (#1217)
* fix summarizer

* fix summarizer
2024-05-31 11:40:47 +08:00
Fengzhe Zhou
a77b8a5cec
[Sync] format (#1214) 2024-05-30 00:21:58 +08:00
Fengzhe Zhou
d656e818f8
[Docs] Remove --no-batch-padding and Use --hf-num-gpus (#1205)
* [Docs] Remove --no-batch-padding and Use -hf-num-gpus

* update
2024-05-29 16:30:10 +08:00
Fengzhe Zhou
2954913d9b
[Sync] bump version (#1204) 2024-05-28 23:09:59 +08:00
liushz
ba620c4afe
Update accelerator (#1195)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

* Update accelerator

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-28 17:17:54 +08:00
jxd
608ff5810d
support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks (#1190)
* support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks

* fix lint error

* add dataset card for CHARM

* minor refactor

* add txt

---------

Co-authored-by: wujiang <wujiang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-27 13:48:22 +08:00
bittersweet1999
88c14d3d04
add support for lmdeploy api judge (#1193) 2024-05-24 23:28:56 +08:00
yaoyingyy
749e4cea71
[Fix] temporary files using tempfile (#1186)
Co-authored-by: yaoying <yaoying@kingsoft.com>
2024-05-24 23:27:37 +08:00
disperaller
78e89ce8b5 change max_task_size to dynamic value 2024-05-22 13:55:05 +08:00
Shiyao Ma
db8d9a9798 change max_task_size to dynamic 2024-05-22 11:28:09 +08:00
Fengzhe Zhou
2b3d4150f3
[Sync] update evaluator (#1175) 2024-05-21 14:22:46 +08:00
Fengzhe Zhou
5de85406ce
[Sync] add OC16 entry (#1171) 2024-05-17 16:50:58 +08:00
Fengzhe Zhou
8ea2c404d7
[Feat] enable HuggingFacewithChatTemplate with --accelerator via cli (#1163)
* enable HuggingFacewithChatTemplate with --accelerator via cli

* rm vllm_internlm2_chat_7b
2024-05-15 21:51:07 +08:00
liushz
e3c0448bbc
Update accelerator (#1152)
* Update acclerator

* update run

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
2024-05-15 14:31:47 +08:00
Fengzhe Zhou
f10dd48f9c
[Fix] Update stop_words in huggingface_above_v4_33 (#1160) 2024-05-15 14:10:33 +08:00
Fengzhe Zhou
80f831b425
[Fix] use ProcessPoolExecutor during mbpp eval (#1159) 2024-05-15 13:48:29 +08:00
bittersweet1999
8a8987be0b
fix arenahard summarizer (#1154)
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-15 13:31:29 +08:00
Fengzhe Zhou
62dbf04708
[Sync] update github workflow (#1156) 2024-05-14 22:42:23 +08:00
Fengzhe Zhou
7505b3cadf
[Feature] Add huggingface apply_chat_template (#1098)
* add TheoremQA with 5-shot

* add huggingface_above_v4_33 classes

* use num_worker partitioner in cli

* update theoremqa

* update TheoremQA

* add TheoremQA

* rename theoremqa -> TheoremQA

* update TheoremQA output path

* rewrite many model configs

* update huggingface

* further update

* refine configs

* update configs

* update configs

* add configs/eval_llama3_instruct.py

* add summarizer multi faceted

* update bbh datasets

* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py

* rename class

* update readme

* update hf above v4.33
2024-05-14 14:50:16 +08:00
Mo Li
6c711cb262
[Fix] Fix Needlebench Summarizer (#1143)
* update few-shot example

* add 128k
2024-05-13 15:59:34 +08:00
bittersweet1999
833a35140b
[Fix] fix alpacaeval while add caching path (#1139)
* fix alpacaeval

* fix alpacaeval
2024-05-11 14:02:26 +08:00
Fengzhe Zhou
19d7e630d6
[Sync] Update accelerator (#1122)
(cherry picked from commit 4beb6d9ab655d8a626971841b7acfd9fae9d438f)

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-09 14:32:31 +08:00
bittersweet1999
826d8307ac
fix links (#1120) 2024-05-08 15:13:18 +08:00
JuhaoLiang
d2c40e5648
[Feature] Add AceGPT-MMLUArabic benchmark (#1099)
* add AceGPT-MMLUArabic benchmark

* update readme and fix lint issue

* remove unused package

* add MMLUArabic zero-shot settings

* rename filename and update readme
2024-05-08 15:00:26 +08:00
Fangyu Lei
862044fb7d
[Feature] Add S3Eval Dataset (#916)
* s3eval_branch

* update s3eval
2024-05-06 19:41:52 +08:00
Yggdrasill7D6
af10ecc272
add mgsm datasets (#1081)
* add mgsm datasets

* fix lint

* fix lint

* update mgsm

* update mgsm

* ease code spell

* update

* update

* update

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-06 15:29:34 +08:00
klein
153c4fc988
[Feature] update drop dataset from openai simple eval (#1092)
* [Feature] update drop dataset from openai simple eval

* update drop template presentation

* update

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-06 13:37:08 +08:00
Fengzhe Zhou
d43392a3bb
[Feature] Add mmlu prompt from simple_evals, openai (#1074)
* add mmlu prompt from simple_evals, openai

* return empty str on failure
2024-05-06 13:26:26 +08:00
Yang Yong
53fe390454
fix LightllmApi workers bug (#1113) 2024-04-30 22:09:22 +08:00
Alexander Lam
35c94d0cde
[Feature] Adding support for LLM Compression Evaluation (#1108)
* fixed formatting based on pre-commit tests

* fixed typo in comments; reduced the number of models in the eval config

* fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset

* removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English
2024-04-30 10:51:01 +08:00
bittersweet1999
3de48e9b35
[Bug] Fix CMB dataset (#1106) 2024-04-30 00:33:43 +08:00
liushz
a6f67e1a65
[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README (#1103)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-04-28 21:58:58 +08:00
Lyu Han
1013dce60c
adapt to lmdeploy v0.4.0 (#1073)
* adapt to lmdeploy v0.4.0

* compatible
2024-04-28 19:57:40 +08:00
Yggdrasill7D6
58a57a4c45
[Feature] add support for Flames datasets (#1093)
* add flames datasets

* fix lint

* rm quota

* add judgemodel info and fix os path

* support flames dataset

* support flames dataset

---------

Co-authored-by: bittersweet1999 <1487910649@qq.com>
2024-04-28 18:56:24 +08:00
dmitrysarov
cce5b6fbb6
fix output typing, change mutable list to immutable tuple (#989)
* fix output typing, change mutable list to immutable tuple

* import missed type

* format

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 23:07:34 +08:00
binary-husky
701ecbb292
[Fix] python path bug (#1063)
* fix relative path bug

* format

---------

Co-authored-by: hmp <505030475@qq.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 21:58:45 +08:00
Wang Xingjin
048d41a1c4
add vllm get_ppl (#1003)
* add vllm get_ppl

* add vllm get_ppl

* format

---------

Co-authored-by: xingjin.wang <xingjin.wang@mihoyo.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 21:31:56 +08:00
Haodong Duan
3a232db471
[Deperecate] Remove multi-modal related stuff (#1072)
* Remove MultiModal

* update index.rst

* update README

* remove mmbench codes

* update news

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 21:20:14 +08:00
Francis-llgg
f1ee11de14
[Feature] Add gpqa prompt from simple_evals, openai (#1080)
* add gpqa_openai_simple_eval

* 触发CI构建

* reorg

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 20:13:00 +08:00
klein
e4830a6926
Update CIBench (#1089)
* modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4

* update cibench: dataset and evluation

* cibench summarizer bug

* update cibench

* move extract_code import

---------

Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 18:46:02 +08:00
bittersweet1999
e404b72c52
[Feature] support arenahard evaluation (#1096)
* support arenahard

* support arenahard

* support arenahard
2024-04-26 15:42:00 +08:00
bittersweet1999
6ba1c4937d
[Feature] Support Math evaluation via judgemodel (#1094)
* support openai math evaluation

* support openai math evaluation

* support openai math evaluation

* support math llm judge

* support math llm judge
2024-04-26 14:56:23 +08:00
Ke Bao
81d0e4d793
[Feature] Add lmdeploy tis python backend model (#1014)
* add lmdeploy tis python backend model

* fix pr check

* update
2024-04-23 14:27:11 +08:00
Fengzhe Zhou
8fe7b271cc
[Fix] Fix sequential runner (#1070) 2024-04-23 11:31:10 +08:00
Fengzhe Zhou
004ed79593
[Feature] Add TheoremQA with 5-shot (#1048)
* add TheoremQA with 5-shot

* cherry pick from add-huggingface-above-v4.33, good TheoremQA results
2024-04-22 15:22:04 +08:00
bittersweet1999
6f98c8d9ab
[Fix] Fix MultiRound Subjective Evaluation(#1043)
* fix multiround

* fix
2024-04-22 12:06:03 +08:00