Commit Graph

339 Commits

Author SHA1 Message Date
Xingjun.Wang
edab1c07ba
[Feature] Support ModelScope datasets (#1289)
* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* udpate dataset for modelscope support

* update readme

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* update readme

* remove tydiqa japanese subset

* add ceval, gsm8k modelscope surpport

* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest

* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets

* format file

* format file

* update dataset format

* support ms_dataset

* udpate dataset for modelscope support

* merge myl_dev and update test_ms_dataset

* update readme

* udpate dataset for modelscope support

* update eval_api_zhipu_v2

* remove unused code

* add get_data_path function

* remove tydiqa japanese subset

* update util

* remove .DS_Store

* fix md format

* move util into package

* update docs/get_started.md

* restore eval_api_zhipu_v2.py, add environment setting

* Update dataset

* Update

* Update

* Update

* Update

---------

Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local>
Co-authored-by: Yunnglin <mao.looper@qq.com>
Co-authored-by: Yun lin <yunlin@laptop.local>
Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-29 13:48:32 +08:00
jxd
12b84aeb3b
[Feature] Update CHARM Memeorziation (#1230)
* update gemini api and add gemini models

* add openai models

* update CHARM evaluation

* add CHARM memorization tasks

* add CharmMemSummarizer (output eval details for memorization-independent reasoning analysis

* update CHARM readme

---------

Co-authored-by: wujiang <wujiang@pjlab.org.cn>
2024-07-26 18:42:30 +08:00
bittersweet1999
d3782c1d47
Revert "Calm dataset (#1287)" (#1366)
This reverts commit edd0ffdf70.
2024-07-26 18:27:29 +08:00
Xu Song
9b9855a008
Add en and zh groups to longbench summarizer; Fix longbench overall score (#1216)
* Add longbench groups

* update

* update
2024-07-26 11:50:41 +08:00
Peng Bo
edd0ffdf70
Calm dataset (#1287)
* add calm dataset

* modify config max_out_len

* update README

* Modify README

* update README

* update README

* update README

* update README

* update README

* add summarizer and modify readme

* delete summarizer config comment

* update summarizer

* modify same response to all questions

* update README
2024-07-26 11:48:16 +08:00
LeavittLang
8ee7fecb68
Adding support for Doubao API (#1218)
* Adding support for Doubao API

* Update doubao_api.py

Fixed the bug that the connection would be retried even if it was normal.

* Update doubao_api.py

---------

Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-26 11:44:51 +08:00
klein
65fad8e2ac
[Fix] minor update wildbench (#1335)
* update crb

* update crbbench

* update crbbench

* update crbbench

* minor update wildbench

* [Fix] Update doc of wildbench, and merge wildbench into subjective

* [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench

* Update crb.md

* Update crb_pair_judge.py

* Update crb_single_judge.py

* Update subjective_evaluation.md

* Update openai_api.py

* [Update] update wildbench readme

* [Update] update wildbench readme

* [Update] update wildbench readme, remove crb

* Delete configs/eval_subjective_wildbench_pair.py

* Delete configs/eval_subjective_wildbench_single.py

* Update __init__.py

---------

Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-26 11:19:04 +08:00
bittersweet1999
8fe75e9937
[Update] update Subeval demo config (#1358)
* fix pip version

* fix pip version

* update demo config
2024-07-24 15:48:28 +08:00
liushz
cf3e942f73
[Fix] Fix MathBench (#1351)
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-07-23 13:35:38 +08:00
Linchen Xiao
8127fc3518
CompassBench subjective summarizer added (#1349)
* subjective summarizer added

* fix lint
2024-07-23 12:29:57 +08:00
Que Haoran
a244453d9e
[Feature] Support inference ppl datasets (#1315)
* commit inference ppl datasets

* revised format

* revise

* revise

* revise

* revise

* revise

* revise
2024-07-22 17:59:30 +08:00
Xu Song
e9384823f2
Upgrade default math pred_postprocessor (#1340)
* Change default math postprocessor

* Update math_gen_265cce.py
2024-07-22 14:00:49 +08:00
Songyang Zhang
96f644de69
[Fix] Update path and folder (#1344)
* Update path and folder

* Update path

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-21 08:18:14 +08:00
Linchen Xiao
a56678190b
[Feature] CompassBench v1_3 subjective evaluation (#1341)
* stash files

* compassbench subjective evaluation added

* evaluation update

* remove unneeded content

* fix lint

* update docs

* Update lint

* Update

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 23:12:23 +08:00
liushz
98c58f8a6c
[Feature] Add compassbench knowledge&math part (#1342)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

* Update accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Update compassbench august wiki&math

* Update compassbench august wiki&math

* Update compassbench august wiki&math

* Update compassbench_aug_gen_068af0.py

* Update compassbench_aug_gen_068af0.py

* Update

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 22:54:46 +08:00
bittersweet1999
1f9f728f22
[Feature] support compassbench Checklist evaluation (#1339)
* fix pip version

* fix pip version

* support checklist eval

* init

* add lan

* fix typo
2024-07-19 16:40:44 +08:00
Xu Song
0a1c89e618
[Fix] Fix rouge evaluator of rolebench_zh (#1322) 2024-07-16 16:18:13 +08:00
bittersweet1999
8e7ad2e981
[Fix] add bc for alignbench summarizer (#1306)
* fix pip version

* fix pip version

* fix alignbench

* fix import error
2024-07-12 11:06:20 +08:00
bittersweet1999
889e7e1140
[Fix] Change abbr for arenahard dataset (#1302)
* fix pip version

* fix pip version

* change abbr for arenahard
2024-07-11 12:42:03 +08:00
Fengzhe Zhou
1d3a26c732
[Doc] quick start swap tabs (#1263)
* [doc] quick start swap tabs

* update docs

* update

* update

* update

* update

* update

* update

* update
2024-07-05 23:51:42 +08:00
bittersweet1999
68ca48496b
[Refactor] Reorganize subjective eval (#1284)
* fix pip version

* fix pip version

* reorganize subjective eval

* reorg sub

* reorg subeval

* reorg subeval

* update subjective doc

* reorg subeval

* reorg subeval
2024-07-05 22:11:37 +08:00
Songyang Zhang
409a042d93
[Feature] Add InternLM2.5 (#1286)
* [Feature] Add InternLM2.5

* Update

* update readme

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-07-04 20:10:31 +08:00
zhulinJulia24
167cfdcca3
[ci] update daily testcase (#1285)
* Update daily-run-test.yml

* Create eval_regression_chat.py

* Delete .github/scripts/.github/scripts/eval_regression_chat.py

* Create eval_regression_chat.py

* Update pr-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update oc_score_baseline.yaml

* Update oc_score_assert.py

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update oc_score_baseline.yaml

* Update oc_score_assert.py

* Update oc_score_assert.py

* fix lint

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update daily-run-test.yml

* update

---------

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-07-03 18:56:09 +08:00
liushz
fc2c9dea8c
Update MathBench summarizer & fix cot setting (#1282)
* Update MathBench

* Update MathBench

* Update MathBench

---------

Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-07-01 21:51:17 +08:00
Fengzhe Zhou
a32f21a356
[Sync] Sync with internal codes 2024.06.28 (#1279) 2024-06-28 14:16:34 +08:00
klein
1fa62c4a42
Support wildbench (#1266)
Co-authored-by: Leymore <zfz-960727@163.com>
2024-06-24 13:16:27 +08:00
bittersweet1999
e0d7808b4e
[Fix] fix pip version (#1228)
* fix pip version

* fix pip version
2024-06-06 11:48:07 +08:00
bittersweet1999
982e024540
[Feature] add dataset Fofo (#1224)
* add fofo dataset

* add dataset fofo
2024-06-06 11:40:48 +08:00
Xingyuan Bu
02a0a4e857
MT-Bench-101 (#1215)
* add mt-bench-101

* add readme and requirements

* add mt-bench-101 data

* Update readme_mtbench101.md

* update readme

* update leaderboard

* fix typo

* Update readme_mtbench101.md

* fit newest opencompass

* update readme.md

* mtbench101 to opencompass

* mtbench101 to opencompass

* for code review

* for code review

* for code review

* hook

* hook

---------

Co-authored-by: liujie <ljie@buaa.edu.cn>
2024-06-03 14:52:12 +08:00
bittersweet1999
7c381e5be8
[Fix] fix summarizer (#1217)
* fix summarizer

* fix summarizer
2024-05-31 11:40:47 +08:00
Fengzhe Zhou
a77b8a5cec
[Sync] format (#1214) 2024-05-30 00:21:58 +08:00
Fengzhe Zhou
d59189b87f
[Doc] Update running command in README (#1206) 2024-05-30 00:06:39 +08:00
Fengzhe Zhou
0b50112dc1
[Fix] Rollback opt model configs (#1213) 2024-05-30 00:03:22 +08:00
Xu Song
808582d952
Fix VLLM argument error (#1207) 2024-05-29 10:14:08 +08:00
Fengzhe Zhou
2954913d9b
[Sync] bump version (#1204) 2024-05-28 23:09:59 +08:00
Fengzhe Zhou
9fa80b0f93
[Feat] Update charm summary (#1194) 2024-05-27 16:17:01 +08:00
jxd
608ff5810d
support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks (#1190)
* support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks

* fix lint error

* add dataset card for CHARM

* minor refactor

* add txt

---------

Co-authored-by: wujiang <wujiang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-27 13:48:22 +08:00
bittersweet1999
07a6dacf33
fix length (#1180) 2024-05-24 23:30:01 +08:00
klein
5eb8f14d97
[Fix] Fix drop_gen.py (#1191)
Fix the bug in drop_gen: wrong import
2024-05-24 23:17:50 +08:00
bittersweet1999
31afe87026
fix yi-chat template (#1178) 2024-05-21 18:14:12 +08:00
liushz
1448be00e2
Update MathBench (#1176)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-21 14:45:43 +08:00
Fengzhe Zhou
2b3d4150f3
[Sync] update evaluator (#1175) 2024-05-21 14:22:46 +08:00
Fengzhe Zhou
5de85406ce
[Sync] add OC16 entry (#1171) 2024-05-17 16:50:58 +08:00
Fengzhe Zhou
8ea2c404d7
[Feat] enable HuggingFacewithChatTemplate with --accelerator via cli (#1163)
* enable HuggingFacewithChatTemplate with --accelerator via cli

* rm vllm_internlm2_chat_7b
2024-05-15 21:51:07 +08:00
Fengzhe Zhou
62dbf04708
[Sync] update github workflow (#1156) 2024-05-14 22:42:23 +08:00
Fengzhe Zhou
aa2dd2b58c
[Format] Add config lints (#892) 2024-05-14 15:35:58 +08:00
Xu Song
3dbba11945
[Feat] Support dataset_suffix check for mixed configs (#973)
* [Feat] Support dataset_suffix check for mixed configs

* update mixed suffix

* update suffix

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-14 15:03:28 +08:00
Fengzhe Zhou
7505b3cadf
[Feature] Add huggingface apply_chat_template (#1098)
* add TheoremQA with 5-shot

* add huggingface_above_v4_33 classes

* use num_worker partitioner in cli

* update theoremqa

* update TheoremQA

* add TheoremQA

* rename theoremqa -> TheoremQA

* update TheoremQA output path

* rewrite many model configs

* update huggingface

* further update

* refine configs

* update configs

* update configs

* add configs/eval_llama3_instruct.py

* add summarizer multi faceted

* update bbh datasets

* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py

* rename class

* update readme

* update hf above v4.33
2024-05-14 14:50:16 +08:00
Mo Li
6c711cb262
[Fix] Fix Needlebench Summarizer (#1143)
* update few-shot example

* add 128k
2024-05-13 15:59:34 +08:00
bittersweet1999
5432dfc1ff
fix multiround (#1146) 2024-05-13 15:58:39 +08:00