Commit Graph

595 Commits

Author SHA1 Message Date
Xu Song
e9384823f2
Upgrade default math pred_postprocessor (#1340)
* Change default math postprocessor

* Update math_gen_265cce.py
2024-07-22 14:00:49 +08:00
Songyang Zhang
96f644de69
[Fix] Update path and folder (#1344)
* Update path and folder

* Update path

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-21 08:18:14 +08:00
Linchen Xiao
a56678190b
[Feature] CompassBench v1_3 subjective evaluation (#1341)
* stash files

* compassbench subjective evaluation added

* evaluation update

* remove unneeded content

* fix lint

* update docs

* Update lint

* Update

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 23:12:23 +08:00
liushz
98c58f8a6c
[Feature] Add compassbench knowledge&math part (#1342)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

* Update accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Update compassbench august wiki&math

* Update compassbench august wiki&math

* Update compassbench august wiki&math

* Update compassbench_aug_gen_068af0.py

* Update compassbench_aug_gen_068af0.py

* Update

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 22:54:46 +08:00
bittersweet1999
1f9f728f22
[Feature] support compassbench Checklist evaluation (#1339)
* fix pip version

* fix pip version

* support checklist eval

* init

* add lan

* fix typo
2024-07-19 16:40:44 +08:00
Mo Li
f40add2596
[Fix] Fix lint (#1334)
* update needlebench docs

* update model_name_mapping dict

* update README

* fix_lint
2024-07-18 17:15:06 +08:00
Xu Song
1bfb4217ff
Fix typing and typo (#1331) 2024-07-18 13:41:24 +08:00
Mo Li
104bddf647
[Doc] Update NeedleBench Docs (#1330)
* update needlebench docs

* update model_name_mapping dict

* update README

* Update README_zh-CN.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-07-18 13:16:19 +08:00
Xu Song
0a1c89e618
[Fix] Fix rouge evaluator of rolebench_zh (#1322) 2024-07-16 16:18:13 +08:00
bittersweet1999
3aeabbc427
[Fix] update Faq (#1313)
* fix pip version

* fix pip version

* update faq

* update faq

* update faq

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-07-12 11:29:26 +08:00
bittersweet1999
8e7ad2e981
[Fix] add bc for alignbench summarizer (#1306)
* fix pip version

* fix pip version

* fix alignbench

* fix import error
2024-07-12 11:06:20 +08:00
Fengzhe Zhou
62f55987f1
force register (#1311) 2024-07-11 19:59:35 +08:00
bittersweet1999
889e7e1140
[Fix] Change abbr for arenahard dataset (#1302)
* fix pip version

* fix pip version

* change abbr for arenahard
2024-07-11 12:42:03 +08:00
Fengzhe Zhou
a62c613d3e
[Sync] bump version 0.2.6+local (#1294) 2024-07-06 00:44:06 +08:00
Fengzhe Zhou
1d3a26c732
[Doc] quick start swap tabs (#1263)
* [doc] quick start swap tabs

* update docs

* update

* update

* update

* update

* update

* update

* update
2024-07-05 23:51:42 +08:00
bittersweet1999
68ca48496b
[Refactor] Reorganize subjective eval (#1284)
* fix pip version

* fix pip version

* reorganize subjective eval

* reorg sub

* reorg subeval

* reorg subeval

* update subjective doc

* reorg subeval

* reorg subeval
2024-07-05 22:11:37 +08:00
Songyang Zhang
aadcfa625f
[Feat] Update owners for issues (#1293)
* [Feat] Update owners for issues

* update owners

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-07-05 18:27:30 +08:00
Songyang Zhang
409a042d93
[Feature] Add InternLM2.5 (#1286)
* [Feature] Add InternLM2.5

* Update

* update readme

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-07-04 20:10:31 +08:00
zhulinJulia24
167cfdcca3
[ci] update daily testcase (#1285)
* Update daily-run-test.yml

* Create eval_regression_chat.py

* Delete .github/scripts/.github/scripts/eval_regression_chat.py

* Create eval_regression_chat.py

* Update pr-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update oc_score_baseline.yaml

* Update oc_score_assert.py

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update oc_score_baseline.yaml

* Update oc_score_assert.py

* Update oc_score_assert.py

* fix lint

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update daily-run-test.yml

* update

---------

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-07-03 18:56:09 +08:00
baymax591
28eba6fe34
npu适配 (#1250)
* npu适配

* Add suport for Ascend NPU

* format

---------

Co-authored-by: baymax591 <14428251+baymax591@user.noreply.gitee.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-07-03 18:55:19 +08:00
liushz
fc2c9dea8c
Update MathBench summarizer & fix cot setting (#1282)
* Update MathBench

* Update MathBench

* Update MathBench

---------

Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-07-01 21:51:17 +08:00
Fengzhe Zhou
a32f21a356
[Sync] Sync with internal codes 2024.06.28 (#1279) 2024-06-28 14:16:34 +08:00
Xingyuan Bu
842fb1cd70
Update mtbench101.py (#1276)
fix wrong-used import
from torch.utils.data import DataLoader, Dataset
2024-06-26 00:40:22 +08:00
zhulinJulia24
26d077b080
flash attn installation in daily testcase (#1272)
* Update daily-run-test.yml

* Update daily-run-test.yml

* Update oc_score_baseline.yaml
2024-06-24 18:22:46 +08:00
liushz
e5ee1647fb
Add doc for accelerator function (#1252)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

* Update accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

* Add Doc for accelerator

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-06-24 14:53:51 +08:00
klein
1fa62c4a42
Support wildbench (#1266)
Co-authored-by: Leymore <zfz-960727@163.com>
2024-06-24 13:16:27 +08:00
LIU Xiao
83b9fd9eaa
add ",<2.0.0" to "numpy>=1.23.4" in requirements/runtime.txt, as pandas<2.0.0 doesn't compatible with numpy>=2.0.0 (#1267) 2024-06-24 11:03:42 +08:00
bittersweet1999
e0d7808b4e
[Fix] fix pip version (#1228)
* fix pip version

* fix pip version
2024-06-06 11:48:07 +08:00
bittersweet1999
982e024540
[Feature] add dataset Fofo (#1224)
* add fofo dataset

* add dataset fofo
2024-06-06 11:40:48 +08:00
Xingyuan Bu
02a0a4e857
MT-Bench-101 (#1215)
* add mt-bench-101

* add readme and requirements

* add mt-bench-101 data

* Update readme_mtbench101.md

* update readme

* update leaderboard

* fix typo

* Update readme_mtbench101.md

* fit newest opencompass

* update readme.md

* mtbench101 to opencompass

* mtbench101 to opencompass

* for code review

* for code review

* for code review

* hook

* hook

---------

Co-authored-by: liujie <ljie@buaa.edu.cn>
2024-06-03 14:52:12 +08:00
mqy004
b272803d8a
解决release版本安装后不能导入opencompass.cli.main的问题 (#1221)
* Create __init__.py

* Create __init__.py

* Create __init__.py

* Create __init__.py

* Create __init__.py

* Create __init__.py

* format

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-31 13:23:33 +08:00
bittersweet1999
7c381e5be8
[Fix] fix summarizer (#1217)
* fix summarizer

* fix summarizer
2024-05-31 11:40:47 +08:00
Fengzhe Zhou
a77b8a5cec
[Sync] format (#1214) 2024-05-30 00:21:58 +08:00
Fengzhe Zhou
d59189b87f
[Doc] Update running command in README (#1206) 2024-05-30 00:06:39 +08:00
Fengzhe Zhou
0b50112dc1
[Fix] Rollback opt model configs (#1213) 2024-05-30 00:03:22 +08:00
Fengzhe Zhou
d656e818f8
[Docs] Remove --no-batch-padding and Use --hf-num-gpus (#1205)
* [Docs] Remove --no-batch-padding and Use -hf-num-gpus

* update
2024-05-29 16:30:10 +08:00
Xu Song
808582d952
Fix VLLM argument error (#1207) 2024-05-29 10:14:08 +08:00
Fengzhe Zhou
2954913d9b
[Sync] bump version (#1204) 2024-05-28 23:09:59 +08:00
liushz
ba620c4afe
Update accelerator (#1195)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

* Update accelerator

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-28 17:17:54 +08:00
Fengzhe Zhou
9fa80b0f93
[Feat] Update charm summary (#1194) 2024-05-27 16:17:01 +08:00
jxd
608ff5810d
support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks (#1190)
* support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks

* fix lint error

* add dataset card for CHARM

* minor refactor

* add txt

---------

Co-authored-by: wujiang <wujiang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-27 13:48:22 +08:00
bittersweet1999
07a6dacf33
fix length (#1180) 2024-05-24 23:30:01 +08:00
bittersweet1999
88c14d3d04
add support for lmdeploy api judge (#1193) 2024-05-24 23:28:56 +08:00
yaoyingyy
749e4cea71
[Fix] temporary files using tempfile (#1186)
Co-authored-by: yaoying <yaoying@kingsoft.com>
2024-05-24 23:27:37 +08:00
klein
5eb8f14d97
[Fix] Fix drop_gen.py (#1191)
Fix the bug in drop_gen: wrong import
2024-05-24 23:17:50 +08:00
bittersweet1999
31afe87026
fix yi-chat template (#1178) 2024-05-21 18:14:12 +08:00
liushz
1448be00e2
Update MathBench (#1176)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-21 14:45:43 +08:00
Fengzhe Zhou
2b3d4150f3
[Sync] update evaluator (#1175) 2024-05-21 14:22:46 +08:00
zhulinJulia24
296ea59931
Update daily-run-test.yml (#1173) 2024-05-20 14:04:58 +08:00
Fengzhe Zhou
5de85406ce
[Sync] add OC16 entry (#1171) 2024-05-17 16:50:58 +08:00