liushz
00fc8da5be
[Feature] Add model postprocess function ( #1484 )
...
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
* Add model postprocess function
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-05 21:10:29 +08:00
Linchen Xiao
6c9cd9a260
[Feature] Needlebench auto-download update ( #1480 )
...
* update
* update
* update
2024-09-05 17:22:42 +08:00
Linchen Xiao
9693be46b7
[Feature] Mmlu-pro auto-download ( #1464 )
...
* update
* update
* update
* update
* update
2024-08-30 10:03:40 +08:00
Linchen Xiao
245664f4c0
[Feature] Fullbench v0.1 language update ( #1463 )
...
* update
* update
* update
* update
2024-08-28 14:01:05 +08:00
Songyang Zhang
7c2d25b557
[Fix] Update SciCode and Gemma model ( #1449 )
...
* [Fix] Update SciCode and Gemma model
* Update
* Update
2024-08-23 10:42:27 +08:00
Hari Seldon
14b4b735cb
[Feature] Add support for SciCode ( #1417 )
...
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode w/ bg
* add scicode
* Update README.md
* Update README.md
* Delete configs/eval_SciCode.py
* rename
* 1
* rename
* Update README.md
* Update scicode.py
* Update scicode.py
* fix some bugs
* Update
* Update
---------
Co-authored-by: root <HariSeldon0>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-22 13:42:25 +08:00
Linchen Xiao
a4b54048ae
[Feature] Add Ruler datasets ( #1310 )
...
* [Feature] Add Ruler datasets
* pre-commit fixed
* Add model specific tokenizer to dataset
* pre-commit modified
* remove unused import
* fix linting
* add trust_remote to tokenizer load
* lint fix
* comments resolved
* fix lint
* Add readme
* Fix lint
* ruler refactorize
* fix lint
* lint fix
* updated
* lint fix
* fix wonderwords import issue
* prompt modified
* update
* readme updated
* update
* ruler dataset added
* Update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-20 11:40:11 +08:00
Xu Song
99b5122ed5
[Feature] Add abbr for rolebench dataset ( #1431 )
...
* Add abbr for rolebench dataset
* add
2024-08-20 11:22:48 +08:00
Linchen Xiao
ecf9bb3e4c
[Bug] Commonsenseqa dataset fix ( #1425 )
...
* longbench dataset load fix
* update
* Update
* Update
* Update
* update
* update
---------
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-16 15:54:07 +08:00
Songyang Zhang
9b3613f10b
[Update] Support auto-download of FOFO/MT-Bench-101 ( #1423 )
...
* [Update] Support auto-download of FOFO/MT-Bench-101
* Update wildbench
2024-08-16 11:57:41 +08:00
Linchen Xiao
8e55c9c6ee
[Update] Compassbench v1.3 ( #1396 )
...
* stash files
* compassbench subjective evaluation added
* evaluation update
* fix lint
* update docs
* Update lint
* changes saved
* changes saved
* CompassBench subjective summarizer added (#1349 )
* subjective summarizer added
* fix lint
[Fix] Fix MathBench (#1351 )
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
[Update] Update model support list (#1353 )
* fix pip version
* fix pip version
* update model support
subjective summarizer updated
knowledge, math objective done (data need update)
remove secrets
objective changes saved
knowledge data added
* secrets removed
* changed added
* summarizer modified
* summarizer modified
* compassbench coding added
* fix lint
* objective summarizer updated
* compass_bench_v1.3 updated
* update files in config folder
* remove unused model
* lcbench modified
* removed model evaluation configs
* remove duplicated sdk implementation
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-08-12 19:09:19 +08:00
Songyang Zhang
c81329b548
[Fix] Fix Slurm ENV ( #1392 )
...
1. Support Slurm Cluster
2. Support automatic data download
3. Update InternLM2.5-1.8B/20B-Chat
2024-08-06 01:35:20 +08:00
Peng Bo
07c96ac659
Calm dataset ( #1385 )
...
* Add CALM Dataset
2024-08-01 10:03:21 +08:00
Songyang Zhang
46cc7894e1
[Feature] Support import configs/models/summarizers from whl ( #1376 )
...
* [Feature] Support import configs/models/summarizers from whl
* Update LCBench configs
* Update
* Update
* Update
* Update
* update
* Update
* Update
* Update
* Update
* Update
2024-08-01 00:42:48 +08:00
Songyang Zhang
704853e5e7
[Feature] Update pip install ( #1324 )
...
* [Feature] Update pip install
* Update Configuration
* Update
* Update
* Update
* Update Internal Config
* Update collect env
2024-07-29 18:32:50 +08:00
Xingjun.Wang
edab1c07ba
[Feature] Support ModelScope datasets ( #1289 )
...
* add ceval, gsm8k modelscope surpport
* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest
* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets
* format file
* format file
* update dataset format
* support ms_dataset
* udpate dataset for modelscope support
* merge myl_dev and update test_ms_dataset
* udpate dataset for modelscope support
* update readme
* update eval_api_zhipu_v2
* remove unused code
* add get_data_path function
* update readme
* remove tydiqa japanese subset
* add ceval, gsm8k modelscope surpport
* update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest
* update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets
* format file
* format file
* update dataset format
* support ms_dataset
* udpate dataset for modelscope support
* merge myl_dev and update test_ms_dataset
* update readme
* udpate dataset for modelscope support
* update eval_api_zhipu_v2
* remove unused code
* add get_data_path function
* remove tydiqa japanese subset
* update util
* remove .DS_Store
* fix md format
* move util into package
* update docs/get_started.md
* restore eval_api_zhipu_v2.py, add environment setting
* Update dataset
* Update
* Update
* Update
* Update
---------
Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local>
Co-authored-by: Yunnglin <mao.looper@qq.com>
Co-authored-by: Yun lin <yunlin@laptop.local>
Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-29 13:48:32 +08:00
jxd
12b84aeb3b
[Feature] Update CHARM Memeorziation ( #1230 )
...
* update gemini api and add gemini models
* add openai models
* update CHARM evaluation
* add CHARM memorization tasks
* add CharmMemSummarizer (output eval details for memorization-independent reasoning analysis
* update CHARM readme
---------
Co-authored-by: wujiang <wujiang@pjlab.org.cn>
2024-07-26 18:42:30 +08:00
bittersweet1999
d3782c1d47
Revert "Calm dataset ( #1287 )" ( #1366 )
...
This reverts commit edd0ffdf70
.
2024-07-26 18:27:29 +08:00
Peng Bo
edd0ffdf70
Calm dataset ( #1287 )
...
* add calm dataset
* modify config max_out_len
* update README
* Modify README
* update README
* update README
* update README
* update README
* update README
* add summarizer and modify readme
* delete summarizer config comment
* update summarizer
* modify same response to all questions
* update README
2024-07-26 11:48:16 +08:00
klein
65fad8e2ac
[Fix] minor update wildbench ( #1335 )
...
* update crb
* update crbbench
* update crbbench
* update crbbench
* minor update wildbench
* [Fix] Update doc of wildbench, and merge wildbench into subjective
* [Fix] Update doc of wildbench, and merge wildbench into subjective, fix crbbench
* Update crb.md
* Update crb_pair_judge.py
* Update crb_single_judge.py
* Update subjective_evaluation.md
* Update openai_api.py
* [Update] update wildbench readme
* [Update] update wildbench readme
* [Update] update wildbench readme, remove crb
* Delete configs/eval_subjective_wildbench_pair.py
* Delete configs/eval_subjective_wildbench_single.py
* Update __init__.py
---------
Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
2024-07-26 11:19:04 +08:00
liushz
cf3e942f73
[Fix] Fix MathBench ( #1351 )
...
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-07-23 13:35:38 +08:00
Que Haoran
a244453d9e
[Feature] Support inference ppl datasets ( #1315 )
...
* commit inference ppl datasets
* revised format
* revise
* revise
* revise
* revise
* revise
* revise
2024-07-22 17:59:30 +08:00
Xu Song
e9384823f2
Upgrade default math pred_postprocessor
( #1340 )
...
* Change default math postprocessor
* Update math_gen_265cce.py
2024-07-22 14:00:49 +08:00
Songyang Zhang
96f644de69
[Fix] Update path and folder ( #1344 )
...
* Update path and folder
* Update path
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-21 08:18:14 +08:00
Linchen Xiao
a56678190b
[Feature] CompassBench v1_3 subjective evaluation ( #1341 )
...
* stash files
* compassbench subjective evaluation added
* evaluation update
* remove unneeded content
* fix lint
* update docs
* Update lint
* Update
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 23:12:23 +08:00
liushz
98c58f8a6c
[Feature] Add compassbench knowledge&math part ( #1342 )
...
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Fix Llama-3 meta template
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Update acclerator
* Update MathBench
* Update accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Update compassbench august wiki&math
* Update compassbench august wiki&math
* Update compassbench august wiki&math
* Update compassbench_aug_gen_068af0.py
* Update compassbench_aug_gen_068af0.py
* Update
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 22:54:46 +08:00
bittersweet1999
1f9f728f22
[Feature] support compassbench Checklist evaluation ( #1339 )
...
* fix pip version
* fix pip version
* support checklist eval
* init
* add lan
* fix typo
2024-07-19 16:40:44 +08:00
Xu Song
0a1c89e618
[Fix] Fix rouge evaluator of rolebench_zh ( #1322 )
2024-07-16 16:18:13 +08:00
bittersweet1999
8e7ad2e981
[Fix] add bc for alignbench summarizer ( #1306 )
...
* fix pip version
* fix pip version
* fix alignbench
* fix import error
2024-07-12 11:06:20 +08:00
bittersweet1999
889e7e1140
[Fix] Change abbr for arenahard dataset ( #1302 )
...
* fix pip version
* fix pip version
* change abbr for arenahard
2024-07-11 12:42:03 +08:00
Fengzhe Zhou
1d3a26c732
[Doc] quick start swap tabs ( #1263 )
...
* [doc] quick start swap tabs
* update docs
* update
* update
* update
* update
* update
* update
* update
2024-07-05 23:51:42 +08:00
bittersweet1999
68ca48496b
[Refactor] Reorganize subjective eval ( #1284 )
...
* fix pip version
* fix pip version
* reorganize subjective eval
* reorg sub
* reorg subeval
* reorg subeval
* update subjective doc
* reorg subeval
* reorg subeval
2024-07-05 22:11:37 +08:00
liushz
fc2c9dea8c
Update MathBench summarizer & fix cot setting ( #1282 )
...
* Update MathBench
* Update MathBench
* Update MathBench
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-07-01 21:51:17 +08:00
Fengzhe Zhou
a32f21a356
[Sync] Sync with internal codes 2024.06.28 ( #1279 )
2024-06-28 14:16:34 +08:00
klein
1fa62c4a42
Support wildbench ( #1266 )
...
Co-authored-by: Leymore <zfz-960727@163.com>
2024-06-24 13:16:27 +08:00
bittersweet1999
982e024540
[Feature] add dataset Fofo ( #1224 )
...
* add fofo dataset
* add dataset fofo
2024-06-06 11:40:48 +08:00
Xingyuan Bu
02a0a4e857
MT-Bench-101 ( #1215 )
...
* add mt-bench-101
* add readme and requirements
* add mt-bench-101 data
* Update readme_mtbench101.md
* update readme
* update leaderboard
* fix typo
* Update readme_mtbench101.md
* fit newest opencompass
* update readme.md
* mtbench101 to opencompass
* mtbench101 to opencompass
* for code review
* for code review
* for code review
* hook
* hook
---------
Co-authored-by: liujie <ljie@buaa.edu.cn>
2024-06-03 14:52:12 +08:00
Fengzhe Zhou
a77b8a5cec
[Sync] format ( #1214 )
2024-05-30 00:21:58 +08:00
Fengzhe Zhou
d59189b87f
[Doc] Update running command in README ( #1206 )
2024-05-30 00:06:39 +08:00
Fengzhe Zhou
2954913d9b
[Sync] bump version ( #1204 )
2024-05-28 23:09:59 +08:00
Fengzhe Zhou
9fa80b0f93
[Feat] Update charm summary ( #1194 )
2024-05-27 16:17:01 +08:00
jxd
608ff5810d
support CHARM ( https://github.com/opendatalab/CHARM ) reasoning tasks ( #1190 )
...
* support CHARM (https://github.com/opendatalab/CHARM ) reasoning tasks
* fix lint error
* add dataset card for CHARM
* minor refactor
* add txt
---------
Co-authored-by: wujiang <wujiang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-27 13:48:22 +08:00
bittersweet1999
07a6dacf33
fix length ( #1180 )
2024-05-24 23:30:01 +08:00
klein
5eb8f14d97
[Fix] Fix drop_gen.py ( #1191 )
...
Fix the bug in drop_gen: wrong import
2024-05-24 23:17:50 +08:00
liushz
1448be00e2
Update MathBench ( #1176 )
...
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Fix Llama-3 meta template
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Update acclerator
* Update MathBench
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-21 14:45:43 +08:00
Fengzhe Zhou
2b3d4150f3
[Sync] update evaluator ( #1175 )
2024-05-21 14:22:46 +08:00
Fengzhe Zhou
5de85406ce
[Sync] add OC16 entry ( #1171 )
2024-05-17 16:50:58 +08:00
Fengzhe Zhou
62dbf04708
[Sync] update github workflow ( #1156 )
2024-05-14 22:42:23 +08:00
Fengzhe Zhou
aa2dd2b58c
[Format] Add config lints ( #892 )
2024-05-14 15:35:58 +08:00
Xu Song
3dbba11945
[Feat] Support dataset_suffix check for mixed configs ( #973 )
...
* [Feat] Support dataset_suffix check for mixed configs
* update mixed suffix
* update suffix
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-14 15:03:28 +08:00