Xu Song
e9384823f2
Upgrade default math pred_postprocessor
( #1340 )
...
* Change default math postprocessor
* Update math_gen_265cce.py
2024-07-22 14:00:49 +08:00
Songyang Zhang
96f644de69
[Fix] Update path and folder ( #1344 )
...
* Update path and folder
* Update path
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-21 08:18:14 +08:00
Linchen Xiao
a56678190b
[Feature] CompassBench v1_3 subjective evaluation ( #1341 )
...
* stash files
* compassbench subjective evaluation added
* evaluation update
* remove unneeded content
* fix lint
* update docs
* Update lint
* Update
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 23:12:23 +08:00
liushz
98c58f8a6c
[Feature] Add compassbench knowledge&math part ( #1342 )
...
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Fix Llama-3 meta template
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Update acclerator
* Update MathBench
* Update accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Add Doc for accelerator
* Update compassbench august wiki&math
* Update compassbench august wiki&math
* Update compassbench august wiki&math
* Update compassbench_aug_gen_068af0.py
* Update compassbench_aug_gen_068af0.py
* Update
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2024-07-19 22:54:46 +08:00
bittersweet1999
1f9f728f22
[Feature] support compassbench Checklist evaluation ( #1339 )
...
* fix pip version
* fix pip version
* support checklist eval
* init
* add lan
* fix typo
2024-07-19 16:40:44 +08:00
Xu Song
0a1c89e618
[Fix] Fix rouge evaluator of rolebench_zh ( #1322 )
2024-07-16 16:18:13 +08:00
bittersweet1999
8e7ad2e981
[Fix] add bc for alignbench summarizer ( #1306 )
...
* fix pip version
* fix pip version
* fix alignbench
* fix import error
2024-07-12 11:06:20 +08:00
bittersweet1999
889e7e1140
[Fix] Change abbr for arenahard dataset ( #1302 )
...
* fix pip version
* fix pip version
* change abbr for arenahard
2024-07-11 12:42:03 +08:00
Fengzhe Zhou
1d3a26c732
[Doc] quick start swap tabs ( #1263 )
...
* [doc] quick start swap tabs
* update docs
* update
* update
* update
* update
* update
* update
* update
2024-07-05 23:51:42 +08:00
bittersweet1999
68ca48496b
[Refactor] Reorganize subjective eval ( #1284 )
...
* fix pip version
* fix pip version
* reorganize subjective eval
* reorg sub
* reorg subeval
* reorg subeval
* update subjective doc
* reorg subeval
* reorg subeval
2024-07-05 22:11:37 +08:00
liushz
fc2c9dea8c
Update MathBench summarizer & fix cot setting ( #1282 )
...
* Update MathBench
* Update MathBench
* Update MathBench
---------
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-07-01 21:51:17 +08:00
Fengzhe Zhou
a32f21a356
[Sync] Sync with internal codes 2024.06.28 ( #1279 )
2024-06-28 14:16:34 +08:00
klein
1fa62c4a42
Support wildbench ( #1266 )
...
Co-authored-by: Leymore <zfz-960727@163.com>
2024-06-24 13:16:27 +08:00
bittersweet1999
982e024540
[Feature] add dataset Fofo ( #1224 )
...
* add fofo dataset
* add dataset fofo
2024-06-06 11:40:48 +08:00
Xingyuan Bu
02a0a4e857
MT-Bench-101 ( #1215 )
...
* add mt-bench-101
* add readme and requirements
* add mt-bench-101 data
* Update readme_mtbench101.md
* update readme
* update leaderboard
* fix typo
* Update readme_mtbench101.md
* fit newest opencompass
* update readme.md
* mtbench101 to opencompass
* mtbench101 to opencompass
* for code review
* for code review
* for code review
* hook
* hook
---------
Co-authored-by: liujie <ljie@buaa.edu.cn>
2024-06-03 14:52:12 +08:00
Fengzhe Zhou
a77b8a5cec
[Sync] format ( #1214 )
2024-05-30 00:21:58 +08:00
Fengzhe Zhou
d59189b87f
[Doc] Update running command in README ( #1206 )
2024-05-30 00:06:39 +08:00
Fengzhe Zhou
2954913d9b
[Sync] bump version ( #1204 )
2024-05-28 23:09:59 +08:00
Fengzhe Zhou
9fa80b0f93
[Feat] Update charm summary ( #1194 )
2024-05-27 16:17:01 +08:00
jxd
608ff5810d
support CHARM ( https://github.com/opendatalab/CHARM ) reasoning tasks ( #1190 )
...
* support CHARM (https://github.com/opendatalab/CHARM ) reasoning tasks
* fix lint error
* add dataset card for CHARM
* minor refactor
* add txt
---------
Co-authored-by: wujiang <wujiang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-27 13:48:22 +08:00
bittersweet1999
07a6dacf33
fix length ( #1180 )
2024-05-24 23:30:01 +08:00
klein
5eb8f14d97
[Fix] Fix drop_gen.py ( #1191 )
...
Fix the bug in drop_gen: wrong import
2024-05-24 23:17:50 +08:00
liushz
1448be00e2
Update MathBench ( #1176 )
...
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Fix Llama-3 meta template
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Update acclerator
* Update MathBench
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-21 14:45:43 +08:00
Fengzhe Zhou
2b3d4150f3
[Sync] update evaluator ( #1175 )
2024-05-21 14:22:46 +08:00
Fengzhe Zhou
5de85406ce
[Sync] add OC16 entry ( #1171 )
2024-05-17 16:50:58 +08:00
Fengzhe Zhou
62dbf04708
[Sync] update github workflow ( #1156 )
2024-05-14 22:42:23 +08:00
Fengzhe Zhou
aa2dd2b58c
[Format] Add config lints ( #892 )
2024-05-14 15:35:58 +08:00
Xu Song
3dbba11945
[Feat] Support dataset_suffix check for mixed configs ( #973 )
...
* [Feat] Support dataset_suffix check for mixed configs
* update mixed suffix
* update suffix
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-14 15:03:28 +08:00
Fengzhe Zhou
7505b3cadf
[Feature] Add huggingface apply_chat_template ( #1098 )
...
* add TheoremQA with 5-shot
* add huggingface_above_v4_33 classes
* use num_worker partitioner in cli
* update theoremqa
* update TheoremQA
* add TheoremQA
* rename theoremqa -> TheoremQA
* update TheoremQA output path
* rewrite many model configs
* update huggingface
* further update
* refine configs
* update configs
* update configs
* add configs/eval_llama3_instruct.py
* add summarizer multi faceted
* update bbh datasets
* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py
* rename class
* update readme
* update hf above v4.33
2024-05-14 14:50:16 +08:00
Mo Li
6c711cb262
[Fix] Fix Needlebench Summarizer ( #1143 )
...
* update few-shot example
* add 128k
2024-05-13 15:59:34 +08:00
bittersweet1999
5432dfc1ff
fix multiround ( #1146 )
2024-05-13 15:58:39 +08:00
JuhaoLiang
d2c40e5648
[Feature] Add AceGPT-MMLUArabic benchmark ( #1099 )
...
* add AceGPT-MMLUArabic benchmark
* update readme and fix lint issue
* remove unused package
* add MMLUArabic zero-shot settings
* rename filename and update readme
2024-05-08 15:00:26 +08:00
Fangyu Lei
862044fb7d
[Feature] Add S3Eval Dataset ( #916 )
...
* s3eval_branch
* update s3eval
2024-05-06 19:41:52 +08:00
Xu Song
d501710155
[Fix] Fix AGIEval chinese sets ( #972 )
...
* [Fix] Fix AGIEval chinese sets
* Create agieval_gen_617738.py
* [Fix] Fix AGIEval chinese sets
* Restore agieval_gen_64afd3.py
* Update agieval_gen.py
* Create agieval_mixed_0fa998.py
* Update agieval_mixed.py
2024-05-06 15:31:42 +08:00
Yggdrasill7D6
af10ecc272
add mgsm datasets ( #1081 )
...
* add mgsm datasets
* fix lint
* fix lint
* update mgsm
* update mgsm
* ease code spell
* update
* update
* update
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-06 15:29:34 +08:00
klein
153c4fc988
[Feature] update drop dataset from openai simple eval ( #1092 )
...
* [Feature] update drop dataset from openai simple eval
* update drop template presentation
* update
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-06 13:37:08 +08:00
Fengzhe Zhou
d43392a3bb
[Feature] Add mmlu prompt from simple_evals, openai ( #1074 )
...
* add mmlu prompt from simple_evals, openai
* return empty str on failure
2024-05-06 13:26:26 +08:00
Alexander Lam
35c94d0cde
[Feature] Adding support for LLM Compression Evaluation ( #1108 )
...
* fixed formatting based on pre-commit tests
* fixed typo in comments; reduced the number of models in the eval config
* fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset
* removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English
2024-04-30 10:51:01 +08:00
liushz
a6f67e1a65
[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README ( #1103 )
...
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Fix Llama-3 meta template
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-04-28 21:58:58 +08:00
Yggdrasill7D6
58a57a4c45
[Feature] add support for Flames datasets ( #1093 )
...
* add flames datasets
* fix lint
* rm quota
* add judgemodel info and fix os path
* support flames dataset
* support flames dataset
---------
Co-authored-by: bittersweet1999 <1487910649@qq.com>
2024-04-28 18:56:24 +08:00
Francis-llgg
f1ee11de14
[Feature] Add gpqa prompt from simple_evals, openai ( #1080 )
...
* add gpqa_openai_simple_eval
* 触发CI构建
* reorg
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 20:13:00 +08:00
klein
e4830a6926
Update CIBench ( #1089 )
...
* modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4
* update cibench: dataset and evluation
* cibench summarizer bug
* update cibench
* move extract_code import
---------
Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 18:46:02 +08:00
bittersweet1999
e404b72c52
[Feature] support arenahard evaluation ( #1096 )
...
* support arenahard
* support arenahard
* support arenahard
2024-04-26 15:42:00 +08:00
bittersweet1999
6ba1c4937d
[Feature] Support Math evaluation via judgemodel ( #1094 )
...
* support openai math evaluation
* support openai math evaluation
* support openai math evaluation
* support math llm judge
* support math llm judge
2024-04-26 14:56:23 +08:00
Jingming Zhuo
41196c48ae
Add humaneval prompt from simple_evals, openai ( #1076 )
...
* [Feature] Add IFEval
* add humaneval prompt from simple_evals, openai
2024-04-24 17:40:50 +08:00
Fengzhe Zhou
004ed79593
[Feature] Add TheoremQA with 5-shot ( #1048 )
...
* add TheoremQA with 5-shot
* cherry pick from add-huggingface-above-v4.33, good TheoremQA results
2024-04-22 15:22:04 +08:00
bittersweet1999
6f98c8d9ab
[Fix] Fix MultiRound Subjective Evaluation( #1043 )
...
* fix multiround
* fix
2024-04-22 12:06:03 +08:00
Fengzhe Zhou
8c85edd1cd
[Sync] deprecate old mbpps ( #1064 )
2024-04-19 20:49:46 +08:00
liuwei130
a00e57296f
[Feature] Add ChemBench ( #1032 )
...
* add ChemBench
* update results
* molbench -> ChemBench
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-12 08:46:26 +08:00
Fengzhe Zhou
b39f501563
[Sync] update taco ( #1030 )
2024-04-09 17:50:23 +08:00