Commit Graph

558 Commits

Author SHA1 Message Date
Fengzhe Zhou
2954913d9b
[Sync] bump version (#1204) 2024-05-28 23:09:59 +08:00
liushz
ba620c4afe
Update accelerator (#1195)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

* Update accelerator

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-28 17:17:54 +08:00
Fengzhe Zhou
9fa80b0f93
[Feat] Update charm summary (#1194) 2024-05-27 16:17:01 +08:00
jxd
608ff5810d
support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks (#1190)
* support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks

* fix lint error

* add dataset card for CHARM

* minor refactor

* add txt

---------

Co-authored-by: wujiang <wujiang@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-27 13:48:22 +08:00
bittersweet1999
07a6dacf33
fix length (#1180) 2024-05-24 23:30:01 +08:00
bittersweet1999
88c14d3d04
add support for lmdeploy api judge (#1193) 2024-05-24 23:28:56 +08:00
yaoyingyy
749e4cea71
[Fix] temporary files using tempfile (#1186)
Co-authored-by: yaoying <yaoying@kingsoft.com>
2024-05-24 23:27:37 +08:00
klein
5eb8f14d97
[Fix] Fix drop_gen.py (#1191)
Fix the bug in drop_gen: wrong import
2024-05-24 23:17:50 +08:00
bittersweet1999
31afe87026
fix yi-chat template (#1178) 2024-05-21 18:14:12 +08:00
liushz
1448be00e2
Update MathBench (#1176)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-21 14:45:43 +08:00
Fengzhe Zhou
2b3d4150f3
[Sync] update evaluator (#1175) 2024-05-21 14:22:46 +08:00
zhulinJulia24
296ea59931
Update daily-run-test.yml (#1173) 2024-05-20 14:04:58 +08:00
Fengzhe Zhou
5de85406ce
[Sync] add OC16 entry (#1171) 2024-05-17 16:50:58 +08:00
zhulinJulia24
94eb90569f
update test workflow (#1167)
* Update pr-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update pr-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update oc_score_baseline.yaml

* Update daily-run-test.yml

* Update oc_score_assert.py

---------

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-05-16 15:32:57 +08:00
Fengzhe Zhou
8ea2c404d7
[Feat] enable HuggingFacewithChatTemplate with --accelerator via cli (#1163)
* enable HuggingFacewithChatTemplate with --accelerator via cli

* rm vllm_internlm2_chat_7b
2024-05-15 21:51:07 +08:00
liushz
e3c0448bbc
Update accelerator (#1152)
* Update acclerator

* update run

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
2024-05-15 14:31:47 +08:00
Fengzhe Zhou
f10dd48f9c
[Fix] Update stop_words in huggingface_above_v4_33 (#1160) 2024-05-15 14:10:33 +08:00
Fengzhe Zhou
80f831b425
[Fix] use ProcessPoolExecutor during mbpp eval (#1159) 2024-05-15 13:48:29 +08:00
bittersweet1999
8a8987be0b
fix arenahard summarizer (#1154)
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-15 13:31:29 +08:00
Fengzhe Zhou
62dbf04708
[Sync] update github workflow (#1156) 2024-05-14 22:42:23 +08:00
Fengzhe Zhou
aa2dd2b58c
[Format] Add config lints (#892) 2024-05-14 15:35:58 +08:00
Xu Song
3dbba11945
[Feat] Support dataset_suffix check for mixed configs (#973)
* [Feat] Support dataset_suffix check for mixed configs

* update mixed suffix

* update suffix

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-14 15:03:28 +08:00
Fengzhe Zhou
7505b3cadf
[Feature] Add huggingface apply_chat_template (#1098)
* add TheoremQA with 5-shot

* add huggingface_above_v4_33 classes

* use num_worker partitioner in cli

* update theoremqa

* update TheoremQA

* add TheoremQA

* rename theoremqa -> TheoremQA

* update TheoremQA output path

* rewrite many model configs

* update huggingface

* further update

* refine configs

* update configs

* update configs

* add configs/eval_llama3_instruct.py

* add summarizer multi faceted

* update bbh datasets

* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py

* rename class

* update readme

* update hf above v4.33
2024-05-14 14:50:16 +08:00
Mo Li
6c711cb262
[Fix] Fix Needlebench Summarizer (#1143)
* update few-shot example

* add 128k
2024-05-13 15:59:34 +08:00
bittersweet1999
5432dfc1ff
fix multiround (#1146) 2024-05-13 15:58:39 +08:00
bittersweet1999
833a35140b
[Fix] fix alpacaeval while add caching path (#1139)
* fix alpacaeval

* fix alpacaeval
2024-05-11 14:02:26 +08:00
Fengzhe Zhou
19d7e630d6
[Sync] Update accelerator (#1122)
(cherry picked from commit 4beb6d9ab655d8a626971841b7acfd9fae9d438f)

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-09 14:32:31 +08:00
Alexander Lam
a71122ee18
[Feature] Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs (#1123)
* added qwen moe and mixtral 8x22 model configs

* updated README files news section
2024-05-09 11:04:26 +08:00
Mo Li
cb080fa7de
[Fix] Fix NeedleBench Summarizer Typo (#1125)
* update needleinahaystack eval docs

* update needlebench summarizer

* fix english docs typo
2024-05-08 20:00:15 +08:00
bittersweet1999
826d8307ac
fix links (#1120) 2024-05-08 15:13:18 +08:00
JuhaoLiang
d2c40e5648
[Feature] Add AceGPT-MMLUArabic benchmark (#1099)
* add AceGPT-MMLUArabic benchmark

* update readme and fix lint issue

* remove unused package

* add MMLUArabic zero-shot settings

* rename filename and update readme
2024-05-08 15:00:26 +08:00
Fangyu Lei
862044fb7d
[Feature] Add S3Eval Dataset (#916)
* s3eval_branch

* update s3eval
2024-05-06 19:41:52 +08:00
Xu Song
d501710155
[Fix] Fix AGIEval chinese sets (#972)
* [Fix] Fix AGIEval chinese sets

* Create agieval_gen_617738.py

* [Fix] Fix AGIEval chinese sets

* Restore agieval_gen_64afd3.py

* Update agieval_gen.py

* Create agieval_mixed_0fa998.py

* Update agieval_mixed.py
2024-05-06 15:31:42 +08:00
Yggdrasill7D6
af10ecc272
add mgsm datasets (#1081)
* add mgsm datasets

* fix lint

* fix lint

* update mgsm

* update mgsm

* ease code spell

* update

* update

* update

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-06 15:29:34 +08:00
klein
153c4fc988
[Feature] update drop dataset from openai simple eval (#1092)
* [Feature] update drop dataset from openai simple eval

* update drop template presentation

* update

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-06 13:37:08 +08:00
Fengzhe Zhou
d43392a3bb
[Feature] Add mmlu prompt from simple_evals, openai (#1074)
* add mmlu prompt from simple_evals, openai

* return empty str on failure
2024-05-06 13:26:26 +08:00
Yang Yong
53fe390454
fix LightllmApi workers bug (#1113) 2024-04-30 22:09:22 +08:00
Fengzhe Zhou
baed2ed9b8
update pre-commit (#891) 2024-04-30 10:59:41 +08:00
Alexander Lam
35c94d0cde
[Feature] Adding support for LLM Compression Evaluation (#1108)
* fixed formatting based on pre-commit tests

* fixed typo in comments; reduced the number of models in the eval config

* fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset

* removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English
2024-04-30 10:51:01 +08:00
Ikko Eltociear Ashimine
9c79224b39
[Docs] Update README.md (#1110)
requiresments -> requirements
2024-04-30 00:45:33 +08:00
bittersweet1999
3de48e9b35
[Bug] Fix CMB dataset (#1106) 2024-04-30 00:33:43 +08:00
Songyang Zhang
063f5f5f49
[Update] Update performance of common benchmarks (#1109)
* [Update] Update performance of common benchmarks

* [Update] Update performance of common benchmarks

* [Update] Update performance of common benchmarks
2024-04-30 00:09:08 +08:00
liushz
a6f67e1a65
[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README (#1103)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-04-28 21:58:58 +08:00
bittersweet1999
0b7de67c4a
fix prompt template (#1104) 2024-04-28 21:54:30 +08:00
Lyu Han
1013dce60c
adapt to lmdeploy v0.4.0 (#1073)
* adapt to lmdeploy v0.4.0

* compatible
2024-04-28 19:57:40 +08:00
Yggdrasill7D6
58a57a4c45
[Feature] add support for Flames datasets (#1093)
* add flames datasets

* fix lint

* rm quota

* add judgemodel info and fix os path

* support flames dataset

* support flames dataset

---------

Co-authored-by: bittersweet1999 <1487910649@qq.com>
2024-04-28 18:56:24 +08:00
Mo Li
76dd814c4d
[Doc] Update NeedleInAHaystack Docs (#1102)
* update NeedleInAHaystack Test Docs

* update docs
2024-04-28 18:51:47 +08:00
dmitrysarov
cce5b6fbb6
fix output typing, change mutable list to immutable tuple (#989)
* fix output typing, change mutable list to immutable tuple

* import missed type

* format

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 23:07:34 +08:00
binary-husky
701ecbb292
[Fix] python path bug (#1063)
* fix relative path bug

* format

---------

Co-authored-by: hmp <505030475@qq.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 21:58:45 +08:00
Wang Xingjin
048d41a1c4
add vllm get_ppl (#1003)
* add vllm get_ppl

* add vllm get_ppl

* format

---------

Co-authored-by: xingjin.wang <xingjin.wang@mihoyo.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 21:31:56 +08:00