Commit Graph

550 Commits

Author SHA1 Message Date
bittersweet1999
31afe87026
fix yi-chat template (#1178) 2024-05-21 18:14:12 +08:00
liushz
1448be00e2
Update MathBench (#1176)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Update acclerator

* Update MathBench

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-21 14:45:43 +08:00
Fengzhe Zhou
2b3d4150f3
[Sync] update evaluator (#1175) 2024-05-21 14:22:46 +08:00
zhulinJulia24
296ea59931
Update daily-run-test.yml (#1173) 2024-05-20 14:04:58 +08:00
Fengzhe Zhou
5de85406ce
[Sync] add OC16 entry (#1171) 2024-05-17 16:50:58 +08:00
zhulinJulia24
94eb90569f
update test workflow (#1167)
* Update pr-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update pr-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update daily-run-test.yml

* Update oc_score_baseline.yaml

* Update daily-run-test.yml

* Update oc_score_assert.py

---------

Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
2024-05-16 15:32:57 +08:00
Fengzhe Zhou
8ea2c404d7
[Feat] enable HuggingFacewithChatTemplate with --accelerator via cli (#1163)
* enable HuggingFacewithChatTemplate with --accelerator via cli

* rm vllm_internlm2_chat_7b
2024-05-15 21:51:07 +08:00
liushz
e3c0448bbc
Update accelerator (#1152)
* Update acclerator

* update run

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
2024-05-15 14:31:47 +08:00
Fengzhe Zhou
f10dd48f9c
[Fix] Update stop_words in huggingface_above_v4_33 (#1160) 2024-05-15 14:10:33 +08:00
Fengzhe Zhou
80f831b425
[Fix] use ProcessPoolExecutor during mbpp eval (#1159) 2024-05-15 13:48:29 +08:00
bittersweet1999
8a8987be0b
fix arenahard summarizer (#1154)
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-15 13:31:29 +08:00
Fengzhe Zhou
62dbf04708
[Sync] update github workflow (#1156) 2024-05-14 22:42:23 +08:00
Fengzhe Zhou
aa2dd2b58c
[Format] Add config lints (#892) 2024-05-14 15:35:58 +08:00
Xu Song
3dbba11945
[Feat] Support dataset_suffix check for mixed configs (#973)
* [Feat] Support dataset_suffix check for mixed configs

* update mixed suffix

* update suffix

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-14 15:03:28 +08:00
Fengzhe Zhou
7505b3cadf
[Feature] Add huggingface apply_chat_template (#1098)
* add TheoremQA with 5-shot

* add huggingface_above_v4_33 classes

* use num_worker partitioner in cli

* update theoremqa

* update TheoremQA

* add TheoremQA

* rename theoremqa -> TheoremQA

* update TheoremQA output path

* rewrite many model configs

* update huggingface

* further update

* refine configs

* update configs

* update configs

* add configs/eval_llama3_instruct.py

* add summarizer multi faceted

* update bbh datasets

* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py

* rename class

* update readme

* update hf above v4.33
2024-05-14 14:50:16 +08:00
Mo Li
6c711cb262
[Fix] Fix Needlebench Summarizer (#1143)
* update few-shot example

* add 128k
2024-05-13 15:59:34 +08:00
bittersweet1999
5432dfc1ff
fix multiround (#1146) 2024-05-13 15:58:39 +08:00
bittersweet1999
833a35140b
[Fix] fix alpacaeval while add caching path (#1139)
* fix alpacaeval

* fix alpacaeval
2024-05-11 14:02:26 +08:00
Fengzhe Zhou
19d7e630d6
[Sync] Update accelerator (#1122)
(cherry picked from commit 4beb6d9ab655d8a626971841b7acfd9fae9d438f)

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-05-09 14:32:31 +08:00
Alexander Lam
a71122ee18
[Feature] Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs (#1123)
* added qwen moe and mixtral 8x22 model configs

* updated README files news section
2024-05-09 11:04:26 +08:00
Mo Li
cb080fa7de
[Fix] Fix NeedleBench Summarizer Typo (#1125)
* update needleinahaystack eval docs

* update needlebench summarizer

* fix english docs typo
2024-05-08 20:00:15 +08:00
bittersweet1999
826d8307ac
fix links (#1120) 2024-05-08 15:13:18 +08:00
JuhaoLiang
d2c40e5648
[Feature] Add AceGPT-MMLUArabic benchmark (#1099)
* add AceGPT-MMLUArabic benchmark

* update readme and fix lint issue

* remove unused package

* add MMLUArabic zero-shot settings

* rename filename and update readme
2024-05-08 15:00:26 +08:00
Fangyu Lei
862044fb7d
[Feature] Add S3Eval Dataset (#916)
* s3eval_branch

* update s3eval
2024-05-06 19:41:52 +08:00
Xu Song
d501710155
[Fix] Fix AGIEval chinese sets (#972)
* [Fix] Fix AGIEval chinese sets

* Create agieval_gen_617738.py

* [Fix] Fix AGIEval chinese sets

* Restore agieval_gen_64afd3.py

* Update agieval_gen.py

* Create agieval_mixed_0fa998.py

* Update agieval_mixed.py
2024-05-06 15:31:42 +08:00
Yggdrasill7D6
af10ecc272
add mgsm datasets (#1081)
* add mgsm datasets

* fix lint

* fix lint

* update mgsm

* update mgsm

* ease code spell

* update

* update

* update

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-06 15:29:34 +08:00
klein
153c4fc988
[Feature] update drop dataset from openai simple eval (#1092)
* [Feature] update drop dataset from openai simple eval

* update drop template presentation

* update

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-06 13:37:08 +08:00
Fengzhe Zhou
d43392a3bb
[Feature] Add mmlu prompt from simple_evals, openai (#1074)
* add mmlu prompt from simple_evals, openai

* return empty str on failure
2024-05-06 13:26:26 +08:00
Yang Yong
53fe390454
fix LightllmApi workers bug (#1113) 2024-04-30 22:09:22 +08:00
Fengzhe Zhou
baed2ed9b8
update pre-commit (#891) 2024-04-30 10:59:41 +08:00
Alexander Lam
35c94d0cde
[Feature] Adding support for LLM Compression Evaluation (#1108)
* fixed formatting based on pre-commit tests

* fixed typo in comments; reduced the number of models in the eval config

* fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset

* removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English
2024-04-30 10:51:01 +08:00
Ikko Eltociear Ashimine
9c79224b39
[Docs] Update README.md (#1110)
requiresments -> requirements
2024-04-30 00:45:33 +08:00
bittersweet1999
3de48e9b35
[Bug] Fix CMB dataset (#1106) 2024-04-30 00:33:43 +08:00
Songyang Zhang
063f5f5f49
[Update] Update performance of common benchmarks (#1109)
* [Update] Update performance of common benchmarks

* [Update] Update performance of common benchmarks

* [Update] Update performance of common benchmarks
2024-04-30 00:09:08 +08:00
liushz
a6f67e1a65
[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README (#1103)
* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Add Math Evaluation with Judge Model Evaluator

* Fix Llama-3 meta template

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

* Fix MATH with JudgeLM Evaluation

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-04-28 21:58:58 +08:00
bittersweet1999
0b7de67c4a
fix prompt template (#1104) 2024-04-28 21:54:30 +08:00
Lyu Han
1013dce60c
adapt to lmdeploy v0.4.0 (#1073)
* adapt to lmdeploy v0.4.0

* compatible
2024-04-28 19:57:40 +08:00
Yggdrasill7D6
58a57a4c45
[Feature] add support for Flames datasets (#1093)
* add flames datasets

* fix lint

* rm quota

* add judgemodel info and fix os path

* support flames dataset

* support flames dataset

---------

Co-authored-by: bittersweet1999 <1487910649@qq.com>
2024-04-28 18:56:24 +08:00
Mo Li
76dd814c4d
[Doc] Update NeedleInAHaystack Docs (#1102)
* update NeedleInAHaystack Test Docs

* update docs
2024-04-28 18:51:47 +08:00
dmitrysarov
cce5b6fbb6
fix output typing, change mutable list to immutable tuple (#989)
* fix output typing, change mutable list to immutable tuple

* import missed type

* format

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 23:07:34 +08:00
binary-husky
701ecbb292
[Fix] python path bug (#1063)
* fix relative path bug

* format

---------

Co-authored-by: hmp <505030475@qq.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 21:58:45 +08:00
Wang Xingjin
048d41a1c4
add vllm get_ppl (#1003)
* add vllm get_ppl

* add vllm get_ppl

* format

---------

Co-authored-by: xingjin.wang <xingjin.wang@mihoyo.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 21:31:56 +08:00
Haodong Duan
3a232db471
[Deperecate] Remove multi-modal related stuff (#1072)
* Remove MultiModal

* update index.rst

* update README

* remove mmbench codes

* update news

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 21:20:14 +08:00
Francis-llgg
f1ee11de14
[Feature] Add gpqa prompt from simple_evals, openai (#1080)
* add gpqa_openai_simple_eval

* 触发CI构建

* reorg

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 20:13:00 +08:00
klein
e4830a6926
Update CIBench (#1089)
* modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4

* update cibench: dataset and evluation

* cibench summarizer bug

* update cibench

* move extract_code import

---------

Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 18:46:02 +08:00
bittersweet1999
e404b72c52
[Feature] support arenahard evaluation (#1096)
* support arenahard

* support arenahard

* support arenahard
2024-04-26 15:42:00 +08:00
bittersweet1999
6ba1c4937d
[Feature] Support Math evaluation via judgemodel (#1094)
* support openai math evaluation

* support openai math evaluation

* support openai math evaluation

* support math llm judge

* support math llm judge
2024-04-26 14:56:23 +08:00
Jingming Zhuo
41196c48ae
Add humaneval prompt from simple_evals, openai (#1076)
* [Feature] Add IFEval

* add humaneval prompt from simple_evals, openai
2024-04-24 17:40:50 +08:00
liushz
17735f0c13
Fix Llama-3 meta template (#1079)
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-04-24 16:46:25 +08:00
Ke Bao
81d0e4d793
[Feature] Add lmdeploy tis python backend model (#1014)
* add lmdeploy tis python backend model

* fix pr check

* update
2024-04-23 14:27:11 +08:00