..
adv_glue
[Feat] support adv_glue dataset for adversarial robustness ( #205 )
2023-08-16 18:42:06 +08:00
agieval
[Sync] update ( #517 )
2023-10-27 20:31:22 +08:00
anli
[Feature] Add Xiezhi SQuAD2.0 ANLI ( #101 )
2023-08-10 14:04:18 +08:00
anthropics_evals
[Feat] support antropics evals dataset ( #422 )
2023-09-20 18:36:44 +08:00
apps
[Sync] update taco ( #1030 )
2024-04-09 17:50:23 +08:00
ARC_c
[Feature] Contamination analysis for MMLU, Hellaswag, and ARC_c ( #699 )
2024-01-08 15:51:48 +08:00
ARC_e
[Feat] update postprocessor to get first option more accurately ( #193 )
2023-08-11 17:33:00 +08:00
bbh
[Sync] Sync Internal ( #941 )
2024-03-04 14:42:36 +08:00
ceval
[Sync] update taco ( #1030 )
2024-04-09 17:50:23 +08:00
ChemBench
[Feature] Add ChemBench ( #1032 )
2024-04-12 08:46:26 +08:00
CIBench
Update CIBench ( #1089 )
2024-04-26 18:46:02 +08:00
civilcomments
[Feat] add safety to collections ( #185 )
2023-08-11 11:19:26 +08:00
clozeTest_maxmin
[Feature] Add py150 and maxmin ( #562 )
2023-11-09 22:05:25 +08:00
CLUE_afqmc
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
CLUE_C3
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
CLUE_cmnli
[Sync] Fix cmnli, fix vicuna meta template, fix longbench postprocess and other minor fixes ( #625 )
2023-11-23 14:05:59 +08:00
CLUE_CMRC
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
CLUE_DRCD
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
CLUE_ocnli
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
cmb
[Feature] Update cmb ( #571 )
2023-11-13 00:09:05 +08:00
cmmlu
[Sync] Sync Internal ( #941 )
2024-03-04 14:42:36 +08:00
collections
[Sync] deprecate old mbpps ( #1064 )
2024-04-19 20:49:46 +08:00
commonsenseqa
[Sync] some renaming ( #641 )
2023-11-27 16:06:49 +08:00
commonsenseqa_cn
[Feature] Add Chinese version: commonsenseqa, crowspairs and nq ( #144 )
2023-11-30 15:33:02 +08:00
contamination
[Sync] Merge branch 'dev' into zfz/update-keyset-demo ( #876 )
2024-02-05 23:29:10 +08:00
crowspairs
[Feature] Add LEval datasets
2023-08-11 17:38:31 +08:00
crowspairs_cn
[Feature] Add Chinese version: commonsenseqa, crowspairs and nq ( #144 )
2023-11-30 15:33:02 +08:00
cvalues
[Feature] Add LEval datasets
2023-08-11 17:38:31 +08:00
drop
[Fix] fix typos in drop prompt ( #773 )
2024-01-08 14:22:35 +08:00
ds1000
[Sync] minor test ( #683 )
2023-12-11 17:42:53 +08:00
FewCLUE_bustm
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
FewCLUE_chid
[Feature] Add logger info and remove dataset bugs ( #61 )
2023-07-17 14:26:30 +08:00
FewCLUE_cluewsc
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
FewCLUE_csl
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
FewCLUE_eprstmt
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
FewCLUE_ocnli_fc
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
FewCLUE_tnews
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
FinanceIQ
[Fix] FinanceIQ_datasets import error ( #939 )
2024-03-05 20:32:24 +08:00
flames
[Feature] add support for Flames datasets ( #1093 )
2024-04-28 18:56:24 +08:00
flores
[Feature] Use dataset in local path ( #570 )
2023-11-13 13:00:37 +08:00
game24
[Feature] Add and apply update suffix tool ( #280 )
2023-08-28 17:35:04 +08:00
GaokaoBench
[Sync] Sync Internal ( #941 )
2024-03-04 14:42:36 +08:00
GLUE_CoLA
[Docs] fix dataset name error ( #533 )
2023-11-10 18:54:20 +08:00
GLUE_MRPC
[Sync] update model configs ( #574 )
2023-11-13 15:15:34 +08:00
GLUE_QQP
[Refactor] Move fix_id_list to Retriever ( #442 )
2023-10-07 12:53:41 +08:00
govrepcrs
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
gpqa
[Feature] Add gpqa prompt from simple_evals, openai ( #1080 )
2024-04-26 20:13:00 +08:00
gsm8k
[Sync] update taco ( #1030 )
2024-04-09 17:50:23 +08:00
gsm8k_contamination
[Sync] minor test ( #683 )
2023-12-11 17:42:53 +08:00
gsm_hard
[Feature] Add GSM_Hard dataset ( #619 )
2023-11-27 17:40:34 +08:00
hellaswag
[Sync] Sync Internal ( #941 )
2024-03-04 14:42:36 +08:00
humaneval
Add humaneval prompt from simple_evals, openai ( #1076 )
2024-04-24 17:40:50 +08:00
humaneval_cn
[Feat] update code config ( #749 )
2023-12-29 18:46:34 +08:00
humaneval_multi
[feat] support multipl-e ( #846 )
2024-02-06 23:30:28 +08:00
humaneval_plus
[Fix] fix a bug of humanevalplus config ( #944 )
2024-03-05 11:37:17 +08:00
humanevalx
[Sync] Add InternLM2 Keyset Evaluation Demo ( #807 )
2024-01-17 13:48:12 +08:00
hungarian_exam
[Sync] Sync with internal codes 2023.01.08 ( #777 )
2024-01-08 14:07:24 +00:00
IFEval
[Fix] fix ifeval ( #909 )
2024-02-23 16:52:03 +08:00
infinitebench
[Feature] Add InfiniteBench ( #739 )
2023-12-26 15:36:27 +08:00
iwslt2017
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
jigsawmultilingual
[Feat] add safety to collections ( #185 )
2023-08-11 11:19:26 +08:00
kaoshi
[Feature] Add kaoshi dataset ( #392 )
2023-09-22 18:46:33 +08:00
lambada
[Feature] Use dataset in local path ( #570 )
2023-11-13 13:00:37 +08:00
lawbench
[Feature] Add lawbench ( #460 )
2023-10-13 06:51:36 -05:00
lcsts
[Fix] Use jieba rouge in lcsts ( #459 )
2023-10-09 10:10:33 +08:00
leval
[Sync] Update LongEval ( #443 )
2023-09-27 16:32:40 +08:00
longbench
[Sync] Sync with internal codes 2023.01.08 ( #777 )
2024-01-08 14:07:24 +00:00
lveval
[Feature] add lveval benchmark ( #914 )
2024-03-04 11:22:03 +08:00
mastermath2024v1
[Feature] Add new dataset mastermath2024v1 ( #744 )
2024-01-01 15:53:24 +08:00
math
[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README ( #1103 )
2024-04-28 21:58:58 +08:00
math401
[Sync] Sync with internal codes 2023.01.08 ( #777 )
2024-01-08 14:07:24 +00:00
MathBench
[Sync] update taco ( #1030 )
2024-04-09 17:50:23 +08:00
mbpp
[Sync] deprecate old mbpps ( #1064 )
2024-04-19 20:49:46 +08:00
mbpp_cn
[Sync] deprecate old mbpps ( #1064 )
2024-04-19 20:49:46 +08:00
mbpp_plus
[Sync] deprecate old mbpps ( #1064 )
2024-04-19 20:49:46 +08:00
MedBench
[Fix] Update MedBench ( #845 )
2024-01-26 17:56:13 +08:00
mmlu
[Sync] Sync Internal ( #941 )
2024-03-04 14:42:36 +08:00
narrativeqa
Align prompt files with their hash ( #1 )
2023-07-05 18:28:58 +08:00
needlebench
[Sync] deprecate old mbpps ( #1064 )
2024-04-19 20:49:46 +08:00
NPHardEval
Support NPHardEval ( #835 )
2024-02-05 15:52:28 +08:00
nq
[Sync] Sync Internal ( #941 )
2024-03-04 14:42:36 +08:00
nq_cn
[Feature] Add Chinese version: commonsenseqa, crowspairs and nq ( #144 )
2023-11-30 15:33:02 +08:00
obqa
[Feature] Use dataset in local path ( #570 )
2023-11-13 13:00:37 +08:00
OpenFinData
[Feature] Support OpenFinData ( #896 )
2024-02-29 12:55:07 +08:00
piqa
[Feature] Use dataset in local path ( #570 )
2023-11-13 13:00:37 +08:00
PJExam
Align prompt files with their hash ( #1 )
2023-07-05 18:28:58 +08:00
promptbench
[Feat] implementation for support promptbench ( #239 )
2023-09-15 15:06:53 +08:00
py150
[Feature] Add py150 and maxmin ( #562 )
2023-11-09 22:05:25 +08:00
qabench
Align prompt files with their hash ( #1 )
2023-07-05 18:28:58 +08:00
qasper
Align prompt files with their hash ( #1 )
2023-07-05 18:28:58 +08:00
qaspercut
Align prompt files with their hash ( #1 )
2023-07-05 18:28:58 +08:00
QuALITY
[Feature] Add the implement of QuALITY datasets ( #976 )
2024-03-15 21:22:38 +08:00
race
[Sync] Sync Internal ( #941 )
2024-03-04 14:42:36 +08:00
realtoxicprompts
[Feat] add safety to collections ( #185 )
2023-08-11 11:19:26 +08:00
rolebench
added rolebench dataset. ( #633 )
2023-12-01 22:54:42 +08:00
safety
Align prompt files with their hash ( #1 )
2023-07-05 18:28:58 +08:00
scibench
add evaluation of scibench ( #393 )
2023-09-22 17:42:08 +08:00
siqa
[Sync] Merge branch 'dev' into zfz/update-keyset-demo ( #876 )
2024-02-05 23:29:10 +08:00
squad20
[Feature] Add Xiezhi SQuAD2.0 ANLI ( #101 )
2023-08-10 14:04:18 +08:00
storycloze
[Feature] Use dataset in local path ( #570 )
2023-11-13 13:00:37 +08:00
strategyqa
[Feature] Use dataset in local path ( #570 )
2023-11-13 13:00:37 +08:00
subjective
[Feature] support arenahard evaluation ( #1096 )
2024-04-26 15:42:00 +08:00
summedits
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
summscreen
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
SuperGLUE_AX_b
[Feat] update postprocessor to get first option more accurately ( #193 )
2023-08-11 17:33:00 +08:00
SuperGLUE_AX_g
[Feat] update postprocessor to get first option more accurately ( #193 )
2023-08-11 17:33:00 +08:00
SuperGLUE_BoolQ
[Feature] add llama-oriented dataset configs ( #82 )
2023-08-11 12:48:05 +08:00
SuperGLUE_CB
[Feat] update postprocessor to get first option more accurately ( #193 )
2023-08-11 17:33:00 +08:00
SuperGLUE_COPA
[Feat] update postprocessor to get first option more accurately ( #193 )
2023-08-11 17:33:00 +08:00
SuperGLUE_MultiRC
[Feat] update postprocessor to get first option more accurately ( #193 )
2023-08-11 17:33:00 +08:00
SuperGLUE_ReCoRD
[Feature] Add qwen & qwen-chat support ( #286 )
2023-08-31 11:29:05 +08:00
SuperGLUE_RTE
[Feat] update postprocessor to get first option more accurately ( #193 )
2023-08-11 17:33:00 +08:00
SuperGLUE_WiC
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
SuperGLUE_WSC
[Fix] Fix typo in WSC prompt ( #520 )
2023-10-30 12:16:26 +08:00
SVAMP
[Feature] Add SVAMP dataset ( #604 )
2023-11-22 14:54:39 +08:00
TabMWP
[fFeat] Add an opensource dataset Tabmwp ( #505 )
2023-11-03 11:15:46 +08:00
taco
[Sync] update taco ( #1030 )
2024-04-09 17:50:23 +08:00
teval
[Sync] Merge branch 'dev' into zfz/update-keyset-demo ( #876 )
2024-02-05 23:29:10 +08:00
TheoremQA
[Feature] Add TheoremQA with 5-shot ( #1048 )
2024-04-22 15:22:04 +08:00
triviaqa
[Sync] Sync Internal ( #941 )
2024-03-04 14:42:36 +08:00
triviaqarc
Align prompt files with their hash ( #1 )
2023-07-05 18:28:58 +08:00
truthfulqa
[Feat] refine docs and codes for more user guides ( #409 )
2023-09-18 16:12:13 +08:00
tydiqa
update word spell ( #594 )
2023-11-15 15:23:58 +08:00
wikibench
[Feature] Add wikibench dataset ( #655 )
2023-12-01 14:56:54 +08:00
wikitext
[SIG] add WikiText-2&103 ( #397 )
2023-09-26 14:31:15 +08:00
winograd
Align prompt files with their hash ( #1 )
2023-07-05 18:28:58 +08:00
winogrande
[Sync] update 20240308 ( #953 )
2024-03-11 22:34:19 +08:00
XCOPA
Align prompt files with their hash ( #1 )
2023-07-05 18:28:58 +08:00
xiezhi
[Feature] Add Xiezhi SQuAD2.0 ANLI ( #101 )
2023-08-10 14:04:18 +08:00
XLSum
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
Xsum
Update configs ( #9 )
2023-07-06 12:27:41 +08:00
z_bench
[Feature] Add and apply update suffix tool ( #280 )
2023-08-28 17:35:04 +08:00