.. |
adv_glue
|
[Feat] support adv_glue dataset for adversarial robustness (#205)
|
2023-08-16 18:42:06 +08:00 |
agieval
|
[Fix] Fix AGIEval chinese sets (#972)
|
2024-05-06 15:31:42 +08:00 |
anli
|
[Feature] Add Xiezhi SQuAD2.0 ANLI (#101)
|
2023-08-10 14:04:18 +08:00 |
anthropics_evals
|
[Feat] support antropics evals dataset (#422)
|
2023-09-20 18:36:44 +08:00 |
apps
|
[Sync] update taco (#1030)
|
2024-04-09 17:50:23 +08:00 |
ARC_c
|
[Feature] Contamination analysis for MMLU, Hellaswag, and ARC_c (#699)
|
2024-01-08 15:51:48 +08:00 |
ARC_e
|
[Feat] update postprocessor to get first option more accurately (#193)
|
2023-08-11 17:33:00 +08:00 |
bbh
|
[Sync] Sync Internal (#941)
|
2024-03-04 14:42:36 +08:00 |
ceval
|
[Sync] update taco (#1030)
|
2024-04-09 17:50:23 +08:00 |
ChemBench
|
[Feature] Add ChemBench (#1032)
|
2024-04-12 08:46:26 +08:00 |
CIBench
|
Update CIBench (#1089)
|
2024-04-26 18:46:02 +08:00 |
civilcomments
|
[Feat] add safety to collections (#185)
|
2023-08-11 11:19:26 +08:00 |
clozeTest_maxmin
|
[Feature] Add py150 and maxmin (#562)
|
2023-11-09 22:05:25 +08:00 |
CLUE_afqmc
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
CLUE_C3
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
CLUE_cmnli
|
[Sync] Fix cmnli, fix vicuna meta template, fix longbench postprocess and other minor fixes (#625)
|
2023-11-23 14:05:59 +08:00 |
CLUE_CMRC
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
CLUE_DRCD
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
CLUE_ocnli
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
cmb
|
[Feature] Update cmb (#571)
|
2023-11-13 00:09:05 +08:00 |
cmmlu
|
[Sync] Sync Internal (#941)
|
2024-03-04 14:42:36 +08:00 |
collections
|
[Sync] deprecate old mbpps (#1064)
|
2024-04-19 20:49:46 +08:00 |
commonsenseqa
|
[Sync] some renaming (#641)
|
2023-11-27 16:06:49 +08:00 |
commonsenseqa_cn
|
[Feature] Add Chinese version: commonsenseqa, crowspairs and nq (#144)
|
2023-11-30 15:33:02 +08:00 |
contamination
|
[Sync] Merge branch 'dev' into zfz/update-keyset-demo (#876)
|
2024-02-05 23:29:10 +08:00 |
crowspairs
|
[Feature] Add LEval datasets
|
2023-08-11 17:38:31 +08:00 |
crowspairs_cn
|
[Feature] Add Chinese version: commonsenseqa, crowspairs and nq (#144)
|
2023-11-30 15:33:02 +08:00 |
cvalues
|
[Feature] Add LEval datasets
|
2023-08-11 17:38:31 +08:00 |
drop
|
[Feature] update drop dataset from openai simple eval (#1092)
|
2024-05-06 13:37:08 +08:00 |
ds1000
|
[Sync] minor test (#683)
|
2023-12-11 17:42:53 +08:00 |
FewCLUE_bustm
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
FewCLUE_chid
|
[Feature] Add logger info and remove dataset bugs (#61)
|
2023-07-17 14:26:30 +08:00 |
FewCLUE_cluewsc
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
FewCLUE_csl
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
FewCLUE_eprstmt
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
FewCLUE_ocnli_fc
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
FewCLUE_tnews
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
FinanceIQ
|
[Fix] FinanceIQ_datasets import error (#939)
|
2024-03-05 20:32:24 +08:00 |
flames
|
[Feature] add support for Flames datasets (#1093)
|
2024-04-28 18:56:24 +08:00 |
flores
|
[Feature] Use dataset in local path (#570)
|
2023-11-13 13:00:37 +08:00 |
game24
|
[Feature] Add and apply update suffix tool (#280)
|
2023-08-28 17:35:04 +08:00 |
GaokaoBench
|
[Sync] Sync Internal (#941)
|
2024-03-04 14:42:36 +08:00 |
GLUE_CoLA
|
[Docs] fix dataset name error (#533)
|
2023-11-10 18:54:20 +08:00 |
GLUE_MRPC
|
[Sync] update model configs (#574)
|
2023-11-13 15:15:34 +08:00 |
GLUE_QQP
|
[Refactor] Move fix_id_list to Retriever (#442)
|
2023-10-07 12:53:41 +08:00 |
govrepcrs
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
gpqa
|
[Feature] Add gpqa prompt from simple_evals, openai (#1080)
|
2024-04-26 20:13:00 +08:00 |
gsm8k
|
[Sync] update taco (#1030)
|
2024-04-09 17:50:23 +08:00 |
gsm8k_contamination
|
[Sync] minor test (#683)
|
2023-12-11 17:42:53 +08:00 |
gsm_hard
|
[Feature] Add GSM_Hard dataset (#619)
|
2023-11-27 17:40:34 +08:00 |
hellaswag
|
[Sync] Sync Internal (#941)
|
2024-03-04 14:42:36 +08:00 |
humaneval
|
Add humaneval prompt from simple_evals, openai (#1076)
|
2024-04-24 17:40:50 +08:00 |
humaneval_cn
|
[Feat] update code config (#749)
|
2023-12-29 18:46:34 +08:00 |
humaneval_multi
|
[feat] support multipl-e (#846)
|
2024-02-06 23:30:28 +08:00 |
humaneval_plus
|
[Fix] fix a bug of humanevalplus config (#944)
|
2024-03-05 11:37:17 +08:00 |
humanevalx
|
[Sync] Add InternLM2 Keyset Evaluation Demo (#807)
|
2024-01-17 13:48:12 +08:00 |
hungarian_exam
|
[Sync] Sync with internal codes 2023.01.08 (#777)
|
2024-01-08 14:07:24 +00:00 |
IFEval
|
[Fix] fix ifeval (#909)
|
2024-02-23 16:52:03 +08:00 |
infinitebench
|
[Feature] Add InfiniteBench (#739)
|
2023-12-26 15:36:27 +08:00 |
iwslt2017
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
jigsawmultilingual
|
[Feat] add safety to collections (#185)
|
2023-08-11 11:19:26 +08:00 |
kaoshi
|
[Feature] Add kaoshi dataset (#392)
|
2023-09-22 18:46:33 +08:00 |
lambada
|
[Feature] Use dataset in local path (#570)
|
2023-11-13 13:00:37 +08:00 |
lawbench
|
[Feature] Add lawbench (#460)
|
2023-10-13 06:51:36 -05:00 |
lcsts
|
[Fix] Use jieba rouge in lcsts (#459)
|
2023-10-09 10:10:33 +08:00 |
leval
|
[Sync] Update LongEval (#443)
|
2023-09-27 16:32:40 +08:00 |
llm_compression
|
[Feature] Adding support for LLM Compression Evaluation (#1108)
|
2024-04-30 10:51:01 +08:00 |
longbench
|
[Sync] Sync with internal codes 2023.01.08 (#777)
|
2024-01-08 14:07:24 +00:00 |
lveval
|
[Feature] add lveval benchmark (#914)
|
2024-03-04 11:22:03 +08:00 |
mastermath2024v1
|
[Feature] Add new dataset mastermath2024v1 (#744)
|
2024-01-01 15:53:24 +08:00 |
math
|
[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README (#1103)
|
2024-04-28 21:58:58 +08:00 |
math401
|
[Sync] Sync with internal codes 2023.01.08 (#777)
|
2024-01-08 14:07:24 +00:00 |
MathBench
|
[Sync] update taco (#1030)
|
2024-04-09 17:50:23 +08:00 |
mbpp
|
[Sync] deprecate old mbpps (#1064)
|
2024-04-19 20:49:46 +08:00 |
mbpp_cn
|
[Sync] deprecate old mbpps (#1064)
|
2024-04-19 20:49:46 +08:00 |
mbpp_plus
|
[Sync] deprecate old mbpps (#1064)
|
2024-04-19 20:49:46 +08:00 |
MedBench
|
[Fix] Update MedBench (#845)
|
2024-01-26 17:56:13 +08:00 |
mgsm
|
add mgsm datasets (#1081)
|
2024-05-06 15:29:34 +08:00 |
mmlu
|
[Feature] Add mmlu prompt from simple_evals, openai (#1074)
|
2024-05-06 13:26:26 +08:00 |
MMLUArabic
|
[Feature] Add AceGPT-MMLUArabic benchmark (#1099)
|
2024-05-08 15:00:26 +08:00 |
narrativeqa
|
Align prompt files with their hash (#1)
|
2023-07-05 18:28:58 +08:00 |
needlebench
|
[Fix] Fix Needlebench Summarizer (#1143)
|
2024-05-13 15:59:34 +08:00 |
NPHardEval
|
Support NPHardEval (#835)
|
2024-02-05 15:52:28 +08:00 |
nq
|
[Sync] Sync Internal (#941)
|
2024-03-04 14:42:36 +08:00 |
nq_cn
|
[Feature] Add Chinese version: commonsenseqa, crowspairs and nq (#144)
|
2023-11-30 15:33:02 +08:00 |
obqa
|
[Feature] Use dataset in local path (#570)
|
2023-11-13 13:00:37 +08:00 |
OpenFinData
|
[Feature] Support OpenFinData (#896)
|
2024-02-29 12:55:07 +08:00 |
piqa
|
[Feature] Use dataset in local path (#570)
|
2023-11-13 13:00:37 +08:00 |
PJExam
|
Align prompt files with their hash (#1)
|
2023-07-05 18:28:58 +08:00 |
promptbench
|
[Feat] implementation for support promptbench (#239)
|
2023-09-15 15:06:53 +08:00 |
py150
|
[Feature] Add py150 and maxmin (#562)
|
2023-11-09 22:05:25 +08:00 |
qabench
|
Align prompt files with their hash (#1)
|
2023-07-05 18:28:58 +08:00 |
qasper
|
Align prompt files with their hash (#1)
|
2023-07-05 18:28:58 +08:00 |
qaspercut
|
Align prompt files with their hash (#1)
|
2023-07-05 18:28:58 +08:00 |
QuALITY
|
[Feature] Add the implement of QuALITY datasets (#976)
|
2024-03-15 21:22:38 +08:00 |
race
|
[Sync] Sync Internal (#941)
|
2024-03-04 14:42:36 +08:00 |
realtoxicprompts
|
[Feat] add safety to collections (#185)
|
2023-08-11 11:19:26 +08:00 |
rolebench
|
added rolebench dataset. (#633)
|
2023-12-01 22:54:42 +08:00 |
s3eval
|
[Feature] Add S3Eval Dataset (#916)
|
2024-05-06 19:41:52 +08:00 |
safety
|
Align prompt files with their hash (#1)
|
2023-07-05 18:28:58 +08:00 |
scibench
|
add evaluation of scibench (#393)
|
2023-09-22 17:42:08 +08:00 |
siqa
|
[Sync] Merge branch 'dev' into zfz/update-keyset-demo (#876)
|
2024-02-05 23:29:10 +08:00 |
squad20
|
[Feature] Add Xiezhi SQuAD2.0 ANLI (#101)
|
2023-08-10 14:04:18 +08:00 |
storycloze
|
[Feature] Use dataset in local path (#570)
|
2023-11-13 13:00:37 +08:00 |
strategyqa
|
[Feature] Use dataset in local path (#570)
|
2023-11-13 13:00:37 +08:00 |
subjective
|
fix multiround (#1146)
|
2024-05-13 15:58:39 +08:00 |
summedits
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
summscreen
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
SuperGLUE_AX_b
|
[Feat] update postprocessor to get first option more accurately (#193)
|
2023-08-11 17:33:00 +08:00 |
SuperGLUE_AX_g
|
[Feat] update postprocessor to get first option more accurately (#193)
|
2023-08-11 17:33:00 +08:00 |
SuperGLUE_BoolQ
|
[Feature] add llama-oriented dataset configs (#82)
|
2023-08-11 12:48:05 +08:00 |
SuperGLUE_CB
|
[Feat] update postprocessor to get first option more accurately (#193)
|
2023-08-11 17:33:00 +08:00 |
SuperGLUE_COPA
|
[Feat] update postprocessor to get first option more accurately (#193)
|
2023-08-11 17:33:00 +08:00 |
SuperGLUE_MultiRC
|
[Feat] update postprocessor to get first option more accurately (#193)
|
2023-08-11 17:33:00 +08:00 |
SuperGLUE_ReCoRD
|
[Feature] Add qwen & qwen-chat support (#286)
|
2023-08-31 11:29:05 +08:00 |
SuperGLUE_RTE
|
[Feat] update postprocessor to get first option more accurately (#193)
|
2023-08-11 17:33:00 +08:00 |
SuperGLUE_WiC
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
SuperGLUE_WSC
|
[Fix] Fix typo in WSC prompt (#520)
|
2023-10-30 12:16:26 +08:00 |
SVAMP
|
[Feature] Add SVAMP dataset (#604)
|
2023-11-22 14:54:39 +08:00 |
TabMWP
|
[fFeat] Add an opensource dataset Tabmwp (#505)
|
2023-11-03 11:15:46 +08:00 |
taco
|
[Sync] update taco (#1030)
|
2024-04-09 17:50:23 +08:00 |
teval
|
[Sync] Merge branch 'dev' into zfz/update-keyset-demo (#876)
|
2024-02-05 23:29:10 +08:00 |
TheoremQA
|
[Feature] Add TheoremQA with 5-shot (#1048)
|
2024-04-22 15:22:04 +08:00 |
triviaqa
|
[Sync] Sync Internal (#941)
|
2024-03-04 14:42:36 +08:00 |
triviaqarc
|
Align prompt files with their hash (#1)
|
2023-07-05 18:28:58 +08:00 |
truthfulqa
|
[Feat] refine docs and codes for more user guides (#409)
|
2023-09-18 16:12:13 +08:00 |
tydiqa
|
update word spell (#594)
|
2023-11-15 15:23:58 +08:00 |
wikibench
|
[Feature] Add wikibench dataset (#655)
|
2023-12-01 14:56:54 +08:00 |
wikitext
|
[SIG] add WikiText-2&103 (#397)
|
2023-09-26 14:31:15 +08:00 |
winograd
|
Align prompt files with their hash (#1)
|
2023-07-05 18:28:58 +08:00 |
winogrande
|
[Sync] update 20240308 (#953)
|
2024-03-11 22:34:19 +08:00 |
XCOPA
|
Align prompt files with their hash (#1)
|
2023-07-05 18:28:58 +08:00 |
xiezhi
|
[Feature] Add Xiezhi SQuAD2.0 ANLI (#101)
|
2023-08-10 14:04:18 +08:00 |
XLSum
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
Xsum
|
Update configs (#9)
|
2023-07-06 12:27:41 +08:00 |
z_bench
|
[Feature] Add and apply update suffix tool (#280)
|
2023-08-28 17:35:04 +08:00 |