语言 | 知识 | 推理 | 考试 |
字词释义- WiC - SummEdits成语习语- CHID语义相似度- AFQMC - BUSTM指代消解- CLUEWSC - WSC - WinoGrande翻译- Flores - IWSLT2017多语种问答- TyDi-QA - XCOPA多语种总结- XLSum |
知识问答- BoolQ - CommonSenseQA - NaturalQuestions - TriviaQA |
文本蕴含- CMNLI - OCNLI - OCNLI_FC - AX-b - AX-g - CB - RTE - ANLI常识推理- StoryCloze - COPA - ReCoRD - HellaSwag - PIQA - SIQA数学推理- MATH - GSM8K定理应用- TheoremQA - StrategyQA - SciBench综合推理- BBH |
初中/高中/大学/职业考试- C-Eval - AGIEval - MMLU - GAOKAO-Bench - CMMLU - ARC - Xiezhi医学考试- CMB |
理解 | 长文本 | 安全 | 代码 |
阅读理解- C3 - CMRC - DRCD - MultiRC - RACE - DROP - OpenBookQA - SQuAD2.0内容总结- CSL - LCSTS - XSum - SummScreen内容分析- EPRSTMT - LAMBADA - TNEWS |
长文本理解- LEval - LongBench - GovReports - NarrativeQA - Qasper |
安全- CivilComments - CrowsPairs - CValues - JigsawMultilingual - TruthfulQA健壮性- AdvGLUE |
代码- HumanEval - HumanEvalX - MBPP - APPs - DS1000 |
开源模型 | API 模型 |
- InternLM - LLaMA - Vicuna - Alpaca - Baichuan - WizardLM - ChatGLM2 - Falcon - TigerBot - Qwen - …… | - OpenAI - Claude - PaLM (即将推出) - …… |