语言 | 知识 | 推理 | 考试 |
字词释义- WiC - SummEdits成语习语- CHID语义相似度- AFQMC - BUSTM指代消解- CLUEWSC - WSC - WinoGrande翻译- Flores - IWSLT2017多语种问答- TyDi-QA - XCOPA多语种总结- XLSum |
知识问答- BoolQ - CommonSenseQA - NaturalQuestions - TriviaQA |
文本蕴含- CMNLI - OCNLI - OCNLI_FC - AX-b - AX-g - CB - RTE - ANLI常识推理- StoryCloze - COPA - ReCoRD - HellaSwag - PIQA - SIQA数学推理- MATH - GSM8K定理应用- TheoremQA - StrategyQA - SciBench综合推理- BBH |
初中/高中/大学/职业考试- C-Eval - AGIEval - MMLU - GAOKAO-Bench - CMMLU - ARC - Xiezhi医学考试- CMB |
理解 | 长文本 | 安全 | 代码 |
阅读理解- C3 - CMRC - DRCD - MultiRC - RACE - DROP - OpenBookQA - SQuAD2.0内容总结- CSL - LCSTS - XSum - SummScreen内容分析- EPRSTMT - LAMBADA - TNEWS |
长文本理解- LEval - LongBench - GovReports - NarrativeQA - Qasper |
安全- CivilComments - CrowsPairs - CValues - JigsawMultilingual - TruthfulQA健壮性- AdvGLUE |
代码- HumanEval - HumanEvalX - MBPP - APPs - DS1000 |
开源模型 | API 模型 |
- [InternLM](https://github.com/InternLM/InternLM) - [LLaMA](https://github.com/facebookresearch/llama) - [LLaMA3](https://github.com/meta-llama/llama3) - [Vicuna](https://github.com/lm-sys/FastChat) - [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) - [Baichuan](https://github.com/baichuan-inc) - [WizardLM](https://github.com/nlpxucan/WizardLM) - [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) - [ChatGLM3](https://github.com/THUDM/ChatGLM3-6B) - [TigerBot](https://github.com/TigerResearch/TigerBot) - [Qwen](https://github.com/QwenLM/Qwen) - [BlueLM](https://github.com/vivo-ai-lab/BlueLM) - [Gemma](https://huggingface.co/google/gemma-7b) - …… | - OpenAI - Gemini - Claude - ZhipuAI(ChatGLM) - Baichuan - ByteDance(YunQue) - Huawei(PanGu) - 360 - Baidu(ERNIEBot) - MiniMax(ABAB-Chat) - SenseTime(nova) - Xunfei(Spark) - …… |
|
---|