diff --git a/README.md b/README.md index 887fcb4c..736968eb 100644 --- a/README.md +++ b/README.md @@ -279,263 +279,13 @@ OpenCompass is a one-stop platform for large model evaluation, aiming to provide ## 📖 Dataset Support -
- Language - | -- Knowledge - | -- Reasoning - | -- Examination - | -
-
-
+Please refer to the dataset statistics chapter of [official document](https://opencompass.org.cn/doc) for details.
-Word Definition+We have supported a statistical list of all datasets that can be used on this platform in the documentation on the OpenCompass website. -- WiC -- SummEdits +You can quickly find the dataset you need from the list through sorting, filtering, and searching functions. -
-
-
-Idiom Learning- -- CHID - -
-
-
-Semantic Similarity- -- AFQMC -- BUSTM - -
-
-
-Coreference Resolution- -- CLUEWSC -- WSC -- WinoGrande - -
-
-
-Translation- -- Flores -- IWSLT2017 - -
-
-
-Multi-language Question Answering- -- TyDi-QA -- XCOPA - -
-
- Multi-language Summary- -- XLSum - - |
-
-
-
- Knowledge Question Answering- -- BoolQ -- CommonSenseQA -- NaturalQuestions -- TriviaQA - - |
-
-
-
-
-Textual Entailment- -- CMNLI -- OCNLI -- OCNLI_FC -- AX-b -- AX-g -- CB -- RTE -- ANLI - -
-
-
-Commonsense Reasoning- -- StoryCloze -- COPA -- ReCoRD -- HellaSwag -- PIQA -- SIQA - -
-
-
-Mathematical Reasoning- -- MATH -- GSM8K - -
-
-
-Theorem Application- -- TheoremQA -- StrategyQA -- SciBench - -
-
- Comprehensive Reasoning- -- BBH - - |
-
-
-
-
-Junior High, High School, University, Professional Examinations- -- C-Eval -- AGIEval -- MMLU -- GAOKAO-Bench -- CMMLU -- ARC -- Xiezhi - -
-
- Medical Examinations- -- CMB - - |
-
- Understanding - | -- Long Context - | -- Safety - | -- Code - | -
-
-
-
-Reading Comprehension- -- C3 -- CMRC -- DRCD -- MultiRC -- RACE -- DROP -- OpenBookQA -- SQuAD2.0 - -
-
-
-Content Summary- -- CSL -- LCSTS -- XSum -- SummScreen - -
-
- Content Analysis- -- EPRSTMT -- LAMBADA -- TNEWS - - |
-
-
-
- Long Context Understanding- -- LEval -- LongBench -- GovReports -- NarrativeQA -- Qasper - - |
-
-
-
-Safety- -- CivilComments -- CrowsPairs -- CValues -- JigsawMultilingual -- TruthfulQA - -
-
- Robustness- -- AdvGLUE - - |
-
-
-
- Code- -- HumanEval -- HumanEvalX -- MBPP -- APPs -- DS1000 - - |
-
- 语言 - | -- 知识 - | -- 推理 - | -- 考试 - | -
-
-
-
-字词释义+我们已经在OpenCompass官网的文档中支持了所有可在本平台上使用的数据集的统计列表。 -- WiC -- SummEdits +您可以通过排序、筛选和搜索等功能从列表中快速找到您需要的数据集。 -
-
-
-成语习语- -- CHID - -
-
-
-语义相似度- -- AFQMC -- BUSTM - -
-
-
-指代消解- -- CLUEWSC -- WSC -- WinoGrande - -
-
-
-翻译- -- Flores -- IWSLT2017 - -
-
-
-多语种问答- -- TyDi-QA -- XCOPA - -
-
- 多语种总结- -- XLSum - - |
-
-
-
- 知识问答- -- BoolQ -- CommonSenseQA -- NaturalQuestions -- TriviaQA - - |
-
-
-
-
-文本蕴含- -- CMNLI -- OCNLI -- OCNLI_FC -- AX-b -- AX-g -- CB -- RTE -- ANLI - -
-
-
-常识推理- -- StoryCloze -- COPA -- ReCoRD -- HellaSwag -- PIQA -- SIQA - -
-
-
-数学推理- -- MATH -- GSM8K - -
-
-
-定理应用- -- TheoremQA -- StrategyQA -- SciBench - -
-
- 综合推理- -- BBH - - |
-
-
-
-
-初中/高中/大学/职业考试- -- C-Eval -- AGIEval -- MMLU -- GAOKAO-Bench -- CMMLU -- ARC -- Xiezhi - -
-
- 医学考试- -- CMB - - |
-
- 理解 - | -- 长文本 - | -- 安全 - | -- 代码 - | -
-
-
-
-阅读理解- -- C3 -- CMRC -- DRCD -- MultiRC -- RACE -- DROP -- OpenBookQA -- SQuAD2.0 - -
-
-
-内容总结- -- CSL -- LCSTS -- XSum -- SummScreen - -
-
- 内容分析- -- EPRSTMT -- LAMBADA -- TNEWS - - |
-
-
-
- 长文本理解- -- LEval -- LongBench -- GovReports -- NarrativeQA -- Qasper - - |
-
-
-
-安全- -- CivilComments -- CrowsPairs -- CValues -- JigsawMultilingual -- TruthfulQA - -
-
- 健壮性- -- AdvGLUE - - |
-
-
-
- 代码- -- HumanEval -- HumanEvalX -- MBPP -- APPs -- DS1000 - - |
-