From 30a988a6203bc726ad1591bfc054bb1872b0d928 Mon Sep 17 00:00:00 2001
From: Tong Gao <gaotongxiao@gmail.com>
Date: Thu, 6 Jul 2023 15:47:09 +0800
Subject: [PATCH] [Docs] Update dataset docs (#19)

* [Docs] Update dataset docs

* [Docs] Update dataset docs
---
 README.md                                 |  6 +++---
 README_zh-CN.md                           |  3 ++-
 docs/en/get_started.md                    |  2 +-
 docs/en/index.rst                         |  4 ++--
 docs/en/user_guides/dataset_prepare.md    | 12 +++++++++---
 docs/zh_cn/get_started.md                 |  6 +++---
 docs/zh_cn/user_guides/dataset_prepare.md | 10 ++++++++--
 7 files changed, 28 insertions(+), 15 deletions(-)
diff --git a/README.md b/README.md
index 166b9484..a39c083a 100644
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@ English | [简体中文](README_zh-CN.md)
 
 </div>
 
-Welcome to **OpenCompass**! 
+Welcome to **OpenCompass**!
 
 Just like a compass guides us on our journey, OpenCompass will guide you through the complex landscape of evaluating large language models. With its powerful algorithms and intuitive interface, OpenCompass makes it easy to assess the quality and effectiveness of your NLP models.
 
@@ -37,7 +37,6 @@ OpenCompass is a one-stop platform for large model evaluation, aiming to provide
 
 We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for community to rank all public models and API models. If you would like to join the evaluation, please provide the model repository URL or a standard API interface to the email address `opencompass@pjlab.org.cn`.
 
-
 [![image](https://github.com/InternLM/OpenCompass/assets/7881589/475b0c8e-28b8-43e9-b2fd-4dd558e22491)](https://opencompass.org.cn/rank)
 
 ## Dataset Support
@@ -289,7 +288,8 @@ git clone https://github.com/InternLM/opencompass opencompass
 cd opencompass
 pip install -e .
 # Download dataset to data/ folder
-# TODO: ....
+wget https://github.com/InternLM/opencompass/releases/download/0.1.0/OpenCompassData.zip
+unzip OpenCompassData.zip
 ```
 
 ## Evaluation
diff --git a/README_zh-CN.md b/README_zh-CN.md
index 21371e5a..5e81db6f 100644
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -290,7 +290,8 @@ git clone https://github.com/InternLM/opencompass opencompass
 cd opencompass
 pip install -e .
 # 下载数据集到 data/ 处
-# TODO: ....
+wget https://github.com/InternLM/opencompass/releases/download/0.1.0/OpenCompassData.zip
+unzip OpenCompassData.zip
 ```
 
 ## 评测
diff --git a/docs/en/get_started.md b/docs/en/get_started.md
index e773ff31..23c84777 100644
--- a/docs/en/get_started.md
+++ b/docs/en/get_started.md
@@ -60,7 +60,7 @@ Here's a detailed step-by-step explanation of this case study:
 <details>
 <summary>prepare datasets</summary>
 
-The SiQA and PiQA benchmarks can be automatically downloaded through their respective links here and here, so no manual downloading is required here. However, some other datasets may require manual downloads. Please refer to the documentation [Prepare Datasets](docs/zh_cn/user_guides/dataset_prepare.md) for more information.
+The SiQA and PiQA benchmarks can be automatically downloaded through their respective links here and here, so no manual downloading is required here. However, some other datasets may require manual downloads. Please refer to the documentation [Prepare Datasets](./user_guides/dataset_prepare.md) for more information.
 
 Create a '.py' configuration file and add the following content:
 
diff --git a/docs/en/index.rst b/docs/en/index.rst
index 0b46b751..6056926c 100644
--- a/docs/en/index.rst
+++ b/docs/en/index.rst
@@ -29,7 +29,7 @@ We always welcome *PRs* and *Issues* for the betterment of MMPretrain.
 .. _UserGuides:
 .. toctree::
    :maxdepth: 1
-   :caption: UserGuides
+   :caption: User Guides
 
    user_guides/config.md
    user_guides/dataset_prepare.md
@@ -40,7 +40,7 @@ We always welcome *PRs* and *Issues* for the betterment of MMPretrain.
 .. _AdvancedGuides:
 .. toctree::
    :maxdepth: 1
-   :caption: AdvancedGuides
+   :caption: Advanced Guides
 
    advanced_guides/new_dataset.md
    advanced_guides/new_model.md
diff --git a/docs/en/user_guides/dataset_prepare.md b/docs/en/user_guides/dataset_prepare.md
index faca61b6..58357882 100644
--- a/docs/en/user_guides/dataset_prepare.md
+++ b/docs/en/user_guides/dataset_prepare.md
@@ -39,11 +39,17 @@ The datasets supported by OpenCompass mainly include two parts:
 
 [Huggingface Dataset](https://huggingface.co/datasets) provides a large number of datasets. OpenCompass has supported most of the datasets commonly used for performance comparison, please refer to `configs/dataset` for the specific list of supported datasets.
 
-2. OpenCompass Self-built Datasets
+2. Third-party Datasets
 
-In addition to supporting Huggingface's existing datasets, OpenCompass also provides some self-built CN datasets. In the future, a dataset-related link will be provided for users to download and use. Following the instructions in the document to place the datasets uniformly in the `./data` directory can complete dataset preparation.
+In addition to supporting Huggingface's existing datasets, OpenCompass also provides some third-party and self-built datasets. Run the following commands to download and place the datasets in the `./data` directory can complete dataset preparation.
 
-It is important to note that the Repo not only contains self-built datasets, but also includes some HF-supported datasets for testing convenience.
+```bash
+# Run in the OpenCompass directory
+wget https://github.com/InternLM/opencompass/releases/download/0.1.0/OpenCompassData.zip
+unzip OpenCompassData.zip
+```
+
+Note that the Repo not only contains self-built datasets, but also includes some HF-supported datasets for testing convenience.
 
 ## Dataset Selection
 
diff --git a/docs/zh_cn/get_started.md b/docs/zh_cn/get_started.md
index 046dfc3c..183f3eeb 100644
--- a/docs/zh_cn/get_started.md
+++ b/docs/zh_cn/get_started.md
@@ -55,7 +55,7 @@ python run.py configs/eval_llama_7b.py --debug
 <details>
 <summary>准备数据集及其配置</summary>
 
-因为 [siqa](https://huggingface.co/datasets/siqa)， [piqa](https://huggingface.co/datasets/piqa) 支持自动下载，所以这里不需要手动下载数据集，但有部分数据集可能需要手动下载，详细查看文档 [准备数据集](docs/zh_cn/user_guides/dataset_prepare.md).
+因为 [siqa](https://huggingface.co/datasets/siqa)， [piqa](https://huggingface.co/datasets/piqa) 支持自动下载，所以这里不需要手动下载数据集，但有部分数据集可能需要手动下载，详细查看文档 [准备数据集](./user_guides/dataset_prepare.md).
 
 创建一个 '.py' 配置文件， 添加以下内容：
 
@@ -66,7 +66,7 @@ with read_base():
     # 直接从预设数据集配置中读取需要的数据集配置
     from .datasets.piqa.piqa_ppl import piqa_datasets
     from .datasets.siqa.siqa_gen import siqa_datasets
-                                          
+
 datasets = [*piqa_datasets, *siqa_datasets]          # 最后 config 需要包含所需的评测数据集列表 datasets
 ```
 
@@ -97,7 +97,7 @@ llama_7b = dict(
         batch_size=16,              # 批次大小
         run_cfg=dict(num_gpus=1),   # 运行配置，用于指定资源需求
     )
- 
+
 models = [llama_7b]                                     # 最后 config 需要包含所需的模型列表 models
 ```
 
diff --git a/docs/zh_cn/user_guides/dataset_prepare.md b/docs/zh_cn/user_guides/dataset_prepare.md
index 989fdb78..9c4a8eaf 100644
--- a/docs/zh_cn/user_guides/dataset_prepare.md
+++ b/docs/zh_cn/user_guides/dataset_prepare.md
@@ -39,9 +39,15 @@ OpenCompass 支持的数据集主要包括两个部分：
 
 [Huggingface Dataset](https://huggingface.co/datasets) 提供了大量的数据集。OpenCompass 已经支持了大多数常用于性能比较的数据集，具体支持的数据集列表请直接在 `configs/dataset` 下进行查找。
 
-2. OpenCompass 自建数据集
+2. 第三方数据集
 
-除了支持 Huggingface 已有的数据集， OpenCompass 还提供了一些自建CN数据集，未来将会提供一个数据集相关的链接供用户下载使用。按照文档指示将数据集统一放置在`./data`目录下即可完成数据集准备。
+除了支持 Huggingface 已有的数据集， OpenCompass 还提供了一些第三方数据集及自建CN数据集。运行以下命令，将数据集统一下载并放置在`./data`目录下即可完成数据集准备。
+
+```bash
+# 在 OpenCompass 目录下运行
+wget https://github.com/InternLM/opencompass/releases/download/0.1.0/OpenCompassData.zip
+unzip OpenCompassData.zip
+```
 
 需要注意的是，Repo中不仅包含自建的数据集，为了方便也加入了部分HF已支持的数据集方便测试。