[Fix] Fix NeedleBench Summarizer Typo (#1125)

* update needleinahaystack eval docs * update needlebench summarizer * fix english docs typo
2025-05-30 16:03:24 +08:00 · 2024-05-08 20:00:15 +08:00 · 2024-05-08 20:00:15 +08:00 · cb080fa7de
commit cb080fa7de
parent 826d8307ac
3 changed files with 50 additions and 20 deletions
--- a/configs/summarizers/needlebench.py
+++ b/configs/summarizers/needlebench.py
@ -107,14 +107,14 @@ def create_summarizer(context_lengths, depths, dataset_size,
            'Multi-Needle-Reasoning(M-RS)',
            'Multi-Needle-Reasoning-EN',
            'Multi-Needle-Reasoning-ZH',
-            '2-Needle-EN-4K',
-            '2-Needle-ZH-4K',
-            '3-Needle-EN-4K',
-            '3-Needle-ZH-4K',
-            '4-Needle-EN-4K',
-            '4-Needle-ZH-4K',
-            '5-Needle-EN-4K',
-            '5-Needle-ZH-4K',
+            f'2-Needle-EN-{dataset_size.upper()}',
+            f'2-Needle-ZH-{dataset_size.upper()}',
+            f'3-Needle-EN-{dataset_size.upper()}',
+            f'3-Needle-ZH-{dataset_size.upper()}',
+            f'4-Needle-EN-{dataset_size.upper()}',
+            f'4-Needle-ZH-{dataset_size.upper()}',
+            f'5-Needle-EN-{dataset_size.upper()}',
+            f'5-Needle-ZH-{dataset_size.upper()}',
            ]
        }
    return summarizer_config
--- a/docs/en/advanced_guides/needleinahaystack_eval.md
+++ b/docs/en/advanced_guides/needleinahaystack_eval.md
@ -60,28 +60,42 @@ pip install -e .

 We have pre-configured datasets for common text lengths (4k, 8k, 32k, 128k, 200k, 1000k) in `configs/datasets/needlebench`, allowing you to flexibly create datasets that meet your needs by defining related parameters in the configuration files.

-### Example of Evaluation
+### Evaluation Example

-#### Evaluating using the `InternLM2-7B` model deployed with `LMDeploy`
+#### Evaluating `InternLM2-7B` Model Deployed Using `LMDeploy`

-For instance, to evaluate all tasks in NeedleBench-4K using the `InternLM2-7B` model deployed with `LMDeploy`, use the following command line command that calls the predefined model and dataset configuration files, without needing to write additional configuration files:
+For example, to evaluate the `InternLM2-7B` model deployed using `LMDeploy` for all tasks in NeedleBench-4K, you can directly use the following command in the command line. This command calls the pre-defined model and dataset configuration files without needing to write additional configuration files:
+
+##### Local Evaluation
+
+If you are evaluating the model locally, the command below will utilize all available GPUs on your machine. You can limit the GPU access for `OpenCompass` by setting the `CUDA_VISIBLE_DEVICES` environment variable. For instance, using `CUDA_VISIBLE_DEVICES=0,1,2,3 python run.py ...` will only expose the first four GPUs to OpenCompass, ensuring that it does not use more than these four GPUs.

 ```bash
-python run.py --dataset needlebench_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
+# Local Evaluation
+python run.py --dataset needlebench_4k --models lmdeploy_internlm2_chat_7b  --summarizer needlebench/needlebench_4k_summarizer
 ```

-If you only want to test the original Needle In A Haystack task setup, you can change the dataset parameter to `needlebench_single_4k`, such as:
+##### Evaluation on a Slurm Cluster
+
+If using `Slurm`, you can add parameters such as `--slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000`, as shown below:

 ```bash
-python run.py --dataset needlebench_single_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --sl
-
-urm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
+# Slurm Evaluation
+python run.py --dataset needlebench_4k --models lmdeploy_internlm2_chat_7b  --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
 ```

-You can also choose sub-datasets, such as changing the `--datasets` parameter to `needlebench_single_4k/needlebench_zh_datasets` for only testing the Chinese version of the single needle task, where the parameter after `/` represents the sub-dataset. You can find the optional sub-dataset variables in the `configs/datasets/needlebench/needlebench_4k/needlebench_single_4k.py`, such as:
+##### Evaluating a Subdataset Only
+
+If you only want to test the original NeedleInAHaystack task setup, you could change the dataset parameter to `needlebench_single_4k`, which corresponds to the single needle version of the NeedleInAHaystack test at 4k length:

 ```bash
-python run.py --dataset needlebench_single_4k/needlebench_zh_datasets --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
+python run.py --dataset needlebench_single_4k --models lmdeploy_internlm2_chat_7b  --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
+```
+
+You can also choose to evaluate a specific subdataset, such as changing the `--datasets` parameter to `needlebench_single_4k/needlebench_zh_datasets` for testing just the Chinese version of the single needle 4K length NeedleInAHaystack task. The parameter after `/` represents the subdataset, which can be found in the dataset variable of `configs/datasets/needlebench/needlebench_4k/needlebench_single_4k.py` :
+
+```bash
+python run.py --dataset needlebench_single_4k/needlebench_zh_datasets --models lmdeploy_internlm2_chat_7b  --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
 ```

 Be sure to install the [LMDeploy](https://github.com/InternLM/lmdeploy) tool before starting the evaluation:
--- a/docs/zh_cn/advanced_guides/needleinahaystack_eval.md
+++ b/docs/zh_cn/advanced_guides/needleinahaystack_eval.md
@ -66,17 +66,33 @@ pip install -e .

 例如，使用`LMDeploy`部署的 `InternLM2-7B` 模型进行评估NeedleBench-4K的所有任务，可以在命令行中直接使用以下命令，该命令会调用预定义好的模型、数据集配置文件，而无需额外书写配置文件：

+##### 本地评估
+
+如果您在本地评估模型，下面命令会调用机器的所有可用GPU。您可以通过设置 `CUDA_VISIBLE_DEVICES` 环境变量来限制 `OpenCompass` 的 GPU 访问。例如，使用 `CUDA_VISIBLE_DEVICES=0,1,2,3 python run.py ...` 只会向 OpenCompass 暴露前四个 GPU，确保它同时使用的 GPU 数量不超过这四个。
+
 ```bash
+# 本地评估
+python run.py --dataset needlebench_4k --models lmdeploy_internlm2_chat_7b  --summarizer needlebench/needlebench_4k_summarizer
+```
+
+##### 在Slurm集群上评估
+
+如果使用 `Slurm`，可以添加 `--slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000`等参数，例如下面：
+
+```bash
+# Slurm评估
 python run.py --dataset needlebench_4k --models lmdeploy_internlm2_chat_7b  --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
 ```

-如果只想测试原始的大海捞针任务设定，可以更换数据集的参数为`needlebench_single_4k`，如：
+##### 只评估子数据集
+
+如果只想测试原始的大海捞针任务设定，比如可以更换数据集的参数为`needlebench_single_4k`，这对应于4k长度下的单针版本的大海捞针测试：

 ```bash
 python run.py --dataset needlebench_single_4k --models lmdeploy_internlm2_chat_7b  --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
 ```

-您也可以进一步选择子数据集，如更换数据集`--datasets`的参数为`needlebench_single_4k/needlebench_zh_datasets`，仅仅进行中文版本的单针大海捞针任务测试，其中`/`后面的参数代表子数据集，您可以在`configs/datasets/needlebench/needlebench_4k/needlebench_single_4k.py`中找到可选的子数据集变量，如：
+您也可以进一步选择子数据集，如更换数据集`--datasets`的参数为`needlebench_single_4k/needlebench_zh_datasets`，仅仅进行中文版本的单针4K长度下的大海捞针任务测试，其中`/`后面的参数代表子数据集，您可以在`configs/datasets/needlebench/needlebench_4k/needlebench_single_4k.py`中找到可选的子数据集变量，如：

 ```bash
 python run.py --dataset needlebench_single_4k/needlebench_zh_datasets --models lmdeploy_internlm2_chat_7b  --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000