[Fix] Fix NeedleBench Summarizer Typo (#1125)

* update needleinahaystack eval docs

* update needlebench summarizer

* fix english docs typo
This commit is contained in:
Mo Li 2024-05-08 20:00:15 +08:00 committed by GitHub
parent 826d8307ac
commit cb080fa7de
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 50 additions and 20 deletions

View File

@ -107,14 +107,14 @@ def create_summarizer(context_lengths, depths, dataset_size,
'Multi-Needle-Reasoning(M-RS)',
'Multi-Needle-Reasoning-EN',
'Multi-Needle-Reasoning-ZH',
'2-Needle-EN-4K',
'2-Needle-ZH-4K',
'3-Needle-EN-4K',
'3-Needle-ZH-4K',
'4-Needle-EN-4K',
'4-Needle-ZH-4K',
'5-Needle-EN-4K',
'5-Needle-ZH-4K',
f'2-Needle-EN-{dataset_size.upper()}',
f'2-Needle-ZH-{dataset_size.upper()}',
f'3-Needle-EN-{dataset_size.upper()}',
f'3-Needle-ZH-{dataset_size.upper()}',
f'4-Needle-EN-{dataset_size.upper()}',
f'4-Needle-ZH-{dataset_size.upper()}',
f'5-Needle-EN-{dataset_size.upper()}',
f'5-Needle-ZH-{dataset_size.upper()}',
]
}
return summarizer_config

View File

@ -60,28 +60,42 @@ pip install -e .
We have pre-configured datasets for common text lengths (4k, 8k, 32k, 128k, 200k, 1000k) in `configs/datasets/needlebench`, allowing you to flexibly create datasets that meet your needs by defining related parameters in the configuration files.
### Example of Evaluation
### Evaluation Example
#### Evaluating using the `InternLM2-7B` model deployed with `LMDeploy`
#### Evaluating `InternLM2-7B` Model Deployed Using `LMDeploy`
For instance, to evaluate all tasks in NeedleBench-4K using the `InternLM2-7B` model deployed with `LMDeploy`, use the following command line command that calls the predefined model and dataset configuration files, without needing to write additional configuration files:
For example, to evaluate the `InternLM2-7B` model deployed using `LMDeploy` for all tasks in NeedleBench-4K, you can directly use the following command in the command line. This command calls the pre-defined model and dataset configuration files without needing to write additional configuration files:
##### Local Evaluation
If you are evaluating the model locally, the command below will utilize all available GPUs on your machine. You can limit the GPU access for `OpenCompass` by setting the `CUDA_VISIBLE_DEVICES` environment variable. For instance, using `CUDA_VISIBLE_DEVICES=0,1,2,3 python run.py ...` will only expose the first four GPUs to OpenCompass, ensuring that it does not use more than these four GPUs.
```bash
python run.py --dataset needlebench_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
# Local Evaluation
python run.py --dataset needlebench_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer
```
If you only want to test the original Needle In A Haystack task setup, you can change the dataset parameter to `needlebench_single_4k`, such as:
##### Evaluation on a Slurm Cluster
If using `Slurm`, you can add parameters such as `--slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000`, as shown below:
```bash
python run.py --dataset needlebench_single_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --sl
urm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
# Slurm Evaluation
python run.py --dataset needlebench_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
```
You can also choose sub-datasets, such as changing the `--datasets` parameter to `needlebench_single_4k/needlebench_zh_datasets` for only testing the Chinese version of the single needle task, where the parameter after `/` represents the sub-dataset. You can find the optional sub-dataset variables in the `configs/datasets/needlebench/needlebench_4k/needlebench_single_4k.py`, such as:
##### Evaluating a Subdataset Only
If you only want to test the original NeedleInAHaystack task setup, you could change the dataset parameter to `needlebench_single_4k`, which corresponds to the single needle version of the NeedleInAHaystack test at 4k length:
```bash
python run.py --dataset needlebench_single_4k/needlebench_zh_datasets --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
python run.py --dataset needlebench_single_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
```
You can also choose to evaluate a specific subdataset, such as changing the `--datasets` parameter to `needlebench_single_4k/needlebench_zh_datasets` for testing just the Chinese version of the single needle 4K length NeedleInAHaystack task. The parameter after `/` represents the subdataset, which can be found in the dataset variable of `configs/datasets/needlebench/needlebench_4k/needlebench_single_4k.py` :
```bash
python run.py --dataset needlebench_single_4k/needlebench_zh_datasets --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
```
Be sure to install the [LMDeploy](https://github.com/InternLM/lmdeploy) tool before starting the evaluation:

View File

@ -66,17 +66,33 @@ pip install -e .
例如,使用`LMDeploy`部署的 `InternLM2-7B` 模型进行评估NeedleBench-4K的所有任务可以在命令行中直接使用以下命令该命令会调用预定义好的模型、数据集配置文件而无需额外书写配置文件
##### 本地评估
如果您在本地评估模型下面命令会调用机器的所有可用GPU。您可以通过设置 `CUDA_VISIBLE_DEVICES` 环境变量来限制 `OpenCompass` 的 GPU 访问。例如,使用 `CUDA_VISIBLE_DEVICES=0,1,2,3 python run.py ...` 只会向 OpenCompass 暴露前四个 GPU确保它同时使用的 GPU 数量不超过这四个。
```bash
# 本地评估
python run.py --dataset needlebench_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer
```
##### 在Slurm集群上评估
如果使用 `Slurm`,可以添加 `--slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000`等参数,例如下面:
```bash
# Slurm评估
python run.py --dataset needlebench_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
```
如果只想测试原始的大海捞针任务设定,可以更换数据集的参数为`needlebench_single_4k`,如:
##### 只评估子数据集
如果只想测试原始的大海捞针任务设定,比如可以更换数据集的参数为`needlebench_single_4k`这对应于4k长度下的单针版本的大海捞针测试
```bash
python run.py --dataset needlebench_single_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
```
您也可以进一步选择子数据集,如更换数据集`--datasets`的参数为`needlebench_single_4k/needlebench_zh_datasets`,仅仅进行中文版本的单针大海捞针任务测试,其中`/`后面的参数代表子数据集,您可以在`configs/datasets/needlebench/needlebench_4k/needlebench_single_4k.py`中找到可选的子数据集变量,如:
您也可以进一步选择子数据集,如更换数据集`--datasets`的参数为`needlebench_single_4k/needlebench_zh_datasets`,仅仅进行中文版本的单针4K长度下的大海捞针任务测试,其中`/`后面的参数代表子数据集,您可以在`configs/datasets/needlebench/needlebench_4k/needlebench_single_4k.py`中找到可选的子数据集变量,如:
```bash
python run.py --dataset needlebench_single_4k/needlebench_zh_datasets --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000