mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00
[Fix] Fix NeedleBench Summarizer Typo (#1125)
* update needleinahaystack eval docs * update needlebench summarizer * fix english docs typo
This commit is contained in:
parent
826d8307ac
commit
cb080fa7de
@ -107,14 +107,14 @@ def create_summarizer(context_lengths, depths, dataset_size,
|
||||
'Multi-Needle-Reasoning(M-RS)',
|
||||
'Multi-Needle-Reasoning-EN',
|
||||
'Multi-Needle-Reasoning-ZH',
|
||||
'2-Needle-EN-4K',
|
||||
'2-Needle-ZH-4K',
|
||||
'3-Needle-EN-4K',
|
||||
'3-Needle-ZH-4K',
|
||||
'4-Needle-EN-4K',
|
||||
'4-Needle-ZH-4K',
|
||||
'5-Needle-EN-4K',
|
||||
'5-Needle-ZH-4K',
|
||||
f'2-Needle-EN-{dataset_size.upper()}',
|
||||
f'2-Needle-ZH-{dataset_size.upper()}',
|
||||
f'3-Needle-EN-{dataset_size.upper()}',
|
||||
f'3-Needle-ZH-{dataset_size.upper()}',
|
||||
f'4-Needle-EN-{dataset_size.upper()}',
|
||||
f'4-Needle-ZH-{dataset_size.upper()}',
|
||||
f'5-Needle-EN-{dataset_size.upper()}',
|
||||
f'5-Needle-ZH-{dataset_size.upper()}',
|
||||
]
|
||||
}
|
||||
return summarizer_config
|
||||
|
@ -60,28 +60,42 @@ pip install -e .
|
||||
|
||||
We have pre-configured datasets for common text lengths (4k, 8k, 32k, 128k, 200k, 1000k) in `configs/datasets/needlebench`, allowing you to flexibly create datasets that meet your needs by defining related parameters in the configuration files.
|
||||
|
||||
### Example of Evaluation
|
||||
### Evaluation Example
|
||||
|
||||
#### Evaluating using the `InternLM2-7B` model deployed with `LMDeploy`
|
||||
#### Evaluating `InternLM2-7B` Model Deployed Using `LMDeploy`
|
||||
|
||||
For instance, to evaluate all tasks in NeedleBench-4K using the `InternLM2-7B` model deployed with `LMDeploy`, use the following command line command that calls the predefined model and dataset configuration files, without needing to write additional configuration files:
|
||||
For example, to evaluate the `InternLM2-7B` model deployed using `LMDeploy` for all tasks in NeedleBench-4K, you can directly use the following command in the command line. This command calls the pre-defined model and dataset configuration files without needing to write additional configuration files:
|
||||
|
||||
##### Local Evaluation
|
||||
|
||||
If you are evaluating the model locally, the command below will utilize all available GPUs on your machine. You can limit the GPU access for `OpenCompass` by setting the `CUDA_VISIBLE_DEVICES` environment variable. For instance, using `CUDA_VISIBLE_DEVICES=0,1,2,3 python run.py ...` will only expose the first four GPUs to OpenCompass, ensuring that it does not use more than these four GPUs.
|
||||
|
||||
```bash
|
||||
python run.py --dataset needlebench_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
|
||||
# Local Evaluation
|
||||
python run.py --dataset needlebench_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer
|
||||
```
|
||||
|
||||
If you only want to test the original Needle In A Haystack task setup, you can change the dataset parameter to `needlebench_single_4k`, such as:
|
||||
##### Evaluation on a Slurm Cluster
|
||||
|
||||
If using `Slurm`, you can add parameters such as `--slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000`, as shown below:
|
||||
|
||||
```bash
|
||||
python run.py --dataset needlebench_single_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --sl
|
||||
|
||||
urm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
|
||||
# Slurm Evaluation
|
||||
python run.py --dataset needlebench_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
|
||||
```
|
||||
|
||||
You can also choose sub-datasets, such as changing the `--datasets` parameter to `needlebench_single_4k/needlebench_zh_datasets` for only testing the Chinese version of the single needle task, where the parameter after `/` represents the sub-dataset. You can find the optional sub-dataset variables in the `configs/datasets/needlebench/needlebench_4k/needlebench_single_4k.py`, such as:
|
||||
##### Evaluating a Subdataset Only
|
||||
|
||||
If you only want to test the original NeedleInAHaystack task setup, you could change the dataset parameter to `needlebench_single_4k`, which corresponds to the single needle version of the NeedleInAHaystack test at 4k length:
|
||||
|
||||
```bash
|
||||
python run.py --dataset needlebench_single_4k/needlebench_zh_datasets --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
|
||||
python run.py --dataset needlebench_single_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
|
||||
```
|
||||
|
||||
You can also choose to evaluate a specific subdataset, such as changing the `--datasets` parameter to `needlebench_single_4k/needlebench_zh_datasets` for testing just the Chinese version of the single needle 4K length NeedleInAHaystack task. The parameter after `/` represents the subdataset, which can be found in the dataset variable of `configs/datasets/needlebench/needlebench_4k/needlebench_single_4k.py` :
|
||||
|
||||
```bash
|
||||
python run.py --dataset needlebench_single_4k/needlebench_zh_datasets --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
|
||||
```
|
||||
|
||||
Be sure to install the [LMDeploy](https://github.com/InternLM/lmdeploy) tool before starting the evaluation:
|
||||
|
@ -66,17 +66,33 @@ pip install -e .
|
||||
|
||||
例如,使用`LMDeploy`部署的 `InternLM2-7B` 模型进行评估NeedleBench-4K的所有任务,可以在命令行中直接使用以下命令,该命令会调用预定义好的模型、数据集配置文件,而无需额外书写配置文件:
|
||||
|
||||
##### 本地评估
|
||||
|
||||
如果您在本地评估模型,下面命令会调用机器的所有可用GPU。您可以通过设置 `CUDA_VISIBLE_DEVICES` 环境变量来限制 `OpenCompass` 的 GPU 访问。例如,使用 `CUDA_VISIBLE_DEVICES=0,1,2,3 python run.py ...` 只会向 OpenCompass 暴露前四个 GPU,确保它同时使用的 GPU 数量不超过这四个。
|
||||
|
||||
```bash
|
||||
# 本地评估
|
||||
python run.py --dataset needlebench_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer
|
||||
```
|
||||
|
||||
##### 在Slurm集群上评估
|
||||
|
||||
如果使用 `Slurm`,可以添加 `--slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000`等参数,例如下面:
|
||||
|
||||
```bash
|
||||
# Slurm评估
|
||||
python run.py --dataset needlebench_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
|
||||
```
|
||||
|
||||
如果只想测试原始的大海捞针任务设定,可以更换数据集的参数为`needlebench_single_4k`,如:
|
||||
##### 只评估子数据集
|
||||
|
||||
如果只想测试原始的大海捞针任务设定,比如可以更换数据集的参数为`needlebench_single_4k`,这对应于4k长度下的单针版本的大海捞针测试:
|
||||
|
||||
```bash
|
||||
python run.py --dataset needlebench_single_4k --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
|
||||
```
|
||||
|
||||
您也可以进一步选择子数据集,如更换数据集`--datasets`的参数为`needlebench_single_4k/needlebench_zh_datasets`,仅仅进行中文版本的单针大海捞针任务测试,其中`/`后面的参数代表子数据集,您可以在`configs/datasets/needlebench/needlebench_4k/needlebench_single_4k.py`中找到可选的子数据集变量,如:
|
||||
您也可以进一步选择子数据集,如更换数据集`--datasets`的参数为`needlebench_single_4k/needlebench_zh_datasets`,仅仅进行中文版本的单针4K长度下的大海捞针任务测试,其中`/`后面的参数代表子数据集,您可以在`configs/datasets/needlebench/needlebench_4k/needlebench_single_4k.py`中找到可选的子数据集变量,如:
|
||||
|
||||
```bash
|
||||
python run.py --dataset needlebench_single_4k/needlebench_zh_datasets --models lmdeploy_internlm2_chat_7b --summarizer needlebench/needlebench_4k_summarizer --slurm -p partition_name -q reserved --max-num-workers 32 --max-partition-size 8000
|
||||
|
Loading…
Reference in New Issue
Block a user