OpenCompass/docs/en/advanced_guides/evaluation_lmdeploy.md

# Evaluation with LMDeploy

We now support evaluation of models accelerated by the [LMDeploy](https://github.com/InternLM/lmdeploy). LMDeploy is a toolkit designed for compressing, deploying, and serving LLM. It has a remarkable inference performance. We now illustrate how to evaluate a model with the support of LMDeploy in OpenCompass.

## Setup

### Install OpenCompass

Please follow the [instructions](https://opencompass.readthedocs.io/en/latest/get_started/installation.html) to install the OpenCompass and prepare the evaluation datasets.

### Install LMDeploy

Install lmdeploy via pip (python 3.8+)

```shell
pip install lmdeploy
```

The default prebuilt package is compiled on CUDA 12. However, if CUDA 11+ is required, you can install lmdeploy by:

```shell
export LMDEPLOY_VERSION=0.6.0
export PYTHON_VERSION=310
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```

## Evaluation

When evaluating a model, it is necessary to prepare an evaluation configuration that specifies information such as the evaluation dataset, the model, and inference parameters.

Taking [internlm2-chat-7b](https://huggingface.co/internlm/internlm2-chat-7b) as an example, the evaluation config is as follows:

```python
# configure the dataset
from mmengine.config import read_base


with read_base():
    # choose a list of datasets
    from .datasets.mmlu.mmlu_gen_a484b3 import mmlu_datasets
    from .datasets.ceval.ceval_gen_5f30c7 import ceval_datasets
    from .datasets.triviaqa.triviaqa_gen_2121ce import triviaqa_datasets
    from opencompass.configs.datasets.gsm8k.gsm8k_0shot_v2_gen_a58960 import \
        gsm8k_datasets
    # and output the results in a chosen format
    from .summarizers.medium import summarizer

datasets = sum((v for k, v in locals().items() if k.endswith('_datasets')), [])

# configure lmdeploy
from opencompass.models import TurboMindModelwithChatTemplate


# configure the model
models = [
    dict(
        type=TurboMindModelwithChatTemplate,
        abbr=f'internlm2-chat-7b-lmdeploy',
        # model path, which can be the address of a model repository on the Hugging Face Hub or a local path
        path='internlm/internlm2-chat-7b',
        # inference backend of LMDeploy. It can be either 'turbomind' or 'pytorch'.
        # If the model is not supported by 'turbomind', it will fallback to
        # 'pytorch'
        backend='turbomind',
        # For the detailed engine config and generation config, please refer to
        # https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/messages.py
        engine_config=dict(tp=1),
        gen_config=dict(do_sample=False),
        # the max size of the context window
        max_seq_len=7168,
        # the max number of new tokens
        max_out_len=1024,
        # the max number of prompts that LMDeploy receives
        # in `generate` function
        batch_size=5000,
        run_cfg=dict(num_gpus=1),
    )
]
```

Place the aforementioned configuration in a file, such as "configs/eval_internlm2_lmdeploy.py". Then, in the home folder of OpenCompass, start evaluation by the following command:

```shell
python run.py configs/eval_internlm2_lmdeploy.py -w outputs
```

You are expected to get the evaluation results after the inference and evaluation.
[Feature] Integrate lmdeploy pipeline api (#1198) * integrate lmdeploy's pipeline api * fix linting * update user guide * rename * update * update * update * rollback class name * update * remove unused code * update * update * fix ci check * compatibility * remove concurrency * Update configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py * Update docs/zh_cn/advanced_guides/evaluation_lmdeploy.md * [Bug] fix lint --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> Co-authored-by: tonysy <sy.zhangbuaa@gmail.com> 2024-10-09 22:58:06 +08:00			`# Evaluation with LMDeploy`

			`We now support evaluation of models accelerated by the [LMDeploy](https://github.com/InternLM/lmdeploy). LMDeploy is a toolkit designed for compressing, deploying, and serving LLM. It has a remarkable inference performance. We now illustrate how to evaluate a model with the support of LMDeploy in OpenCompass.`

			`## Setup`

			`### Install OpenCompass`

			`Please follow the [instructions](https://opencompass.readthedocs.io/en/latest/get_started/installation.html) to install the OpenCompass and prepare the evaluation datasets.`

			`### Install LMDeploy`

			`Install lmdeploy via pip (python 3.8+)`

			```shell
			`pip install lmdeploy`
			```

			`The default prebuilt package is compiled on CUDA 12. However, if CUDA 11+ is required, you can install lmdeploy by:`

			```shell
			`export LMDEPLOY_VERSION=0.6.0`
			`export PYTHON_VERSION=310`
			`pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118`
			```

			`## Evaluation`

			`When evaluating a model, it is necessary to prepare an evaluation configuration that specifies information such as the evaluation dataset, the model, and inference parameters.`

			`Taking [internlm2-chat-7b](https://huggingface.co/internlm/internlm2-chat-7b) as an example, the evaluation config is as follows:`

			```python
			`# configure the dataset`
			`from mmengine.config import read_base`


			`with read_base():`
			`# choose a list of datasets`
			`from .datasets.mmlu.mmlu_gen_a484b3 import mmlu_datasets`
			`from .datasets.ceval.ceval_gen_5f30c7 import ceval_datasets`
			`from .datasets.triviaqa.triviaqa_gen_2121ce import triviaqa_datasets`
			`from opencompass.configs.datasets.gsm8k.gsm8k_0shot_v2_gen_a58960 import \`
			`gsm8k_datasets`
			`# and output the results in a chosen format`
			`from .summarizers.medium import summarizer`

			`datasets = sum((v for k, v in locals().items() if k.endswith('_datasets')), [])`

			`# configure lmdeploy`
			`from opencompass.models import TurboMindModelwithChatTemplate`



			`# configure the model`
			`models = [`
			`dict(`
			`type=TurboMindModelwithChatTemplate,`
			`abbr=f'internlm2-chat-7b-lmdeploy',`
			`# model path, which can be the address of a model repository on the Hugging Face Hub or a local path`
			`path='internlm/internlm2-chat-7b',`
			`# inference backend of LMDeploy. It can be either 'turbomind' or 'pytorch'.`
			`# If the model is not supported by 'turbomind', it will fallback to`
			`# 'pytorch'`
			`backend='turbomind',`
			`# For the detailed engine config and generation config, please refer to`
			`# https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/messages.py`
			`engine_config=dict(tp=1),`
			`gen_config=dict(do_sample=False),`
			`# the max size of the context window`
			`max_seq_len=7168,`
			`# the max number of new tokens`
			`max_out_len=1024,`
			`# the max number of prompts that LMDeploy receives`
			# in `generate` function
			`batch_size=5000,`
			`run_cfg=dict(num_gpus=1),`
			`)`
			`]`
			```

			`Place the aforementioned configuration in a file, such as "configs/eval_internlm2_lmdeploy.py". Then, in the home folder of OpenCompass, start evaluation by the following command:`

			```shell
			`python run.py configs/eval_internlm2_lmdeploy.py -w outputs`
			```

			`You are expected to get the evaluation results after the inference and evaluation.`