OpenCompass/docs/en/advanced_guides/evaluation_turbomind.md

# Evaluation with LMDeploy

We now support evaluation of models accelerated by the [LMDeploy](https://github.com/InternLM/lmdeploy). LMDeploy is a toolkit designed for compressing, deploying, and serving LLM. **TurboMind** is an efficient inference engine proposed by LMDeploy. OpenCompass is compatible with TurboMind. We now illustrate how to evaluate a model with the support of TurboMind in OpenCompass.

## Setup

### Install OpenCompass

Please follow the [instructions](https://opencompass.readthedocs.io/en/latest/get_started.html) to install the OpenCompass and prepare the evaluation datasets.

### Install LMDeploy

Install lmdeploy via pip (python 3.8+)

```shell
pip install lmdeploy
```

## Evaluation

OpenCompass integrates turbomind's python API for evaluation.

We take the InternLM-20B as example. Firstly, we prepare the evaluation config `configs/eval_internlm_turbomind.py`:

```python
from mmengine.config import read_base
from opencompass.models.turbomind import TurboMindModel


with read_base():
    # choose a list of datasets
    from .datasets.mmlu.mmlu_gen_a484b3 import mmlu_datasets
    from .datasets.ceval.ceval_gen_5f30c7 import ceval_datasets
    from .datasets.SuperGLUE_WiC.SuperGLUE_WiC_gen_d06864 import WiC_datasets
    from .datasets.triviaqa.triviaqa_gen_2121ce import triviaqa_datasets
    from .datasets.gsm8k.gsm8k_gen_1d7fe4 import gsm8k_datasets
    from .datasets.humaneval.humaneval_gen_8e312c import humaneval_datasets
    # and output the results in a chosen format
    from .summarizers.medium import summarizer

datasets = sum((v for k, v in locals().items() if k.endswith('_datasets')), [])

# config for internlm-20b model
internlm_20b = dict(
        type=TurboMindModel,
        abbr='internlm-20b-turbomind',
        path="internlm/internlm-20b",  # this path should be same as in huggingface
        engine_config=dict(session_len=2048,
                           max_batch_size=8,
                           rope_scaling_factor=1.0),
        gen_config=dict(top_k=1, top_p=0.8,
                        temperature=1.0,
                        max_new_tokens=100),
        max_out_len=100,
        max_seq_len=2048,
        batch_size=8,
        concurrency=8,
        run_cfg=dict(num_gpus=1, num_procs=1),
    )

models = [internlm_20b]
```

Then, in the home folder of OpenCompass, start evaluation by the following command:

```shell
python run.py configs/eval_internlm_turbomind.py -w outputs/turbomind/internlm-20b
```

You are expected to get the evaluation results after the inference and evaluation.

**Note**:

- If you want to pass more arguments for `engine_config`和`gen_config` in the evaluation config file, please refer to [TurbomindEngineConfig](https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html#turbomindengineconfig)
  and [EngineGenerationConfig](https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html#generationconfig)
- If you evaluate the InternLM Chat model, please use configuration file `eval_internlm_chat_turbomind.py`
- If you evaluate the InternLM 7B model, please modify `eval_internlm_turbomind.py` or `eval_internlm_chat_turbomind.py` by changing to the setting `models = [internlm_7b]` in the last line.
[Feature] Support turbomind (#166) * support turbomind * update doc * Update docs/en/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/zh_cn/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/zh_cn/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/en/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update --------- Co-authored-by: Tong Gao <gaotongxiao@gmail.com> 2023-08-10 16:25:11 +08:00			`# Evaluation with LMDeploy`

			`We now support evaluation of models accelerated by the [LMDeploy](https://github.com/InternLM/lmdeploy). LMDeploy is a toolkit designed for compressing, deploying, and serving LLM. TurboMind is an efficient inference engine proposed by LMDeploy. OpenCompass is compatible with TurboMind. We now illustrate how to evaluate a model with the support of TurboMind in OpenCompass.`

[Feature] update news (#186) * update news * update --------- Co-authored-by: gaotongxiao <gaotongxiao@gmail.com> 2023-08-10 18:52:09 +08:00			`## Setup`
[Feature] Support turbomind (#166) * support turbomind * update doc * Update docs/en/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/zh_cn/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/zh_cn/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/en/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update --------- Co-authored-by: Tong Gao <gaotongxiao@gmail.com> 2023-08-10 16:25:11 +08:00
[Feature] update news (#186) * update news * update --------- Co-authored-by: gaotongxiao <gaotongxiao@gmail.com> 2023-08-10 18:52:09 +08:00			`### Install OpenCompass`
[Feature] Support turbomind (#166) * support turbomind * update doc * Update docs/en/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/zh_cn/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/zh_cn/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/en/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update --------- Co-authored-by: Tong Gao <gaotongxiao@gmail.com> 2023-08-10 16:25:11 +08:00
			`Please follow the [instructions](https://opencompass.readthedocs.io/en/latest/get_started.html) to install the OpenCompass and prepare the evaluation datasets.`

[Feature] update news (#186) * update news * update --------- Co-authored-by: gaotongxiao <gaotongxiao@gmail.com> 2023-08-10 18:52:09 +08:00			`### Install LMDeploy`
[Feature] Support turbomind (#166) * support turbomind * update doc * Update docs/en/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/zh_cn/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/zh_cn/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/en/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update --------- Co-authored-by: Tong Gao <gaotongxiao@gmail.com> 2023-08-10 16:25:11 +08:00
			`Install lmdeploy via pip (python 3.8+)`

			```shell
			`pip install lmdeploy`
			```

[Feature] update news (#186) * update news * update --------- Co-authored-by: gaotongxiao <gaotongxiao@gmail.com> 2023-08-10 18:52:09 +08:00			`## Evaluation`
[Feature] Support turbomind (#166) * support turbomind * update doc * Update docs/en/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/zh_cn/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/zh_cn/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/en/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update --------- Co-authored-by: Tong Gao <gaotongxiao@gmail.com> 2023-08-10 16:25:11 +08:00
[Fix] Fix turbomind and update docs (#808) * update * update docs * add engine_config and gen_config in eval_config * update * fix * fix * fix * fix docstr * fix url 2024-01-18 14:41:35 +08:00			`OpenCompass integrates turbomind's python API for evaluation.`

			We take the InternLM-20B as example. Firstly, we prepare the evaluation config `configs/eval_internlm_turbomind.py`:

			```python
			`from mmengine.config import read_base`
			`from opencompass.models.turbomind import TurboMindModel`


			`with read_base():`
			`# choose a list of datasets`
			`from .datasets.mmlu.mmlu_gen_a484b3 import mmlu_datasets`
			`from .datasets.ceval.ceval_gen_5f30c7 import ceval_datasets`
			`from .datasets.SuperGLUE_WiC.SuperGLUE_WiC_gen_d06864 import WiC_datasets`
			`from .datasets.triviaqa.triviaqa_gen_2121ce import triviaqa_datasets`
			`from .datasets.gsm8k.gsm8k_gen_1d7fe4 import gsm8k_datasets`
			`from .datasets.humaneval.humaneval_gen_8e312c import humaneval_datasets`
			`# and output the results in a chosen format`
			`from .summarizers.medium import summarizer`

			`datasets = sum((v for k, v in locals().items() if k.endswith('_datasets')), [])`

			`# config for internlm-20b model`
			`internlm_20b = dict(`
			`type=TurboMindModel,`
			`abbr='internlm-20b-turbomind',`
			`path="internlm/internlm-20b", # this path should be same as in huggingface`
			`engine_config=dict(session_len=2048,`
			`max_batch_size=8,`
			`rope_scaling_factor=1.0),`
			`gen_config=dict(top_k=1, top_p=0.8,`
			`temperature=1.0,`
			`max_new_tokens=100),`
			`max_out_len=100,`
			`max_seq_len=2048,`
			`batch_size=8,`
			`concurrency=8,`
			`run_cfg=dict(num_gpus=1, num_procs=1),`
			`)`

			`models = [internlm_20b]`
[Feature] Support turbomind (#166) * support turbomind * update doc * Update docs/en/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/zh_cn/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/zh_cn/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/en/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update --------- Co-authored-by: Tong Gao <gaotongxiao@gmail.com> 2023-08-10 16:25:11 +08:00			```

[Fix] Fix turbomind and update docs (#808) * update * update docs * add engine_config and gen_config in eval_config * update * fix * fix * fix * fix docstr * fix url 2024-01-18 14:41:35 +08:00			`Then, in the home folder of OpenCompass, start evaluation by the following command:`
[Feature] Support turbomind (#166) * support turbomind * update doc * Update docs/en/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/zh_cn/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/zh_cn/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/en/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update --------- Co-authored-by: Tong Gao <gaotongxiao@gmail.com> 2023-08-10 16:25:11 +08:00
			```shell
Integrate turbomind python api (#484) * integrate turbomind python api * update * update user guide * update * fix according to reviewer's comments * fix error * fix linting * update user guide * remove debug log --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> 2023-11-21 22:34:46 +08:00			`python run.py configs/eval_internlm_turbomind.py -w outputs/turbomind/internlm-20b`
[Feature] Support turbomind (#166) * support turbomind * update doc * Update docs/en/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/zh_cn/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/zh_cn/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * Update docs/en/advanced_guides/evaluation_turbomind.md Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update --------- Co-authored-by: Tong Gao <gaotongxiao@gmail.com> 2023-08-10 16:25:11 +08:00			```

Integrate turbomind python api (#484) * integrate turbomind python api * update * update user guide * update * fix according to reviewer's comments * fix error * fix linting * update user guide * remove debug log --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com> 2023-11-21 22:34:46 +08:00			`You are expected to get the evaluation results after the inference and evaluation.`

			`Note:`
Integrate turbomind inference via its RPC API instead of its python API (#414) * support tis * integrate turbomind inference via its RPC API instead of its python API * update guide * update ip address spec * update according to reviewer's comments 2023-10-07 10:27:48 +08:00
[Fix] Fix turbomind and update docs (#808) * update * update docs * add engine_config and gen_config in eval_config * update * fix * fix * fix * fix docstr * fix url 2024-01-18 14:41:35 +08:00			- If you want to pass more arguments for `engine_config`和`gen_config` in the evaluation config file, please refer to [TurbomindEngineConfig](https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html#turbomindengineconfig)
			`and [EngineGenerationConfig](https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html#generationconfig)`
[Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend (#721) * add llama2 test * fix * test qwen chat-7b * test w4 * add baichuan2 * update * update * update configs and docs * update 2023-12-21 18:22:17 +08:00			- If you evaluate the InternLM Chat model, please use configuration file `eval_internlm_chat_turbomind.py`
			- If you evaluate the InternLM 7B model, please modify `eval_internlm_turbomind.py` or `eval_internlm_chat_turbomind.py` by changing to the setting `models = [internlm_7b]` in the last line.