# Evaluation with LMDeploy We now support evaluation of models accelerated by the [LMDeploy](https://github.com/InternLM/lmdeploy). LMDeploy is a toolkit designed for compressing, deploying, and serving LLM. It has a remarkable inference performance. We now illustrate how to evaluate a model with the support of LMDeploy in OpenCompass. ## Setup ### Install OpenCompass Please follow the [instructions](https://opencompass.readthedocs.io/en/latest/get_started/installation.html) to install the OpenCompass and prepare the evaluation datasets. ### Install LMDeploy Install lmdeploy via pip (python 3.8+) ```shell pip install lmdeploy ``` The default prebuilt package is compiled on CUDA 12. However, if CUDA 11+ is required, you can install lmdeploy by: ```shell export LMDEPLOY_VERSION=0.6.0 export PYTHON_VERSION=310 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118 ``` ## Evaluation When evaluating a model, it is necessary to prepare an evaluation configuration that specifies information such as the evaluation dataset, the model, and inference parameters. Taking [internlm2-chat-7b](https://huggingface.co/internlm/internlm2-chat-7b) as an example, the evaluation config is as follows: ```python # configure the dataset from mmengine.config import read_base with read_base(): # choose a list of datasets from .datasets.mmlu.mmlu_gen_a484b3 import mmlu_datasets from .datasets.ceval.ceval_gen_5f30c7 import ceval_datasets from .datasets.triviaqa.triviaqa_gen_2121ce import triviaqa_datasets from opencompass.configs.datasets.gsm8k.gsm8k_0shot_v2_gen_a58960 import \ gsm8k_datasets # and output the results in a chosen format from .summarizers.medium import summarizer datasets = sum((v for k, v in locals().items() if k.endswith('_datasets')), []) # configure lmdeploy from opencompass.models import TurboMindModelwithChatTemplate # configure the model models = [ dict( type=TurboMindModelwithChatTemplate, abbr=f'internlm2-chat-7b-lmdeploy', # model path, which can be the address of a model repository on the Hugging Face Hub or a local path path='internlm/internlm2-chat-7b', # inference backend of LMDeploy. It can be either 'turbomind' or 'pytorch'. # If the model is not supported by 'turbomind', it will fallback to # 'pytorch' backend='turbomind', # For the detailed engine config and generation config, please refer to # https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/messages.py engine_config=dict(tp=1), gen_config=dict(do_sample=False), # the max size of the context window max_seq_len=7168, # the max number of new tokens max_out_len=1024, # the max number of prompts that LMDeploy receives # in `generate` function batch_size=5000, run_cfg=dict(num_gpus=1), ) ] ``` Place the aforementioned configuration in a file, such as "configs/eval_internlm2_lmdeploy.py". Then, in the home folder of OpenCompass, start evaluation by the following command: ```shell python run.py configs/eval_internlm2_lmdeploy.py -w outputs ``` You are expected to get the evaluation results after the inference and evaluation.