mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00
89 lines
3.4 KiB
Markdown
89 lines
3.4 KiB
Markdown
![]() |
# Evaluation with LMDeploy
|
||
|
|
||
|
We now support evaluation of models accelerated by the [LMDeploy](https://github.com/InternLM/lmdeploy). LMDeploy is a toolkit designed for compressing, deploying, and serving LLM. It has a remarkable inference performance. We now illustrate how to evaluate a model with the support of LMDeploy in OpenCompass.
|
||
|
|
||
|
## Setup
|
||
|
|
||
|
### Install OpenCompass
|
||
|
|
||
|
Please follow the [instructions](https://opencompass.readthedocs.io/en/latest/get_started/installation.html) to install the OpenCompass and prepare the evaluation datasets.
|
||
|
|
||
|
### Install LMDeploy
|
||
|
|
||
|
Install lmdeploy via pip (python 3.8+)
|
||
|
|
||
|
```shell
|
||
|
pip install lmdeploy
|
||
|
```
|
||
|
|
||
|
The default prebuilt package is compiled on CUDA 12. However, if CUDA 11+ is required, you can install lmdeploy by:
|
||
|
|
||
|
```shell
|
||
|
export LMDEPLOY_VERSION=0.6.0
|
||
|
export PYTHON_VERSION=310
|
||
|
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
|
||
|
```
|
||
|
|
||
|
## Evaluation
|
||
|
|
||
|
When evaluating a model, it is necessary to prepare an evaluation configuration that specifies information such as the evaluation dataset, the model, and inference parameters.
|
||
|
|
||
|
Taking [internlm2-chat-7b](https://huggingface.co/internlm/internlm2-chat-7b) as an example, the evaluation config is as follows:
|
||
|
|
||
|
```python
|
||
|
# configure the dataset
|
||
|
from mmengine.config import read_base
|
||
|
|
||
|
|
||
|
with read_base():
|
||
|
# choose a list of datasets
|
||
|
from .datasets.mmlu.mmlu_gen_a484b3 import mmlu_datasets
|
||
|
from .datasets.ceval.ceval_gen_5f30c7 import ceval_datasets
|
||
|
from .datasets.triviaqa.triviaqa_gen_2121ce import triviaqa_datasets
|
||
|
from opencompass.configs.datasets.gsm8k.gsm8k_0shot_v2_gen_a58960 import \
|
||
|
gsm8k_datasets
|
||
|
# and output the results in a chosen format
|
||
|
from .summarizers.medium import summarizer
|
||
|
|
||
|
datasets = sum((v for k, v in locals().items() if k.endswith('_datasets')), [])
|
||
|
|
||
|
# configure lmdeploy
|
||
|
from opencompass.models import TurboMindModelwithChatTemplate
|
||
|
|
||
|
|
||
|
|
||
|
# configure the model
|
||
|
models = [
|
||
|
dict(
|
||
|
type=TurboMindModelwithChatTemplate,
|
||
|
abbr=f'internlm2-chat-7b-lmdeploy',
|
||
|
# model path, which can be the address of a model repository on the Hugging Face Hub or a local path
|
||
|
path='internlm/internlm2-chat-7b',
|
||
|
# inference backend of LMDeploy. It can be either 'turbomind' or 'pytorch'.
|
||
|
# If the model is not supported by 'turbomind', it will fallback to
|
||
|
# 'pytorch'
|
||
|
backend='turbomind',
|
||
|
# For the detailed engine config and generation config, please refer to
|
||
|
# https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/messages.py
|
||
|
engine_config=dict(tp=1),
|
||
|
gen_config=dict(do_sample=False),
|
||
|
# the max size of the context window
|
||
|
max_seq_len=7168,
|
||
|
# the max number of new tokens
|
||
|
max_out_len=1024,
|
||
|
# the max number of prompts that LMDeploy receives
|
||
|
# in `generate` function
|
||
|
batch_size=5000,
|
||
|
run_cfg=dict(num_gpus=1),
|
||
|
)
|
||
|
]
|
||
|
```
|
||
|
|
||
|
Place the aforementioned configuration in a file, such as "configs/eval_internlm2_lmdeploy.py". Then, in the home folder of OpenCompass, start evaluation by the following command:
|
||
|
|
||
|
```shell
|
||
|
python run.py configs/eval_internlm2_lmdeploy.py -w outputs
|
||
|
```
|
||
|
|
||
|
You are expected to get the evaluation results after the inference and evaluation.
|