
* integrate turbomind python api * update * update user guide * update * fix according to reviewer's comments * fix error * fix linting * update user guide * remove debug log --------- Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
3.1 KiB
Evaluation with LMDeploy
We now support evaluation of models accelerated by the LMDeploy. LMDeploy is a toolkit designed for compressing, deploying, and serving LLM. TurboMind is an efficient inference engine proposed by LMDeploy. OpenCompass is compatible with TurboMind. We now illustrate how to evaluate a model with the support of TurboMind in OpenCompass.
Setup
Install OpenCompass
Please follow the instructions to install the OpenCompass and prepare the evaluation datasets.
Install LMDeploy
Install lmdeploy via pip (python 3.8+)
pip install lmdeploy
Evaluation
OpenCompass integrates both turbomind's python API and gRPC API for evaluation. And the former is highly recommended.
We take the InternLM-20B as example. Please download it from huggingface and convert it to turbomind's model format:
# 1. Download InternLM model(or use the cached model's checkpoint)
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/internlm/internlm-20b /path/to/internlm-20b
# 2. Convert InternLM model to turbomind's format, and save it in the home folder of opencompass
lmdeploy convert internlm /path/to/internlm-20b \
--dst-path {/home/folder/of/opencompass}/turbomind
Note:
If evaluating the InternLM Chat model, make sure to pass internlm-chat
as the model name instead of internlm
when converting the model format. The specific command is:
lmdeploy convert internlm-chat /path/to/internlm-20b-chat \
--dst-path {/home/folder/of/opencompass}/turbomind
Evaluation with Turbomind Python API (recommended)
In the home folder of OpenCompass, start evaluation by the following command:
python run.py configs/eval_internlm_turbomind.py -w outputs/turbomind/internlm-20b
You are expected to get the evaluation results after the inference and evaluation.
Note:
- If you evaluate theInternLM Chat model, please use configuration file
eval_internlm_chat_turbomind.py
- If you evaluate the InternLM 7B model, please modify
eval_internlm_turbomind.py
oreval_internlm_chat_turbomind.py
by commenting out the configuration for the 20B model and enabling the configuration for the 7B model.
Evaluation with Turbomind gPRC API (optional)
In the home folder of OpenCompass, launch the Triton Inference Server:
bash turbomind/service_docker_up.sh
And start evaluation by the following command:
python run.py configs/eval_internlm_turbomind_tis.py -w outputs/turbomind-tis/internlm-20b
**Note: **
- If the InternLM Chat model is requested to be evaluated, please use config file
eval_internlm_chat_turbomind_tis.py
- In
eval_internlm_turbomind_tis.py
, the configured Triton Inference Server (TIS) address istis_addr='0.0.0.0:33337'
. Please modifytis_addr
to the IP address of the machine where the server is launched. - If evaluating the InternLM 7B model, please modify the config file, commenting out the configuration for the 20B model and enabling the configuration for the 7B model