mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00
72 lines
3.8 KiB
Markdown
72 lines
3.8 KiB
Markdown
![]() |
## Short Usage Introduction for Naive Model Postprocessor with Custom Model
|
||
|
|
||
|
<!-- Now OC can use -->
|
||
|
|
||
|
### Step 1: Deploy an API server using vLLM or LMDeploy
|
||
|
|
||
|
```bash
|
||
|
lmdeploy serve api_server meta-llama/Meta-Llama-3-8B-Instruct --model-name llama3-8b-instruct --server-port 23333 --backend turbomind --tp 1
|
||
|
```
|
||
|
|
||
|
### Step 2: Add Naive Model Postprocessor to the configuration file
|
||
|
|
||
|
Take GSM8K as an example, you can add the following lines to the configuration file and replace the `api_url` with the correct address of the API server.
|
||
|
|
||
|
```python
|
||
|
...
|
||
|
from opencompass.utils.model_postprocessors import navie_model_postprocess
|
||
|
from opencompass.utils.postprocessors.naive import MATH_NAVIE_PROMPT_TEMPLATE
|
||
|
|
||
|
...
|
||
|
|
||
|
gsm8k_eval_cfg = dict(
|
||
|
evaluator=dict(type=MATHEvaluator, version='v2'),
|
||
|
pred_postprocessor=dict(type=math_postprocess_v2),
|
||
|
dataset_postprocessor=dict(type=gsm8k_dataset_postprocess),
|
||
|
# Add the following line to use the naive model postprocessor
|
||
|
model_postprocessor=dict(
|
||
|
type=navie_model_postprocess,
|
||
|
custom_instruction=MATH_NAVIE_PROMPT_TEMPLATE,
|
||
|
model_name='llama3-8b-instruct',
|
||
|
api_url='http://0.0.0.0:23333/v1,http://0.0.0.0:23334/v1')
|
||
|
)
|
||
|
...
|
||
|
|
||
|
```
|
||
|
|
||
|
The prompt for extraction can also be customized by changing the `custom_instruction` parameter. Now support two default templates: `MATH_NAVIE_PROMPT_TEMPLATE` for math problems extraction like GSM8K and MATH, and `OPTION_NAVIE_PROMPT_TEMPLATE` for option problems extraction like MMLU. You can also write your own prompt template, like:
|
||
|
|
||
|
```python
|
||
|
OPTION_NAVIE_PROMPT_TEMPLATE = """
|
||
|
There is a detailed explanation of the final answer you should extract:
|
||
|
1. You should extract the final answer option like 'A', 'B', 'C', 'D' ... from the given output sentences.
|
||
|
2. The question is a single choice question, so the final answer option should be one of the options, not a combination of options.
|
||
|
"""
|
||
|
```
|
||
|
|
||
|
Your prompt should start with `There is a detailed explanation of the final answer you should extract:` and following with your customized instructions.
|
||
|
|
||
|
### Step 3: Run the Evaluation as Usual
|
||
|
|
||
|
Now you can run the evaluation as usual with the configuration file you modified. The evaluation will use the custom model as the post-process model to get the final result. The final result will be the `model_postprocess_accuracy` in the evaluation result, like:
|
||
|
|
||
|
```Markdown
|
||
|
dataset version metric mode llama-3-8b-instruct-turbomind
|
||
|
------------------------------------------------- --------- -------------------------- ------ -------------------------------
|
||
|
gsm8k a58960 accuracy gen 73.46
|
||
|
gsm8k a58960 model_postprocess_accuracy gen 78.77
|
||
|
```
|
||
|
|
||
|
## Experiment Results
|
||
|
|
||
|
We have tested the model postprocess method with different models (Qwen2-72B-Chat, Llama3-8b-Chat) as post-process model on the GSM8K, MMLU datasets for `Meta-Llama-3-8B-Instruct` with above settings, and the results are as follows:
|
||
|
|
||
|
```Markdown
|
||
|
| Dataset | Type | Config ID | Regex Postprocess Score | Model Postprocess Score (Llama3-8b-Instruct) | Model Postprocess Score (Qwen2-72B-Chat) |
|
||
|
| ------- | --------------- | ------------------------ | ----------------------- | ----------------------- |----------------------- |
|
||
|
| gsm8k | math | a58960 | 73.46 | 79.08 | 78.77 |
|
||
|
| mmlu | option | 4d595a | 67.89 | 65.26 | 67.94 |
|
||
|
```
|
||
|
|
||
|
The `metric` column with `model_postprocess_accuracy` is the final result after the `Naive Model Postprocessor` is applied.
|