OpenCompass/opencompass/utils/postprocessors/naive/README.md

## Short Usage Introduction for Naive Model Postprocessor with Custom Model

<!-- Now OC can use  -->

### Step 1: Deploy an API server using vLLM or LMDeploy

```bash
lmdeploy serve api_server meta-llama/Meta-Llama-3-8B-Instruct --model-name llama3-8b-instruct  --server-port 23333 --backend turbomind --tp 1
```

### Step 2: Add Naive Model Postprocessor to the configuration file

Take GSM8K as an example, you can add the following lines to the configuration file and replace the `api_url` with the correct address of the API server.

```python
...
from opencompass.utils.model_postprocessors import navie_model_postprocess
from opencompass.utils.postprocessors.naive import MATH_NAVIE_PROMPT_TEMPLATE

...

gsm8k_eval_cfg = dict(
    evaluator=dict(type=MATHEvaluator, version='v2'),
    pred_postprocessor=dict(type=math_postprocess_v2),
    dataset_postprocessor=dict(type=gsm8k_dataset_postprocess),
    # Add the following line to use the naive model postprocessor
    model_postprocessor=dict(
        type=navie_model_postprocess,
        custom_instruction=MATH_NAVIE_PROMPT_TEMPLATE,
        model_name='llama3-8b-instruct',
        api_url='http://0.0.0.0:23333/v1,http://0.0.0.0:23334/v1')
    )
...

```

The prompt for extraction can also be customized by changing the `custom_instruction` parameter. Now support two default templates: `MATH_NAVIE_PROMPT_TEMPLATE` for math problems extraction like GSM8K and MATH, and `OPTION_NAVIE_PROMPT_TEMPLATE` for option problems extraction like MMLU. You can also write your own prompt template, like:

```python
OPTION_NAVIE_PROMPT_TEMPLATE = """
There is a detailed explanation of the final answer you should extract:
1. You should extract the final answer option like 'A', 'B', 'C', 'D' ... from the given output sentences.
2. The question is a single choice question, so the final answer option should be one of the options, not a combination of options.
"""
```

Your prompt should start with `There is a detailed explanation of the final answer you should extract:` and following with your customized instructions.

### Step 3: Run the Evaluation as Usual

Now you can run the evaluation as usual with the configuration file you modified. The evaluation will use the custom model as the post-process model to get the final result. The final result will be the `model_postprocess_accuracy` in the evaluation result, like:

```Markdown
dataset                                            version    metric                      mode      llama-3-8b-instruct-turbomind
-------------------------------------------------  ---------  --------------------------  ------  -------------------------------
gsm8k                                              a58960     accuracy                    gen                               73.46
gsm8k                                              a58960     model_postprocess_accuracy  gen                               78.77
```

## Experiment Results

We have tested the model postprocess method with different models (Qwen2-72B-Chat, Llama3-8b-Chat) as post-process model on the GSM8K, MMLU datasets for `Meta-Llama-3-8B-Instruct` with above settings, and the results are as follows:

```Markdown
| Dataset | Type            | Config ID              | Regex Postprocess Score | Model Postprocess Score (Llama3-8b-Instruct) | Model Postprocess Score (Qwen2-72B-Chat) |
| ------- | --------------- | ------------------------ | ----------------------- | ----------------------- |----------------------- |
| gsm8k   | math            | a58960                   | 73.46               | 79.08                  | 78.77                   |
| mmlu    | option          | 4d595a                   | 67.89               | 65.26                  | 67.94                  |
```

The `metric` column with `model_postprocess_accuracy` is the final result after the `Naive Model Postprocessor` is applied.
[Feature] Add custom model postprocess function (#1519) Co-authored-by: liushz <liuhongwei@pjlab.rog.cn> 2024-09-18 14:40:51 +08:00			`## Short Usage Introduction for Naive Model Postprocessor with Custom Model`

			`<!-- Now OC can use -->`

			`### Step 1: Deploy an API server using vLLM or LMDeploy`

			```bash
			`lmdeploy serve api_server meta-llama/Meta-Llama-3-8B-Instruct --model-name llama3-8b-instruct --server-port 23333 --backend turbomind --tp 1`
			```

			`### Step 2: Add Naive Model Postprocessor to the configuration file`

			Take GSM8K as an example, you can add the following lines to the configuration file and replace the `api_url` with the correct address of the API server.

			```python
			`...`
			`from opencompass.utils.model_postprocessors import navie_model_postprocess`
			`from opencompass.utils.postprocessors.naive import MATH_NAVIE_PROMPT_TEMPLATE`

			`...`

			`gsm8k_eval_cfg = dict(`
			`evaluator=dict(type=MATHEvaluator, version='v2'),`
			`pred_postprocessor=dict(type=math_postprocess_v2),`
			`dataset_postprocessor=dict(type=gsm8k_dataset_postprocess),`
			`# Add the following line to use the naive model postprocessor`
			`model_postprocessor=dict(`
			`type=navie_model_postprocess,`
			`custom_instruction=MATH_NAVIE_PROMPT_TEMPLATE,`
			`model_name='llama3-8b-instruct',`
			`api_url='http://0.0.0.0:23333/v1,http://0.0.0.0:23334/v1')`
			`)`
			`...`

			```

			The prompt for extraction can also be customized by changing the `custom_instruction` parameter. Now support two default templates: `MATH_NAVIE_PROMPT_TEMPLATE` for math problems extraction like GSM8K and MATH, and `OPTION_NAVIE_PROMPT_TEMPLATE` for option problems extraction like MMLU. You can also write your own prompt template, like:

			```python
			`OPTION_NAVIE_PROMPT_TEMPLATE = """`
			`There is a detailed explanation of the final answer you should extract:`
			`1. You should extract the final answer option like 'A', 'B', 'C', 'D' ... from the given output sentences.`
			`2. The question is a single choice question, so the final answer option should be one of the options, not a combination of options.`
			`"""`
			```

			Your prompt should start with `There is a detailed explanation of the final answer you should extract:` and following with your customized instructions.

			`### Step 3: Run the Evaluation as Usual`

			Now you can run the evaluation as usual with the configuration file you modified. The evaluation will use the custom model as the post-process model to get the final result. The final result will be the `model_postprocess_accuracy` in the evaluation result, like:

			```Markdown
			`dataset version metric mode llama-3-8b-instruct-turbomind`
			`------------------------------------------------- --------- -------------------------- ------ -------------------------------`
			`gsm8k a58960 accuracy gen 73.46`
			`gsm8k a58960 model_postprocess_accuracy gen 78.77`
			```

			`## Experiment Results`

			We have tested the model postprocess method with different models (Qwen2-72B-Chat, Llama3-8b-Chat) as post-process model on the GSM8K, MMLU datasets for `Meta-Llama-3-8B-Instruct` with above settings, and the results are as follows:

			```Markdown
			`\| Dataset \| Type \| Config ID \| Regex Postprocess Score \| Model Postprocess Score (Llama3-8b-Instruct) \| Model Postprocess Score (Qwen2-72B-Chat) \|`
			`\| ------- \| --------------- \| ------------------------ \| ----------------------- \| ----------------------- \|----------------------- \|`
			`\| gsm8k \| math \| a58960 \| 73.46 \| 79.08 \| 78.77 \|`
			`\| mmlu \| option \| 4d595a \| 67.89 \| 65.26 \| 67.94 \|`
			```

			The `metric` column with `model_postprocess_accuracy` is the final result after the `Naive Model Postprocessor` is applied.