OpenCompass/opencompass/utils/postprocessors/naive
liushz 2e9db77d57
[Feature] Add custom model postprocess function (#1519)
Co-authored-by: liushz <liuhongwei@pjlab.rog.cn>
2024-09-18 14:40:51 +08:00
..
__init__.py [Feature] Add custom model postprocess function (#1519) 2024-09-18 14:40:51 +08:00
extractor.py [Feature] Add custom model postprocess function (#1519) 2024-09-18 14:40:51 +08:00
PROMPT_TEMPLATE.py [Feature] Add custom model postprocess function (#1519) 2024-09-18 14:40:51 +08:00
README.md [Feature] Add custom model postprocess function (#1519) 2024-09-18 14:40:51 +08:00

Short Usage Introduction for Naive Model Postprocessor with Custom Model

Step 1: Deploy an API server using vLLM or LMDeploy

lmdeploy serve api_server meta-llama/Meta-Llama-3-8B-Instruct --model-name llama3-8b-instruct  --server-port 23333 --backend turbomind --tp 1

Step 2: Add Naive Model Postprocessor to the configuration file

Take GSM8K as an example, you can add the following lines to the configuration file and replace the api_url with the correct address of the API server.

...
from opencompass.utils.model_postprocessors import navie_model_postprocess
from opencompass.utils.postprocessors.naive import MATH_NAVIE_PROMPT_TEMPLATE

...

gsm8k_eval_cfg = dict(
    evaluator=dict(type=MATHEvaluator, version='v2'),
    pred_postprocessor=dict(type=math_postprocess_v2),
    dataset_postprocessor=dict(type=gsm8k_dataset_postprocess),
    # Add the following line to use the naive model postprocessor
    model_postprocessor=dict(
        type=navie_model_postprocess,
        custom_instruction=MATH_NAVIE_PROMPT_TEMPLATE,
        model_name='llama3-8b-instruct',
        api_url='http://0.0.0.0:23333/v1,http://0.0.0.0:23334/v1')
    )
...

The prompt for extraction can also be customized by changing the custom_instruction parameter. Now support two default templates: MATH_NAVIE_PROMPT_TEMPLATE for math problems extraction like GSM8K and MATH, and OPTION_NAVIE_PROMPT_TEMPLATE for option problems extraction like MMLU. You can also write your own prompt template, like:

OPTION_NAVIE_PROMPT_TEMPLATE = """
There is a detailed explanation of the final answer you should extract:
1. You should extract the final answer option like 'A', 'B', 'C', 'D' ... from the given output sentences.
2. The question is a single choice question, so the final answer option should be one of the options, not a combination of options.
"""

Your prompt should start with There is a detailed explanation of the final answer you should extract: and following with your customized instructions.

Step 3: Run the Evaluation as Usual

Now you can run the evaluation as usual with the configuration file you modified. The evaluation will use the custom model as the post-process model to get the final result. The final result will be the model_postprocess_accuracy in the evaluation result, like:

dataset                                            version    metric                      mode      llama-3-8b-instruct-turbomind
-------------------------------------------------  ---------  --------------------------  ------  -------------------------------
gsm8k                                              a58960     accuracy                    gen                               73.46
gsm8k                                              a58960     model_postprocess_accuracy  gen                               78.77

Experiment Results

We have tested the model postprocess method with different models (Qwen2-72B-Chat, Llama3-8b-Chat) as post-process model on the GSM8K, MMLU datasets for Meta-Llama-3-8B-Instruct with above settings, and the results are as follows:

| Dataset | Type            | Config ID              | Regex Postprocess Score | Model Postprocess Score (Llama3-8b-Instruct) | Model Postprocess Score (Qwen2-72B-Chat) |
| ------- | --------------- | ------------------------ | ----------------------- | ----------------------- |----------------------- |
| gsm8k   | math            | a58960                   | 73.46               | 79.08                  | 78.77                   |
| mmlu    | option          | 4d595a                   | 67.89               | 65.26                  | 67.94                  |

The metric column with model_postprocess_accuracy is the final result after the Naive Model Postprocessor is applied.