[Docs] Update prompt docs (#46)

* [Docs] Update prompt docs * update * [Docs] Prompt docs (#112) * update docs * update * update * Update en prompt template * Update en prompt doc * fix * fix --------- Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
2025-05-30 16:03:24 +08:00 · 2023-07-29 00:46:13 +08:00 · 2023-07-29 00:46:13 +08:00 · 262ab794fb
commit 262ab794fb
parent e04f88424d
8 changed files with 1047 additions and 34 deletions
--- a/docs/en/prompt/few_shot.md
+++ b/docs/en/prompt/few_shot.md
@ -1,3 +0,0 @@
 # In-context Learning
 Coming soon.
--- a/docs/en/prompt/meta_template.md
+++ b/docs/en/prompt/meta_template.md
@ -27,7 +27,7 @@ models = [
 ```
 Next, we will introduce how to configure Meta Template on two types of models.
-This article mainly introduces the usage of meta prompt. If you need to debug the prompt, it is recommended to use the `tools/prompt_viewer.py` script to preview the actual prompt received by the model after preparing the configuration file. Read [here](../tools.md#prompt-viewer) for more.
+You are recommended to read [here](./prompt_template.md#dialogue-prompt) for the basic syntax of the dialogue template before reading this chapter.
 ```{note}
 In some cases (such as testing the base station), we don't need to inject any instructions into the normal dialogue, in which case we can leave the meta template empty. In this case, the prompt received by the model is defined only by the dataset configuration and is a regular string. If the dataset configuration uses a dialogue template, speeches from different roles will be concatenated with \n.
@ -43,11 +43,13 @@ We will explain how to define the meta template with several examples.
 Suppose that according to the dialogue template of the dataset, the following dialogue was produced:
-```Plain
+```python
-HUMAN: 1+1=?
+PromptList([
-BOT: 2
+    dict(role='HUMAN', prompt='1+1=?'),
-HUMAN: 2+2=?
+    dict(role='BOT', prompt='2'),
-BOT: 4
+    dict(role='HUMAN', prompt='2+2=?'),
    dict(role='BOT', prompt='4'),
 ])
 ```
 We want to pass this dialogue to a model that has already gone through SFT. The model's agreed dialogue begins with the speech of different roles with `<Role Name>:` and ends with a special token and \\n. Here is the complete string the model expects to receive:
@ -75,12 +77,14 @@ ______________________________________________________________________
 Some datasets may introduce SYSTEM-level roles:
-```
+```python
-SYSTEM: Solve the following math questions
+PromptList([
-HUMAN: 1+1=?
+    dict(role='SYSTEM', fallback_role='HUMAN', prompt='Solve the following math questions'),
-BOT: 2
+    dict(role='HUMAN', prompt='1+1=?'),
-HUMAN: 2+2=?
+    dict(role='BOT', prompt='2'),
-BOT: 4
+    dict(role='HUMAN', prompt='2+2=?'),
    dict(role='BOT', prompt='4'),
 ])
 ```
 Assuming the model also accepts the SYSTEM role, and expects the input to be:
@ -253,3 +257,7 @@ Even though different API models accept different data structures, there are com
 In this regard, OpenCompass has preset three `api_role` values for API models: `HUMAN`, `BOT`, `SYSTEM`, and stipulates that in addition to regular strings, the input accepted by API models includes a middle format of dialogue represented by `PromptList`. The API model will repackage the dialogue in a multi-turn dialogue format and send it to the backend. However, to activate this feature, users need to map the roles `role` in the dataset prompt template to the corresponding `api_role` in the above meta template. The following figure illustrates the relationship between the input accepted by the API model and the Prompt Template and Meta Template.
 ![](https://user-images.githubusercontent.com/22607038/251195872-63aa7d30-045a-4837-84b5-11b09f07fb18.png)
 ## Debugging
 If you need to debug the prompt, it is recommended to use the `tools/prompt_viewer.py` script to preview the actual prompt received by the model after preparing the configuration file. Read [here](../tools.md#prompt-viewer) for more.
--- a/docs/en/prompt/overview.md
+++ b/docs/en/prompt/overview.md
@ -1 +1,9 @@
 # Prompt Overview
 The prompt is the input to the Language Model (LLM), used to guide the model to generate text or calculate perplexity (PPL). The selection of prompts can significantly impact the accuracy of the evaluated model. The process of converting the dataset into a series of prompts is defined by templates.
 In OpenCompass, we split the template into two parts: the data-side template and the model-side template. When evaluating a model, the data will pass through both the data-side template and the model-side template, ultimately transforming into the input required by the model.
 The data-side template is referred to as [prompt_template](./prompt_template.md), which represents the process of converting the fields in the dataset into prompts.
 The model-side template is referred to as [meta_template](./meta_template.md), which represents how the model transforms these prompts into its expected input.
--- a/docs/en/prompt/prompt_template.md
+++ b/docs/en/prompt/prompt_template.md
@ -1,3 +1,497 @@
 # Prompt Template
-Coming soon.
+## Background
 In language model evaluation, we often construct prompts from the original dataset according to certain rules to enable the model to answer questions as required.
 Typically, we place instructions at the beginning of the prompt, followed by several in-context examples, and finally, we include the question. For example:
 ```text
 Solve the following questions.
 1+1=?
 2
 3+9=?
 12
 5+6=?
 ```
 Extensive experiments have shown that even with the same original test questions, different ways of constructing the prompt can affect the model's performance. Factors that may influence this include:
 - The composition of the prompt itself, including instructions, in-context examples, and the format of the question.
 - The selection of in-context examples, including the number and method of selection.
 - The manner in which the prompt is used. Should the model complete the prompt based on the given context, or should it choose the best prompt among the candidate prompts?
 OpenCompass defines the prompt construction strategy in the `infer_cfg` section of the dataset configuration. A typical `infer_cfg` is shown below:
 ```python
 infer_cfg = dict(
    ice_template=dict(  # Template used to construct In Context Examples (ice).
        type=PromptTemplate,
        template='{question}\n{answer}'
    ),
    prompt_template=dict(  # Template used to construct the main prompt.
        type=PromptTemplate,
        template='Solve the following questions.\n</E>{question}\n{answer}',
        ice_token="</E>"
    ),
    retriever=dict(type=FixKRetriever),  # Definition of how to retrieve in-context examples.
    inferencer=dict(type=GenInferencer, fix_id_list=[0, 1]),  # Method used to generate predictions.
 )
 ```
 In this document, we will mainly introduce the definitions of `ice_template`, `prompt_template`, and `inferencer`. For information on the `retriever`, please refer to other documents.
 Let's start by introducing the basic syntax of the prompt.
 ## String-Based Prompt
 String-based prompt is a classic form of template. Consider the following template:
 ```python
 prompt_template=dict(
    type=PromptTemplate,
    template="{anything}\nQuestion: {question}\nAnswer: {answer}"
 )
 ```
 At runtime, the fields within the `{}` will be replaced with corresponding fields from the data sample. If a field does not exist in the data sample, it will be kept as is in the output.
 For example, let's consider a data example as follows:
 ```python
 example = {
    'question': '1+1=?',
    'answer': '2',  # Assume the answer is in the reader_cfg.output_column
    'irrelevant_infos': 'blabla',
 }
 ```
 After filling in the template, the result will be:
 ```text
 {anything}
 Question: 1+1=?
 Answer:
 ```
 As you can see, the actual answer for the question, represented by the field `answer`, does not appear in the generated result. This is because OpenCompass will mask fields that are written in `reader_cfg.output_column` to prevent answer leakage. For detailed explanations on `reader_cfg`, please refer to the relevant documentation on dataset configuration.
 ## Dialogue-Based Prompt
 In practical testing, making models perform simple completions may not effectively test the performance of chat-based models. Therefore, we prefer prompts that take the form of dialogues. Additionally, different models have varying definitions of dialogue formats. Hence, we need prompts generated from the dataset to be more versatile, and the specific prompts required by each model can be generated during testing.
 To achieve this, OpenCompass extends the string-based prompt to dialogue-based prompt. Dialogue-based prompt is more flexible, as it can combine with different [meta_templates](./meta_template.md) on the model side to generate prompts in various dialogue formats. It is applicable to both base and chat models, but their definitions are relatively complex.
 Now, let's assume we have a data sample as follows:
 ```python
 example = {
    'question': '1+1=?',
    'answer': '2',  # Assume the answer is in the reader_cfg.output_column
    'irrelavent_infos': 'blabla',
 }
 ```
 Next, let's showcase a few examples:
 `````{tabs}
 ````{tab} Single-round Dialogue
 ```python
 prompt_template=dict(
    type=PromptTemplate,
    template=dict(
        round=[
            dict(role="HUMAN", prompt="Question: {question}"),
            dict(role="BOT", prompt="Answer: {answer}"),
        ]
    )
 )
 ```
 The intermediate result obtained by OpenCompass after filling the data into the template is:
 ```python
 PromptList([
    dict(role='HUMAN', prompt='Question: 1+1=?'),
    dict(role='BOT', prompt='Answer: '),
 ])
 ```
 ````
 ````{tab} Multi-round Dialogue
 ```python
 prompt_template=dict(
    type=PromptTemplate,
    template=dict(
        round=[
            dict(role="HUMAN", prompt="Question: 2+2=?"),
            dict(role="BOT", prompt="Answer: 4"),
            dict(role="HUMAN", prompt="Question: 3+3=?"),
            dict(role="BOT", prompt="Answer: 6"),
            dict(role="HUMAN", prompt="Question: {question}"),
            dict(role="BOT", prompt="Answer: {answer}"),
        ]
    )
 )
 ```
 The intermediate result obtained by OpenCompass after filling the data into the template is:
 ```python
 PromptList([
    dict(role='HUMAN', prompt='Question: 2+2=?'),
    dict(role='BOT', prompt='Answer: 4'),
    dict(role='HUMAN', prompt='Question: 3+3=?'),
    dict(role='BOT', prompt='Answer: 6'),
    dict(role='HUMAN', prompt='Question: 1+1=?'),
    dict(role='BOT', prompt='Answer: '),
 ])
 ```
 ````
 ````{tab} Dialogue with sys instruction
 ```python
 prompt_template=dict(
    type=PromptTemplate,
    template=dict(
        begin=[
            dict(role='SYSTEM', fallback_role='HUMAN', prompt='Solve the following questions.'),
        ],
        round=[
            dict(role="HUMAN", prompt="Question: {question}"),
            dict(role="BOT", prompt="Answer: {answer}"),
        ]
    )
 )
 ```
 The intermediate result obtained by OpenCompass after filling the data into the template is:
 ```python
 PromptList([
    dict(role='SYSTEM', fallback_role='HUMAN', prompt='Solve the following questions.'),
    dict(role='HUMAN', prompt='Question: 1+1=?'),
    dict(role='BOT', prompt='Answer: '),
 ])
 ```
 During the processing of a specific meta template, if the definition includes the SYSTEM role, the template designated for the SYSTEM role will be used for processing. On the other hand, if the SYSTEM role is not defined, the template assigned to the fallback_role role will be utilized, which, in this example, corresponds to the HUMAN role.
 ````
 `````
 In dialogue-based templates, prompts are organized in the form of conversations between different roles (`role`). In the current predefined dataset configuration of OpenCompass, some commonly used roles in a prompt include:
 - `HUMAN`: Represents a human, usually the one asking questions.
 - `BOT`: Represents the language model, usually the one providing answers.
 - `SYSTEM`: Represents the system, typically used at the beginning of prompts to give instructions.
 Furthermore, unlike string-based templates, the prompts generated by dialogue-based templates are transformed into an intermediate structure called PromptList. This structure will be further combined with the model-side [meta_templates](./meta_template.md) to assemble the final prompt. If no meta template is specified, the prompts in the PromptList will be directly concatenated into a single string.
 ```{note}
 The content within the PromptList in the example above is not the final input to the model and depends on the processing of the meta template. One potential source of misunderstanding is that in generative evaluations, the prompt of the last `BOT` role, `Answer: `, **will not** be inputted to the model. This is because API models generally cannot customize the initial part of model-generated responses. Therefore, this setting ensures consistency in the evaluation behavior between language models and API models. For more information, please refer to the documentation on [meta template](./meta_template.md).
 ```
 <details>
 <summary>Expand the complete parameter descriptions</summary>
 - `begin`, `end`: (list, optional) The beginning and end of the prompt, typically containing system-level instructions. Each item inside can be **a dictionary or a string**.
 - `round`: (list) The format of the dialogue in the template. Each item in the list must be a dictionary.
 Each dictionary has the following parameters:
 - `role` (str): The role name participating in the dialogue. It is used to associate with the names in meta_template but does not affect the actual generated prompt.
 - `fallback_role` (str): The default role name to use in case the associated role is not found in the meta_template. Defaults to None.
 - `prompt` (str): The dialogue content for the role.
 </details>
 ## Prompt Templates and `inferencer`
 Once we understand the basic definition of prompt templates, we also need to organize them according to the type of `inferencer`.
 OpenCompass mainly supports two types of inferencers: `GenInferencer` and `PPLInferencer`, corresponding to two different inference methods.
 `GenInferencer` corresponds to generative inference. During inference, the model is asked to continue generating text based on the input prompt. In this case, the `template` represents a single template for each sentence, for example:
 `````{tabs}
 ````{group-tab} String-based Prompt
 ```python
 prompt_template=dict(
    type=PromptTemplate,
    template='Solve the following questions.\n{question}\n{answer}'
 )
 ```
 ````
 ````{group-tab} Dialogue-Based Prompt
 ```python
 prompt_template=dict(
    type=PromptTemplate,
    template=dict(
        begin=[
            dict(role='SYSTEM', fallback_role='HUMAN', prompt='Solve the following questions.'),
        ],
        round=[
            dict(role="HUMAN", prompt="{question}"),
            dict(role="BOT", prompt="{answer}"),
        ]
    )
 )
 ```
 ````
 `````
 Then, the model's inference result will be a continuation of the concatenated string.
 For `PPLInferencer`, it corresponds to discriminative inference. During inference, the model is asked to compute the perplexity (PPL) for each input string and select the item with the lowest perplexity as the model's inference result. In this case, `template` is a `dict` representing the template for each sentence, for example:
 `````{tabs}
 ````{group-tab} String-based Prompt
 ```python
 prompt_template=dict(
    type=PromptTemplate,
    template=dict(
        "A": "Question: Which is true?\nA. {A}\nB. {B}\nC. {C}\nAnswer: A",
        "B": "Question: Which is true?\nA. {A}\nB. {B}\nC. {C}\nAnswer: B",
        "C": "Question: Which is true?\nA. {A}\nB. {B}\nC. {C}\nAnswer: C",
        "UNK": "Question: Which is true?\nA. {A}\nB. {B}\nC. {C}\nAnswer: None of them is true.",
    )
 )
 ```
 ````
 ````{group-tab} Dialogue-Based Prompt
 ```python
 prompt_template=dict(
    type=PromptTemplate,
    template=dict(
        "A": dict(
            round=[
                dict(role="HUMAN", prompt="Question: Which is true?\nA. {A}\nB. {B}\nC. {C}"),
                dict(role="BOT", prompt="Answer: A"),
            ]
        ),
        "B": dict(
            round=[
                dict(role="HUMAN", prompt="Question: Which is true?\nA. {A}\nB. {B}\nC. {C}"),
                dict(role="BOT", prompt="Answer: B"),
            ]
        ),
        "C": dict(
            round=[
                dict(role="HUMAN", prompt="Question: Which is true?\nA. {A}\nB. {B}\nC. {C}"),
                dict(role="BOT", prompt="Answer: C"),
            ]
        ),
        "UNK": dict(
            round=[
                dict(role="HUMAN", prompt="Question: Which is true?\nA. {A}\nB. {B}\nC. {C}"),
                dict(role="BOT", prompt="Answer: None of them is true."),
            ]
        ),
    )
 )
 ```
 ````
 `````
 In this case, the model's inference result will be one of the four keys in the `template` ("A" / "B" / "C" / "UNK").
 ## `ice_template` and `prompt_template`
 In OpenCompass, for 0-shot evaluation, we usually only need to define the `prompt_template` field to complete prompt construction. However, for few-shot evaluation, we also need to define the `ice_template` field, which manages the prompt templates corresponding to the in-context examples during context learning.
 Both `ice_template` and `prompt_template` follow the same syntax and rules. The complete prompt construction process can be represented using the following pseudo-code:
 ```python
 def build_prompt():
    ice = ice_template.format(*ice_example)
    prompt = prompt_template.replace(prompt_template.ice_token, ice).format(*prompt_example)
    return prompt
 ```
 Now, let's assume there are two training data (ex1, ex2) and one testing data (ex3):
 ```python
 ex1 = {
    'question': '2+2=?',
    'answer': '4',
    'irrelavent_infos': 'blabla',
 }
 ex2 = {
    'question': '3+3=?',
    'answer': '6',
    'irrelavent_infos': 'blabla',
 }
 ex3 = {
    'question': '1+1=?',
    'answer': '2',  # Assume the answer is in the reader_cfg.output_column
    'irrelavent_infos': 'blabla',
 }
 ```
 Next, let's take a look at the actual effects of different prompt construction methods:
 `````{tabs}
 ````{group-tab} String-based Prompt
 Template configurations are as follows:
 ```python
 infer_cfg=dict(
    ice_template=dict(
        type=PromptTemplate,
        template='{question}\n{answer}'
    ),
    prompt_template=dict(
        type=PromptTemplate,
        template='Solve the following questions.\n</E>{question}\n{answer}'
        ice_token='</E>',
    )
 )
 ```
 The resulting strings are as follows:
 ```text
 Solve the following questions.
 2+2=?
 4
 3+3=?
 6
 1+1=?
 ```
 ````
 ````{group-tab} Dialogue-Based Prompt
 Template configurations are as follows:
 ```python
 infer_cfg=dict(
    ice_template=dict(
        type=PromptTemplate,
        template=dict(
            round=[
                dict(role="HUMAN", prompt="{question}"),
                dict(role="BOT", prompt="{answer}"),
            ]
        )
    ),
    prompt_template=dict(
        type=PromptTemplate,
        template=dict(
            begin=[
                dict(role='SYSTEM', fallback_role='HUMAN', prompt='Solve the following questions.'),
                '</E>',
            ],
            round=[
                dict(role="HUMAN", prompt="{question}"),
                dict(role="BOT", prompt="{answer}"),
            ],
        ),
        ice_token='</E>',
    )
 )
 ```
 The intermediate results obtained by OpenCompass after filling the data into the templates are as follows:
 ```python
 PromptList([
    dict(role='SYSTEM', fallback_role='HUMAN', prompt='Solve the following questions.'),
    dict(role='HUMAN', prompt='2+2=?'),
    dict(role='BOT', prompt='4'),
    dict(role='HUMAN', prompt='3+3=?'),
    dict(role='BOT', prompt='6'),
    dict(role='HUMAN', prompt='1+1=?'),
    dict(role='BOT', prompt=''),
 ])
 ```
 ````
 `````
 ### Abbreviated Usage
 It is worth noting that, for the sake of simplicity in the configuration file, the `prompt_template` field can be omitted. When the `prompt_template` field is omitted, the `ice_template` will be used as the `prompt_template` as well, to assemble the complete prompt. The following two `infer_cfg` configurations are equivalent:
 <table class="docutils">
  <thead>
  <tr>
      <th>Complete Form</th>
      <th>Abbreviated Form</th>
  <tbody>
  <tr>
  <td>
 ```python
 infer_cfg=dict(
    ice_template=dict(
        type=PromptTemplate,
        template="Q: {question}\nA: {answer}",
    ),
    prompt_template=dict(
        type=PromptTemplate,
        template="</E>Q: {question}\nA: {answer}",
        ice_token="</E>",
    ),
    # ...
 )
 ```
 </td>
  <td>
 ```python
 infer_cfg=dict(
    ice_template=dict(
        type=PromptTemplate,
        template="</E>Q: {question}\nA: {answer}",
        ice_token="</E>",
    ),
    # ...
 )
 ```
 </td>
  </tr>
  </thead>
  </table>
 More generally, even in the case of 0-shot learning (i.e., when `retriever` is `ZeroRetriver`), this mechanism still applies. Therefore, the following configuration is also valid:
 ```python
 datasets = [
    dict(
        infer_cfg=dict(
            ice_template=dict(
                type=PromptTemplate,
                template="Q: {question}\nA: {answer}",
            ),
            retriever=dict(type=ZeroRetriever),
            inferencer=dict(type=GenInferencer),
        )
    ),
 ]
 ```
 ## Usage Suggestion
 It is suggested to use the [Prompt Viewer](../tools.md) tool to visualize the completed prompts, confirm the correctness of the templates, and ensure that the results meet expectations.
--- a/docs/zh_cn/prompt/few_shot.md
+++ b/docs/zh_cn/prompt/few_shot.md
@ -1,3 +0,0 @@
 # Few-shot
 Coming soon.
--- a/docs/zh_cn/prompt/meta_template.md
+++ b/docs/zh_cn/prompt/meta_template.md
@ -26,8 +26,7 @@ models = [
 ]
 ```
-接下来，我们会介绍 Meta Template 在两种模型上的配置方法。
+接下来，我们会介绍 Meta Template 在两种模型上的配置方法。建议读者在阅读本章前，先了解[对话式模板](./prompt_template.md#对话式-prompt)的基本语法。
 本文主要介绍 meta prompt 的用法。如果需要调试 prompt，建议在准备好配置文件后，使用 `tools/prompt_viewer.py` 脚本预览模型实际接收到的 prompt。阅读[这里](../tools.md#prompt-viewer)了解更多
 ```{note}
 在某些情况下（例如对基座的测试），我们并不需要在正常对话中注入任何的指令，此时我们可以将 meta template 置空。在这种情况下，模型接收到的 prompt 仅由数据集配置定义，是一个普通的字符串。若数据集配置使用的是对话式模板，不同角色的发言将会由 \n 拼接而成。
@ -41,13 +40,15 @@ models = [
 我们将会结合几个例子讲解 meta template 的定义方式。
-假设根据数据集的对话式模板，产生了这么一段对话：
+假设根据数据集的对话式模板，产生了下面的 PromptList：
-```Plain
+```python
-HUMAN: 1+1=?
+PromptList([
-BOT: 2
+    dict(role='HUMAN', prompt='1+1=?'),
-HUMAN: 2+2=?
+    dict(role='BOT', prompt='2'),
-BOT: 4
+    dict(role='HUMAN', prompt='2+2=?'),
    dict(role='BOT', prompt='4'),
 ])
 ```
 我们希望把这段对话传到一个已经经过 SFT 的模型。模型约定的对话中不同的角色的发言以`<角色名>:`开头，并固定以一个特殊 token 和 \\n 结尾。以下是模型期望接收到的完整字符串：
@ -75,12 +76,14 @@ ______________________________________________________________________
 有的数据集中可能会引入 SYSTEM 级别的角色：
-```Plain
+```python
-SYSTEM: Solve the following math questions
+PromptList([
-HUMAN: 1+1=?
+    dict(role='SYSTEM', fallback_role='HUMAN', prompt='Solve the following math questions'),
-BOT: 2
+    dict(role='HUMAN', prompt='1+1=?'),
-HUMAN: 2+2=?
+    dict(role='BOT', prompt='2'),
-BOT: 4
+    dict(role='HUMAN', prompt='2+2=?'),
    dict(role='BOT', prompt='4'),
 ])
 ```
 假设模型同样接受 SYSTEM 这个角色，且期望输入为：
@ -254,3 +257,7 @@ meta_template=dict(
 据此 OpenCompass 为 API 模型预设了三个 `api_role`：`HUMAN`, `BOT`, `SYSTEM`，同时约定 API 模型接受的输入除了普通字符串外，还有一种以 `PromptList` 结构表示对话的中间格式。API 模型会将对话重新以多轮对话格式打包，发送至后端。但要激活此功能，需要用户使用上面的 meta template 中把数据集 prompt 模板中的角色 `role` 映射到对应的 `api_role` 中。下图展示了 API 模型接受的输入与 Prompt Template 、Meta Template 之间的关系。
 ![](https://user-images.githubusercontent.com/22607038/251195872-63aa7d30-045a-4837-84b5-11b09f07fb18.png)
 ## 调试
 如果需要调试 prompt，建议在准备好配置文件后，使用 `tools/prompt_viewer.py` 脚本预览模型实际接收到的 prompt。阅读[这里](../tools.md#prompt-viewer)了解更多。
--- a/docs/zh_cn/prompt/overview.md
+++ b/docs/zh_cn/prompt/overview.md
@ -1 +1,9 @@
 # Prompt 概括
 提示词 (prompt) 是 LLM 的输入，用于让 LLM 往后续写内容或计算困惑度 (ppl)，提示词的选取会对被评测模型的精度产生重大影响。如何将数据集转换为一系列的提示词的过程是由模板 (template) 来定义的。
 在 OpenCompass 中，我们将 template 拆分为两部分：数据侧的 template 和模型侧的 template。在测评模型时，数据会先后经过数据和模型侧的 template，最终转化为模型所需的输入。
 数据侧的 template 被称为 [prompt_template](./prompt_template.md)，它表示了把数据集的字段转化成提示词的过程。
 模型侧的 template 被称为 [meta_template](./meta_template.md)，它表示了模型将这些提示词转化为自身期望的输入的过程。
--- a/docs/zh_cn/prompt/prompt_template.md
+++ b/docs/zh_cn/prompt/prompt_template.md
@ -1,3 +1,497 @@
 # Prompt 模板
-Coming soon.
+## 背景
 在语言模型的评测中，我们常会将原始数据集以一定的规则构造成 prompt，以便模型能够按照要求回答问题。
 通常，我们会在 prompt 开头放入指令，几个 in-context example（上下文样例），再在最后放入题目。例如：
 ```text
 Solve the following questions.
 1+1=?
 2
 3+9=?
 12
 5+6=?
 ```
 大量的实验表明，即便测试的原始题目相同，对于 prompt 的不同构造方式会对模型的表现产生影响。可能影响的因素包括：
 - Prompt 本身的构成方式，包括指令、in-context example、题目的写法；
 - in-context example 的选择，包括了选择的数量和方式；
 - 对 prompt 的使用方式。是让模型基于 prompt 进行补全，还是从候选的 prompt 中选择一个最好的作为答案？
 OpenCompass 将 prompt 的构建策略定义在了数据集配置中的 `infer_cfg` 部分。一个典型的 `infer_cfg` 如下所示:
 ```python
 infer_cfg=dict(
    ice_template=dict(  # 用于构造 In Context Example (ice) 的模板
        type=PromptTemplate,
        template='{question}\n{answer}'
    ),
    prompt_template=dict(  # 用于构造主干 prompt 的模板
        type=PromptTemplate,
        template='Solve the following questions.\n</E>{question}\n{answer}',
        ice_token="</E>"
    ),
    retriever=dict(type=FixKRetriever),  # 定义 in context example 的获取方式
    inferencer=dict(type=GenInferencer, fix_id_list=[0, 1]),  # 使用何种方式推理得到 prediction
 )
 ```
 本文档中，我们将会主要介绍 `ice_template`、`prompt_template`、`inferencer` 的定义方法。对于 `retriever` 的介绍请参考其他章节。
 我们首先介绍 prompt 的基本语法。
 ## 字符串式 prompt
 字符串式的模板是比较经典的模板形式，考虑下面的模板：
 ```python
 prompt_template=dict(
    type=PromptTemplate,
    template="{anything}\nQuestion: {question}\nAnswer: {answer}"
 )
 ```
 运行时，花括号`{}`内的字段会被替换成数据样本内的对应字段。如果数据样本中没有对应的字段，则会保持原样输出。
 例如我们有一个数据 example 如下:
 ```python
 example = {
    'question': '1+1=?',
    'answer': '2',  # 假设 answer 被写在了 reader_cfg.output_column 中
    'irrelavent_infos': 'blabla',
 }
 ```
 则填入模板后的结果为：
 ```text
 {anything}
 Question: 1+1=?
 Answer:
 ```
 可以看到，问题的实际答案 `answer` 并没有出现在生成的结果中。这是因为 OpenCompass 会遮盖被写在 `reader_cfg.output_column` 中的字段，避免答案泄露。关于 `reader_cfg` 的详细说明，请参考介绍数据集配置的相关文档。
 ## 对话式 prompt
 在实际的测试中，简单的补全式测试并不能很好地测试出对话式的模型的性能，因此我们更希望 prompt 能以对话的格式输入到模型中。另外，不同的模型对对话的格式定义也不一样，因此我们也需要数据集侧产生的 prompt 更加通用，在测试时再结合具体模型生成符合需求的提示词。
 因此，OpenCompass 在字符串式模板之上，增加了对对话式模板的支持。对话式模板更加灵活，它可以结合模型侧不同的 [meta_template](./meta_template.md) 生成不同对话形式的提示词，同时适用于基座和对话模型，但定义也相对复杂。
 现在，让我们假设有一个数据样本如下：
 ```python
 example = {
    'question': '1+1=?',
    'answer': '2',  # 假设 answer 被写在了 reader_cfg.output_column 中
    'irrelavent_infos': 'blabla',
 }
 ```
 接下来，我们来展示几个例子：
 `````{tabs}
 ````{tab} 普通对话
 ```python
 prompt_template=dict(
    type=PromptTemplate,
    template=dict(
        round=[
            dict(role="HUMAN", prompt="Question: {question}"),
            dict(role="BOT", prompt="Answer: {answer}"),
        ]
    )
 )
 ```
 OpenCompass 把数据填入模板后得到的中间结果为：
 ```python
 PromptList([
    dict(role='HUMAN', prompt='Question: 1+1=?'),
    dict(role='BOT', prompt='Answer: '),
 ])
 ```
 ````
 ````{tab} 多轮对话
 ```python
 prompt_template=dict(
    type=PromptTemplate,
    template=dict(
        round=[
            dict(role="HUMAN", prompt="Question: 2+2=?"),
            dict(role="BOT", prompt="Answer: 4"),
            dict(role="HUMAN", prompt="Question: 3+3=?"),
            dict(role="BOT", prompt="Answer: 6"),
            dict(role="HUMAN", prompt="Question: {question}"),
            dict(role="BOT", prompt="Answer: {answer}"),
        ]
    )
 )
 ```
 OpenCompass 把数据填入模板后得到的中间结果为：
 ```python
 PromptList([
    dict(role='HUMAN', prompt='Question: 2+2=?'),
    dict(role='BOT', prompt='Answer: 4'),
    dict(role='HUMAN', prompt='Question: 3+3=?'),
    dict(role='BOT', prompt='Answer: 6'),
    dict(role='HUMAN', prompt='Question: 1+1=?'),
    dict(role='BOT', prompt='Answer: '),
 ])
 ```
 ````
 ````{tab} 带 SYSTEM 的对话
 ```python
 prompt_template=dict(
    type=PromptTemplate,
    template=dict(
        begin=[
            dict(role='SYSTEM', fallback_role='HUMAN', prompt='Solve the following questions.'),
        ],
        round=[
            dict(role="HUMAN", prompt="Question: {question}"),
            dict(role="BOT", prompt="Answer: {answer}"),
        ]
    )
 )
 ```
 OpenCompass 把数据填入模板后得到的中间结果为：
 ```python
 PromptList([
    dict(role='SYSTEM', fallback_role='HUMAN', prompt='Solve the following questions.'),
    dict(role='HUMAN', prompt='Question: 1+1=?'),
    dict(role='BOT', prompt='Answer: '),
 ])
 ```
 在具体的 meta template 中处理时，如果定义中存在 SYSTEM 角色，则会调用 SYSTEM 的模板进行处理。否则，会调用 fallback_role 角色的模板进行处理，也就是这个例子中的 HUMAN 角色。
 ````
 `````
 可以见到，在对话式的模板中，prompt 是以不同角色 `role` 的对话为形式进行组织的。在当前 OpenCompass 的预定义数据集配置中，一个 prompt 中常有的角色有：
 - `HUMAN`：人类，通常为提问的一方
 - `BOT`：语言模型，通常为回答的一方
 - `SYSTEM`：系统，通常用在提示词的开头，负责下达指令。
 另外与字符串式的模板不同，经过对话式模板所生成的 prompt 从固定的字符串变成了一个中间结构 PromptList。这个结构会进一步与模型侧的 [meta template](./meta_template.md) 相结合，拼装完成得到最终的提示词。如果不指定 meta template，PromptList 中各项的 prompt 则会直接按行拼接成字符串。
 ```{note}
 上面例子中 PromptList 中的内容并非模型最终的输入，而取决于 meta template 的处理。一个容易产生误解的地方是，在生成式的评测中，最后一个 `BOT` 角色的 prompt `Answer: ` **不会**实际输入到模型。这是由于 API 模型通常并无法自定义模型回复的开头，因此这一设定保持了语言模型与 API 模型在评测上行为的一致。更多信息可以参考 [meta template](./meta_template.md) 的文档。
 ```
 <details>
 <summary>点击查看完整参数介绍</summary>
 - `begin`，`end` ：(list，可选) prompt 的开头和结尾，通常是一些系统级别的指令。里面的每一项**允许是一个字典或字符串**。
 - `round`：(list) 对话的模板格式。列表的每一项**只允许是一个字典**。
 每一个字典的参数如下：
 - `role`（str）: 参与对话的角色名，用于与 `meta_template` 中的名称进行关联，不会影响实际生成的 prompt。
 - `fallback_role` (str) : 缺省角色名，假设 `meta_template` 中找不到 `role`，则会尝试使用 `fallback_role` 进行关联。默认为 `None`
 - `prompt` (str) : 角色的对话内容。
 </details>
 ## Prompt 模板 与 `inferencer`
 在明白了 prompt 模板的基础定义方式后，我们还要根据 `inferencer` 的类型组织 prompt 模板。
 OpenCompass 中主要支持了两种 Infernecer：`GenInferencer` 和 `PPLInferencer`，它们对应着两种不同的推理方式。
 `GenInferencer` 对应生成式的推理。在推理时，模型被要求以输入的提示词为基准，继续往下续写。此时，`template` 则单一地表示这一句话对应的模板，例如:
 `````{tabs}
 ````{group-tab} 字符串式模板
 ```python
 prompt_template=dict(
    type=PromptTemplate,
    template='Solve the following questions.\n{question}\n{answer}'
 )
 ```
 ````
 ````{group-tab} 对话式模板
 ```python
 prompt_template=dict(
    type=PromptTemplate,
    template=dict(
        begin=[
            dict(role='SYSTEM', fallback_role='HUMAN', prompt='Solve the following questions.'),
        ],
        round=[
            dict(role="HUMAN", prompt="{question}"),
            dict(role="BOT", prompt="{answer}"),
        ]
    )
 )
 ```
 ````
 `````
 则模型的推理结果将会是往下续写的字符串。
 而 `PPLInferencer` 对应判别式推理。在推理时，模型被要求计算多个输入字符串各自的混淆度 (PerPLexity / ppl)，并将其中 ppl 最小的项作为模型的推理结果。此时 `template` 是一个 `dict`，表示每一句话所对应的模板，例如:
 `````{tabs}
 ````{group-tab} 字符串式模板
 ```python
 prompt_template=dict(
    type=PromptTemplate,
    template=dict(
        "A": "Question: Which is true?\nA. {A}\nB. {B}\nC. {C}\nAnswer: A",
        "B": "Question: Which is true?\nA. {A}\nB. {B}\nC. {C}\nAnswer: B",
        "C": "Question: Which is true?\nA. {A}\nB. {B}\nC. {C}\nAnswer: C",
        "UNK": "Question: Which is true?\nA. {A}\nB. {B}\nC. {C}\nAnswer: None of them is true.",
    )
 )
 ```
 ````
 ````{group-tab} 对话式模板
 ```python
 prompt_template=dict(
    type=PromptTemplate,
    template=dict(
        "A": dict(
            round=[
                dict(role="HUMAN", prompt="Question: Which is true?\nA. {A}\nB. {B}\nC. {C}"),
                dict(role="BOT", prompt="Answer: A"),
            ]
        ),
        "B": dict(
            round=[
                dict(role="HUMAN", prompt="Question: Which is true?\nA. {A}\nB. {B}\nC. {C}"),
                dict(role="BOT", prompt="Answer: B"),
            ]
        ),
        "C": dict(
            round=[
                dict(role="HUMAN", prompt="Question: Which is true?\nA. {A}\nB. {B}\nC. {C}"),
                dict(role="BOT", prompt="Answer: C"),
            ]
        ),
        "UNK": dict(
            round=[
                dict(role="HUMAN", prompt="Question: Which is true?\nA. {A}\nB. {B}\nC. {C}"),
                dict(role="BOT", prompt="Answer: None of them is true."),
            ]
        ),
    )
 )
 ```
 ````
 `````
 此时模型的推理结果将会是 `template` 的四个 key 之一 ("A" / "B" / "C" / "UNK")
 ## `ice_template` 与 `prompt_template`
 在 OpenCompass 中，对于 0-shot 的评测，我们通常只需要定义 `prompt_template` 字段，即可完成 prompt 的构造。但对于 few shot 的评测，我们还需要定义 `ice_template` 字段，管理上下文学习中样例所对应的 prompt 模板。
 `ice_template` 和 `prompt_template` 两者遵循的语法和规则一致，完整 prompt 的构造流程可以使用如下的伪代码进行表示：
 ```python
 def build_prompt():
    ice = ice_template.format(*ice_example)
    prompt = prompt_template.replace(prompt_template.ice_token, ice).format(*prompt_example)
    return prompt
 ```
 现在，让我们假设有两个训练数据 (ex1, ex2) 和一个测试数据 (ex3):
 ```python
 ex1 = {
    'question': '2+2=?',
    'answer': '4',
    'irrelavent_infos': 'blabla',
 }
 ex2 = {
    'question': '3+3=?',
    'answer': '6',
    'irrelavent_infos': 'blabla',
 }
 ex3 = {
    'question': '1+1=?',
    'answer': '2',  # 假设 answer 被写在了 reader_cfg.output_column 中
    'irrelavent_infos': 'blabla',
 }
 ```
 接下来，我们看一下不同的 prompt 构造方法对应的实际效果：
 `````{tabs}
 ````{group-tab} 字符串式模板
 模板配置如下：
 ```python
 infer_cfg=dict(
    ice_template=dict(
        type=PromptTemplate,
        template='{question}\n{answer}'
    ),
    prompt_template=dict(
        type=PromptTemplate,
        template='Solve the following questions.\n</E>{question}\n{answer}'
        ice_token='</E>',
    )
 )
 ```
 会得到以下字符串：
 ```text
 Solve the following questions.
 2+2=?
 4
 3+3=?
 6
 1+1=?
 ```
 ````
 ````{group-tab} 对话式模板
 模板配置如下：
 ```python
 infer_cfg=dict(
    ice_template=dict(
        type=PromptTemplate,
        template=dict(
            round=[
                dict(role="HUMAN", prompt="{question}"),
                dict(role="BOT", prompt="{answer}"),
            ]
        )
    ),
    prompt_template=dict(
        type=PromptTemplate,
        template=dict(
            begin=[
                dict(role='SYSTEM', fallback_role='HUMAN', prompt='Solve the following questions.'),
                '</E>',
            ],
            round=[
                dict(role="HUMAN", prompt="{question}"),
                dict(role="BOT", prompt="{answer}"),
            ],
        ),
        ice_token='</E>',
    )
 )
 ```
 OpenCompass 把数据填入模板后得到的中间结果为：
 ```python
 PromptList([
    dict(role='SYSTEM', fallback_role='HUMAN', prompt='Solve the following questions.'),
    dict(role='HUMAN', prompt='2+2=?'),
    dict(role='BOT', prompt='4'),
    dict(role='HUMAN', prompt='3+3=?'),
    dict(role='BOT', prompt='6'),
    dict(role='HUMAN', prompt='1+1=?'),
    dict(role='BOT', prompt=''),
 ])
 ```
 ````
 `````
 ### 省略式使用方法
 值得一提的是，为了简便配置文件，`prompt_template` 这一字段是可被省略的。当 `prompt_template` 字段被省略时，`ice_template` 会同时被作为 `prompt_template`，用于拼装得到完整的 prompt。以下两份 `infer_cfg` 是等价的：
 <table class="docutils">
  <thead>
  <tr>
      <th>完整写法</th>
      <th>省略写法</th>
  <tbody>
  <tr>
  <td>
 ```python
 infer_cfg=dict(
    ice_template=dict(
        type=PromptTemplate,
        template="Q: {question}\nA: {answer}",
    ),
    prompt_template=dict(
        type=PromptTemplate,
        template="</E>Q: {question}\nA: {answer}",
        ice_token="</E>",
    ),
    # ...
 )
 ```
 </td>
  <td>
 ```python
 infer_cfg=dict(
    ice_template=dict(
        type=PromptTemplate,
        template="</E>Q: {question}\nA: {answer}",
        ice_token="</E>",
    ),
    # ...
 )
 ```
 </td>
  </tr>
  </thead>
  </table>
 更一般地，即便在 0-shot learning 的情况下（即 `retriever` 为 `ZeroRetriver`）时，这一机制依然生效。因此以下配置也是合法的：
 ```python
 datasets = [
    dict(
        infer_cfg=dict(
            ice_template=dict(
                type=PromptTemplate,
                template="Q: {question}\nA: {answer}",
            ),
            retriever=dict(type=ZeroRetriever),
            inferencer=dict(type=GenInferencer),
        )
    ),
 ]
 ```
 ## 使用建议
 建议使用 [Prompt Viewer](../tools.md) 工具对完成拼装后的 prompt 进行可视化，确认模板是否正确，结果是否符合预期。