Add docs (#8)

* Add docs * update * update
2025-05-30 16:03:24 +08:00 · 2023-07-06 12:58:58 +08:00 · 2023-07-06 12:58:58 +08:00 · 18ace3d549
commit 18ace3d549
parent 7f8eee4725
7 changed files with 561 additions and 154 deletions
--- a/configs/eval_demo.py
+++ b/configs/eval_demo.py
@ -1,3 +1,4 @@
+from mmengine.config import read_base
 from opencompass.models import HuggingFaceCausalLM

 from mmengine.config import read_base
--- a/docs/en/prompt/meta_template.md
+++ b/docs/en/prompt/meta_template.md
@ -1,3 +1,255 @@
-# Meta-Prompt
+# Meta Template

-Coming soon.
+## Background
+
+In the Supervised Fine-Tuning (SFT) process of Language Model Learning (LLM), we often inject some predefined strings into the conversation according to actual requirements, in order to prompt the model to output content according to certain guidelines. For example, in some `chat` model fine-tuning, we may add system-level instructions at the beginning of each dialogue, and establish a format to represent the conversation between the user and the model. In a conversation, the model may expect the text format to be as follows:
+
+```bash
+Meta instruction: You are now a helpful and harmless AI assistant.
+HUMAN: Hi!<eoh>\n
+Bot: Hello! How may I assist you?<eob>\n
+```
+
+During evaluation, we also need to enter questions according to the agreed format for the model to perform its best.
+
+In addition, similar situations exist in API models. General API dialogue models allow users to pass in historical dialogues when calling, and some models also allow the input of SYSTEM level instructions. To better evaluate the ability of API models, we hope to make the data as close as possible to the multi-round dialogue template of the API model itself during the evaluation, rather than stuffing all the content into an instruction.
+
+Therefore, we need to specify different parsing templates for different models. In OpenCompass, we call this set of parsing templates **Meta Template**. Meta Template is tied to the model's configuration and is combined with the dialogue template of the dataset during runtime to ultimately generate the most suitable prompt for the current model.
+
+```python
+# When specifying, just pass the meta_template field into the model
+models = [
+    dict(
+        type='AnyModel',
+        meta_template = ...,  # meta template
+    )
+]
+```
+
+Next, we will introduce how to configure Meta Template on two types of models.
+This article mainly introduces the usage of meta prompt. If you need to debug the prompt, it is recommended to use the `tools/prompt_viewer.py` script to preview the actual prompt received by the model after preparing the configuration file. Read [here](../tools.md#prompt-viewer) for more.
+
+```{note}
+In some cases (such as testing the base station), we don't need to inject any instructions into the normal dialogue, in which case we can leave the meta template empty. In this case, the prompt received by the model is defined only by the dataset configuration and is a regular string. If the dataset configuration uses a dialogue template, speeches from different roles will be concatenated with \n.
+```
+
+## Application on Language Models
+
+The following figure shows several situations where the data is built into a prompt through the prompt template and meta template from the dataset in the case of 2-shot learning. Readers can use this figure as a reference to help understand the following sections.
+
+![](https://user-images.githubusercontent.com/22607038/251195073-85808807-6359-44df-8a19-9f5d00c591ec.png)
+
+We will explain how to define the meta template with several examples.
+
+Suppose that according to the dialogue template of the dataset, the following dialogue was produced:
+
+```Plain
+HUMAN: 1+1=?
+BOT: 2
+HUMAN: 2+2=?
+BOT: 4
+```
+
+We want to pass this dialogue to a model that has already gone through SFT. The model's agreed dialogue begins with the speech of different roles with `<Role Name>:` and ends with a special token and \\n. Here is the complete string the model expects to receive:
+
+```Plain
+<HUMAN>: 1+1=?<eoh>
+<BOT>: 2<eob>
+<HUMAN>: 2+2=?<eoh>
+<BOT>: 4<eob>
+```
+
+In the meta template, we only need to abstract the format of each round of dialogue into the following configuration:
+
+```python
+# model meta template
+meta_template = dict(
+    round=[
+          dict(role='HUMAN', begin='<HUMAN>: ', end='<eoh>\n'),
+          dict(role='BOT', begin='<BOT>: ', end='<eob>\n'),
+    ],
+ )
+```
+
+______________________________________________________________________
+
+Some datasets may introduce SYSTEM-level roles:
+
+```
+SYSTEM: Solve the following math questions
+HUMAN: 1+1=?
+BOT: 2
+HUMAN: 2+2=?
+BOT: 4
+```
+
+Assuming the model also accepts the SYSTEM role, and expects the input to be:
+
+```
+<SYSTEM>: Solve the following math questions<eosys>\n
+<HUMAN>: 1+1=?<eoh>\n
+<BOT>: 2<eob>\n
+<HUMAN>: 2+2=?<eoh>\n
+<BOT>: 4<eob>\n
+end of conversation
+```
+
+We can put the definition of the SYSTEM role into `reserved_roles`. Roles in `reserved_roles` will not appear in regular conversations, but they allow the dialogue template of the dataset configuration to call them in `begin` or `end`.
+
+```python
+# model meta template
+meta_template = dict(
+    round=[
+          dict(role='HUMAN', begin='<HUMAN>: ', end='<eoh>\n'),
+          dict(role='BOT', begin='<BOT>: ', end='<eob>\n'),
+    ],
+    reserved_roles=[dict(role='SYSTEM', begin='<SYSTEM>: ', end='<eosys>\n'),],
+ ),
+```
+
+If the model does not accept the SYSTEM role, it is not necessary to configure this item, and it can still run normally. In this case, the string received by the model becomes:
+
+```
+<HUMAN>: Solve the following math questions<eoh>\n
+<HUMAN>: 1+1=?<eoh>\n
+<BOT>: 2<eob>\n
+<HUMAN>: 2+2=?<eoh>\n
+<BOT>: 4<eob>\n
+end of conversation
+```
+
+This is because in the predefined datasets in OpenCompass, each `SYSTEM` speech has a `fallback_role='HUMAN'`, that is, if the `SYSTEM` role in the meta template does not exist, the speaker will be switched to the `HUMAN` role.
+
+______________________________________________________________________
+
+Some models may need to consider embedding other strings at the beginning or end of the conversation, such as system instructions:
+
+```
+Meta instruction: You are now a helpful and harmless AI assistant.
+<SYSTEM>: Solve the following math questions<eosys>\n
+<HUMAN>: 1+1=?<eoh>\n
+<BOT>: 2<eob>\n
+<HUMAN>: 2+2=?<eoh>\n
+<BOT>: 4<eob>\n
+end of conversation
+```
+
+In this case, we can specify these strings by specifying the begin and end parameters.
+
+```python
+meta_template = dict(
+    round=[
+          dict(role='HUMAN', begin='<HUMAN>: ', end='<eoh>\n'),
+          dict(role='BOT', begin='<BOT>: ', end='<eob>\n'),
+    ],
+    reserved_roles=[dict(role='SYSTEM', begin='<SYSTEM>: ', end='<eosys>\n'),],
+    begin="Meta instruction: You are now a helpful and harmless AI assistant.",
+    end="end of conversation",
+ ),
+```
+
+______________________________________________________________________
+
+In **generative** task evaluation, we will not directly input the answer to the model, but by truncating the prompt, while retaining the previous text, we leave the answer output by the model blank.
+
+```
+Meta instruction: You are now a helpful and harmless AI assistant.
+<SYSTEM>: Solve the following math questions<eosys>\n
+<HUMAN>: 1+1=?<eoh>\n
+<BOT>: 2<eob>\n
+<HUMAN>: 2+2=?<eoh>\n
+<BOT>:
+```
+
+We only need to set the `generate` field in BOT's configuration to True, and OpenCompass will automatically leave the last utterance of BOT blank:
+
+```python
+# model meta template
+meta_template = dict(
+    round=[
+          dict(role='HUMAN', begin='<HUMAN>: ', end='<eoh>\n'),
+          dict(role='BOT', begin='<BOT>: ', end='<eob>\n', generate=True),
+    ],
+    reserved_roles=[dict(role='SYSTEM', begin='<SYSTEM>: ', end='<eosys>\n'),],
+    begin="Meta instruction: You are now a helpful and harmless AI assistant.",
+    end="end of conversation",
+ ),
+```
+
+Note that `generate` only affects generative inference. When performing discriminative inference, the prompt received by the model is still complete.
+
+### Full Definition
+
+```bash
+models = [
+    dict(meta_template = dict(
+            begin="Meta instruction: You are now a helpful and harmless AI assistant.",
+            round=[
+                    dict(role='HUMAN', begin='HUMAN: ', end='<eoh>\n'),  # begin and end can be a list of strings or integers.
+                    dict(role='THOUGHTS', begin='THOUGHTS: ', end='<eot>\n', prompt='None'), # Here we can set the default prompt, which may be overridden by the specific dataset
+                    dict(role='BOT', begin='BOT: ', generate=True, end='<eob>\n'),
+            ],
+            end="end of conversion",
+            reserved_roles=[dict(role='SYSTEM', begin='SYSTEM: ', end='\n'),],
+            eos_token_id=10000,
+         ),
+     )
+]
+```
+
+The `meta_template` is a dictionary that can contain the following fields:
+
+- `begin`, `end`: (str, optional) The beginning and ending of the prompt, typically some system-level instructions.
+
+- `round`: (list) The template format of each round of dialogue. The content of the prompt for each round of dialogue is controlled by the dialogue template configured in the dataset.
+
+- `reserved_roles`: (list, optional) Specify roles that do not appear in `round` but may be used in the dataset configuration, such as the `SYSTEM` role.
+
+- `eos_token_id`: (int, optional): Specifies the ID of the model's eos token. If not set, it defaults to the eos token id in the tokenizer. Its main role is to trim the output of the model in generative tasks, so it should generally be set to the first token id of the end corresponding to the item with generate=True.
+
+The `round` of the `meta_template` specifies the format of each role's speech in a round of dialogue. It accepts a list of dictionaries, each dictionary's keys are as follows:
+
+- `role` (str): The name of the role participating in the dialogue. This string does not affect the actual prompt.
+
+- `begin`, `end` (str): Specifies the fixed beginning or end when this role speaks.
+
+- `prompt` (str): The role's prompt. It is allowed to leave it blank in the meta template, but in this case, it must be specified in the prompt of the dataset configuration.
+
+- `generate` (bool): When specified as True, this role is the one the model plays. In generation tasks, the prompt received by the model will be cut off at the `begin` of this role, and the remaining content will be filled by the model.
+
+## Application to API Models
+
+The meta template of the API model is similar to the meta template of the general model, but the configuration is simpler. Users can, as per their requirements, directly use one of the two configurations below to evaluate the API model in a multi-turn dialogue manner:
+
+```bash
+# If the API model does not support system instructions
+meta_template=dict(
+    round=[
+        dict(role='HUMAN', api_role='HUMAN'),
+        dict(role='BOT', api_role='BOT', generate=True)
+    ],
+)
+
+# If the API model supports system instructions
+meta_template=dict(
+    round=[
+        dict(role='HUMAN', api_role='HUMAN'),
+        dict(role='BOT', api_role='BOT', generate=True)
+    ],
+    reserved_roles=[
+        dict(role='SYSTEM', api_role='SYSTEM'),
+    ],
+)
+```
+
+### Principle
+
+Even though different API models accept different data structures, there are commonalities overall. Interfaces that accept dialogue history generally allow users to pass in prompts from the following three roles:
+
+- User
+- Robot
+- System (optional)
+
+In this regard, OpenCompass has preset three `api_role` values for API models: `HUMAN`, `BOT`, `SYSTEM`, and stipulates that in addition to regular strings, the input accepted by API models includes a middle format of dialogue represented by `PromptList`. The API model will repackage the dialogue in a multi-turn dialogue format and send it to the backend. However, to activate this feature, users need to map the roles `role` in the dataset prompt template to the corresponding `api_role` in the above meta template. The following figure illustrates the relationship between the input accepted by the API model and the Prompt Template and Meta Template.
+
+![](https://user-images.githubusercontent.com/22607038/251195872-63aa7d30-045a-4837-84b5-11b09f07fb18.png)
--- a/docs/en/tools.md
+++ b/docs/en/tools.md
@ -1 +1,59 @@
-# Userful Tools
+# Useful Tools
+
+## Prompt Viewer
+
+This tool allows you to directly view the generated prompt without starting the full training process. If the passed configuration is only the dataset configuration (such as `configs/datasets/nq/nq_gen.py`), it will display the original prompt defined in the dataset configuration. If it is a complete evaluation configuration (including the model and the dataset), it will display the prompt received by the selected model during operation.
+
+Running method:
+
+```bash
+python tools/prompt_viewer.py CONFIG_PATH [-n] [-a] [-p PATTERN]
+```
+
+- `-n`: Do not enter interactive mode, select the first model (if any) and dataset by default.
+- `-a`: View the prompts received by all models and all dataset combinations in the configuration.
+- `-p PATTERN`: Do not enter interactive mode, select all datasets that match the input regular expression.
+
+## Case Analyzer (To be updated)
+
+Based on existing evaluation results, this tool produces inference error samples and full samples with annotation information.
+
+Running method:
+
+```bash
+python tools/case_analyzer.py CONFIG_PATH [-w WORK_DIR]
+```
+
+- `-w`: Work path, default is `'./outputs/default'`.
+
+## Lark Bot
+
+Users can configure the Lark bot to implement real-time monitoring of task status. Please refer to [this document](https://open.feishu.cn/document/ukTMukTMukTM/ucTM5YjL3ETO24yNxkjN?lang=zh-CN#7a28964d) for setting up the Lark bot.
+
+Configuration method:
+
+- Open the `configs/secrets.py` file, and add the following line to the file:
+
+```python
+lark_bot_url = 'YOUR_WEBHOOK_URL'
+```
+
+- Normally, the Webhook URL format is like https://open.feishu.cn/open-apis/bot/v2/hook/xxxxxxxxxxxxxxxxx .
+
+- Inherit this file in the complete evaluation configuration
+
+- To avoid the bot sending messages frequently and causing disturbance, the running status will not be reported automatically by default. If necessary, you can start status reporting through `-l` or `--lark`:
+
+```bash
+python run.py configs/eval_demo.py -l
+```
+
+## API Model Tester
+
+This tool can quickly test whether the functionality of the API model is normal.
+
+Running method:
+
+```bash
+python tools/test_api_model.py [CONFIG_PATH] -n
+```
--- a/docs/en/user_guides/config.md
+++ b/docs/en/user_guides/config.md
@ -1,11 +1,11 @@
 # Learn About Config

 OpenCompass uses the OpenMMLab modern style configuration files. If you are familiar with the OpenMMLab style
-configuration files, you can directly refer to 
+configuration files, you can directly refer to
 [A Pure Python style Configuration File (Beta)](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#a-pure-python-style-configuration-file-beta)
 to understand the differences between the new-style and original configuration files. If you have not
 encountered OpenMMLab style configuration files before, I will explain the usage of configuration files using
-a simple example. Make sure you have installed the latest version of MMEngine (>=0.8.1) to support the
+a simple example. Make sure you have installed the latest version of MMEngine to support the
 new-style configuration files.

 ## Basic Format
@ -36,8 +36,8 @@ When reading the configuration file, use `Config.fromfile` from MMEngine for par
 ```python
 >>> from mmengine.config import Config
 >>> cfg = Config.fromfile('./model_cfg.py')
->>> print(cfg.models[0].type)
-<class 'opencompass.models.huggingface.HuggingFaceCausalLM'>
+>>> print(cfg.models[0])
+{'type': HuggingFaceCausalLM, 'path': 'huggyllama/llama-7b', 'model_kwargs': {'device_map': 'auto'}, ...}
 ```

 ## Inheritance Mechanism
@ -58,8 +58,8 @@ Parse the configuration file using `Config.fromfile`:
 ```python
 >>> from mmengine.config import Config
 >>> cfg = Config.fromfile('./inherit.py')
->>> print(cfg.models[0].type)
-<class 'opencompass.models.huggingface.HuggingFaceCausalLM'>
+>>> print(cfg.models[0])
+{'type': HuggingFaceCausalLM, 'path': 'huggyllama/llama-7b', 'model_kwargs': {'device_map': 'auto'}, ...}
 ```

 ## Evaluation Configuration Example
--- a/docs/zh_cn/prompt/meta_template.md
+++ b/docs/zh_cn/prompt/meta_template.md
@ -1,163 +1,256 @@
-# Meta Prompt
+# Meta Template

 ## 背景

-在 LLM 的实际 finetune 中，我们常常会根据实际的要求注入一些预定义的字符串，以求模型能按照自然语言的格式输出指定的内容。在评测时，我们也需要按照 finetune 时设定的格式输入问题，模型才能发挥出其最大的性能。因此，我们需要对 OpenICL 原本的 prompt 设计作一次增强，才能满足相应需求。
+在 LLM 的 Supervised Fine-Tuning (SFT) 过程中，我们常常会根据实际的要求往对话内注入一些预定义的字符串，以求模型能按照一定的要求输出内容。例如，在一些 `chat` 模型的微调中，我们可能会在每段对话的开头加入系统层级的指令，并约定一套的格式表示用户与模型之间的对话。在一段对话中，模型期望文本的格式可能如下：

-## Model - Meta Template
+```Bash
+Meta instruction: You are now a helpful and harmless AI assistant.
+HUMAN: Hi!<eoh>\n
+Bot: Hello! How may I assist you?<eob>\n
+```

-此前， prompt template 的设定绑定在数据集配置中。现在考虑到不同模型的 instruction 可能会有所不同，我们往 model config 中新增 `meta_template` 字段，允许用户指定与模型密切相关的 instruction。
+在评测时，我们也需要按照约定的格式输入问题，模型才能发挥出其最大的性能。
+
+此外， API 模型也存在着类似的情况。一般 API 的对话模型都允许用户在调用时传入历史对话，还有些模型也允许传入 SYSTEM 层级的指令。为了更好地评测 API 模型的能力，我们希望在评测 API 模型时可以尽量让数据更贴合 API 模型本身的多轮对话模板，而并非把所有内容塞进一段指令当中。
+
+因此，我们需要针对不同模型指定不同的解析模板。在 OpenCompass 中，我们将这套解析模板其称为 **Meta Template**。Meta Template 与模型的配置相绑定，在运行时与数据集的对话式模板相结合，最终产生最适合当前模型的 prompt。

 ```Python
+# 指定时只需要把 meta_template 字段传入模型
 models = [
-    dict(type='LLM',
-         # ...
-         meta_template = dict(
-            begin="meta instruction\nYou are an AI assistant.\n",
+    dict(
+        type='AnyModel',
+        meta_template = ...,  # meta tmplate
+    )
+]
+```
+
+接下来，我们会介绍 Meta Template 在两种模型上的配置方法。
+本文主要介绍 meta prompt 的用法。如果需要调试 prompt，建议在准备好配置文件后，使用 `tools/prompt_viewer.py` 脚本预览模型实际接收到的 prompt。阅读[这里](../tools.md#prompt-viewer)了解更多
+
+```{note}
+在某些情况下（例如对基座的测试），我们并不需要在正常对话中注入任何的指令，此时我们可以将 meta template 置空。在这种情况下，模型接收到的 prompt 仅由数据集配置定义，是一个普通的字符串。若数据集配置使用的是对话式模板，不同角色的发言将会由 \n 拼接而成。
+```
+
+## 应用在语言模型上
+
+下图展示了在 2-shot learning 的情况下，数据从数据集中经过 prompt template 和 meta template，最终构建出 prompt 的几种情况。读者可以该图为参考，方便理解后续的章节。
+
+![](https://user-images.githubusercontent.com/22607038/251195073-85808807-6359-44df-8a19-9f5d00c591ec.png)
+
+我们将会结合几个例子讲解 meta template 的定义方式。
+
+假设根据数据集的对话式模板，产生了这么一段对话：
+
+```Plain
+HUMAN: 1+1=?
+BOT: 2
+HUMAN: 2+2=?
+BOT: 4
+```
+
+我们希望把这段对话传到一个已经经过 SFT 的模型。模型约定的对话中不同的角色的发言以`<角色名>:`开头，并固定以一个特殊 token 和 \\n 结尾。以下是模型期望接收到的完整字符串：
+
+```Plain
+<HUMAN>: 1+1=?<eoh>
+<BOT>: 2<eob>
+<HUMAN>: 2+2=?<eoh>
+<BOT>: 4<eob>
+```
+
+在 meta template 中，我们只需要把每轮对话的格式抽象为如下配置即可：
+
+```Python
+# model meta template
+meta_template = dict(
+    round=[
+          dict(role='HUMAN', begin='<HUMAN>: ', end='<eoh>\n'),
+          dict(role='BOT', begin='<BOT>: ', end='<eob>\n'),
+    ],
+ )
+```
+
+______________________________________________________________________
+
+有的数据集中可能会引入 SYSTEM 级别的角色：
+
+```Plain
+SYSTEM: Solve the following math questions
+HUMAN: 1+1=?
+BOT: 2
+HUMAN: 2+2=?
+BOT: 4
+```
+
+假设模型同样接受 SYSTEM 这个角色，且期望输入为：
+
+```Bash
+<SYSTEM>: Solve the following math questions<eosys>\n
+<HUMAN>: 1+1=?<eoh>\n
+<BOT>: 2<eob>\n
+<HUMAN>: 2+2=?<eoh>\n
+<BOT>: 4<eob>\n
+end of conversation
+```
+
+我们就可以把 SYSTEM 角色的定义放进 `reserved_roles` 中。`reserved_roles` 中的角色不会在常规对话中出现，但允许数据集配置的对话式模板在 `begin` 或者 `end` 中调用。
+
+```Python
+# model meta template
+meta_template = dict(
+    round=[
+          dict(role='HUMAN', begin='<HUMAN>: ', end='<eoh>\n'),
+          dict(role='BOT', begin='<BOT>: ', end='<eob>\n'),
+    ],
+    reserved_roles=[dict(role='SYSTEM', begin='<SYSTEM>: ', end='<eosys>\n'),],
+ ),
+```
+
+若模型并不接受 SYSTEM 角色，则**不需要**配置此项，也能正常运行。这种情况下，模型会接收到的字符串变成了：
+
+```Python
+<HUMAN>: Solve the following math questions<eoh>\n
+<HUMAN>: 1+1=?<eoh>\n
+<BOT>: 2<eob>\n
+<HUMAN>: 2+2=?<eoh>\n
+<BOT>: 4<eob>\n
+end of conversation
+```
+
+这是因为在 OpenCompass 预定义的数据集中，每个 `SYSTEM` 发言都会有一个 `fallback_role='HUMAN'`，即若 meta template 中的 `SYSTEM` 角色不存在，发言者会被切换至 `HUMAN` 角色。
+
+______________________________________________________________________
+
+有的模型还可能需要考虑在对话开始或结束时嵌入其它字符串，如系统指令：
+
+```Bash
+Meta instruction: You are now a helpful and harmless AI assistant.
+<SYSTEM>: Solve the following math questions<eosys>\n
+<HUMAN>: 1+1=?<eoh>\n
+<BOT>: 2<eob>\n
+<HUMAN>: 2+2=?<eoh>\n
+<BOT>: 4<eob>\n
+end of conversation
+```
+
+此时，我们可以通过指定 `begin` 和 `end` 参数指定这些字符串。
+
+```Python
+meta_template = dict(
+    round=[
+          dict(role='HUMAN', begin='<HUMAN>: ', end='<eoh>\n'),
+          dict(role='BOT', begin='<BOT>: ', end='<eob>\n'),
+    ],
+    reserved_roles=[dict(role='SYSTEM', begin='<SYSTEM>: ', end='<eosys>\n'),],
+    begin="Meta instruction: You are now a helpful and harmless AI assistant.",
+    end="end of conversion",
+ ),
+```
+
+______________________________________________________________________
+
+在**生成式**的任务评测中，我们也不会将答案直接输入模型，而是通过截断 prompt，在保留上文的同时，把模型输出的答案留空。
+
+```Bash
+Meta instruction: You are now a helpful and harmless AI assistant.
+<SYSTEM>: Solve the following math questions<eosys>\n
+<HUMAN>: 1+1=?<eoh>\n
+<BOT>: 2<eob>\n
+<HUMAN>: 2+2=?<eoh>\n
+<BOT>:
+```
+
+我们只需要把 BOT 的配置中把 `generate` 字段置为 True ，OpenCompass 即会将 BOT 的最后一句话留给模型生成：
+
+```Python
+meta_template = dict(
+    round=[
+          dict(role='HUMAN', begin='<HUMAN>: ', end='<eoh>\n'),
+          dict(role='BOT', begin='<BOT>: ', end='<eob>\n', generate=True),
+    ],
+    reserved_roles=[dict(role='SYSTEM', begin='<SYSTEM>: ', end='<eosys>\n'),],
+    begin="Meta instruction: You are now a helpful and harmless AI assistant.",
+    end="end of conversion",
+ ),
+```
+
+需要注意的是，`generate` 仅影响生成式推理。在进行判别式推理时，模型接受到的 prompt 仍然是完整的。
+
+### 全量字段介绍
+
+```Bash
+models = [
+    dict(meta_template = dict(
+            begin="Meta instruction: You are now a helpful and harmless AI assistant.",
            round=[
-                    dict(role='HUMAN', begin='<|HUMAN|>:', end='脷\n'),  # begin and end can be a list of strings or integers.
-                    dict(role='THOUGHTS', begin='<|Inner Thoughts|>:', end='茔\n', prompt='None'),
-                    dict(role='COMMANDS', begin='<|Commands|>:', end='蝮\n', prompt='None'),
-                    dict(role='RESULTS', begin='<|Results|>:', end='兒\n', prompt='None'),  # Here we can set the default prompt, which may be overridden by the speicfic dataset
-                    dict(role='BOT', begin='<|MOSS|>:', generate=True, end='氡\n'),
+                    dict(role='HUMAN', begin='HUMAN: ', end='<eoh>\n'),  # begin and end can be a list of strings or integers.
+                    dict(role='THOUGHTS', begin='THOUGHTS: ', end='<eot>\n', prompt='None'), # Here we can set the default prompt, which may be overridden by the speicfic dataset
+                    dict(role='BOT', begin='BOT: ', generate=True, end='<eob>\n'),
            ],
            end="end of conversion",
-            reserved_roles=[dict(role='SYSTEM', begin='<|SYSTEM|>: ', end='\n'),],
-            # the token to stop the generation tasks (TODO: support string)
-            eos_token_id=65605,
+            reserved_roles=[dict(role='SYSTEM', begin='SYSTEM: ', end='\n'),],
+            eos_token_id=10000,
         ),
     )
 ]
 ```

-这里，meta_template 是一个**字典**，该字典可以包含以下数个字段：
+meta_template 是一个字典，该字典可以包含以下数个字段：

- `begin`，`end` ：(str，可选) prompt 的开头，通常是一些 meta instruction。
+- `begin`，`end` ：(str，可选) prompt 的开头和结尾，通常是一些系统级别的指令。

- `round`：(list，可选) 约定了每一轮对话的 prompt 格式。每轮对话的 prompt 内容由 dataset config 中的 prompt template 控制（下文会详述）。如果不指定，则该字段将会直接被 dataset config 中的 prompt template 替换。
+- `round`：(list) 每一轮对话的模板格式。每轮对话的 prompt 内容由数据集配置的对话式模板控制。

- (str，可选)：收尾的 instruction。
+- `reserved_roles`:（list，可选）指定 `round` 中并未出现，但有可能在数据集配置中用到的的预留角色，例如 `SYSTEM` 角色。

- `reserved_roles` （list，可选）指定了在 meta template 中并未出现的预留角色。这里面定义的角色有可能在 dataset config 的 begin 或 end 中用到，例如 `SYSTEM` 角色。
+- `eos_token_id`:（int, 可选）：指定了该模型的 eos token 的 id。如果不设置，则默认为 tokenizer 中的 eos token id。它的主要作用是在生成式任务中，截取模型的输出结果，因此一般应该被设置为 generate=True 的项所对应的 end 的第一个 token id。

- `eos_token_id` （int, 可选）：指定了该模型在生成式任务中 eos token 的 id。如果不设置，则默认为 tokenizer 中的 eos token id。
+meta_template 的 `round` 指定了一轮对话中每个角色说话的格式，接受一个字典组成的列表，每个字典的关键字如下：

-`round` 指定了每轮对话中每个角色说话的格式，通常接受一个列表，内容可以是 **str 或 dict**。每个字典接受以下关键字：
+- `role`（str）: 参与对话的角色名，该字符串并不影响实际的 prompt。

- `role`（str）: 对话中的角色，也可以认为是这个 prompt 的 identifier。该字符串并不影响实际的 prompt，仅用于在 dataset_config 中的指定对应项，并对其 prompt 内容进行覆盖。
+- `begin`, `end` (str): 指定该角色在说话时的固定开头或结尾。

- `begin`, `end` (str): 指定该角色在说话时的开头或结尾。
+- `prompt` (str)：角色的 prompt。在 meta template 中允许留空，但此时必须在数据集配置的 prompt 中指定。

- `prompt` (str)：prompt 的内容，遵循 `ICLPromptTemplate` 的格式规范。如果在 meta_prompt_template 中未指定，则必须在 dataset config 中的 prompt template 中指定。
+- `generate` (bool): 指定为 True 时，该角色即为模型扮演的角色。在生成任务中，模型接收到的 prompt 会截止到该角色的 `begin` 处，剩下的内容由模型补全。

- `generate` (bool): 指定为 True 时，该角色即为模型在生成任务中开始生成输出的位点。在生成任务中生成对应 prompt 时，prompt template 只会生成到该角色的 begin，剩下的内容由模型补全。
+## 应用在 API 模型上

-在上面的例子中，最后的 meta prompt 将会是：
+API 模型的 meta template 与普通模型的 meta template 类似，但配置更为简单。用户可以根据情况，直接使用下面的两种配置之一，即可以多轮对话的方式评测 API 模型：

-```
-meta instructionYou are an AI assistant.
-<|HUMAN|>: 脷\n
-<|Inner Thoughts|>: None茔\n<|Commands|>: None蝮\n<|Results|>: None兒\n
-<|MOSS|>: 氡\n
-end of conversion
+```Bash
+# 若 API 模型不支持 system 指令
+meta_template=dict(
+    round=[
+        dict(role='HUMAN', api_role='HUMAN'),
+        dict(role='BOT', api_role='BOT', generate=True)
+    ],
+)
+
+# 若 API 模型支持 system 指令
+meta_template=dict(
+    round=[
+        dict(role='HUMAN', api_role='HUMAN'),
+        dict(role='BOT', api_role='BOT', generate=True)
+    ],
+    reserved_roles=[
+        dict(role='SYSTEM', api_role='SYSTEM'),
+    ],
+)
 ```

-特别地，在生成式任务中，prompt 仅会生成到 \<|MOSS|>: 后：
+### 原理

-```
-meta instructionYou are an AI assistant.
-<|HUMAN|>: 脷\n
-<|Inner Thoughts|>: None茔\n<|Commands|>: None蝮\n<|Results|>: None兒\n
-<|MOSS|>:
-```
+尽管不同 API 模型接受的数据结构不一，但总体上不乏共通之处。接受对话历史的接口里通常允许用户传入以下三个角色的 prompt：

-接下来我们在 dataset config 中进行进一步约定。
+- 用户

-## Dataset: Prompt Template
+- 机器人

-在 model 配置中约定了该 model 所需的 meta template 后，dataset 中 prompt template 的格式也会有所变化。同时，该方向尽可能地保持了 prompt 的 backward compatibility。
+- 系统 （可选）

-在改动前，`PromptTemplate` 接受 str 或 dict 作为输入。其中，dict 形式的输入将 label string 映射到对应的 prompt (str)上，通常用作为 `PPLInferencer` 的输入。因而本质上，`PromptTemplate` 的旧版实现里表示 prompt 的方式只有 `str` 一种。
+据此 OpenCompass 为 API 模型预设了三个 `api_role`：`HUMAN`, `BOT`, `SYSTEM`，同时约定 API 模型接受的输入除了普通字符串外，还有一种以 `PromptList` 结构表示对话的中间格式。API 模型会将对话重新以多轮对话格式打包，发送至后端。但要激活此功能，需要用户使用上面的 meta template 中把数据集 prompt 模板中的角色 `role` 映射到对应的 `api_role` 中。下图展示了 API 模型接受的输入与 Prompt Template 、Meta Template 之间的关系。

-而改动后的 prompt template 允许接受的 prompt 基本形式从 str 扩展到了 dict。
-
-这个 dict 的格式与 meta template 相似，用户也可以指定 `begin`, `end` 和 `round` 关键字：
-
-```Python
-mmlu_prompt_template = dict(
-    type='PromptTemplate',
-    template=dict(
-        begin=[dict(role='SYSTEM', fallback_role='HUMAN', prompt='The following are '
-            'multiple choice questions (with answers) about physics.'),
-            '</E>',
-        ],
-        round=[
-            dict(role='HUMAN', prompt='</input>\nA. </A>\nB. </B>\nC. </C>\nD. </D>\nAnswer: '),
-            dict(role='BOT', prompt='</target>'),
-        ],
-        end="end of dataset prompt template."
-    ),
-        column_token_map={
-            'input': '</input>',
-            'A': '</A>',
-            'B': '</B>',
-            'C': '</C>',
-            'D': '</D>',
-            'target': '</target>'
-        },
-        ice_token='</E>',
-    )
-
-```
-
-其中，`round`用于指定在每轮对话中角色的 prompt 格式，同时也是为了呼应和补全 meta template 中的配置，因此，其接受的参数和规则均与 meta template 中的 `round` 一致。**在实际运行时，两处 prompt 的配置将会融合，同时如果某一字段被重复定义，则以 dataset config 中定义为准。**
-
-而 `begin` 和 `end` 则除了支持 str 类型的输入，也支持 list 类型的输入，在其中用户可以通过组合 dict 和字符串实现对系统角色的融合。留意到例子中引入了 `fallback_role` 的设定，意味着若系统在 meta template 中 reserved_roles 中找不到 `role` 中的角色时，会自动替换成 `fallback_role` 中的角色。这个特征的设立是为了尽可能确保 prompt 模板的通用性。
-
-结合 meta template，最终生成的 prompt 模板为：
-
-```Plain
-meta instruction
-You are an AI assistant.
-<|SYSTEM|>: The following are multiple choice questions (with answers) about college biology.
-<|HUMAN|>: Which of the following is NOT a characteristic of an oligotrophic lake?
-A. Low nutrient levels
-B. High altitudes
-C. Shallow water
-D. Sand or gravel bottom
-Answer: 脷\n
-<|Inner Thoughts|>: None茔\n
-<|Commands|>: None蝮\n
-<|Results|>: None兒\n
-<|MOSS|>: A氡\n
-end of dataset prompt template.
-end of conversion
-```
-
-特别地，由于这种 prompt 的数据结构（dict）与旧版的 label -> prompt 映射相同，本实现仅在字典的 keys 为 {`begin`, `round`, `end`} 的子集时将 prompt 的输入以新版规则进行解码，否则依然将字典以 label -> prompt 的形式进行解码。此外，该方案也允许新版 prompt 字典嵌套在旧版的 label -> prompt 字典中。例如，以下表达方式也是合法的 （摘自 `configs/datasets/mmlu.py`）：
-
-```Python
-prompt_template={
-        target:
-        dict(
-            begin=[dict(role='SYSTEM', fallback_role='HUMAN', prompt='The following are '
-                'multiple choice questions (with answers) about '
-                f'{name.replace("_", " ")}.\n'),
-                '</E>',
-            ],
-            round=[
-                dict(role='HUMAN', prompt='</input>\nA. </A>\nB. </B>\nC. </C>\nD. </D>\nAnswer: '),
-                dict(role='BOT', prompt=f'{target}'),
-            ]
-        )
-        for target in ['A', 'B', 'C', 'D']  # use the actual answer
-    }
-```
-
-### 无 meta template 时
-
-为了保证后向兼容性，当用户未在 model config 中指定 meta template 时，`ICLPromptTemplate` 会将每个 dict 按照 `begin`, `prompt`, `end` 的顺序拼接为普通字符串。
-
-### 多轮对话例子
-
-在某些时候，一轮完整的交互中可能需要包含多轮对话。用户可以参考 `configs/datasets/gsm8k.py` 配置自己的模板。
+![](https://user-images.githubusercontent.com/22607038/251195872-63aa7d30-045a-4837-84b5-11b09f07fb18.png)
--- a/docs/zh_cn/tools.md
+++ b/docs/zh_cn/tools.md
@ -2,28 +2,30 @@

 ## Prompt Viewer

-本工具允许你在不启动完整训练流程的情况下，直接查看模型会接收到的 prompt。
+本工具允许你在不启动完整训练流程的情况下，直接查看生成的 prompt。如果传入的配置仅为数据集配置（如 `configs/datasets/nq/nq_gen_3dcea1.py`），则展示数据集配置中定义的原始 prompt。若为完整的评测配置（包含模型和数据集），则会展示所选模型运行时实际接收到的 prompt。

 运行方式：

 ```bash
-python tools/prompt_viewer.py [CONFIG_PATH]
+python tools/prompt_viewer.py CONFIG_PATH [-n] [-a] [-p PATTERN]
 ```

+- `-n`: 不进入交互模式，默认选择第一个 model （如有）和 dataset。
+- `-a`: 查看配置中所有模型和所有数据集组合接收到的 prompt。
+- `-p PATTERN`: 不进入交互模式，选择所有与传入正则表达式匹配的数据集。
+
 ## Case Analyzer

-本工具在已有评测结果的基础上，产出推理错误样本以及带有标注信息的全量样本
+本工具在已有评测结果的基础上，产出推理错误样本以及带有标注信息的全量样本。

 运行方式：

 ```bash
-python tools/case_analyzer.py [CONFIG_PATH] [-w WORK_DIR]
+python tools/case_analyzer.py CONFIG_PATH [-w WORK_DIR]
 ```

 - `-w`：工作路径，默认为 `'./outputs/default'`。

-更多细节见 [飞书文档](https://aicarrier.feishu.cn/docx/SgrLdwinion00Kxkzh2czz29nIh)
-
 ## Lark Bot

 用户可以通过配置飞书机器人，实现任务状态的实时监控。飞书机器人的设置文档请[参考这里](https://open.feishu.cn/document/ukTMukTMukTM/ucTM5YjL3ETO24yNxkjN?lang=zh-CN#7a28964d)。
@ -52,15 +54,15 @@ python tools/case_analyzer.py [CONFIG_PATH] [-w WORK_DIR]
 - 为了避免机器人频繁发消息形成骚扰，默认运行时状态不会自动上报。有需要时，可以通过 `-l` 或 `--lark` 启动状态上报：

  ```bash
-  python run.py configs/eval_demo.py -p {PARTITION} -l
+  python run.py configs/eval_demo.py -l
  ```

-## API Model Tests
+## API Model Tester

-本工具可以快速测试 API Wrapper 的功能是否正常。
+本工具可以快速测试 API 模型的功能是否正常。

 运行方式：

 ```bash
-python tools/test_api_model.py [CONFIG_PATH]
+python tools/test_api_model.py [CONFIG_PATH] -n
 ```
--- a/docs/zh_cn/user_guides/config.md
+++ b/docs/zh_cn/user_guides/config.md
@ -3,7 +3,7 @@
 OpenCompass 使用 OpenMMLab 新式风格的配置文件。如果你之前熟悉 OpenMMLab 风格的配置文件，可以直接阅读
 [纯 Python 风格的配置文件（Beta）](https://mmengine.readthedocs.io/zh_CN/latest/advanced_tutorials/config.html#python-beta)
 了解新式配置文件与原配置文件的区别。如果你之前没有接触过 OpenMMLab 风格的配置文件，
-下面我将会用一个简单的例子来介绍配置文件的使用。请确保你安装了最新版本的 MMEngine (>=0.8.1)，以支持新式风格的配置文件。
+下面我将会用一个简单的例子来介绍配置文件的使用。请确保你安装了最新版本的 MMEngine，以支持新式风格的配置文件。

 ## 基本格式

@ -28,13 +28,13 @@ models = [
 ]
 ```

-当读取配置文件时，使用 MMEngine 中的 `Config.fromfile` 进行解析：
+当读取配置文件时，使用 MMEngine 中的 `Config.fromfile` 进行解析。

 ```python
 >>> from mmengine.config import Config
 >>> cfg = Config.fromfile('./model_cfg.py')
->>> print(cfg.models[0].type)
-<class 'opencompass.models.huggingface.HuggingFaceCausalLM'>
+>>> print(cfg.models[0])
+{'type': HuggingFaceCausalLM, 'path': 'huggyllama/llama-7b', 'model_kwargs': {'device_map': 'auto'}, ...}
 ```

 ## 继承机制
@ -55,8 +55,8 @@ with read_base():
 ```python
 >>> from mmengine.config import Config
 >>> cfg = Config.fromfile('./inherit.py')
->>> print(cfg.models[0].type)
-<class 'opencompass.models.huggingface.HuggingFaceCausalLM'>
+>>> print(cfg.models[0])
+{'type': HuggingFaceCausalLM, 'path': 'huggyllama/llama-7b', 'model_kwargs': {'device_map': 'auto'}, ...}
 ```

 ## 评测配置文件示例
@ -84,7 +84,7 @@ models = [
        tokenizer_path='huggyllama/llama-7b',
        tokenizer_kwargs=dict(padding_side='left', truncation_side='left'),
        max_seq_len=2048,
-        # 以下参数为各类模型都有的参数，非 HuggingFaceCausalLM 的初始化参数
+        # 以下参数为各类模型都必须设定的参数，非 HuggingFaceCausalLM 的初始化参数
        abbr='llama-7b',            # 模型简称，用于结果展示
        max_out_len=100,            # 最长生成 token 数
        batch_size=16,              # 批次大小
@ -96,7 +96,7 @@ models = [
 ## 数据集配置文件示例

 以上示例配置文件中，我们直接以继承的方式获取了数据集相关的配置。接下来，
-我们会以 PIQA 数据集配置文件为示例，展示如何数据集配置文件中各个字段的含义。
+我们会以 PIQA 数据集配置文件为示例，展示数据集配置文件中各个字段的含义。
 如果你不打算修改模型测试的 prompt，或者添加新的数据集，则可以跳过这一节的介绍。

 PIQA 数据集 [配置文件](https://github.com/InternLM/opencompass/blob/main/configs/datasets/piqa/piqa_ppl_1cf9f0.py)
@ -147,6 +147,7 @@ piqa_datasets = [
        reader_cfg=piqa_reader_cfg,
        infer_cfg=piqa_infer_cfg,
        eval_cfg=piqa_eval_cfg)
+]
 ```

 其中 **Prompt 生成配置** 的详细配置方式，可以参见 [Prompt 模板](../prompt/prompt_template.md)。