Docs

2025-05-30 16:03:24 +08:00 · 2023-07-06 13:50:44 +08:00 · 2023-07-06 13:50:44 +08:00 · cbaa1bc8f9
commit cbaa1bc8f9
parent e50ea4f312
2 changed files with 260 additions and 2 deletions
--- a/docs/en/user_guides/evaluation.md
+++ b/docs/en/user_guides/evaluation.md
@ -1 +1,129 @@
-# Evaluation Strategy
+# Efficient Evaluation
 OpenCompass supports custom task partitioners (`Partitioner`), which enable flexible division of evaluation tasks. In conjunction with `Runner`, which controls the platform for task execution, such as a local machine or a cluster, OpenCompass can distribute large evaluation tasks to a vast number of computing nodes. This helps utilize computational resources efficiently and significantly accelerates the evaluation process.
 ## Task Division (Partitioner)
 Due to the long inference time of large language models and the vast amount of evaluation datasets, serial execution of a single evaluation task can be quite time-consuming. OpenCompass allows custom task partitioners (`Partitioner`) to divide large evaluation tasks into numerous independent smaller tasks, thus fully utilizing computational resources via parallel execution. Users can configure the task partitioning strategies for the inference and evaluation stages via `infer.partitioner` and `eval.partitioner`. Below, we will introduce all the partitioning strategies supported by OpenCompass.
 ### `NaivePartitioner`
 This partitioner dispatches each combination of a model and dataset as an independent task. It is the most basic partitioning strategy and does not have any additional parameters.
 ```python
 from opencompass.partitioners import NaivePartitioner
 infer = dict(
    partitioner=dict(type=SizePartitioner)
    # ...
 )
 ```
 ### `SizePartitioner`
 ```{warning}
 This partitioner is not suitable for evaluation stage tasks (OpenEvalTask).
 ```
 This partitioner estimates the inference cost (time) of a dataset according to its size, multiplied by an expansion coefficient. It then creates tasks by splitting larger datasets and merging smaller ones to ensure the inference costs of each sub-task are as equal as possible.
 The commonly used parameters for this partitioner are as follows:
 ```python
 from opencompass.partitioners import SizePartitioner
 infer = dict(
    partitioner=dict(
        type=SizePartitioner,
        max_task_size: int = 2000,  # Maximum length of a single task
        gen_task_coef: int = 20,  # Expansion coefficient for generative tasks
    ),
    # ...
 )
 ```
 `SizePartitioner` estimates the inference cost of a dataset based on the type of the inference task and selects different expansion coefficients accordingly. For generative tasks, such as those using `GenInferencer`, a larger `gen_task_coef` is set; for discriminative tasks, like those using `PPLInferencer`, the number of labels in the prompt is used.
 ```{note}
 Currently, this partitioning strategy is still rather crude and does not accurately reflect the computational difference between generative and discriminative tasks. We look forward to the community proposing better partitioning strategies :)
 ```
 ## Execution Backend (Runner)
 In a multi-card, multi-machine cluster environment, if we want to implement parallel execution of multiple tasks, we usually need to rely on a cluster management system (like Slurm) for task allocation and scheduling. In OpenCompass, task allocation and execution are uniformly handled by the Runner. Currently, it supports both Slurm and PAI-DLC scheduling backends, and also provides a `LocalRunner` to directly launch tasks on the local machine.
 ### `LocalRunner`
 `LocalRunner` is the most basic runner that can run tasks in serial on the local machine.
 ```python
 from opencompass.runners import LocalRunner
 from opencompass.tasks import OpenICLInferTask
 infer = dict(
    # ...
    runner=dict(
        type=LocalRunner,
        task=dict(type=OpenICLEvalTask),  # Task to be run
    )
 )
 ```
 In the future, we plan to enhance the capabilities of `LocalRunner` to effectively utilize multi-card resources on a single machine.
 ### `SlurmRunner`
 `SlurmRunner` submits tasks to run on the Slurm cluster. The commonly used configuration fields are as follows:
 ```python
 from opencompass.runners import SlurmRunner
 from opencompass.tasks import OpenICLInferTask
 infer = dict(
    # ...
    runner=dict(
        type=SlurmRunner,
        task=dict(type=OpenICLEvalTask),  # Task to be run
        max_num_workers=16,  # Maximum concurrent evaluation task count
        partition='lm',  # The Slurm partition for running tasks
        quotatype='auto',  # (Supported only in some Slurm, can be left unset) The priority for running tasks
        retry=2,  # Retry count for failed tasks, can prevent accidental errors
    ),
 )
 ```
 ### `DLCRunner`
 `DLCRunner` submits tasks to run on Alibaba's Deep Learning Center (DLC). This Runner depends on `dlc`. Firstly, you need to prepare `dlc` in the environment:
 ```bash
 cd ~
 wget https://dlc-cli.oss-cn-zhangjiakou.aliyuncs.com/light/binary/linux/amd64/dlc
 chmod +x ./dlc
 sudo ln -rs dlc /usr/local/bin
 ./dlc config
 ```
 Fill in the necessary information according to the prompts and get the `dlc` configuration file (like `/user/.dlc/config`) to complete the preparation. Then, specify the `DLCRunner` configuration in the configuration file as per the format:
 ```python
 from opencompass.runners import DLCRunner
 from opencompass.tasks import OpenICLInferTask
 infer = dict(
    # ...
    runner=dict(
        type=DLCRunner,
        task=dict(type=OpenICLEvalTask),  # Task to be run
        max_num_workers=16,  # Maximum concurrent evaluation task count
        aliyun_cfg=dict(
            bashrc_path="/user/.bashrc",  # Path to the bashrc for initializing the running environment
            conda_env_name='opencompass',  # Conda environment for OpenCompass
            dlc_config_path="/user/.dlc/config",  # Configuration file for dlc
            workspace_id='ws-xxx',  # DLC workspace ID
            worker_image='xxx',  # Image url for running tasks
        ),
        retry=2,  # Retry count for failed tasks, can prevent accidental errors
    ),
 )
 ```
--- a/docs/zh_cn/user_guides/evaluation.md
+++ b/docs/zh_cn/user_guides/evaluation.md
@ -1 +1,131 @@
-# 评估策略
+# 高效评测
 OpenCompass 支持自定义评测任务的任务划分器（`Partitioner`），实现评测任务的灵活切分；同时配合 `Runner` 控制任务执行的平台，如本机及集群。通过二者的组合，OpenCompass 可以将大评测任务分割到大量计算节点上运行，高效利用计算资源，从而大大加速评测流程。
 ## 任务划分 (Partitioner)
 由于大语言模型的推理耗时长，评测的数据集量大，因此串行运行一次评测任务的时间开销往往很大。
 OpenCompass 支持通过自定义评测任务的任务划分器（`Partitioner`），将大评测任务按不同策略划分为众多独立的小任务，通过并行运行充分利用计算资源。用户可以通过 `infer.partitioner` 及 `eval.partitioner` 配置推理和评测阶段的任务划分策略。下面，我们将会介绍 OpenCompass 中支持的所有划分策略。
 ### `NaivePartitioner`
 该划分器会将每个模型和数据集的组合作为一个独立任务派发，为最基础的划分策略，并无任何额外参数。
 ```python
 from opencompass.partitioners import NaivePartitioner
 infer = dict(
    partitioner=dict(type=SizePartitioner)
    # ...
 )
 ```
 ### `SizePartitioner`
 ```{warning}
 该划分器不适用于评测阶段的任务（OpenEvalTask）。
 ```
 该划分器会根据数据集的大小，乘上一个扩张系数，估算该数据集的推理成本（耗时）。然后会通过切分大数据集、合并小数据集的方式创建任务，尽可能保证各个子任务推理成本均等。
 该划分器常用的参数如下：
 ```python
 from opencompass.partitioners import SizePartitioner
 infer = dict(
    partitioner=dict(
        type=SizePartitioner,
        max_task_size: int = 2000,  # 单个任务的最大长度
        gen_task_coef: int = 20,  # 生成式任务的扩张系数
    ),
    # ...
 )
 ```
 `SizePartitioner` 在估算数据集推理成本时, 会根据推理任务的类型，选择不同的扩张系数。对于生成式任务，如使用 `GenInferencer` 的任务，会设置成比较大的 `gen_task_coef`；对于判别式任务，如使用 `PPLInferencer` 的任务，则会设置成 prompt 中 label 的数量。
 ```{note}
 目前这种分割策略实现仍然比较粗糙，并未能准确反映生成式任务与判别式任务的计算量差距。我们也期待社区能提出更好的划分策略 ：）
 ```
 ## 运行后端 (Runner)
 在多卡多机的集群环境下，我们若想实现多个任务的并行执行，通常需要依赖集群管理系统（如 Slurm）对任务进行分配和调度。OpenCompass 中，任务的分配和运行统一交由 Runner 负责。目前已经支持了 Slurm 和 PAI-DLC 两种调度后端，同时也保留了在本机直接启动任务的 `LocalRunner`。
 ### `LocalRunner`
 `LocalRunner` 为最基本的运行器，可以将任务在本机串行运行。
 ```python
 from opencompass.runners import LocalRunner
 from opencompass.tasks import OpenICLInferTask
 infer = dict(
    # ...
    runner=dict(
        type=LocalRunner,
        task=dict(type=OpenICLEvalTask),  # 待运行的任务
    )
 )
 ```
 在未来，我们计划提升 `LocalRunner` 的能力，实现单机多卡资源的高效利用。
 ### `SlurmRunner`
 `SlurmRunner` 会将任务提交到 Slurm 集群上运行。常用的配置字段如下：
 ```python
 from opencompass.runners import SlurmRunner
 from opencompass.tasks import OpenICLInferTask
 infer = dict(
    # ...
    runner=dict(
        type=SlurmRunner,
        task=dict(type=OpenICLEvalTask),  # 待运行任务
        max_num_workers=16,  # 最大同时评测任务数
        partition='lm',  # 运行任务的 Slurm 分区
        quotatype='auto',  # （仅在某些 Slurm 中支持，可以不设置）运行任务的优先级
        retry=2,  # 任务失败的重试次数，可以避免意外发生的错误
    ),
 )
 ```
 ### `DLCRunner`
 `DLCRunner` 则可以将任务提交到 Alibaba Deep Learning Ceneter (DLC) 运行，该 Runner 依赖于 dlc。首先，先在环境内准备好 dlc：
 ```bash
 cd ~
 wget https://dlc-cli.oss-cn-zhangjiakou.aliyuncs.com/light/binary/linux/amd64/dlc
 chmod +x ./dlc
 sudo ln -rs dlc /usr/local/bin
 ./dlc config
 ```
 根据提示填入相应信息，并得到 dlc 的配置文件（如 /user/.dlc/config），即完成了前期准备。之后，我们在配置文件按照格式指定 `DLCRunner` 的配置：
 ```python
 from opencompass.runners import DLCRunner
 from opencompass.tasks import OpenICLInferTask
 infer = dict(
    # ...
    runner=dict(
        type=DLCRunner,
        task=dict(type=OpenICLEvalTask),  # 待运行任务
        max_num_workers=16,  # 最大同时评测任务数
        aliyun_cfg=dict(
            bashrc_path="/user/.bashrc",  # 用于初始化运行环境的 bashrc 路径
            conda_env_name='opencompass',  # OpenCompass 的 conda 环境
            dlc_config_path="/user/.dlc/config",  # dlc 配置文件
            workspace_id='ws-xxx',  # DLC 工作空间 ID
            worker_image='xxx',  # 运行任务的 image url
        ),
        retry=2,  # 任务失败的重试次数，可以避免意外发生的错误
    ),
 )
 ```