
* update docs * update docs * update * update en docs * update * update --------- Co-authored-by: Leymore <zfz-960727@163.com>
6.6 KiB
Task Execution and Monitoring
Launching an Evaluation Task
The program entry for the evaluation task is run.py
. The usage is as follows:
python run.py $EXP {--slurm | --dlc | None} [-p PARTITION] [-q QUOTATYPE] [--debug] [-m MODE] [-r [REUSE]] [-w WORKDIR] [-l] [--dry-run]
Task Configuration ($EXP
):
-
run.py
accepts a .py configuration file as task-related parameters, which must include thedatasets
andmodels
fields.python run.py configs/eval_demo.py
-
If no configuration file is provided, users can also specify models and datasets using
--models MODEL1 MODEL2 ...
and--datasets DATASET1 DATASET2 ...
:python run.py --models hf_opt_350m hf_opt_125m --datasets siqa_gen winograd_ppl
-
For HuggingFace related models, users can also define a model quickly in the command line through HuggingFace parameters and then specify datasets using
--datasets DATASET1 DATASET2 ...
.python run.py --datasets siqa_gen winograd_ppl \ --hf-path huggyllama/llama-7b \ # HuggingFace model path --model-kwargs device_map='auto' \ # Parameters for constructing the model --tokenizer-kwargs padding_side='left' truncation='left' use_fast=False \ # Parameters for constructing the tokenizer --max-out-len 100 \ # Maximum sequence length the model can accept --max-seq-len 2048 \ # Maximum generated token count --batch-size 8 \ # Batch size --no-batch-padding \ # Disable batch padding and infer through a for loop to avoid accuracy loss --num-gpus 1 # Number of required GPUs
Complete HuggingFace parameter descriptions:
--hf-path
: HuggingFace model path--peft-path
: PEFT model path--tokenizer-path
: HuggingFace tokenizer path (if it's the same as the model path, it can be omitted)--model-kwargs
: Parameters for constructing the model--tokenizer-kwargs
: Parameters for constructing the tokenizer--max-out-len
: Maximum generated token count--max-seq-len
: Maximum sequence length the model can accept--no-batch-padding
: Disable batch padding and infer through a for loop to avoid accuracy loss--batch-size
: Batch size--num-gpus
: Number of GPUs required to run the model
Starting Methods:
- Running on local machine:
run.py $EXP
. - Running with slurm:
run.py $EXP --slurm -p $PARTITION_name
. - Running with dlc:
run.py $EXP --dlc --aliyun-cfg $AliYun_Cfg
- Customized starting:
run.py $EXP
. Here, $EXP is the configuration file which includes theeval
andinfer
fields. For detailed configurations, please refer to Efficient Evaluation.
The parameter explanation is as follows:
-p
: Specify the slurm partition;-q
: Specify the slurm quotatype (default is None), with optional values being reserved, auto, spot. This parameter may only be used in some slurm variants;--debug
: When enabled, inference and evaluation tasks will run in single-process mode, and output will be echoed in real-time for debugging;-m
: Running mode, default isall
. It can be specified asinfer
to only run inference and obtain output results; if there are already model outputs in{WORKDIR}
, it can be specified aseval
to only run evaluation and obtain evaluation results; if the evaluation results are ready, it can be specified asviz
to only run visualization, which summarizes the results in tables; if specified asall
, a full run will be performed, which includes inference, evaluation, and visualization.-r
: Reuse existing inference results, and skip the finished tasks. If followed by a timestamp, the result under that timestamp in the workspace path will be reused; otherwise, the latest result in the specified workspace path will be reused.-w
: Specify the working path, default is./outputs/default
.-l
: Enable status reporting via Lark bot.--dry-run
: When enabled, inference and evaluation tasks will be dispatched but won't actually run for debugging.
Using run mode -m all
as an example, the overall execution flow is as follows:
- Read the configuration file, parse out the model, dataset, evaluator, and other configuration information
- The evaluation task mainly includes three stages: inference
infer
, evaluationeval
, and visualizationviz
. After task division by Partitioner, they are handed over to Runner for parallel execution. Individual inference and evaluation tasks are abstracted intoOpenICLInferTask
andOpenICLEvalTask
respectively. - After each stage ends, the visualization stage will read the evaluation results in
results/
to generate a table.
Task Monitoring: Lark Bot
Users can enable real-time monitoring of task status by setting up a Lark bot. Please refer to this document for setting up the Lark bot.
Configuration method:
-
Open the
configs/lark.py
file, and add the following line:lark_bot_url = 'YOUR_WEBHOOK_URL'
Typically, the Webhook URL is formatted like this: https://open.feishu.cn/open-apis/bot/v2/hook/xxxxxxxxxxxxxxxxx .
-
Inherit this file in the complete evaluation configuration:
from mmengine.config import read_base with read_base(): from .lark import lark_bot_url
-
To avoid frequent messages from the bot becoming a nuisance, status updates are not automatically reported by default. You can start status reporting using
-l
or--lark
when needed:python run.py configs/eval_demo.py -p {PARTITION} -l
Run Results
All run results will be placed in outputs/default/
directory by default, the directory structure is shown below:
outputs/default/
├── 20200220_120000
├── ...
├── 20230220_183030
│ ├── configs
│ ├── logs
│ │ ├── eval
│ │ └── infer
│ ├── predictions
│ │ └── MODEL1
│ └── results
│ └── MODEL1
Each timestamp contains the following content:
- configs folder, which stores the configuration files corresponding to each run with this timestamp as the output directory;
- logs folder, which stores the output log files of the inference and evaluation phases, each folder will store logs in subfolders by model;
- predictions folder, which stores the inferred json results, with a model subfolder;
- results folder, which stores the evaluated json results, with a model subfolder.
Also, all -r
without specifying a corresponding timestamp will select the newest folder by sorting as the output directory.