diff --git a/docs/en/user_guides/datasets.md b/docs/en/user_guides/datasets.md index 891cb2b9..c392fefb 100644 --- a/docs/en/user_guides/datasets.md +++ b/docs/en/user_guides/datasets.md @@ -102,9 +102,11 @@ afqmc_datasets = [ ``` Additionally, for binary evaluation metrics (such as accuracy, pass-rate, etc.), you can also set the parameter `k` in conjunction with `n` for [G-Pass@ k](http://arxiv.org/abs/2412.13147) evaluation. The formula for G-Pass@$k$ is: + $$ \text{G-Pass@}k_\tau=\mathbb{E}_{\text{Data}}\left[ \sum_{j=\lceil \tau \cdot k \rceil}^c \frac{{c \choose j} \cdot {n - c \choose k - j}}{{n \choose k}} \right], $$ + where $n$ is the number of evaluations, and $c$ is the number of times that passed or were correct out of $n$ runs. An example configuration is as follows: ```python