mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00
[Docs] add humanevalx dataset link in config (#559)
* [Docs] add humanevalx dataset link in config * [Docs] add humanevalx dataset link in config * minor fix
This commit is contained in:
parent
32884f2e39
commit
95e0da0173
@ -44,12 +44,15 @@ humanevalx_eval_cfg_dict = {
|
|||||||
] # do not support rust now
|
] # do not support rust now
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# Please download the needed `xx.jsonl.gz` from
|
||||||
|
# https://github.com/THUDM/CodeGeeX2/tree/main/benchmark/humanevalx
|
||||||
|
# and move them into `data/humanevalx/` folder
|
||||||
humanevalx_datasets = [
|
humanevalx_datasets = [
|
||||||
dict(
|
dict(
|
||||||
type=HumanevalXDataset,
|
type=HumanevalXDataset,
|
||||||
abbr=f'humanevalx-{lang}',
|
abbr=f'humanevalx-{lang}',
|
||||||
language=lang,
|
language=lang,
|
||||||
path='./backup_data/humanevalx',
|
path='./data/humanevalx',
|
||||||
reader_cfg=humanevalx_reader_cfg,
|
reader_cfg=humanevalx_reader_cfg,
|
||||||
infer_cfg=humanevalx_infer_cfg[lang],
|
infer_cfg=humanevalx_infer_cfg[lang],
|
||||||
eval_cfg=humanevalx_eval_cfg_dict[lang])
|
eval_cfg=humanevalx_eval_cfg_dict[lang])
|
||||||
|
@ -24,6 +24,9 @@ humanevalx_eval_cfg_dict = {
|
|||||||
for lang in ['python', 'cpp', 'go', 'java', 'js'] # do not support rust now
|
for lang in ['python', 'cpp', 'go', 'java', 'js'] # do not support rust now
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# Please download the needed `xx.jsonl.gz` from
|
||||||
|
# https://github.com/THUDM/CodeGeeX2/tree/main/benchmark/humanevalx
|
||||||
|
# and move them into `data/humanevalx/` folder
|
||||||
humanevalx_datasets = [
|
humanevalx_datasets = [
|
||||||
dict(
|
dict(
|
||||||
type=HumanevalXDataset,
|
type=HumanevalXDataset,
|
||||||
|
@ -2,6 +2,10 @@
|
|||||||
|
|
||||||
To complete LLM code capability evaluation, we need to set up an independent evaluation environment to avoid executing erroneous codes on development environments which would cause unavoidable losses. The current Code Evaluation Service used in OpenCompass refers to the project [code-evaluator](https://github.com/open-compass/code-evaluator.git), which has already supported evaluating datasets for multiple programming languages [humaneval-x](https://huggingface.co/datasets/THUDM/humaneval-x). The following tutorials will introduce how to conduct code review services under different requirements.
|
To complete LLM code capability evaluation, we need to set up an independent evaluation environment to avoid executing erroneous codes on development environments which would cause unavoidable losses. The current Code Evaluation Service used in OpenCompass refers to the project [code-evaluator](https://github.com/open-compass/code-evaluator.git), which has already supported evaluating datasets for multiple programming languages [humaneval-x](https://huggingface.co/datasets/THUDM/humaneval-x). The following tutorials will introduce how to conduct code review services under different requirements.
|
||||||
|
|
||||||
|
Dataset [download address](https://github.com/THUDM/CodeGeeX2/tree/main/benchmark/humanevalx). Please download the needed files (xx.jsonl.gz) into `./data/humanevalx` folder.
|
||||||
|
|
||||||
|
Supported languages are `python`, `cpp`, `go`, `java`, `js`.
|
||||||
|
|
||||||
## Launching the Code Evaluation Service
|
## Launching the Code Evaluation Service
|
||||||
|
|
||||||
1. Ensure you have installed Docker, please refer to [Docker installation document](https://docs.docker.com/engine/install/).
|
1. Ensure you have installed Docker, please refer to [Docker installation document](https://docs.docker.com/engine/install/).
|
||||||
|
@ -2,6 +2,10 @@
|
|||||||
|
|
||||||
为了完成LLM代码能力评测,我们需要搭建一套独立的评测环境,避免在开发环境执行错误代码从而造成不可避免的损失。目前 OpenCompass 使用的代码评测服务可参考[code-evaluator](https://github.com/open-compass/code-evaluator)项目,并已经支持评测多编程语言的数据集 [humaneval-x](https://huggingface.co/datasets/THUDM/humaneval-x)。接下来将围绕代码评测服务介绍不同需要下的评测教程。
|
为了完成LLM代码能力评测,我们需要搭建一套独立的评测环境,避免在开发环境执行错误代码从而造成不可避免的损失。目前 OpenCompass 使用的代码评测服务可参考[code-evaluator](https://github.com/open-compass/code-evaluator)项目,并已经支持评测多编程语言的数据集 [humaneval-x](https://huggingface.co/datasets/THUDM/humaneval-x)。接下来将围绕代码评测服务介绍不同需要下的评测教程。
|
||||||
|
|
||||||
|
数据集[下载地址](https://github.com/THUDM/CodeGeeX2/tree/main/benchmark/humanevalx),请下载需要评测的语言(××.jsonl.gz)文件,并放入`./data/humanevalx`文件夹。
|
||||||
|
|
||||||
|
目前支持的语言有`python`, `cpp`, `go`, `java`, `js`。
|
||||||
|
|
||||||
## 启动代码评测服务
|
## 启动代码评测服务
|
||||||
|
|
||||||
1. 确保您已经安装了 docker,可参考[安装docker文档](https://docs.docker.com/engine/install/)
|
1. 确保您已经安装了 docker,可参考[安装docker文档](https://docs.docker.com/engine/install/)
|
||||||
|
Loading…
Reference in New Issue
Block a user