
* add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * udpate dataset for modelscope support * update readme * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * update readme * remove tydiqa japanese subset * add ceval, gsm8k modelscope surpport * update race, mmlu, arc, cmmlu, commonsenseqa, humaneval and unittest * update bbh, flores, obqa, siqa, storycloze, summedits, winogrande, xsum datasets * format file * format file * update dataset format * support ms_dataset * udpate dataset for modelscope support * merge myl_dev and update test_ms_dataset * update readme * udpate dataset for modelscope support * update eval_api_zhipu_v2 * remove unused code * add get_data_path function * remove tydiqa japanese subset * update util * remove .DS_Store * fix md format * move util into package * update docs/get_started.md * restore eval_api_zhipu_v2.py, add environment setting * Update dataset * Update * Update * Update * Update --------- Co-authored-by: Yun lin <yunlin@U-Q9X2K4QV-1904.local> Co-authored-by: Yunnglin <mao.looper@qq.com> Co-authored-by: Yun lin <yunlin@laptop.local> Co-authored-by: Yunnglin <maoyl@smail.nju.edu.cn> Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
4.7 KiB
Installation
- Set up the OpenCompass environment:
````{tab} Open-source Models with GPU
```bash
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
conda activate opencompass
```
If you want to customize the PyTorch version or related CUDA version, please refer to the [official documentation](https://pytorch.org/get-started/locally/) to set up the PyTorch environment. Note that OpenCompass requires `pytorch>=1.13`.
````
````{tab} API Models with CPU-only
```bash
conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y
conda activate opencompass
# also please install requiresments packages via `pip install -r requirements/api.txt` for API models if needed.
```
If you want to customize the PyTorch version, please refer to the [official documentation](https://pytorch.org/get-started/locally/) to set up the PyTorch environment. Note that OpenCompass requires `pytorch>=1.13`.
````
-
Install OpenCompass:
git clone https://github.com/open-compass/opencompass.git cd opencompass pip install -e .
-
Install humaneval (Optional)
If you want to evaluate your models coding ability on the humaneval dataset, follow this step.
click to show the details
git clone https://github.com/openai/human-eval.git cd human-eval pip install -r requirements.txt pip install -e . cd ..
Please read the comments in
human_eval/execution.py
lines 48-57 to understand the potential risks of executing the model generation code. If you accept these risks, uncomment line 58 to enable code execution evaluation. -
Install Llama (Optional)
If you want to evaluate Llama / Llama-2 / Llama-2-chat with its official implementation, follow this step.
click to show the details
git clone https://github.com/facebookresearch/llama.git cd llama pip install -r requirements.txt pip install -e . cd ..
You can find example configs in
configs/models
. (example) -
Install alpaca-eval (Optional):
If you want toevaluate alpaca-eval in official ways, follow this step.
click to show the details
pip install alpaca-eval
Dataset Preparation
The datasets supported by OpenCompass mainly include three parts:
-
Huggingface datasets: The Huggingface Datasets provide a large number of datasets, which will automatically download when running with this option. Translate the paragraph into English:
-
ModelScope Datasets: ModelScope OpenCompass Dataset supports automatic downloading of datasets from ModelScope.
To enable this feature, set the environment variable:
export DATASET_SOURCE=ModelScope
. The available datasets include (sourced from OpenCompassData-core.zip):humaneval, triviaqa, commonsenseqa, tydiqa, strategyqa, cmmlu, lambada, piqa, ceval, math, LCSTS, Xsum, winogrande, openbookqa, AGIEval, gsm8k, nq, race, siqa, mbpp, mmlu, hellaswag, ARC, BBH, xstory_cloze, summedits, GAOKAO-BENCH, OCNLI, cmnli
-
Custom dataset: OpenCompass also provides some Chinese custom self-built datasets. Please run the following command to manually download and extract them.
Run the following commands to download and place the datasets in the ${OpenCompass}/data
directory can complete dataset preparation.
# Run in the OpenCompass directory
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
unzip OpenCompassData-core-20240207.zip
If you need to use the more comprehensive dataset (~500M) provided by OpenCompass, You can download and unzip
it using the following command:
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-complete-20240207.zip
unzip OpenCompassData-complete-20240207.zip
cd ./data
find . -name "*.zip" -exec unzip "{}" \;
The list of datasets included in both .zip
can be found here
OpenCompass has supported most of the datasets commonly used for performance comparison, please refer to configs/dataset
for the specific list of supported datasets.
For next step, please read Quick Start.