mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00
[Doc] Update README and requirements. (#622)
* update readme * update doc
This commit is contained in:
parent
c0785e53d8
commit
5329724b65
71
README.md
71
README.md
@ -38,15 +38,15 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
|
|||||||
|
|
||||||
## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
|
## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
|
||||||
|
|
||||||
|
- **\[2023.11.22\]** We have supported many API-based models, include **Baidu, ByteDance, Huawei, 360**. Welcome to [Models](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) section for more details. 🔥🔥🔥.
|
||||||
|
- **\[2023.11.20\]** Thanks [helloyongyang](https://github.com/helloyongyang) for supporting the evaluation with [LightLLM](https://github.com/ModelTC/lightllm) as backent. Welcome to [Evaluation With LightLLM](https://opencompass.readthedocs.io/en/latest/advanced_guides/evaluation_lightllm.html) for more details. 🔥🔥🔥.
|
||||||
- **\[2023.11.13\]** We are delighted to announce the release of OpenCompass v0.1.8. This version enables local loading of evaluation benchmarks, thereby eliminating the need for an internet connection. Please note that with this update, **you must re-download all evaluation datasets** to ensure accurate and up-to-date results.🔥🔥🔥.
|
- **\[2023.11.13\]** We are delighted to announce the release of OpenCompass v0.1.8. This version enables local loading of evaluation benchmarks, thereby eliminating the need for an internet connection. Please note that with this update, **you must re-download all evaluation datasets** to ensure accurate and up-to-date results.🔥🔥🔥.
|
||||||
- **\[2023.11.06\]** We have supported several API-based models, include ChatGLM Pro@Zhipu, ABAB-Chat@MiniMax and Xunfei. Welcome to [Models](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) section for more details. 🔥🔥🔥.
|
- **\[2023.11.06\]** We have supported several API-based models, include **ChatGLM Pro@Zhipu, ABAB-Chat@MiniMax and Xunfei**. Welcome to [Models](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) section for more details. 🔥🔥🔥.
|
||||||
- **\[2023.10.24\]** We release a new benchmark for evaluating LLMs’ capabilities of having multi-turn dialogues. Welcome to [BotChat](https://github.com/open-compass/BotChat) for more details. 🔥🔥🔥.
|
- **\[2023.10.24\]** We release a new benchmark for evaluating LLMs’ capabilities of having multi-turn dialogues. Welcome to [BotChat](https://github.com/open-compass/BotChat) for more details.
|
||||||
- **\[2023.09.26\]** We update the leaderboard with [Qwen](https://github.com/QwenLM/Qwen), one of the best-performing open-source models currently available, welcome to our [homepage](https://opencompass.org.cn) for more details. 🔥🔥🔥.
|
- **\[2023.09.26\]** We update the leaderboard with [Qwen](https://github.com/QwenLM/Qwen), one of the best-performing open-source models currently available, welcome to our [homepage](https://opencompass.org.cn) for more details.
|
||||||
- **\[2023.09.20\]** We update the leaderboard with [InternLM-20B](https://github.com/InternLM/InternLM), welcome to our [homepage](https://opencompass.org.cn) for more details.
|
- **\[2023.09.20\]** We update the leaderboard with [InternLM-20B](https://github.com/InternLM/InternLM), welcome to our [homepage](https://opencompass.org.cn) for more details.
|
||||||
- **\[2023.09.19\]** We update the leaderboard with WeMix-LLaMA2-70B/Phi-1.5-1.3B, welcome to our [homepage](https://opencompass.org.cn) for more details.
|
- **\[2023.09.19\]** We update the leaderboard with WeMix-LLaMA2-70B/Phi-1.5-1.3B, welcome to our [homepage](https://opencompass.org.cn) for more details.
|
||||||
- **\[2023.09.18\]** We have released [long context evaluation guidance](docs/en/advanced_guides/longeval.md).
|
- **\[2023.09.18\]** We have released [long context evaluation guidance](docs/en/advanced_guides/longeval.md).
|
||||||
- **\[2023.09.08\]** We update the leaderboard with Baichuan-2/Tigerbot-2/Vicuna-v1.5, welcome to our [homepage](https://opencompass.org.cn) for more details.
|
|
||||||
- **\[2023.09.06\]** [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
|
|
||||||
|
|
||||||
> [More](docs/en/notes/news.md)
|
> [More](docs/en/notes/news.md)
|
||||||
|
|
||||||
@ -76,12 +76,32 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for the co
|
|||||||
|
|
||||||
Below are the steps for quick installation and datasets preparation.
|
Below are the steps for quick installation and datasets preparation.
|
||||||
|
|
||||||
```Python
|
### 💻 Environment Setup
|
||||||
|
|
||||||
|
#### Open-source Models with GPU
|
||||||
|
|
||||||
|
```bash
|
||||||
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
|
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
|
||||||
conda activate opencompass
|
conda activate opencompass
|
||||||
git clone https://github.com/open-compass/opencompass opencompass
|
git clone https://github.com/open-compass/opencompass opencompass
|
||||||
cd opencompass
|
cd opencompass
|
||||||
pip install -e .
|
pip install -e .
|
||||||
|
```
|
||||||
|
|
||||||
|
#### API Models with CPU-only
|
||||||
|
|
||||||
|
```bash
|
||||||
|
conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y
|
||||||
|
conda activate opencompass
|
||||||
|
git clone https://github.com/open-compass/opencompass opencompass
|
||||||
|
cd opencompass
|
||||||
|
pip install -e .
|
||||||
|
# also please install requiresments packages via `pip install -r requirements/api.txt` for API models if needed.
|
||||||
|
```
|
||||||
|
|
||||||
|
### 📂 Data Preparation
|
||||||
|
|
||||||
|
```bash
|
||||||
# Download dataset to data/ folder
|
# Download dataset to data/ folder
|
||||||
wget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip
|
wget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip
|
||||||
unzip OpenCompassData-core-20231110.zip
|
unzip OpenCompassData-core-20231110.zip
|
||||||
@ -411,16 +431,17 @@ Through the command line or configuration files, OpenCompass also supports evalu
|
|||||||
<tr valign="top">
|
<tr valign="top">
|
||||||
<td>
|
<td>
|
||||||
|
|
||||||
- InternLM
|
- [InternLM](https://github.com/InternLM/InternLM)
|
||||||
- LLaMA
|
- [LLaMA](https://github.com/facebookresearch/llama)
|
||||||
- Vicuna
|
- [Vicuna](https://github.com/lm-sys/FastChat)
|
||||||
- Alpaca
|
- [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
|
||||||
- Baichuan
|
- [Baichuan](https://github.com/baichuan-inc)
|
||||||
- WizardLM
|
- [WizardLM](https://github.com/nlpxucan/WizardLM)
|
||||||
- ChatGLM2
|
- [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B)
|
||||||
- Falcon
|
- [ChatGLM3](https://github.com/THUDM/ChatGLM3-6B)
|
||||||
- TigerBot
|
- [TigerBot](https://github.com/TigerResearch/TigerBot)
|
||||||
- Qwen
|
- [Qwen](https://github.com/QwenLM/Qwen)
|
||||||
|
- [BlueLM](https://github.com/vivo-ai-lab/BlueLM)
|
||||||
- ...
|
- ...
|
||||||
|
|
||||||
</td>
|
</td>
|
||||||
@ -428,7 +449,15 @@ Through the command line or configuration files, OpenCompass also supports evalu
|
|||||||
|
|
||||||
- OpenAI
|
- OpenAI
|
||||||
- Claude
|
- Claude
|
||||||
- PaLM (coming soon)
|
- ZhipuAI(ChatGLM)
|
||||||
|
- Baichuan
|
||||||
|
- ByteDance(YunQue)
|
||||||
|
- Huawei(PanGu)
|
||||||
|
- 360
|
||||||
|
- Baidu(ERNIEBot)
|
||||||
|
- MiniMax(ABAB-Chat)
|
||||||
|
- SenseTime(nova)
|
||||||
|
- Xunfei(Spark)
|
||||||
- ……
|
- ……
|
||||||
|
|
||||||
</td>
|
</td>
|
||||||
@ -444,17 +473,17 @@ Through the command line or configuration files, OpenCompass also supports evalu
|
|||||||
- [ ] Subjective Evaluation
|
- [ ] Subjective Evaluation
|
||||||
- [ ] Release CompassAreana
|
- [ ] Release CompassAreana
|
||||||
- [ ] Subjective evaluation dataset.
|
- [ ] Subjective evaluation dataset.
|
||||||
- [ ] Long-context
|
- [x] Long-context
|
||||||
- [ ] Long-context evaluation with extensive datasets.
|
- [ ] Long-context evaluation with extensive datasets.
|
||||||
- [ ] Long-context leaderboard.
|
- [ ] Long-context leaderboard.
|
||||||
- [ ] Coding
|
- [ ] Coding
|
||||||
- [ ] Coding evaluation leaderboard.
|
- [ ] Coding evaluation leaderboard.
|
||||||
- [ ] Non-python language evaluation service.
|
- [x] Non-python language evaluation service.
|
||||||
- [ ] Agent
|
- [ ] Agent
|
||||||
- [ ] Support various agenet framework.
|
- [ ] Support various agenet framework.
|
||||||
- [ ] Evaluation of tool use of the LLMs.
|
- [ ] Evaluation of tool use of the LLMs.
|
||||||
- [ ] Robustness
|
- [x] Robustness
|
||||||
- [ ] Support various attack method
|
- [x] Support various attack method
|
||||||
|
|
||||||
## 👷♂️ Contributing
|
## 👷♂️ Contributing
|
||||||
|
|
||||||
|
@ -38,15 +38,15 @@
|
|||||||
|
|
||||||
## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
|
## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
|
||||||
|
|
||||||
|
- **\[2023.11.22\]** 我们已经支持了多个于API的模型,包括**百度、字节跳动、华为、360**。欢迎查阅[模型](https://opencompass.readthedocs.io/en/latest/user_guides/models.html)部分以获取更多详细信息。🔥🔥🔥。
|
||||||
|
- **\[2023.11.20\]** 感谢[helloyongyang](https://github.com/helloyongyang)支持使用[LightLLM](https://github.com/ModelTC/lightllm)作为后端进行评估。欢迎查阅[使用LightLLM进行评估](https://opencompass.readthedocs.io/en/latest/advanced_guides/evaluation_lightllm.html)以获取更多详细信息。🔥🔥🔥。
|
||||||
- **\[2023.11.13\]** 我们很高兴地宣布发布 OpenCompass v0.1.8 版本。此版本支持本地加载评估基准,从而无需连接互联网。请注意,随着此更新的发布,**您需要重新下载所有评估数据集**,以确保结果准确且最新。🔥🔥🔥。
|
- **\[2023.11.13\]** 我们很高兴地宣布发布 OpenCompass v0.1.8 版本。此版本支持本地加载评估基准,从而无需连接互联网。请注意,随着此更新的发布,**您需要重新下载所有评估数据集**,以确保结果准确且最新。🔥🔥🔥。
|
||||||
- **\[2023.11.06\]** 我们已经支持了多个基于 API 的模型,包括ChatGLM Pro@智谱清言、ABAB-Chat@MiniMax 和讯飞。欢迎查看 [模型](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) 部分以获取更多详细信息。🔥🔥🔥。
|
- **\[2023.11.06\]** 我们已经支持了多个基于 API 的模型,包括ChatGLM Pro@智谱清言、ABAB-Chat@MiniMax 和讯飞。欢迎查看 [模型](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) 部分以获取更多详细信息。🔥🔥🔥。
|
||||||
- **\[2023.10.24\]** 我们发布了一个全新的评测集,BotChat,用于评估大语言模型的多轮对话能力,欢迎查看 [BotChat](https://github.com/open-compass/BotChat) 获取更多信息. 🔥🔥🔥.
|
- **\[2023.10.24\]** 我们发布了一个全新的评测集,BotChat,用于评估大语言模型的多轮对话能力,欢迎查看 [BotChat](https://github.com/open-compass/BotChat) 获取更多信息.
|
||||||
- **\[2023.09.26\]** 我们在评测榜单上更新了[Qwen](https://github.com/QwenLM/Qwen), 这是目前表现最好的开源模型之一, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.🔥🔥🔥.
|
- **\[2023.09.26\]** 我们在评测榜单上更新了[Qwen](https://github.com/QwenLM/Qwen), 这是目前表现最好的开源模型之一, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.
|
||||||
- **\[2023.09.20\]** 我们在评测榜单上更新了[InternLM-20B](https://github.com/InternLM/InternLM), 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.
|
- **\[2023.09.20\]** 我们在评测榜单上更新了[InternLM-20B](https://github.com/InternLM/InternLM), 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.
|
||||||
- **\[2023.09.19\]** 我们在评测榜单上更新了WeMix-LLaMA2-70B/Phi-1.5-1.3B, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.
|
- **\[2023.09.19\]** 我们在评测榜单上更新了WeMix-LLaMA2-70B/Phi-1.5-1.3B, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.
|
||||||
- **\[2023.09.18\]** 我们发布了[长文本评测指引](docs/zh_cn/advanced_guides/longeval.md).
|
- **\[2023.09.18\]** 我们发布了[长文本评测指引](docs/zh_cn/advanced_guides/longeval.md).
|
||||||
- **\[2023.09.08\]** 我们在评测榜单上更新了Baichuan-2/Tigerbot-2/Vicuna-v1.5, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情。
|
|
||||||
- **\[2023.09.06\]** 欢迎 [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。
|
|
||||||
|
|
||||||
> [更多](docs/zh_cn/notes/news.md)
|
> [更多](docs/zh_cn/notes/news.md)
|
||||||
|
|
||||||
@ -78,12 +78,32 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
|
|||||||
|
|
||||||
下面展示了快速安装以及准备数据集的步骤。
|
下面展示了快速安装以及准备数据集的步骤。
|
||||||
|
|
||||||
```Python
|
### 💻 环境配置
|
||||||
|
|
||||||
|
#### 面向开源模型的GPU环境
|
||||||
|
|
||||||
|
```bash
|
||||||
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
|
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
|
||||||
conda activate opencompass
|
conda activate opencompass
|
||||||
git clone https://github.com/open-compass/opencompass opencompass
|
git clone https://github.com/open-compass/opencompass opencompass
|
||||||
cd opencompass
|
cd opencompass
|
||||||
pip install -e .
|
pip install -e .
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 面向API模型测试的CPU环境
|
||||||
|
|
||||||
|
```bash
|
||||||
|
conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y
|
||||||
|
conda activate opencompass
|
||||||
|
git clone https://github.com/open-compass/opencompass opencompass
|
||||||
|
cd opencompass
|
||||||
|
pip install -e .
|
||||||
|
# 如果需要使用各个API模型,请 `pip install -r requirements/api.txt` 安装API模型的相关依赖
|
||||||
|
```
|
||||||
|
|
||||||
|
### 📂 数据准备
|
||||||
|
|
||||||
|
```bash
|
||||||
# 下载数据集到 data/ 处
|
# 下载数据集到 data/ 处
|
||||||
wget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip
|
wget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip
|
||||||
unzip OpenCompassData-core-20231110.zip
|
unzip OpenCompassData-core-20231110.zip
|
||||||
@ -413,16 +433,17 @@ python run.py --datasets ceval_ppl mmlu_ppl \
|
|||||||
<tr valign="top">
|
<tr valign="top">
|
||||||
<td>
|
<td>
|
||||||
|
|
||||||
- InternLM
|
- [InternLM](https://github.com/InternLM/InternLM)
|
||||||
- LLaMA
|
- [LLaMA](https://github.com/facebookresearch/llama)
|
||||||
- Vicuna
|
- [Vicuna](https://github.com/lm-sys/FastChat)
|
||||||
- Alpaca
|
- [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
|
||||||
- Baichuan
|
- [Baichuan](https://github.com/baichuan-inc)
|
||||||
- WizardLM
|
- [WizardLM](https://github.com/nlpxucan/WizardLM)
|
||||||
- ChatGLM2
|
- [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B)
|
||||||
- Falcon
|
- [ChatGLM3](https://github.com/THUDM/ChatGLM3-6B)
|
||||||
- TigerBot
|
- [TigerBot](https://github.com/TigerResearch/TigerBot)
|
||||||
- Qwen
|
- [Qwen](https://github.com/QwenLM/Qwen)
|
||||||
|
- [BlueLM](https://github.com/vivo-ai-lab/BlueLM)
|
||||||
- ……
|
- ……
|
||||||
|
|
||||||
</td>
|
</td>
|
||||||
@ -430,7 +451,15 @@ python run.py --datasets ceval_ppl mmlu_ppl \
|
|||||||
|
|
||||||
- OpenAI
|
- OpenAI
|
||||||
- Claude
|
- Claude
|
||||||
- PaLM (即将推出)
|
- ZhipuAI(ChatGLM)
|
||||||
|
- Baichuan
|
||||||
|
- ByteDance(YunQue)
|
||||||
|
- Huawei(PanGu)
|
||||||
|
- 360
|
||||||
|
- Baidu(ERNIEBot)
|
||||||
|
- MiniMax(ABAB-Chat)
|
||||||
|
- SenseTime(nova)
|
||||||
|
- Xunfei(Spark)
|
||||||
- ……
|
- ……
|
||||||
|
|
||||||
</td>
|
</td>
|
||||||
@ -446,17 +475,17 @@ python run.py --datasets ceval_ppl mmlu_ppl \
|
|||||||
- [ ] 主观评测
|
- [ ] 主观评测
|
||||||
- [ ] 发布主观评测榜单
|
- [ ] 发布主观评测榜单
|
||||||
- [ ] 发布主观评测数据集
|
- [ ] 发布主观评测数据集
|
||||||
- [ ] 长文本
|
- [x] 长文本
|
||||||
- [ ] 支持广泛的长文本评测集
|
- [ ] 支持广泛的长文本评测集
|
||||||
- [ ] 发布长文本评测榜单
|
- [ ] 发布长文本评测榜单
|
||||||
- [ ] 代码能力
|
- [ ] 代码能力
|
||||||
- [ ] 发布代码能力评测榜单
|
- [ ] 发布代码能力评测榜单
|
||||||
- [ ] 提供非Python语言的评测服务
|
- [x] 提供非Python语言的评测服务
|
||||||
- [ ] 智能体
|
- [ ] 智能体
|
||||||
- [ ] 支持丰富的智能体方案
|
- [ ] 支持丰富的智能体方案
|
||||||
- [ ] 提供智能体评测榜单
|
- [ ] 提供智能体评测榜单
|
||||||
- [ ] 鲁棒性
|
- [x] 鲁棒性
|
||||||
- [ ] 支持各类攻击方法
|
- [x] 支持各类攻击方法
|
||||||
|
|
||||||
## 👷♂️ 贡献
|
## 👷♂️ 贡献
|
||||||
|
|
||||||
|
@ -2,6 +2,9 @@
|
|||||||
|
|
||||||
1. Set up the OpenCompass environment:
|
1. Set up the OpenCompass environment:
|
||||||
|
|
||||||
|
`````{tabs}
|
||||||
|
````{tab} Open-source Models with GPU
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
|
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
|
||||||
conda activate opencompass
|
conda activate opencompass
|
||||||
@ -9,6 +12,20 @@
|
|||||||
|
|
||||||
If you want to customize the PyTorch version or related CUDA version, please refer to the [official documentation](https://pytorch.org/get-started/locally/) to set up the PyTorch environment. Note that OpenCompass requires `pytorch>=1.13`.
|
If you want to customize the PyTorch version or related CUDA version, please refer to the [official documentation](https://pytorch.org/get-started/locally/) to set up the PyTorch environment. Note that OpenCompass requires `pytorch>=1.13`.
|
||||||
|
|
||||||
|
````
|
||||||
|
````{tab} API Models with CPU-only
|
||||||
|
|
||||||
|
```bash
|
||||||
|
conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y
|
||||||
|
conda activate opencompass
|
||||||
|
# also please install requiresments packages via `pip install -r requirements/api.txt` for API models if needed.
|
||||||
|
```
|
||||||
|
|
||||||
|
If you want to customize the PyTorch version, please refer to the [official documentation](https://pytorch.org/get-started/locally/) to set up the PyTorch environment. Note that OpenCompass requires `pytorch>=1.13`.
|
||||||
|
|
||||||
|
````
|
||||||
|
`````
|
||||||
|
|
||||||
2. Install OpenCompass:
|
2. Install OpenCompass:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -1,5 +1,7 @@
|
|||||||
# News
|
# News
|
||||||
|
|
||||||
|
- **\[2023.09.08\]** We update the leaderboard with Baichuan-2/Tigerbot-2/Vicuna-v1.5, welcome to our [homepage](https://opencompass.org.cn) for more details.
|
||||||
|
- **\[2023.09.06\]** [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
|
||||||
- **\[2023.09.02\]** We have supported the evaluation of [Qwen-VL](https://github.com/QwenLM/Qwen-VL) in OpenCompass.
|
- **\[2023.09.02\]** We have supported the evaluation of [Qwen-VL](https://github.com/QwenLM/Qwen-VL) in OpenCompass.
|
||||||
- **\[2023.08.25\]** [**TigerBot**](https://github.com/TigerResearch/TigerBot) team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
|
- **\[2023.08.25\]** [**TigerBot**](https://github.com/TigerResearch/TigerBot) team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
|
||||||
- **\[2023.08.21\]** [**Lagent**](https://github.com/InternLM/lagent) has been released, which is a lightweight framework for building LLM-based agents. We are working with Lagent team to support the evaluation of general tool-use capability, stay tuned!
|
- **\[2023.08.21\]** [**Lagent**](https://github.com/InternLM/lagent) has been released, which is a lightweight framework for building LLM-based agents. We are working with Lagent team to support the evaluation of general tool-use capability, stay tuned!
|
||||||
|
@ -2,6 +2,9 @@
|
|||||||
|
|
||||||
1. 准备 OpenCompass 运行环境:
|
1. 准备 OpenCompass 运行环境:
|
||||||
|
|
||||||
|
`````{tabs}
|
||||||
|
````{tab} 面向开源模型的GPU环境
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
|
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
|
||||||
conda activate opencompass
|
conda activate opencompass
|
||||||
@ -9,6 +12,21 @@
|
|||||||
|
|
||||||
如果你希望自定义 PyTorch 版本或相关的 CUDA 版本,请参考 [官方文档](https://pytorch.org/get-started/locally/) 准备 PyTorch 环境。需要注意的是,OpenCompass 要求 `pytorch>=1.13`。
|
如果你希望自定义 PyTorch 版本或相关的 CUDA 版本,请参考 [官方文档](https://pytorch.org/get-started/locally/) 准备 PyTorch 环境。需要注意的是,OpenCompass 要求 `pytorch>=1.13`。
|
||||||
|
|
||||||
|
````
|
||||||
|
|
||||||
|
````{tab} 面向API模型测试的CPU环境
|
||||||
|
|
||||||
|
```bash
|
||||||
|
conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y
|
||||||
|
conda activate opencompass
|
||||||
|
# 如果需要使用各个API模型,请 `pip install -r requirements/api.txt` 安装API模型的相关依赖
|
||||||
|
```
|
||||||
|
|
||||||
|
如果你希望自定义 PyTorch 版本,请参考 [官方文档](https://pytorch.org/get-started/locally/) 准备 PyTorch 环境。需要注意的是,OpenCompass 要求 `pytorch>=1.13`。
|
||||||
|
|
||||||
|
````
|
||||||
|
`````
|
||||||
|
|
||||||
2. 安装 OpenCompass:
|
2. 安装 OpenCompass:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -1,5 +1,7 @@
|
|||||||
# 新闻
|
# 新闻
|
||||||
|
|
||||||
|
- **\[2023.09.08\]** 我们在评测榜单上更新了Baichuan-2/Tigerbot-2/Vicuna-v1.5, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情。
|
||||||
|
- **\[2023.09.06\]** 欢迎 [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。
|
||||||
- **\[2023.09.02\]** 我们加入了[Qwen-VL](https://github.com/QwenLM/Qwen-VL)的评测支持。
|
- **\[2023.09.02\]** 我们加入了[Qwen-VL](https://github.com/QwenLM/Qwen-VL)的评测支持。
|
||||||
- **\[2023.08.25\]** 欢迎 [**TigerBot**](https://github.com/TigerResearch/TigerBot) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。
|
- **\[2023.08.25\]** 欢迎 [**TigerBot**](https://github.com/TigerResearch/TigerBot) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。
|
||||||
- **\[2023.08.21\]** [**Lagent**](https://github.com/InternLM/lagent) 正式发布,它是一个轻量级、开源的基于大语言模型的智能体(agent)框架。我们正与Lagent团队紧密合作,推进支持基于Lagent的大模型工具能力评测 !
|
- **\[2023.08.21\]** [**Lagent**](https://github.com/InternLM/lagent) 正式发布,它是一个轻量级、开源的基于大语言模型的智能体(agent)框架。我们正与Lagent团队紧密合作,推进支持基于Lagent的大模型工具能力评测 !
|
||||||
|
@ -11,7 +11,7 @@ fairscale
|
|||||||
fuzzywuzzy
|
fuzzywuzzy
|
||||||
jieba
|
jieba
|
||||||
ltp
|
ltp
|
||||||
mmengine>=0.8.2
|
mmengine-lite
|
||||||
nltk==3.8
|
nltk==3.8
|
||||||
numpy==1.23.4
|
numpy==1.23.4
|
||||||
openai
|
openai
|
||||||
|
Loading…
Reference in New Issue
Block a user