mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00
Update news (#241)
This commit is contained in:
parent
fdc69f9d58
commit
8f7bdb4b36
25
README.md
25
README.md
@ -29,16 +29,16 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
|
||||
|
||||
## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
|
||||
|
||||
- **\[2023.08.11\]** [Model comparison](https://opencompass.org.cn/model-compare/GPT-4,ChatGPT,LLaMA-2-70B,LLaMA-65B) is now online. We hope this feature offers deeper insights! 🔥🔥🔥.
|
||||
- **\[2023.08.11\]** We have supported [LEval](https://github.com/OpenLMLab/LEval). 🔥🔥🔥.
|
||||
- **\[2023.08.18\]** We have supported evaluation for **multi-modality learning**, include **MMBench, SEED-Bench, COCO-Caption, Flickr-30K, OCR-VQA, ScienceQA** and so on. Leaderboard is on the road. Feel free to try multi-modality evaluation with OpenCompass ! 🔥🔥🔥.
|
||||
- **\[2023.08.18\]** [Dataset card](https://opencompass.org.cn/dataset-detail/MMLU) is now online. Welcome new evaluation benchmark OpenCompass ! 🔥🔥🔥.
|
||||
- **\[2023.08.11\]** [Model comparison](https://opencompass.org.cn/model-compare/GPT-4,ChatGPT,LLaMA-2-70B,LLaMA-65B) is now online. We hope this feature offers deeper insights!
|
||||
- **\[2023.08.11\]** We have supported [LEval](https://github.com/OpenLMLab/LEval).
|
||||
- **\[2023.08.10\]** OpenCompass is compatible with [LMDeploy](https://github.com/InternLM/lmdeploy). Now you can follow this [instruction](https://opencompass.readthedocs.io/en/latest/advanced_guides/evaluation_turbomind.html#) to evaluate the accelerated models provide by the **Turbomind**.
|
||||
- **\[2023.08.10\]** We have supported [Qwen-7B](https://github.com/QwenLM/Qwen-7B) and [XVERSE-13B](https://github.com/xverse-ai/XVERSE-13B) ! Go to our [leaderboard](https://opencompass.org.cn/leaderboard-llm) for more results! More models are welcome to join OpenCompass.
|
||||
- **\[2023.08.09\]** Several new datasets(**CMMLU, TydiQA, SQuAD2.0, DROP**) are updated on our [leaderboard](https://opencompass.org.cn/leaderboard-llm)! More datasets are welcomed to join OpenCompass.
|
||||
- **\[2023.08.07\]** We have added a [script](tools/eval_mmbench.py) for users to evaluate the inference results of [MMBench](https://opencompass.org.cn/MMBench)-dev.
|
||||
- **\[2023.08.05\]** We have supported [GPT-4](https://openai.com/gpt-4)! Go to our [leaderboard](https://opencompass.org.cn/leaderboard-llm) for more results! More models are welcome to join OpenCompass.
|
||||
- **\[2023.07.27\]** We have supported [CMMLU](https://github.com/haonan-li/CMMLU)! More datasets are welcome to join OpenCompass.
|
||||
- **\[2023.07.21\]** Performances of Llama-2 are available in [OpenCompass leaderboard](https://opencompass.org.cn/leaderboard-llm)!
|
||||
- **\[2023.07.13\]** We release [MMBench](https://opencompass.org.cn/MMBench), a meticulously curated dataset to comprehensively evaluate different abilities of multimodality models.
|
||||
|
||||
## ✨ Introduction
|
||||
|
||||
@ -326,6 +326,23 @@ Make sure you have installed OpenCompass correctly and prepared your datasets ac
|
||||
|
||||
For more tutorials, please check our [Documentation](https://opencompass.readthedocs.io/en/latest/index.html).
|
||||
|
||||
## 🔜 Roadmap
|
||||
|
||||
- [ ] Subjective Evaluation
|
||||
- [ ] Release CompassAreana
|
||||
- [ ] Subjective evaluation dataset.
|
||||
- [ ] Long-context
|
||||
- [ ] Long-context evaluation with extensive datasets.
|
||||
- [ ] Long-context leaderboard.
|
||||
- [ ] Coding
|
||||
- [ ] Coding evaluation leaderdboard.
|
||||
- [ ] Non-python language evaluation service.
|
||||
- [ ] Agent
|
||||
- [ ] Support various agenet framework.
|
||||
- [ ] Evaluation of tool use of the LLMs.
|
||||
- [ ] Robustness
|
||||
- [ ] Support various attack method
|
||||
|
||||
## 👷♂️ Contributing
|
||||
|
||||
We appreciate all contributions to improve OpenCompass. Please refer to the [contributing guideline](https://opencompass.readthedocs.io/en/latest/notes/contribution_guide.html) for the best practice.
|
||||
|
@ -29,6 +29,8 @@
|
||||
|
||||
## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
|
||||
|
||||
- **\[2023.08.18\]** OpenCompass现已支持**多模态评测**,支持10+多模态评测数据集,包括 **MMBench, SEED-Bench, COCO-Caption, Flickr-30K, OCR-VQA, ScienceQA** 等. 多模态评测榜单即将上线,敬请! 🔥🔥🔥.
|
||||
- **\[2023.08.18\]** [数据集页面](https://opencompass.org.cn/dataset-detail/MMLU) 现已在OpenCompass官网上线,欢迎更多社区评测数据集加入OpenCompass ! 🔥🔥🔥.
|
||||
- **\[2023.08.11\]** 官网榜单上新增了[模型对比](https://opencompass.org.cn/model-compare/GPT-4,ChatGPT,LLaMA-2-70B,LLaMA-65B)功能,希望该功能可以协助提供更多发现!🔥🔥🔥.
|
||||
- **\[2023.08.11\]** 新增了 [LEval](https://github.com/OpenLMLab/LEval) 评测支持. 🔥🔥🔥.
|
||||
- **\[2023.08.10\]** OpenCompass 现已适配 [LMDeploy](https://github.com/InternLM/lmdeploy). 请参考 [评测指南](https://opencompass.readthedocs.io/zh_CN/latest/advanced_guides/evaluation_turbomind.html) 对 **Turbomind** 加速后的模型进行评估.
|
||||
@ -37,9 +39,6 @@
|
||||
- **\[2023.08.07\]** 新增了 [MMBench 评测脚本](tools/eval_mmbench.py) 以支持用户自行获取 [MMBench](https://opencompass.org.cn/MMBench)-dev 的测试结果.
|
||||
- **\[2023.08.05\]** [GPT-4](https://openai.com/gpt-4) 的评测结果已更新在 OpenCompass [大语言模型评测榜单](https://opencompass.org.cn/leaderboard-llm)!
|
||||
- **\[2023.07.27\]** 新增了 [CMMLU](https://github.com/haonan-li/CMMLU)! 欢迎更多的数据集加入 OpenCompass.
|
||||
- **\[2023.07.21\]** Llama-2 的评测结果已更新在 OpenCompass [大语言模型评测榜单](https://opencompass.org.cn/leaderboard-llm)!
|
||||
- **\[2023.07.19\]** 新增了 [Llama-2](https://ai.meta.com/llama/)!我们近期将会公布其评测结果。\[[文档](./docs/zh_cn/get_started.md#安装)\]。
|
||||
- **\[2023.07.13\]** 发布了 [MMBench](https://opencompass.org.cn/MMBench),该数据集经过细致整理,用于评测多模态模型全方位能力。
|
||||
|
||||
## ✨ 介绍
|
||||
|
||||
@ -327,6 +326,23 @@ unzip OpenCompassData.zip
|
||||
|
||||
更多教程请查看我们的[文档](https://opencompass.readthedocs.io/zh_CN/latest/index.html)。
|
||||
|
||||
## 🔜 路线图
|
||||
|
||||
- [ ] 主观评测
|
||||
- [ ] 发布主观评测榜单
|
||||
- [ ] 发布主观评测数据集
|
||||
- [ ] 长文本
|
||||
- [ ] 支持广泛的长文本评测集
|
||||
- [ ] 发布长文本评测榜单
|
||||
- [ ] 代码能力
|
||||
- [ ] 发布代码能力评测榜单
|
||||
- [ ] 提供非Python语言的评测服务
|
||||
- [ ] 智能体
|
||||
- [ ] 支持丰富的智能体方案
|
||||
- [ ] 提供智能体评测榜单
|
||||
- [ ] 鲁棒性
|
||||
- [ ] 支持各类攻击方法
|
||||
|
||||
## 👷♂️ 贡献
|
||||
|
||||
我们感谢所有的贡献者为改进和提升 OpenCompass 所作出的努力。请参考[贡献指南](https://opencompass.readthedocs.io/zh_CN/latest/notes/contribution_guide.html)来了解参与项目贡献的相关指引。
|
||||
|
Loading…
Reference in New Issue
Block a user