👋 join us on Discord and WeChat
## 📣 OpenCompass 2.0 We are thrilled to introduce OpenCompass 2.0, an advanced suite featuring three key components: [CompassKit](https://github.com/open-compass), [CompassHub](https://hub.opencompass.org.cn/home), and [CompassRank](https://rank.opencompass.org.cn/home).  **CompassRank** has been significantly enhanced into the leaderboards that now incorporates both open-source benchmarks and proprietary benchmarks. This upgrade allows for a more comprehensive evaluation of models across the industry. **CompassHub** presents a pioneering benchmark browser interface, designed to simplify and expedite the exploration and utilization of an extensive array of benchmarks for researchers and practitioners alike. To enhance the visibility of your own benchmark within the community, we warmly invite you to contribute it to CompassHub. You may initiate the submission process by clicking [here](https://hub.opencompass.org.cn/dataset-submit). **CompassKit** is a powerful collection of evaluation toolkits specifically tailored for Large Language Models and Large Vision-language Models. It provides an extensive set of tools to assess and measure the performance of these complex models effectively. Welcome to try our toolkits for in your research and products. ## 🧭 Welcome to **OpenCompass**! Just like a compass guides us on our journey, OpenCompass will guide you through the complex landscape of evaluating large language models. With its powerful algorithms and intuitive interface, OpenCompass makes it easy to assess the quality and effectiveness of your NLP models. 🚩🚩🚩 Explore opportunities at OpenCompass! We're currently **hiring full-time researchers/engineers and interns**. If you're passionate about LLM and OpenCompass, don't hesitate to reach out to us via [email](mailto:zhangsongyang@pjlab.org.cn). We'd love to hear from you! 🔥🔥🔥 We are delighted to announce that **the OpenCompass has been recommended by the Meta AI**, click [Get Started](https://ai.meta.com/llama/get-started/#validation) of Llama for more information. > **Attention**Language | Knowledge | Reasoning | Examination |
Word Definition- WiC - SummEditsIdiom Learning- CHIDSemantic Similarity- AFQMC - BUSTMCoreference Resolution- CLUEWSC - WSC - WinoGrandeTranslation- Flores - IWSLT2017Multi-language Question Answering- TyDi-QA - XCOPAMulti-language Summary- XLSum |
Knowledge Question Answering- BoolQ - CommonSenseQA - NaturalQuestions - TriviaQA |
Textual Entailment- CMNLI - OCNLI - OCNLI_FC - AX-b - AX-g - CB - RTE - ANLICommonsense Reasoning- StoryCloze - COPA - ReCoRD - HellaSwag - PIQA - SIQAMathematical Reasoning- MATH - GSM8KTheorem Application- TheoremQA - StrategyQA - SciBenchComprehensive Reasoning- BBH |
Junior High, High School, University, Professional Examinations- C-Eval - AGIEval - MMLU - GAOKAO-Bench - CMMLU - ARC - XiezhiMedical Examinations- CMB |
Understanding | Long Context | Safety | Code |
Reading Comprehension- C3 - CMRC - DRCD - MultiRC - RACE - DROP - OpenBookQA - SQuAD2.0Content Summary- CSL - LCSTS - XSum - SummScreenContent Analysis- EPRSTMT - LAMBADA - TNEWS |
Long Context Understanding- LEval - LongBench - GovReports - NarrativeQA - Qasper |
Safety- CivilComments - CrowsPairs - CValues - JigsawMultilingual - TruthfulQARobustness- AdvGLUE |
Code- HumanEval - HumanEvalX - MBPP - APPs - DS1000 |
Open-source Models | API Models |
- [InternLM](https://github.com/InternLM/InternLM) - [LLaMA](https://github.com/facebookresearch/llama) - [Vicuna](https://github.com/lm-sys/FastChat) - [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) - [Baichuan](https://github.com/baichuan-inc) - [WizardLM](https://github.com/nlpxucan/WizardLM) - [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) - [ChatGLM3](https://github.com/THUDM/ChatGLM3-6B) - [TigerBot](https://github.com/TigerResearch/TigerBot) - [Qwen](https://github.com/QwenLM/Qwen) - [BlueLM](https://github.com/vivo-ai-lab/BlueLM) - [Gemma](https://huggingface.co/google/gemma-7b) - ... | - OpenAI - Gemini - Claude - ZhipuAI(ChatGLM) - Baichuan - ByteDance(YunQue) - Huawei(PanGu) - 360 - Baidu(ERNIEBot) - MiniMax(ABAB-Chat) - SenseTime(nova) - Xunfei(Spark) - …… |