👋 join us on Discord and WeChat
## 🧭 Welcome to **OpenCompass**! Just like a compass guides us on our journey, OpenCompass will guide you through the complex landscape of evaluating large language models. With its powerful algorithms and intuitive interface, OpenCompass makes it easy to assess the quality and effectiveness of your NLP models. 🚩🚩🚩 Explore opportunities at OpenCompass! We're currently **hiring full-time researchers/engineers and interns**. If you're passionate about LLM and OpenCompass, don't hesitate to reach out to us via [email](mailto:zhangsongyang@pjlab.org.cn). We'd love to hear from you! 🔥🔥🔥 We are delighted to announce that **the OpenCompass has been recommended by the Meta AI**, click [Get Started](https://ai.meta.com/llama/get-started/#validation) of Llama for more information. > **Attention**Language | Knowledge | Reasoning | Examination |
Word Definition- WiC - SummEditsIdiom Learning- CHIDSemantic Similarity- AFQMC - BUSTMCoreference Resolution- CLUEWSC - WSC - WinoGrandeTranslation- Flores - IWSLT2017Multi-language Question Answering- TyDi-QA - XCOPAMulti-language Summary- XLSum |
Knowledge Question Answering- BoolQ - CommonSenseQA - NaturalQuestions - TriviaQA |
Textual Entailment- CMNLI - OCNLI - OCNLI_FC - AX-b - AX-g - CB - RTE - ANLICommonsense Reasoning- StoryCloze - COPA - ReCoRD - HellaSwag - PIQA - SIQAMathematical Reasoning- MATH - GSM8KTheorem Application- TheoremQA - StrategyQA - SciBenchComprehensive Reasoning- BBH |
Junior High, High School, University, Professional Examinations- C-Eval - AGIEval - MMLU - GAOKAO-Bench - CMMLU - ARC - XiezhiMedical Examinations- CMB |
Understanding | Long Context | Safety | Code |
Reading Comprehension- C3 - CMRC - DRCD - MultiRC - RACE - DROP - OpenBookQA - SQuAD2.0Content Summary- CSL - LCSTS - XSum - SummScreenContent Analysis- EPRSTMT - LAMBADA - TNEWS |
Long Context Understanding- LEval - LongBench - GovReports - NarrativeQA - Qasper |
Safety- CivilComments - CrowsPairs - CValues - JigsawMultilingual - TruthfulQARobustness- AdvGLUE |
Code- HumanEval - HumanEvalX - MBPP - APPs - DS1000 |
Open-source Models | API Models |
- InternLM - LLaMA - Vicuna - Alpaca - Baichuan - WizardLM - ChatGLM2 - Falcon - TigerBot - Qwen - ... | - OpenAI - Claude - PaLM (coming soon) - …… |