OpenCompass/opencompass/configs/datasets/livecodebench
Songyang Zhang 0d8df541bc
[Update] Update O1-style Benchmark and Prompts (#1742)
* Update JuderBench

* Support O1-style Prompts

* Update Code

* Update OpenAI

* Update BigCodeBench

* Update BigCodeBench

* Update BigCodeBench

* Update BigCodeBench

* Update BigCodeBench

* Update

* Update

* Update

* Update
2024-12-09 13:48:56 +08:00
..
livecodebench_gen_6966bc.py [Update] Update configurations (#1704) 2024-11-21 16:51:18 +08:00
livecodebench_gen_b2b0fd.py [Feature] Support LiveCodeBench (#1617) 2024-10-21 20:50:39 +08:00
livecodebench_gen.py [Update] Update configurations (#1704) 2024-11-21 16:51:18 +08:00
livecodebench_o1_gen_f0ed6c.py [Update] Update Fullbench (#1712) 2024-11-26 14:26:55 +08:00
livecodebench_split_v4_o1_gen_f0ed6c.py [Update] Update Fullbench (#1712) 2024-11-26 14:26:55 +08:00
livecodebench_split_v4_o1_postprocess_gen_f0ed6c.py [Update] Update O1-style Benchmark and Prompts (#1742) 2024-12-09 13:48:56 +08:00
livecodebench_v1_o1_gen_f0ed6c.py [Update] Update Fullbench (#1712) 2024-11-26 14:26:55 +08:00
README.md [Feature] Support LiveCodeBench (#1617) 2024-10-21 20:50:39 +08:00

LiveCodeBench

Dataset

LiveCodeBench provides holistic and contamination-free evaluation of coding capabilities of LLMs. Particularly, LiveCodeBench continuously collects new problems over time from contests across three competition platforms -- LeetCode, AtCoder, and CodeForces. Next, LiveCodeBench also focuses on a broader range of code-related capabilities, such as self-repair, code execution, and test output prediction, beyond just code generation. Currently, LiveCodeBench hosts four hundred high-quality coding problems that were published between May 2023 and March 2024.

Setting

Model Type Code Generation Test Output Prediction Code Execution
Base Model
Chat Model

Baseline Performance

Model Type Code Generation(pass@1) Test Output Prediction(pass@1) Code Execution(pass@1)
Qwen2.5-7B-Instruct(HF) 39.25 48.64 41.96
Meta-Llama-3.1-8B-Instruct(HF) 20.25 24.66 17.12

Citation

@article{jain2024livecodebench,
  author    = {Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, Ion Stoica},
  title     = {LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code},
  year      = {2024},
  journal   = {arXiv preprint},
}
@misc{2023opencompass,
    title={OpenCompass: A Universal Evaluation Platform for Foundation Models},
    author={OpenCompass Contributors},
    howpublished = {\url{https://github.com/open-compass/opencompass}},
    year={2023}
}