mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00

* upload hellobench * update hellobench * update readme.md * update eval_hellobench.py * update lastest --------- Co-authored-by: bittersweet1999 <148421775+bittersweet1999@users.noreply.github.com>
1.6 KiB
1.6 KiB
Guideline for evaluating HelloBench on Diverse LLMs
HelloBench is a comprehenvise, in-the-wild, and open-ended benchmark to evaluate LLMs' performance in generating long text. More details could be found in 🌐Github Repo and 📖Paper.
Detailed instructions to evalute HelloBench in Opencompass
- Git clone Opencompass
cd ~
git clone git@github.com:open-compass/opencompass.git
cd opencompass
- Download HelloBench data in Google Drive Url, unzip it and put it in the following path(OPENCOMPASS_PATH/data/HelloBench), make sure you get path like this:
~/opencompass/data/
└── HelloBench
├── chat.jsonl
├── heuristic_text_generation.jsonl
├── length_constrained_data
│ ├── heuristic_text_generation_16k.jsonl
│ ├── heuristic_text_generation_2k.jsonl
│ ├── heuristic_text_generation_4k.jsonl
│ └── heuristic_text_generation_8k.jsonl
├── open_ended_qa.jsonl
├── summarization.jsonl
└── text_completion.jsonl
- Setup your opencompass
cd ~/opencompass
pip install -e .
- configuration your launch in configs/eval_hellobench.py
-
set your models to be evaluated
-
set your judge model (we recommend to use gpt4o-mini)
- launch it!
python run.py configs/eval_hellobench.py
- After that, you could find the results in outputs/hellobench/xxx/summary