OpenCompass/configs/datasets/subjective/hellobench
bittersweet1999 a11e2b2fd4
[Fix] Compatible with old versions (#1616)
* fix pip version

* fix pip version

* Compatible with old versions

* compati old version

* compati old version

* compati old version

* update configs
2024-10-21 10:16:29 +08:00
..
hellobench.py [Fix] Compatible with old versions (#1616) 2024-10-21 10:16:29 +08:00
README.md [Fix] Compatible with old versions (#1616) 2024-10-21 10:16:29 +08:00

Guideline for evaluating HelloBench on Diverse LLMs

HelloBench is a comprehenvise, in-the-wild, and open-ended benchmark to evaluate LLMs' performance in generating long text. More details could be found in 🌐Github Repo and 📖Paper.

Detailed instructions to evalute HelloBench in Opencompass

  1. Git clone Opencompass
cd ~
git clone git@github.com:open-compass/opencompass.git
cd opencompass
  1. Download HelloBench data in Google Drive Url, unzip it and put it in the following path(OPENCOMPASS_PATH/data/HelloBench), make sure you get path like this:
~/opencompass/data/
└── HelloBench
    ├── chat.jsonl
    ├── heuristic_text_generation.jsonl
    ├── length_constrained_data
    │   ├── heuristic_text_generation_16k.jsonl
    │   ├── heuristic_text_generation_2k.jsonl
    │   ├── heuristic_text_generation_4k.jsonl
    │   └── heuristic_text_generation_8k.jsonl
    ├── open_ended_qa.jsonl
    ├── summarization.jsonl
    └── text_completion.jsonl
  1. Setup your opencompass
cd ~/opencompass
pip install -e .
  1. configuration your launch in configs/eval_hellobench.py
  • set your models to be evaluated

  • set your judge model (we recommend to use gpt4o-mini)

  1. launch it!
python run.py configs/eval_hellobench.py
  1. After that, you could find the results in outputs/hellobench/xxx/summary