# IFEval

```bash
python3 run.py --models hf_internlm2_chat_7b --datasets IFEval_gen_3321a3 --debug
```

## Chat Models

|             model             |   Prompt-level-strict-accuracy |   Inst-level-strict-accuracy |   Prompt-level-loose-accuracy |   Inst-level-loose-accuracy |
|:-----------------------------:|-------------------------------:|-----------------------------:|------------------------------:|----------------------------:|
|     qwen1.5-0.5b-chat-hf      |                          13.12 |                        23.26 |                         15.71 |                       26.38 |
|     qwen1.5-1.8b-chat-hf      |                          16.08 |                        26.26 |                         18.30 |                       29.02 |
|      qwen1.5-4b-chat-hf       |                          25.51 |                        35.97 |                         28.84 |                       39.81 |
|      qwen1.5-7b-chat-hf       |                          38.82 |                        50.00 |                         42.70 |                       53.48 |
|      qwen1.5-14b-chat-hf      |                          42.51 |                        54.20 |                         49.17 |                       59.95 |
|      qwen1.5-32b-chat-hf      |                          49.54 |                        60.43 |                         53.97 |                       64.39 |
|      qwen1.5-72b-chat-hf      |                          51.02 |                        61.99 |                         57.12 |                       67.27 |
|     qwen1.5-110b-chat-hf      |                          55.08 |                        65.59 |                         61.18 |                       70.86 |
|    internlm2-chat-1.8b-hf     |                          18.30 |                        28.78 |                         21.44 |                       32.01 |
|  internlm2-chat-1.8b-sft-hf   |                          18.67 |                        31.18 |                         19.78 |                       32.85 |
|     internlm2-chat-7b-hf      |                          34.75 |                        46.28 |                         40.48 |                       51.44 |
|   internlm2-chat-7b-sft-hf    |                          39.19 |                        50.12 |                         42.33 |                       52.76 |
|     internlm2-chat-20b-hf     |                          36.41 |                        48.68 |                         40.67 |                       53.24 |
|   internlm2-chat-20b-sft-hf   |                          44.55 |                        55.64 |                         46.77 |                       58.03 |
|    llama-3-8b-instruct-hf     |                          68.02 |                        76.74 |                         75.42 |                       82.85 |
|    llama-3-70b-instruct-hf    |                          78.00 |                        84.65 |                         84.29 |                       89.21 |
| llama-3-8b-instruct-lmdeploy  |                          69.13 |                        77.46 |                         77.26 |                       83.93 |
| llama-3-70b-instruct-lmdeploy |                          75.97 |                        82.97 |                         83.18 |                       88.37 |
|  mistral-7b-instruct-v0.1-hf  |                          40.30 |                        50.96 |                         41.96 |                       53.48 |
|  mistral-7b-instruct-v0.2-hf  |                          49.17 |                        60.43 |                         51.94 |                       64.03 |
| mixtral-8x7b-instruct-v0.1-hf |                          50.09 |                        60.67 |                         55.64 |                       65.83 |