Commit Graph

7 Commits

Author SHA1 Message Date
Hoter Young
25b25c8b78 [Feature] Support eval WildBench-Score 2025-03-12 17:29:46 +08:00
Hoter Young
6b1671e029 [Chores] Change datasets path 2025-03-12 17:29:01 +08:00
Hoter Young
b3b5bacc4f [Feature] Ensure QwQ pred are processed before evaluation for configed
datasets
2025-02-15 14:12:16 +08:00
Hoter Young
6f5c16edc5
[Chores] do some minor changes to HuLifeQA (#27)
1. enlarge token size
2. add two r1 distill models
2025-02-12 21:43:11 +08:00
hoteryoung
f2c17190c9 enable tested reasoning model 2025-02-10 16:51:48 +08:00
wujiang
8ec47e2b93 add openai model 2025-02-07 14:43:53 +08:00
wujiang
3c93a98e91 update HuLifeQA 2025-02-04 12:24:35 +08:00