Commit Graph

30 Commits

Author SHA1 Message Date
wangjingchao
a3bac3611a [Fix] Fix bugs when adding QwQ models 2025-03-13 17:16:16 +08:00
Hoter Young
4ef3c5083b [Feature] Support QwQ-32B and QwQ-Plus 2025-03-13 17:07:15 +08:00
Hoter Young
25b25c8b78 [Feature] Support eval WildBench-Score 2025-03-12 17:29:46 +08:00
Hoter Young
6b1671e029 [Chores] Change datasets path 2025-03-12 17:29:01 +08:00
Hoter Young
b3b5bacc4f [Feature] Ensure QwQ pred are processed before evaluation for configed
datasets
2025-02-15 14:12:16 +08:00
Hoter Young
c7e89aa3db
[Feature] Support answer extraction of QwQ when evaluating HuStandardFIB (#36) 2025-02-15 12:09:54 +08:00
Hoter Young
879b181c1b
add some features (#32)
* [Feature] Support answer extraction of QwQ when evaluating HuSimpleQA

* [Feature] Support mulit-language summarization in HuSimpleQASummarizer

* [Feature] Support DeepSeep-R1-Distill-Qwen_32B_turbomind
2025-02-14 20:44:53 +08:00
Hoter Young
b6c8165ca3
[Feature] Support answer extraction of QwQ when evaluating HuProverbRea_OE (#31) 2025-02-14 10:16:05 +08:00
Hoter Young
4114079aed
[Fix] Fix HuSimpleQASummarizer bug (#28) 2025-02-13 11:28:49 +08:00
Hoter Young
6f5c16edc5
[Chores] do some minor changes to HuLifeQA (#27)
1. enlarge token size
2. add two r1 distill models
2025-02-12 21:43:11 +08:00
hoteryoung
23210e089a [Refactor] Change HuSimpleQA to subjective evaluation 2025-02-12 20:25:03 +08:00
wujiang
b4ecd718a0 update examples and configs 2025-02-10 23:08:43 +08:00
wujiang
f55810ae48 [Update] OpenHuEval examples 2025-02-10 23:08:43 +08:00
wujiang
1e1acf9236 add HuSimpleQA 2025-02-10 21:22:45 +08:00
hoteryoung
f2c17190c9 enable tested reasoning model 2025-02-10 16:51:48 +08:00
weixingjian
9ae714a577 update hustandard and eval details using data version 250205 2025-02-07 18:51:14 +08:00
weixingjian
9395dc2b60 update humatching and eval details using data version 250205 2025-02-07 14:52:51 +08:00
wujiang
8ec47e2b93 add openai model 2025-02-07 14:43:53 +08:00
wujiang
08712f49f2 update HuProverb config and eval 2025-02-04 16:10:50 +08:00
wujiang
7586186897 add deepseek api models 2025-02-04 15:07:34 +08:00
wujiang
3c93a98e91 update HuLifeQA 2025-02-04 12:24:35 +08:00
gaojunyuan
f152ccf127 add HuProverbRea dataset (20250203) 2025-02-04 11:06:10 +08:00
wujiang
794ab7c372 add & update openai models 2025-02-02 15:53:55 +08:00
wujiang
2abf6ca795 update HuMatchingFIB 2025-02-02 14:48:58 +08:00
wujiang
273e609b53 update hu_matching_fib_250126 2025-02-02 13:48:40 +08:00
Hoter Young
3939915349
[Update] Update HuLifeQA primary tags (#6) 2025-02-01 14:18:05 +08:00
wujiang
d4df622e02 update HuMatchingFIB config and dataset 2025-01-26 13:48:35 +08:00
Hoter Young
116a24632c
[Feature] Add OpenHuEval-HuLifeQA (#4) 2025-01-24 10:32:17 +08:00
weixingjian
6527fdf70a add HuMatchingFIB under new paradigm 2025-01-22 19:32:44 +08:00
Linchen Xiao
a6193b4c02
[Refactor] Code refactoarization (#1831)
* Update

* fix lint

* update

* fix lint
2025-01-20 19:17:38 +08:00