wangjingchao
a3bac3611a
[Fix] Fix bugs when adding QwQ models
2025-03-13 17:16:16 +08:00
Hoter Young
4ef3c5083b
[Feature] Support QwQ-32B and QwQ-Plus
2025-03-13 17:07:15 +08:00
Hoter Young
25b25c8b78
[Feature] Support eval WildBench-Score
2025-03-12 17:29:46 +08:00
Hoter Young
6b1671e029
[Chores] Change datasets path
2025-03-12 17:29:01 +08:00
Hoter Young
b3b5bacc4f
[Feature] Ensure QwQ pred are processed before evaluation for configed
...
datasets
2025-02-15 14:12:16 +08:00
Hoter Young
c7e89aa3db
[Feature] Support answer extraction of QwQ when evaluating HuStandardFIB ( #36 )
2025-02-15 12:09:54 +08:00
Hoter Young
879b181c1b
add some features ( #32 )
...
* [Feature] Support answer extraction of QwQ when evaluating HuSimpleQA
* [Feature] Support mulit-language summarization in HuSimpleQASummarizer
* [Feature] Support DeepSeep-R1-Distill-Qwen_32B_turbomind
2025-02-14 20:44:53 +08:00
Hoter Young
b6c8165ca3
[Feature] Support answer extraction of QwQ when evaluating HuProverbRea_OE ( #31 )
2025-02-14 10:16:05 +08:00
Hoter Young
4114079aed
[Fix] Fix HuSimpleQASummarizer bug ( #28 )
2025-02-13 11:28:49 +08:00
Hoter Young
6f5c16edc5
[Chores] do some minor changes to HuLifeQA ( #27 )
...
1. enlarge token size
2. add two r1 distill models
2025-02-12 21:43:11 +08:00
hoteryoung
23210e089a
[Refactor] Change HuSimpleQA to subjective evaluation
2025-02-12 20:25:03 +08:00
wujiang
b4ecd718a0
update examples and configs
2025-02-10 23:08:43 +08:00
wujiang
f55810ae48
[Update] OpenHuEval examples
2025-02-10 23:08:43 +08:00
wujiang
1e1acf9236
add HuSimpleQA
2025-02-10 21:22:45 +08:00
hoteryoung
f2c17190c9
enable tested reasoning model
2025-02-10 16:51:48 +08:00
weixingjian
9ae714a577
update hustandard and eval details using data version 250205
2025-02-07 18:51:14 +08:00
weixingjian
9395dc2b60
update humatching and eval details using data version 250205
2025-02-07 14:52:51 +08:00
wujiang
8ec47e2b93
add openai model
2025-02-07 14:43:53 +08:00
wujiang
08712f49f2
update HuProverb config and eval
2025-02-04 16:10:50 +08:00
wujiang
7586186897
add deepseek api models
2025-02-04 15:07:34 +08:00
wujiang
3c93a98e91
update HuLifeQA
2025-02-04 12:24:35 +08:00
gaojunyuan
f152ccf127
add HuProverbRea dataset (20250203)
2025-02-04 11:06:10 +08:00
wujiang
794ab7c372
add & update openai models
2025-02-02 15:53:55 +08:00
wujiang
2abf6ca795
update HuMatchingFIB
2025-02-02 14:48:58 +08:00
wujiang
273e609b53
update hu_matching_fib_250126
2025-02-02 13:48:40 +08:00
Hoter Young
3939915349
[Update] Update HuLifeQA primary tags ( #6 )
2025-02-01 14:18:05 +08:00
wujiang
d4df622e02
update HuMatchingFIB config and dataset
2025-01-26 13:48:35 +08:00
Hoter Young
116a24632c
[Feature] Add OpenHuEval-HuLifeQA ( #4 )
2025-01-24 10:32:17 +08:00
weixingjian
6527fdf70a
add HuMatchingFIB under new paradigm
2025-01-22 19:32:44 +08:00
Linchen Xiao
a6193b4c02
[Refactor] Code refactoarization ( #1831 )
...
* Update
* fix lint
* update
* fix lint
2025-01-20 19:17:38 +08:00