.. |
__init__.py
|
[Refactor] Refactorize openicl eval task (#1990)
|
2025-04-09 15:52:23 +08:00 |
abbr.py
|
[Feature] Add multi-model judge and fix some problems (#1016)
|
2024-04-02 11:52:06 +08:00 |
auxiliary.py
|
[Feat] support humaneval and mbpp pass@k (#598)
|
2023-11-16 21:22:06 +08:00 |
build.py
|
[Feature] Support Dataset Repeat and G-Pass Compute for Each Evaluator (#1886)
|
2025-02-26 19:43:12 +08:00 |
collect_env.py
|
[Feature] Update pip install (#1324)
|
2024-07-29 18:32:50 +08:00 |
datasets_info.py
|
[Dataset] Support MedMCQA and MedBullets benchmark (#2054)
|
2025-05-13 17:10:50 +08:00 |
datasets.py
|
[Update] Update o1 eval prompt (#1806)
|
2025-01-07 00:14:32 +08:00 |
dependency.py
|
[Feature]: Use multimodal (#73)
|
2023-08-03 11:07:50 +08:00 |
dict_postprocessors.py
|
[Feature] Add Judgerbench and reorg subeval (#1593)
|
2024-10-15 16:36:05 +08:00 |
file.py
|
fix output typing, change mutable list to immutable tuple (#989)
|
2024-04-26 23:07:34 +08:00 |
fileio.py
|
[Update] Update Skywork/Qwen-QwQ (#1728)
|
2024-12-05 19:30:43 +08:00 |
lark.py
|
[Feature] Several enhancements (#142)
|
2023-08-01 18:19:49 +08:00 |
logging.py
|
[Update] Add CascadeEvaluator with Data Replica (#2022)
|
2025-05-20 16:46:55 +08:00 |
menu.py
|
[Feat] Support local runner for windows (#515)
|
2023-10-27 17:16:22 +08:00 |
network.py
|
[Update] Update Skywork/Qwen-QwQ (#1728)
|
2024-12-05 19:30:43 +08:00 |
prompt.py
|
Support wildbench (#1266)
|
2024-06-24 13:16:27 +08:00 |
result_station.py
|
[Fix] Fix CLI option for results persistence (#1920)
|
2025-03-07 18:24:30 +08:00 |
run.py
|
[Update] Add CascadeEvaluator with Data Replica (#2022)
|
2025-05-20 16:46:55 +08:00 |
text_postprocessors.py
|
[Feature] Math Verify with model post_processor (#1881)
|
2025-02-20 19:32:12 +08:00 |
types.py
|
[Sync] Initial support of subjective evaluation (#421)
|
2023-09-22 15:42:31 +08:00 |