..
api_examples
[Fix] Fix BailingAPI model ( #1707 )
2024-11-26 19:24:47 +08:00
dataset_collections
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
datasets
[Feature] Added CompassArena-SubjectiveBench with Bradley-Terry Model ( #1751 )
2024-12-16 13:41:28 +08:00
models
[Fix] Fix BailingAPI model ( #1707 )
2024-11-26 19:24:47 +08:00
summarizers
[Update] Add RULER 64k config ( #1709 )
2024-11-25 19:35:27 +08:00
eval_academic_leaderboard_202407.py
[Feature] Update Lint and Leaderboard ( #1458 )
2024-08-28 22:36:42 +08:00
eval_academic_leaderboard_202412.py
[Feature] Add OC academic 2412 ( #1750 )
2024-12-10 21:53:06 +08:00
eval_alaya.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_api_demo.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_attack.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_babilong.py
[Feature] BABILong Dataset added ( #1684 )
2024-11-14 15:32:43 +08:00
eval_base_demo.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_bluelm_32k_lveval.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_charm_mem.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_charm_rea.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_chat_agent_baseline.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_chat_agent.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_chat_demo.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_chat_last.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_chembench.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_chinese_simpleqa.py
Add Chinese SimpleQA config ( #1697 )
2024-12-11 18:03:39 +08:00
eval_cibench_api.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_cibench.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_circular.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_claude.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_code_passk_repeat_dataset.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_code_passk.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_codeagent.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_codegeex2.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_compassarena_subjectivebench_bradleyterry.py
[Feature] Added CompassArena-SubjectiveBench with Bradley-Terry Model ( #1751 )
2024-12-16 13:41:28 +08:00
eval_compassarena_subjectivebench.py
[Add] Add CompassArenaSubjectiveBench ( #1645 )
2024-11-01 13:52:22 +08:00
eval_contamination.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_corebench_2409_base_objective.py
[Feature] Update BailingLM/OpenAI verbose ( #1568 )
2024-09-27 11:15:25 +08:00
eval_corebench_2409_chat_objective.py
[Feature] Update CoreBench 2.0 ( #1566 )
2024-09-26 18:44:00 +08:00
eval_corebench_2409_longcontext.py
[Feature] Add Config for CoreBench ( #1547 )
2024-09-25 11:36:43 +08:00
eval_corebench_2409_subjective.py
[Feature] Add Config for CoreBench ( #1547 )
2024-09-25 11:36:43 +08:00
eval_dingo.py
[Feature] Add dingo test ( #1529 )
2024-09-29 19:24:58 +08:00
eval_ds1000_interpreter.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_edgellm_demo.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_gpt3.5.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_gpt4.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_hellobench.py
Upload HelloBench ( #1607 )
2024-10-15 17:11:37 +08:00
eval_hf_llama2.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_hf_llama_7b.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_inference_ppl.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_internlm2_chat_keyset.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_internlm2_keyset.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_internlm_7b.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_internlm_chat_lmdeploy_apiserver.py
[Feature] Add an attribute api_key into TurboMindAPIModel default None ( #1475 )
2024-09-05 17:51:16 +08:00
eval_internlm_chat_turbomind.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_internlm_flames_chat.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_internlm_lmdeploy_apiserver.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_internlm_math_chat.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_internlm_turbomind.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_internLM.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_judgerbench.py
[Update] eval_judgerbench.py ( #1625 )
2024-10-21 15:30:29 +08:00
eval_korbench.py
[Feature] Add Korbench dataset ( #1713 )
2024-11-25 20:11:27 +08:00
eval_lightllm.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_llama2_7b_lveval.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_llama2_7b.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_llama3_instruct.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_llm_compression.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_lmdeploy_demo.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_math_llm_judge_internal.py
[Update] Update MATH dataset with model judge ( #1711 )
2024-11-25 15:14:55 +08:00
eval_math_llm_judge.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_mathbench.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_mmlu_pro.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_mmlu_with_zero_retriever_overwritten.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_modelscope_datasets.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_multi_prompt_demo.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_musr.py
[Update] MUSR dataset config prefix update ( #1692 )
2024-11-15 11:06:30 +08:00
eval_needlebench.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_PMMEval.py
[Feature] Add P-MMEval ( #1714 )
2024-11-27 21:26:18 +08:00
eval_qwen_7b_chat_lawbench.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_qwen_7b_chat.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_qwen_7b.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_ruler_fix_tokenizer.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_ruler.py
[Fix] Fix ruler_16k_gen ( #1643 )
2024-10-29 17:58:43 +08:00
eval_rwkv5_3b.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_simpleqa.py
[Feature] Add Openai Simpleqa dataset ( #1720 )
2024-11-28 19:16:07 +08:00
eval_subjective_alpacaeval_official.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_subjective.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_teval.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_TheoremQA.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00
eval_with_model_dataset_combinations.py
[Doc] Update Readme ( #1439 )
2024-08-22 14:48:45 +08:00