mirror of https://github.com/open-compass/opencompass.git synced 2025-05-30 16:03:24 +08:00

History

Songyang Zhang 46cc7894e1 [Feature] Support import configs/models/summarizers from whl (#1376 ) * [Feature] Support import configs/models/summarizers from whl * Update LCBench configs * Update * Update * Update * Update * update * Update * Update * Update * Update * Update		2024-08-01 00:42:48 +08:00
..
README.md	[Feature] Support import configs/models/summarizers from whl (#1376 )	2024-08-01 00:42:48 +08:00
teval_en_gen_1ac254.py	[Feature] Support import configs/models/summarizers from whl (#1376 )	2024-08-01 00:42:48 +08:00
teval_en_gen.py	[Feature] Support import configs/models/summarizers from whl (#1376 )	2024-08-01 00:42:48 +08:00
teval_zh_gen_1ac254.py	[Feature] Support import configs/models/summarizers from whl (#1376 )	2024-08-01 00:42:48 +08:00
teval_zh_gen.py	[Feature] Support import configs/models/summarizers from whl (#1376 )	2024-08-01 00:42:48 +08:00

README.md

T-Eval

Tool utilization is comprehensively decomposed into multiple sub-processes, including instruction following, planning, reasoning, retrieval, understanding, and review. Based on that, T-Eval is introduced to evaluate the tool-utilization capability step by step. T-Eval disentangles the tool utilization evaluation into several sub-domains along model capabilities, facilitating the inner understanding of both holistic and isolated competency of LLMs.

Citation

@article{chen2023t,
  title={T-Eval: Evaluating the Tool Utilization Capability Step by Step},
  author={Chen, Zehui and Du, Weihua and Zhang, Wenwei and Liu, Kuikun and Liu, Jiangning and Zheng, Miao and Zhuo, Jingming and Zhang, Songyang and Lin, Dahua and Chen, Kai and others},
  journal={arXiv preprint arXiv:2312.14033},
  year={2023}
}