mirror of
https://github.com/open-compass/opencompass.git
synced 2025-05-30 16:03:24 +08:00
23 lines
1.0 KiB
Markdown
23 lines
1.0 KiB
Markdown
![]() |
# T-Eval
|
||
|
|
||
|
Tool utilization is comprehensively decomposed into multiple sub-processes, including instruction following, planning, reasoning, retrieval, understanding, and review. Based on that, T-Eval is introduced to evaluate the tool-utilization capability step by step. T-Eval disentangles the tool utilization evaluation into several sub-domains along model capabilities, facilitating the inner understanding of both holistic and isolated competency of LLMs.
|
||
|
|
||
|
[Paper](https://arxiv.org/abs/2312.14033)
|
||
|
|
||
|
[Project Page](https://open-compass.github.io/T-Eval/)
|
||
|
|
||
|
[LeaderBoard](https://open-compass.github.io/T-Eval/leaderboard.html)
|
||
|
|
||
|
[HuggingFace](https://huggingface.co/datasets/lovesnowbest/T-Eval)
|
||
|
|
||
|
## Citation
|
||
|
|
||
|
```
|
||
|
@article{chen2023t,
|
||
|
title={T-Eval: Evaluating the Tool Utilization Capability Step by Step},
|
||
|
author={Chen, Zehui and Du, Weihua and Zhang, Wenwei and Liu, Kuikun and Liu, Jiangning and Zheng, Miao and Zhuo, Jingming and Zhang, Songyang and Lin, Dahua and Chen, Kai and others},
|
||
|
journal={arXiv preprint arXiv:2312.14033},
|
||
|
year={2023}
|
||
|
}
|
||
|
```
|