Commit Graph

74 Commits

Author SHA1 Message Date
Fengzhe Zhou
32f40a8f83
[Sync] Sync with internal codes 2023.01.08 (#777) 2024-01-08 14:07:24 +00:00
bittersweet1999
2163f9398f
[Feature] add subject ir dataset (#755)
* add subject ir

* Add ir dataset

* Add ir dataset
2024-01-05 12:00:57 +00:00
bittersweet1999
be369c3e06
[Feature] Add multi_round dataset evaluation (#766)
* multi_round dataset

* add multi_round evaluation
2024-01-04 10:37:52 +00:00
bittersweet1999
7cd65d49d8
[Fix] Fix small bug in alignbench (#764)
* fix small bugs

* fix small bugs
2024-01-03 07:44:53 +00:00
bittersweet1999
fe0b717033
add creationbench (#753) 2023-12-29 10:03:44 +00:00
bittersweet1999
dfd9ac0fd9
[Feature] Add other judgelm prompts for Alignbench (#731)
* add judgellm prompts

* add judgelm prompts

* update import info

* fix situation that no abbr in config

* fix situation that no abbr in config

* add summarizer for other judgellm

* change config name

* add maxlen

* add maxlen

* dict assert

* dict assert

* fix strings

* fix strings
2023-12-27 17:54:53 +08:00
Fengzhe Zhou
3a68083ecc
[Sync] update configs (#734) 2023-12-25 21:59:16 +08:00
bittersweet1999
e985100cd1
[Fix] Fix subjective alignbench (#730) 2023-12-23 20:06:53 +08:00
bittersweet1999
fbb912ddf3
[Feature] Add abbr for judgemodel in subjective evaluation (#724)
* add_judgemodel_abbr

* add judgemodel abbr
2023-12-21 15:58:20 +08:00
Songyang Zhang
bfe4aa2af5
[Fix] Update alignmentbench (#704)
* update alignmentbench

* update alignmentbench

* update alignmentbench
2023-12-14 18:24:21 +08:00
bittersweet1999
1fe152b3e8
[Feature] Support AlignmentBench infer and judge (#697)
* alignmentbench infer and judge

* alignmentbench

* alignmentbench done

* alignment all done

* alignment all done
2023-12-13 19:59:30 +08:00
Hubert
4780b39eda
[Sync] format (#690)
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-12 14:03:45 +08:00
bittersweet1999
465308e430
[Feature] Add Subjective Evaluation (#680)
* new version of subject

* fixed draw

* fixed draw

* fixed draw

* done

* done

* done

* done

* fixed lint
2023-12-11 22:22:11 +08:00
Jingming
7cb53a95fa
[Fix] fix bug on standart_deviation summarizer (#675) 2023-12-08 13:38:07 +08:00
liyucheng09
05bbce8b08
[Feature] Add Data Contamination Analysis (#639)
* add contamination analysis to ceval

* fix bugs

* add contamination docs

* to pass CI check

* update

---------

Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-08 10:00:11 +08:00
bittersweet1999
1c95790fdd
New subjective judgement (#660)
* TabMWP

* TabMWP

* fixed

* fixed

* fixed

* done

* done

* done

* add new subjective judgement

* add new subjective judgement

* add new subjective judgement

* add new subjective judgement

* add new subjective judgement

* modified to a more general way

* modified to a more general way

* final

* final

* add summarizer

* add new summarize

* fixed

* fixed

* fixed

---------

Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>
2023-12-06 13:28:33 +08:00
Fengzhe Zhou
9083dea683
[Sync] some renaming (#641) 2023-11-27 16:06:49 +08:00
Fengzhe Zhou
79f6449d85
[Doc] Update FAQ (#628)
* update faq

* Update docs/zh_cn/get_started/faq.md

* Update docs/en/get_started/faq.md

* Update docs/zh_cn/get_started/faq.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-11-23 18:19:17 +08:00
Fengzhe Zhou
d949e3c003
[Feature] Add circular eval (#610)
* refactor default, add circular summarizer

* add circular

* update impl

* update doc

* minor update

* no more to be added
2023-11-23 16:45:47 +08:00
Jingming
5e75e29711
[Feature] Add multi-prompt generation demo (#568)
* [Feature] Add multi-prompt generation demo

* [Fix] change form in winogrande_gen_XXX.py

* [Fix] make multi prompt demo more directly

* [Fix] fix bug

* [Fix] minor fix

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-11-20 16:16:37 +08:00
Fengzhe Zhou
d3de5c41fb
[Sync] update model configs (#574) 2023-11-13 15:15:34 +08:00
Qing
e2355a2ede
[Feature] Add multi model viz (#509)
* add viz_multi_model.py tool

* Modify the viz_multi_model.py script according to the review

* highlight multiple optimal scores

---------

Co-authored-by: wq.chu <wq.chu@tianrang-inc.com>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-10-30 12:11:33 +08:00
Fengzhe Zhou
dbb20b8270
[Sync] update (#517) 2023-10-27 20:31:22 +08:00
Leymore
fbf5089c40
[Sync] update github token (#475) 2023-10-13 06:50:54 -05:00