Commit Graph

30 Commits

Author SHA1 Message Date
Fengzhe Zhou
9afbfa3639
[Sync] Fix TEvalEvaluator (#929) 2024-02-28 16:05:30 +08:00
bittersweet1999
2ee8e8a1a1
[Feature] add mtbench (#829)
* add mtbench

* add mtbench

* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/datasets/subjective/__init__.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* Update opencompass/datasets/subjective/mtbench.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* fix mtbench

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-24 12:11:47 +08:00
Fengzhe Zhou
32f40a8f83
[Sync] Sync with internal codes 2023.01.08 (#777) 2024-01-08 14:07:24 +00:00
Fengzhe Zhou
3a68083ecc
[Sync] update configs (#734) 2023-12-25 21:59:16 +08:00
bittersweet1999
1fe152b3e8
[Feature] Support AlignmentBench infer and judge (#697)
* alignmentbench infer and judge

* alignmentbench

* alignmentbench done

* alignment all done

* alignment all done
2023-12-13 19:59:30 +08:00
bittersweet1999
6130394165
[Feature] Add double order of subjective evaluation and removing duplicated response among two models (#692)
* add features

* add doc string

* add doc string
2023-12-12 20:58:17 +08:00
bittersweet1999
465308e430
[Feature] Add Subjective Evaluation (#680)
* new version of subject

* fixed draw

* fixed draw

* fixed draw

* done

* done

* done

* done

* fixed lint
2023-12-11 22:22:11 +08:00
Hubert
e78857ac36
[Sync] minor test (#683) 2023-12-11 17:42:53 +08:00
liyucheng09
05bbce8b08
[Feature] Add Data Contamination Analysis (#639)
* add contamination analysis to ceval

* fix bugs

* add contamination docs

* to pass CI check

* update

---------

Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-08 10:00:11 +08:00
Fengzhe Zhou
d3de5c41fb
[Sync] update model configs (#574) 2023-11-13 15:15:34 +08:00
Fengzhe Zhou
dbb20b8270
[Sync] update (#517) 2023-10-27 20:31:22 +08:00
liushz
2737249f31
[Feature] Add mathbench dataset and circular evaluator (#408)
* add_mathbench

* update mathbench

* support non circular eval dataset

---------

Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-10-18 04:08:31 -05:00
Leymore
fbf5089c40
[Sync] update github token (#475) 2023-10-13 06:50:54 -05:00
Leymore
362c33dff4
fix jieba rouge (#467) 2023-10-12 10:25:19 +08:00
Leymore
d7ff933a73
[Fix] Use jieba rouge in lcsts (#459)
* use jieba rouge in lcsts

* use rouge_chinese
2023-10-09 10:10:33 +08:00
Tong Gao
a1ea3c094a
[Sync] Initial support of subjective evaluation (#421)
Co-authored-by: Leymore <zfz-960727@163.com>
2023-09-22 15:42:31 +08:00
Ma Zerun
0f2c388280
Support GSM8k evaluation with tools by Lagent and LangChain (#277)
* Support GSM8k evaluation with tools by Lagent and LangChain

* Avoid to use MMEngine new feature

* update document

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-09-22 15:28:22 +08:00
Leymore
ae0cd8752f
[Feature] Use local accuracy from hf implements (#416)
* use local accuracy from hf implements

* add load from hf fallback
2023-09-20 16:35:22 +08:00
Haodong Duan
d17a5b94fa
[Refine] Refine PR #122 (#123)
* update

* update
2023-08-03 14:54:38 +08:00
Yuan Liu
191a3f6f9d
[Feature]: Use multimodal (#73)
* [Feature]: Add minigpt-4

* [Feature]: Add mm local runner

* [Feature]: Add instructblip

* [Feature]: Delete redundant file

* [Feature]: Delete redundant file

* [Feature]: Add README to InstructBLIP

* [Feature]: Update MiniGPT-4

* [Fix]: Fix lint

* [Feature]add omnibenchmark readme (#49)

* add omnibenchmark readme

* fix

* Update OmniMMBench.md

* Update OmniMMBench.md

* Update OmniMMBench.md

* [Fix]: Refine name (#54)

* [Feature]: Unify out and err

* [Fix]: Fix lint

* [Feature]: Rename to mmbench and change weight path

* [Feature]: Delete Omni in instructblip

* [Feature]: Check the avaliablity of lavis

* [Fix]: Fix lint

* [Feature]: Refactor MM

* [Refactor]: Refactor path

* [Feature]: Delete redundant files

* [Refactor]: Delete redundant files

---------

Co-authored-by: Wangbo Zhao(黑色枷锁) <56866854+wangbo-zhao@users.noreply.github.com>
2023-08-03 11:07:50 +08:00
Tong Gao
c00179d46b
[Feature] Evaluating acc based on minimum edit distance, update SIQA (#130)
* [Feature] Support evaluating acc based on minimum edit distance, update SIQA

* update
2023-08-01 14:24:27 +08:00
Haodong Duan
538b439302
[Fix] Fix seed in HFEvaluator (#122) 2023-07-28 11:29:01 +08:00
Tong Gao
311bf0daa7
[Fix] Fix CI (#70)
* [Fix] Fix CI

* [Fix] Fix CI

* [Fix] Fix CI

* update
2023-07-17 19:10:59 +08:00
Tong Gao
1e44541730
[Enhancement] Test linting in CI and fix existing linting errors (#69)
* [Enhancement] Test linting in CI

* fix linting
2023-07-17 15:59:10 +08:00
Hubert
f5103f93dd
[Feat] add bs for perspective api eval (#50)
* [Feat] add bs for perspective api eval

* fix according to comments

* fix according to comments
2023-07-12 16:26:01 +08:00
Leymore
86d5ec3d0f
Update configs (#9)
* Update implements

* Update
2023-07-06 12:27:41 +08:00
Ezra-Yu
cbe9fe2cdb Add Release Contraibution 2023-07-05 02:22:40 +00:00
cky
36f111100f update datasets 2023-07-05 01:45:26 +00:00
yingfhu
fb11108723 [Feat] support opencompass 2023-07-04 22:11:33 +08:00
gaotongxiao
7d346000bb initial commit 2023-07-04 21:34:55 +08:00