Alexander Lam
1bd594fc62
[Feature] Added CompassArena-SubjectiveBench with Bradley-Terry Model ( #1751 )
...
* fix lint issues
* updated gitignore
* changed infer_order from random to double for the pairwise_judge.py (not changing for pairwise_bt_judge.py
* added return statement to CompassArenaBradleyTerrySummarizer to return overall score for each judger model
2024-12-16 13:41:28 +08:00
Yufeng Zhao
300adc31e8
[Feature] Add Korbench dataset ( #1713 )
...
* first version for korbench
* first stage for korbench
* korbench_1
* korbench_1
* korbench_1
* korbench_1
* korbench_1_revised
* korbench_combined_1
* korbench_combined_1
* kor_combined
* kor_combined
* update
---------
Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
2024-11-25 20:11:27 +08:00
bittersweet1999
a0853c939d
[Add] Add CompassArenaSubjectiveBench ( #1645 )
...
* fix pip version
* fix pip version
* add compassarenasubjectivebench
* add compassarenasubjectivebench
* add compassarenabench
2024-11-01 13:52:22 +08:00
Songyang Zhang
a4d5a6c81b
[Feature] Support LiveCodeBench ( #1617 )
...
* Update
* Update LCB
* Update
* Update
* Update
* Update
* Update
2024-10-21 20:50:39 +08:00
bittersweet1999
a11e2b2fd4
[Fix] Compatible with old versions ( #1616 )
...
* fix pip version
* fix pip version
* Compatible with old versions
* compati old version
* compati old version
* compati old version
* update configs
2024-10-21 10:16:29 +08:00
bittersweet1999
fa54aa62f6
[Feature] Add Judgerbench and reorg subeval ( #1593 )
...
* fix pip version
* fix pip version
* update (#1522 )
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
* [Feature] Update Models (#1518 )
* Update Models
* Update
* Update humanevalx
* Update
* Update
* [Feature] Dataset prompts update for ARC, BoolQ, Race (#1527 )
add judgerbench and reorg sub
add judgerbench and reorg subeval
add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
* add judgerbench and reorg subeval
---------
Co-authored-by: zhulinJulia24 <145004780+zhulinJulia24@users.noreply.github.com>
Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
Co-authored-by: Linchen Xiao <xxllcc1993@gmail.com>
2024-10-15 16:36:05 +08:00
Songyang Zhang
ee058e25b2
[Feature] Support verbose for OpenAI API ( #1546 )
2024-09-20 17:12:52 +08:00
Linchen Xiao
245664f4c0
[Feature] Fullbench v0.1 language update ( #1463 )
...
* update
* update
* update
* update
2024-08-28 14:01:05 +08:00
CHEN PENGAN
463231c651
[Feature] Add icl_sliding_k_retriever.py and update __init__.py ( #1305 )
...
* Add icl_sliding_k_retriever.py and update __init__.py
* Fix flake8, isort, and yapf issues for Sliding Window Retriever
2024-08-23 17:18:31 +08:00
Hari Seldon
14b4b735cb
[Feature] Add support for SciCode ( #1417 )
...
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode
* add SciCode w/ bg
* add scicode
* Update README.md
* Update README.md
* Delete configs/eval_SciCode.py
* rename
* 1
* rename
* Update README.md
* Update scicode.py
* Update scicode.py
* fix some bugs
* Update
* Update
---------
Co-authored-by: root <HariSeldon0>
Co-authored-by: tonysy <sy.zhangbuaa@gmail.com>
2024-08-22 13:42:25 +08:00
Que Haoran
a244453d9e
[Feature] Support inference ppl datasets ( #1315 )
...
* commit inference ppl datasets
* revised format
* revise
* revise
* revise
* revise
* revise
* revise
2024-07-22 17:59:30 +08:00
Fengzhe Zhou
a32f21a356
[Sync] Sync with internal codes 2024.06.28 ( #1279 )
2024-06-28 14:16:34 +08:00
bittersweet1999
982e024540
[Feature] add dataset Fofo ( #1224 )
...
* add fofo dataset
* add dataset fofo
2024-06-06 11:40:48 +08:00
Fengzhe Zhou
2954913d9b
[Sync] bump version ( #1204 )
2024-05-28 23:09:59 +08:00
Fengzhe Zhou
2b3d4150f3
[Sync] update evaluator ( #1175 )
2024-05-21 14:22:46 +08:00
Fengzhe Zhou
7505b3cadf
[Feature] Add huggingface apply_chat_template ( #1098 )
...
* add TheoremQA with 5-shot
* add huggingface_above_v4_33 classes
* use num_worker partitioner in cli
* update theoremqa
* update TheoremQA
* add TheoremQA
* rename theoremqa -> TheoremQA
* update TheoremQA output path
* rewrite many model configs
* update huggingface
* further update
* refine configs
* update configs
* update configs
* add configs/eval_llama3_instruct.py
* add summarizer multi faceted
* update bbh datasets
* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py
* rename class
* update readme
* update hf above v4.33
2024-05-14 14:50:16 +08:00
Alexander Lam
35c94d0cde
[Feature] Adding support for LLM Compression Evaluation ( #1108 )
...
* fixed formatting based on pre-commit tests
* fixed typo in comments; reduced the number of models in the eval config
* fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset
* removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English
2024-04-30 10:51:01 +08:00
bittersweet1999
6ba1c4937d
[Feature] Support Math evaluation via judgemodel ( #1094 )
...
* support openai math evaluation
* support openai math evaluation
* support openai math evaluation
* support math llm judge
* support math llm judge
2024-04-26 14:56:23 +08:00
bittersweet1999
6f98c8d9ab
[Fix] Fix MultiRound Subjective Evaluation( #1043 )
...
* fix multiround
* fix
2024-04-22 12:06:03 +08:00
Fengzhe Zhou
b39f501563
[Sync] update taco ( #1030 )
2024-04-09 17:50:23 +08:00
bittersweet1999
2d4e559763
[Feature] Add multi-model judge and fix some problems ( #1016 )
...
* support multi-model judge and moe judge
* test_moe
* test_moe
* test
* add moe judge
* support multi-judge-model
2024-04-02 11:52:06 +08:00
Fengzhe Zhou
ab6cdb2be8
[Sync] Bump version 0.2.3 ( #957 )
2024-03-12 11:51:56 +08:00
bittersweet1999
848e7c8a76
[fix] add different temp for different question in mtbench ( #954 )
...
* add temp for mtbench
* add document for mtbench
* add document for mtbench
2024-03-11 17:24:39 +08:00
Yang Yong
3829be87b1
Fix LightllmApi ppl test ( #951 )
2024-03-08 12:04:44 +08:00
Fengzhe Zhou
9afbfa3639
[Sync] Fix TEvalEvaluator ( #929 )
2024-02-28 16:05:30 +08:00
Hubert
4aa74565e2
[Feat] minor update agent related ( #839 )
...
* [Feat] update cibench
* [Feat] Support CIBench
* [Feat] Support CIBench
* [Feat] Support CIBench
* [Feat] Support CIBench
2024-01-26 14:15:51 +08:00
bittersweet1999
2ee8e8a1a1
[Feature] add mtbench ( #829 )
...
* add mtbench
* add mtbench
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/mtbench.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix mtbench
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-24 12:11:47 +08:00
Fengzhe Zhou
b4afe3e7c1
[Sync] Add InternLM2 Keyset Evaluation Demo ( #807 )
...
Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
2024-01-17 13:48:12 +08:00
Fengzhe Zhou
32f40a8f83
[Sync] Sync with internal codes 2023.01.08 ( #777 )
2024-01-08 14:07:24 +00:00
bittersweet1999
be369c3e06
[Feature] Add multi_round dataset evaluation ( #766 )
...
* multi_round dataset
* add multi_round evaluation
2024-01-04 10:37:52 +00:00
Fengzhe Zhou
3a68083ecc
[Sync] update configs ( #734 )
2023-12-25 21:59:16 +08:00
bittersweet1999
1fe152b3e8
[Feature] Support AlignmentBench infer and judge ( #697 )
...
* alignmentbench infer and judge
* alignmentbench
* alignmentbench done
* alignment all done
* alignment all done
2023-12-13 19:59:30 +08:00
bittersweet1999
6130394165
[Feature] Add double order of subjective evaluation and removing duplicated response among two models ( #692 )
...
* add features
* add doc string
* add doc string
2023-12-12 20:58:17 +08:00
bittersweet1999
465308e430
[Feature] Add Subjective Evaluation ( #680 )
...
* new version of subject
* fixed draw
* fixed draw
* fixed draw
* done
* done
* done
* done
* fixed lint
2023-12-11 22:22:11 +08:00
Hubert
e78857ac36
[Sync] minor test ( #683 )
2023-12-11 17:42:53 +08:00
liyucheng09
05bbce8b08
[Feature] Add Data Contamination Analysis ( #639 )
...
* add contamination analysis to ceval
* fix bugs
* add contamination docs
* to pass CI check
* update
---------
Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-08 10:00:11 +08:00
Ma Zerun
6aaf3b91ec
[Feature] Support chat style inferencer. ( #643 )
...
* [Feature] Support chat style inferencer.
* [Fix] use new prompt
* [Fix] use new prompt
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-11-30 14:00:06 +08:00
Fengzhe Zhou
d4d1330a5a
[Sync] Fix cmnli, fix vicuna meta template, fix longbench postprocess and other minor fixes ( #625 )
2023-11-23 14:05:59 +08:00
Fengzhe Zhou
fb30b7c7a2
[Fix] Fix gen inferencer ( #615 )
2023-11-22 12:04:31 +08:00
Songyang Zhang
721a45c68f
[Bug] Update api with generation_kargs ( #614 )
...
* update api
* update generation_kwargs impl
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-22 10:02:57 +08:00
Hubert
91fba2c2e9
[Feat] support humaneval and mbpp pass@k ( #598 )
...
* [Feat] support pass@ k
* [Feat] support pass@k
* [Feat] support pass@k
* [Feat] support pass@k
* [Feat] support pass@k
* [Feat] support pass@k docs
* update naming
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-16 21:22:06 +08:00
Hubert
fcab30f82e
[Fix] change save_every defaults to 1 ( #592 )
2023-11-15 13:00:25 +08:00
Fengzhe Zhou
d3de5c41fb
[Sync] update model configs ( #574 )
2023-11-13 15:15:34 +08:00
Hubert
bb2ecf416e
[Feat] Support cibench ( #538 )
...
* [Feat] support cidataset
* [Feat] support cidataset
* [Feat] support cidataset
* [Feat] support cidataset
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* rename cibench
* rename cibench
* rename cibench
* rename cibench
* minor fix
* minor fix
* minor fix
2023-11-07 19:11:44 +08:00
Songyang Zhang
239c2a346e
[Feature] Add support for MiniMax API ( #548 )
...
* update requirement
* update requirement
* update with minimax
* update api model
* Update readme
* fix error
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2023-11-06 21:57:32 +08:00
Fengzhe Zhou
dbb20b8270
[Sync] update ( #517 )
2023-10-27 20:31:22 +08:00
Hubert
b3f5d9e421
[Feat] support math/gms8k agent config ( #494 )
...
* support math agent
* support gsm8k agent
* support gsm8k agent
* minor fix
* minor fix
* minor fix
* Update configs/eval_codeagent.py
2023-10-25 23:05:15 +08:00
liushz
2737249f31
[Feature] Add mathbench dataset and circular evaluator ( #408 )
...
* add_mathbench
* update mathbench
* support non circular eval dataset
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-10-18 04:08:31 -05:00
Leymore
fbf5089c40
[Sync] update github token ( #475 )
2023-10-13 06:50:54 -05:00
Leymore
362c33dff4
fix jieba rouge ( #467 )
2023-10-12 10:25:19 +08:00