Songyang Zhang
704853e5e7
[Feature] Update pip install ( #1324 )
...
* [Feature] Update pip install
* Update Configuration
* Update
* Update
* Update
* Update Internal Config
* Update collect env
2024-07-29 18:32:50 +08:00
jxd
12b84aeb3b
[Feature] Update CHARM Memeorziation ( #1230 )
...
* update gemini api and add gemini models
* add openai models
* update CHARM evaluation
* add CHARM memorization tasks
* add CharmMemSummarizer (output eval details for memorization-independent reasoning analysis
* update CHARM readme
---------
Co-authored-by: wujiang <wujiang@pjlab.org.cn>
2024-07-26 18:42:30 +08:00
WANG WENJIN
0aad8199c7
Fix the summary error in subjective.py ( #1363 )
2024-07-25 18:36:13 +08:00
Linchen Xiao
8127fc3518
CompassBench subjective summarizer added ( #1349 )
...
* subjective summarizer added
* fix lint
2024-07-23 12:29:57 +08:00
Mo Li
104bddf647
[Doc] Update NeedleBench Docs ( #1330 )
...
* update needlebench docs
* update model_name_mapping dict
* update README
* Update README_zh-CN.md
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-07-18 13:16:19 +08:00
bittersweet1999
8e7ad2e981
[Fix] add bc for alignbench summarizer ( #1306 )
...
* fix pip version
* fix pip version
* fix alignbench
* fix import error
2024-07-12 11:06:20 +08:00
bittersweet1999
68ca48496b
[Refactor] Reorganize subjective eval ( #1284 )
...
* fix pip version
* fix pip version
* reorganize subjective eval
* reorg sub
* reorg subeval
* reorg subeval
* update subjective doc
* reorg subeval
* reorg subeval
2024-07-05 22:11:37 +08:00
Fengzhe Zhou
a32f21a356
[Sync] Sync with internal codes 2024.06.28 ( #1279 )
2024-06-28 14:16:34 +08:00
klein
1fa62c4a42
Support wildbench ( #1266 )
...
Co-authored-by: Leymore <zfz-960727@163.com>
2024-06-24 13:16:27 +08:00
bittersweet1999
982e024540
[Feature] add dataset Fofo ( #1224 )
...
* add fofo dataset
* add dataset fofo
2024-06-06 11:40:48 +08:00
Xingyuan Bu
02a0a4e857
MT-Bench-101 ( #1215 )
...
* add mt-bench-101
* add readme and requirements
* add mt-bench-101 data
* Update readme_mtbench101.md
* update readme
* update leaderboard
* fix typo
* Update readme_mtbench101.md
* fit newest opencompass
* update readme.md
* mtbench101 to opencompass
* mtbench101 to opencompass
* for code review
* for code review
* for code review
* hook
* hook
---------
Co-authored-by: liujie <ljie@buaa.edu.cn>
2024-06-03 14:52:12 +08:00
bittersweet1999
7c381e5be8
[Fix] fix summarizer ( #1217 )
...
* fix summarizer
* fix summarizer
2024-05-31 11:40:47 +08:00
Fengzhe Zhou
a77b8a5cec
[Sync] format ( #1214 )
2024-05-30 00:21:58 +08:00
Fengzhe Zhou
2954913d9b
[Sync] bump version ( #1204 )
2024-05-28 23:09:59 +08:00
bittersweet1999
8a8987be0b
fix arenahard summarizer ( #1154 )
...
Co-authored-by: Leymore <zfz-960727@163.com>
2024-05-15 13:31:29 +08:00
Fengzhe Zhou
7505b3cadf
[Feature] Add huggingface apply_chat_template ( #1098 )
...
* add TheoremQA with 5-shot
* add huggingface_above_v4_33 classes
* use num_worker partitioner in cli
* update theoremqa
* update TheoremQA
* add TheoremQA
* rename theoremqa -> TheoremQA
* update TheoremQA output path
* rewrite many model configs
* update huggingface
* further update
* refine configs
* update configs
* update configs
* add configs/eval_llama3_instruct.py
* add summarizer multi faceted
* update bbh datasets
* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py
* rename class
* update readme
* update hf above v4.33
2024-05-14 14:50:16 +08:00
Mo Li
6c711cb262
[Fix] Fix Needlebench Summarizer ( #1143 )
...
* update few-shot example
* add 128k
2024-05-13 15:59:34 +08:00
Alexander Lam
35c94d0cde
[Feature] Adding support for LLM Compression Evaluation ( #1108 )
...
* fixed formatting based on pre-commit tests
* fixed typo in comments; reduced the number of models in the eval config
* fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset
* removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English
2024-04-30 10:51:01 +08:00
liushz
a6f67e1a65
[Fix] Fix Math Evaluation with Judge Model Evaluator & Add README ( #1103 )
...
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Add Math Evaluation with Judge Model Evaluator
* Fix Llama-3 meta template
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
* Fix MATH with JudgeLM Evaluation
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2024-04-28 21:58:58 +08:00
Yggdrasill7D6
58a57a4c45
[Feature] add support for Flames datasets ( #1093 )
...
* add flames datasets
* fix lint
* rm quota
* add judgemodel info and fix os path
* support flames dataset
* support flames dataset
---------
Co-authored-by: bittersweet1999 <1487910649@qq.com>
2024-04-28 18:56:24 +08:00
klein
e4830a6926
Update CIBench ( #1089 )
...
* modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4
* update cibench: dataset and evluation
* cibench summarizer bug
* update cibench
* move extract_code import
---------
Co-authored-by: zhangchuyu@pjlab.org.cn <zhangchuyu@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 18:46:02 +08:00
bittersweet1999
e404b72c52
[Feature] support arenahard evaluation ( #1096 )
...
* support arenahard
* support arenahard
* support arenahard
2024-04-26 15:42:00 +08:00
bittersweet1999
6ba1c4937d
[Feature] Support Math evaluation via judgemodel ( #1094 )
...
* support openai math evaluation
* support openai math evaluation
* support openai math evaluation
* support math llm judge
* support math llm judge
2024-04-26 14:56:23 +08:00
bittersweet1999
6f98c8d9ab
[Fix] Fix MultiRound Subjective Evaluation( #1043 )
...
* fix multiround
* fix
2024-04-22 12:06:03 +08:00
Fengzhe Zhou
8c85edd1cd
[Sync] deprecate old mbpps ( #1064 )
2024-04-19 20:49:46 +08:00
Fengzhe Zhou
b39f501563
[Sync] update taco ( #1030 )
2024-04-09 17:50:23 +08:00
Mo Li
16f29b25f1
[Fix] Simplify needlebench summarizer ( #1024 )
...
* Conflicts:
configs/summarizers/needlebench.py
* fix lint problems
2024-04-07 17:51:13 +08:00
bittersweet1999
2d4e559763
[Feature] Add multi-model judge and fix some problems ( #1016 )
...
* support multi-model judge and moe judge
* test_moe
* test_moe
* test
* add moe judge
* support multi-judge-model
2024-04-02 11:52:06 +08:00
bittersweet1999
848e7c8a76
[fix] add different temp for different question in mtbench ( #954 )
...
* add temp for mtbench
* add document for mtbench
* add document for mtbench
2024-03-11 17:24:39 +08:00
Mo Li
8142f399a8
[Feature] Upgrade the needle-in-a-haystack experiment to Needlebench ( #913 )
...
* add needlebench
* simplify needlebench 32k, 128k, 200k for eval
* update act prompt
* fix bug in needlebench summarizer
* add needlebench intro, fix summarizer
* lint summarizer
* fix linting error
* move readme.md
* update readme for needlebench
* update docs of needlebench
* simplify needlebench summarizers
2024-03-04 11:10:52 +08:00
bittersweet1999
1c8e193de8
[Fix] hotfix for mtbench ( #877 )
...
* hotfix for mtbench
* hotfix
2024-02-06 21:26:47 +08:00
Fengzhe Zhou
d34ba11106
[Sync] Merge branch 'dev' into zfz/update-keyset-demo ( #876 )
2024-02-05 23:29:10 +08:00
bittersweet1999
7806cd0f64
[Feature] support alpacaeval ( #809 )
...
* support alpacaeval_v1
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/alpacaeval_v1.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix conflict
* support alpacaeval v2
* support alpacav2
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-02-04 14:18:36 +08:00
bittersweet1999
5c6dc908cd
fix compass arena ( #854 )
2024-01-30 16:34:38 +08:00
bittersweet1999
2ee8e8a1a1
[Feature] add mtbench ( #829 )
...
* add mtbench
* add mtbench
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/mtbench.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix mtbench
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-24 12:11:47 +08:00
bittersweet1999
3d9bb4aed7
[Fix] fix strings ( #833 )
...
* add compass arena
* add compass_arena
* add compass arena
* Update opencompass/summarizers/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix check position bias
* fix string
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-23 10:57:26 +00:00
bittersweet1999
2d4da8dd02
[Feature] Add CompassArena ( #828 )
...
* add compass arena
* add compass_arena
* add compass arena
* Update opencompass/summarizers/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/summarizers/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/compass_arena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/eval_subjective_compassarena.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/compassarena/compassarena_compare.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix check position bias
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-23 15:12:46 +08:00
bittersweet1999
814b3f73bd
reorganize subject files ( #801 )
2024-01-16 18:03:11 +08:00
Fengzhe Zhou
32f40a8f83
[Sync] Sync with internal codes 2023.01.08 ( #777 )
2024-01-08 14:07:24 +00:00
bittersweet1999
2163f9398f
[Feature] add subject ir dataset ( #755 )
...
* add subject ir
* Add ir dataset
* Add ir dataset
2024-01-05 12:00:57 +00:00
bittersweet1999
be369c3e06
[Feature] Add multi_round dataset evaluation ( #766 )
...
* multi_round dataset
* add multi_round evaluation
2024-01-04 10:37:52 +00:00
bittersweet1999
7cd65d49d8
[Fix] Fix small bug in alignbench ( #764 )
...
* fix small bugs
* fix small bugs
2024-01-03 07:44:53 +00:00
bittersweet1999
fe0b717033
add creationbench ( #753 )
2023-12-29 10:03:44 +00:00
bittersweet1999
dfd9ac0fd9
[Feature] Add other judgelm prompts for Alignbench ( #731 )
...
* add judgellm prompts
* add judgelm prompts
* update import info
* fix situation that no abbr in config
* fix situation that no abbr in config
* add summarizer for other judgellm
* change config name
* add maxlen
* add maxlen
* dict assert
* dict assert
* fix strings
* fix strings
2023-12-27 17:54:53 +08:00
Fengzhe Zhou
3a68083ecc
[Sync] update configs ( #734 )
2023-12-25 21:59:16 +08:00
bittersweet1999
e985100cd1
[Fix] Fix subjective alignbench ( #730 )
2023-12-23 20:06:53 +08:00
bittersweet1999
fbb912ddf3
[Feature] Add abbr for judgemodel in subjective evaluation ( #724 )
...
* add_judgemodel_abbr
* add judgemodel abbr
2023-12-21 15:58:20 +08:00
Songyang Zhang
bfe4aa2af5
[Fix] Update alignmentbench ( #704 )
...
* update alignmentbench
* update alignmentbench
* update alignmentbench
2023-12-14 18:24:21 +08:00
bittersweet1999
1fe152b3e8
[Feature] Support AlignmentBench infer and judge ( #697 )
...
* alignmentbench infer and judge
* alignmentbench
* alignmentbench done
* alignment all done
* alignment all done
2023-12-13 19:59:30 +08:00
Hubert
4780b39eda
[Sync] format ( #690 )
...
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-12 14:03:45 +08:00