Fengzhe Zhou
2b3d4150f3
[Sync] update evaluator ( #1175 )
2024-05-21 14:22:46 +08:00
Fengzhe Zhou
7505b3cadf
[Feature] Add huggingface apply_chat_template ( #1098 )
...
* add TheoremQA with 5-shot
* add huggingface_above_v4_33 classes
* use num_worker partitioner in cli
* update theoremqa
* update TheoremQA
* add TheoremQA
* rename theoremqa -> TheoremQA
* update TheoremQA output path
* rewrite many model configs
* update huggingface
* further update
* refine configs
* update configs
* update configs
* add configs/eval_llama3_instruct.py
* add summarizer multi faceted
* update bbh datasets
* update configs/models/hf_llama/lmdeploy_llama3_8b_instruct.py
* rename class
* update readme
* update hf above v4.33
2024-05-14 14:50:16 +08:00
Alexander Lam
35c94d0cde
[Feature] Adding support for LLM Compression Evaluation ( #1108 )
...
* fixed formatting based on pre-commit tests
* fixed typo in comments; reduced the number of models in the eval config
* fixed a bug in LLMCompressionDataset, where setting samples=None would result in passing test[:None] to load_dataset
* removed unnecessary variable in _format_table_pivot; changed lark_reporter message to English
2024-04-30 10:51:01 +08:00
bittersweet1999
6ba1c4937d
[Feature] Support Math evaluation via judgemodel ( #1094 )
...
* support openai math evaluation
* support openai math evaluation
* support openai math evaluation
* support math llm judge
* support math llm judge
2024-04-26 14:56:23 +08:00
bittersweet1999
6f98c8d9ab
[Fix] Fix MultiRound Subjective Evaluation( #1043 )
...
* fix multiround
* fix
2024-04-22 12:06:03 +08:00
Fengzhe Zhou
b39f501563
[Sync] update taco ( #1030 )
2024-04-09 17:50:23 +08:00
bittersweet1999
2d4e559763
[Feature] Add multi-model judge and fix some problems ( #1016 )
...
* support multi-model judge and moe judge
* test_moe
* test_moe
* test
* add moe judge
* support multi-judge-model
2024-04-02 11:52:06 +08:00
Fengzhe Zhou
ab6cdb2be8
[Sync] Bump version 0.2.3 ( #957 )
2024-03-12 11:51:56 +08:00
bittersweet1999
848e7c8a76
[fix] add different temp for different question in mtbench ( #954 )
...
* add temp for mtbench
* add document for mtbench
* add document for mtbench
2024-03-11 17:24:39 +08:00
Yang Yong
3829be87b1
Fix LightllmApi ppl test ( #951 )
2024-03-08 12:04:44 +08:00
Fengzhe Zhou
9afbfa3639
[Sync] Fix TEvalEvaluator ( #929 )
2024-02-28 16:05:30 +08:00
Hubert
4aa74565e2
[Feat] minor update agent related ( #839 )
...
* [Feat] update cibench
* [Feat] Support CIBench
* [Feat] Support CIBench
* [Feat] Support CIBench
* [Feat] Support CIBench
2024-01-26 14:15:51 +08:00
bittersweet1999
2ee8e8a1a1
[Feature] add mtbench ( #829 )
...
* add mtbench
* add mtbench
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update configs/datasets/subjective/multiround/mtbench_judgeby_gpt4.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/__init__.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* Update opencompass/datasets/subjective/mtbench.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* fix mtbench
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-24 12:11:47 +08:00
Fengzhe Zhou
b4afe3e7c1
[Sync] Add InternLM2 Keyset Evaluation Demo ( #807 )
...
Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
2024-01-17 13:48:12 +08:00
Fengzhe Zhou
32f40a8f83
[Sync] Sync with internal codes 2023.01.08 ( #777 )
2024-01-08 14:07:24 +00:00
bittersweet1999
be369c3e06
[Feature] Add multi_round dataset evaluation ( #766 )
...
* multi_round dataset
* add multi_round evaluation
2024-01-04 10:37:52 +00:00
Fengzhe Zhou
3a68083ecc
[Sync] update configs ( #734 )
2023-12-25 21:59:16 +08:00
bittersweet1999
1fe152b3e8
[Feature] Support AlignmentBench infer and judge ( #697 )
...
* alignmentbench infer and judge
* alignmentbench
* alignmentbench done
* alignment all done
* alignment all done
2023-12-13 19:59:30 +08:00
bittersweet1999
6130394165
[Feature] Add double order of subjective evaluation and removing duplicated response among two models ( #692 )
...
* add features
* add doc string
* add doc string
2023-12-12 20:58:17 +08:00
bittersweet1999
465308e430
[Feature] Add Subjective Evaluation ( #680 )
...
* new version of subject
* fixed draw
* fixed draw
* fixed draw
* done
* done
* done
* done
* fixed lint
2023-12-11 22:22:11 +08:00
Hubert
e78857ac36
[Sync] minor test ( #683 )
2023-12-11 17:42:53 +08:00
liyucheng09
05bbce8b08
[Feature] Add Data Contamination Analysis ( #639 )
...
* add contamination analysis to ceval
* fix bugs
* add contamination docs
* to pass CI check
* update
---------
Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-08 10:00:11 +08:00
Ma Zerun
6aaf3b91ec
[Feature] Support chat style inferencer. ( #643 )
...
* [Feature] Support chat style inferencer.
* [Fix] use new prompt
* [Fix] use new prompt
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-11-30 14:00:06 +08:00
Fengzhe Zhou
d4d1330a5a
[Sync] Fix cmnli, fix vicuna meta template, fix longbench postprocess and other minor fixes ( #625 )
2023-11-23 14:05:59 +08:00
Fengzhe Zhou
fb30b7c7a2
[Fix] Fix gen inferencer ( #615 )
2023-11-22 12:04:31 +08:00
Songyang Zhang
721a45c68f
[Bug] Update api with generation_kargs ( #614 )
...
* update api
* update generation_kwargs impl
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-22 10:02:57 +08:00
Hubert
91fba2c2e9
[Feat] support humaneval and mbpp pass@k ( #598 )
...
* [Feat] support pass@ k
* [Feat] support pass@k
* [Feat] support pass@k
* [Feat] support pass@k
* [Feat] support pass@k
* [Feat] support pass@k docs
* update naming
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-16 21:22:06 +08:00
Hubert
fcab30f82e
[Fix] change save_every defaults to 1 ( #592 )
2023-11-15 13:00:25 +08:00
Fengzhe Zhou
d3de5c41fb
[Sync] update model configs ( #574 )
2023-11-13 15:15:34 +08:00
Hubert
bb2ecf416e
[Feat] Support cibench ( #538 )
...
* [Feat] support cidataset
* [Feat] support cidataset
* [Feat] support cidataset
* [Feat] support cidataset
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* rename cibench
* rename cibench
* rename cibench
* rename cibench
* minor fix
* minor fix
* minor fix
2023-11-07 19:11:44 +08:00
Songyang Zhang
239c2a346e
[Feature] Add support for MiniMax API ( #548 )
...
* update requirement
* update requirement
* update with minimax
* update api model
* Update readme
* fix error
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2023-11-06 21:57:32 +08:00
Fengzhe Zhou
dbb20b8270
[Sync] update ( #517 )
2023-10-27 20:31:22 +08:00
Hubert
b3f5d9e421
[Feat] support math/gms8k agent config ( #494 )
...
* support math agent
* support gsm8k agent
* support gsm8k agent
* minor fix
* minor fix
* minor fix
* Update configs/eval_codeagent.py
2023-10-25 23:05:15 +08:00
liushz
2737249f31
[Feature] Add mathbench dataset and circular evaluator ( #408 )
...
* add_mathbench
* update mathbench
* support non circular eval dataset
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-10-18 04:08:31 -05:00
Leymore
fbf5089c40
[Sync] update github token ( #475 )
2023-10-13 06:50:54 -05:00
Leymore
362c33dff4
fix jieba rouge ( #467 )
2023-10-12 10:25:19 +08:00
Leymore
d7ff933a73
[Fix] Use jieba rouge in lcsts ( #459 )
...
* use jieba rouge in lcsts
* use rouge_chinese
2023-10-09 10:10:33 +08:00
Tong Gao
119bfd1569
[Refactor] Move fix_id_list to Retriever ( #442 )
...
* [Refactor] Move fix_id_list to Retriever
* update
* move to base
* fix
2023-10-07 12:53:41 +08:00
Hubert
d9f3e88dfe
[Fix] fix clp potential error and support bs>1 ( #439 )
...
* [Fix] fix clp potential error and support bs>1
* [Fix] fix clp potential error and support bs>1
* minor fix
* minor fix
2023-09-27 16:32:57 +08:00
Tong Gao
a1ea3c094a
[Sync] Initial support of subjective evaluation ( #421 )
...
Co-authored-by: Leymore <zfz-960727@163.com>
2023-09-22 15:42:31 +08:00
Ma Zerun
0f2c388280
Support GSM8k evaluation with tools by Lagent and LangChain ( #277 )
...
* Support GSM8k evaluation with tools by Lagent and LangChain
* Avoid to use MMEngine new feature
* update document
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-09-22 15:28:22 +08:00
Tong Gao
681d3013de
[Feature] Log gold answer in prediction output ( #419 )
...
* [Feature] Log gold answer in prediction output
* support clp golden ans
* minor fix
---------
Co-authored-by: yingfhu <yingfhu@gmail.com>
2023-09-22 12:44:40 +08:00
Leymore
ae0cd8752f
[Feature] Use local accuracy from hf implements ( #416 )
...
* use local accuracy from hf implements
* add load from hf fallback
2023-09-20 16:35:22 +08:00
Hubert
a11cb45c83
[Feat] implementation for support promptbench ( #239 )
...
* [Feat] support adv_glue dataset for adversarial robustness
* reorg files
* minor fix
* minor fix
* support prompt bench demo
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
2023-09-15 15:06:53 +08:00
cdpath
722eb39526
fix potential oom issue ( #387 )
2023-09-12 10:41:03 +08:00
Leymore
880b34e759
[Fix] Quick lint fix ( #362 )
...
* add default value
* lint fix
* use None
2023-09-06 14:33:13 +08:00
Leymore
b8bf16e81c
[Fix] zero retriever add default value ( #361 )
2023-09-05 10:37:42 +08:00
Leymore
8774465a8f
[Enhancement] ignore ZeroRetriever error when id_list provided ( #340 )
2023-09-04 11:12:16 +08:00
Leymore
e810974068
[Fix] Fix when missing both pad and eos token ( #287 )
...
* fix when missing both pad and eos token
* update pad_token_id impl
2023-08-31 16:53:39 +08:00
liushz
02ce139bc6
[Feature] Add Tree-of-Thought method ( #173 )
...
* Add ToT method
* Update ToT
* Update ToT
* Update ToT
* Update ToT
* Update ToT
* Update ToT
* Update ToT
* Update chain_of_thought.md
* Update icl_tot_inferencer.py
---------
Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
2023-08-23 12:23:05 +08:00