Mo Li
76dd814c4d
[Doc] Update NeedleInAHaystack Docs ( #1102 )
...
* update NeedleInAHaystack Test Docs
* update docs
2024-04-28 18:51:47 +08:00
Haodong Duan
3a232db471
[Deperecate] Remove multi-modal related stuff ( #1072 )
...
* Remove MultiModal
* update index.rst
* update README
* remove mmbench codes
* update news
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-04-26 21:20:14 +08:00
bittersweet1999
e404b72c52
[Feature] support arenahard evaluation ( #1096 )
...
* support arenahard
* support arenahard
* support arenahard
2024-04-26 15:42:00 +08:00
Fengzhe Zhou
a256753221
[Feature] Add LLaMA-3 Series Configs ( #1065 )
...
* add LLaMA-3 Series configs
* update readme
2024-04-22 14:39:31 +08:00
Fengzhe Zhou
8c85edd1cd
[Sync] deprecate old mbpps ( #1064 )
2024-04-19 20:49:46 +08:00
bittersweet1999
02e7eec911
[Feature] Support AlpacaEval_V2 ( #1006 )
...
* support alpacaeval_v2
* support alpacaeval
* update docs
* update docs
2024-03-28 16:49:04 +08:00
seanzhang-zhichen
7baa711fc7
[Fix] Fix doc problem ( #975 )
...
Co-authored-by: zhangzc <2608882093@qq.com>
2024-03-15 13:44:46 +08:00
Fengzhe Zhou
2a741477fe
update links and checkers ( #890 )
2024-03-13 11:01:35 +08:00
Songyang Zhang
47cb75a3f7
[Docs] Update README ( #956 )
...
* [Docs] Update README
* Update README.md
* [Docs] Update README
2024-03-12 11:40:34 +08:00
bittersweet1999
848e7c8a76
[fix] add different temp for different question in mtbench ( #954 )
...
* add temp for mtbench
* add document for mtbench
* add document for mtbench
2024-03-11 17:24:39 +08:00
Songyang Zhang
7c1a819bb4
[Fix] Chinese version of ReadTheDoc ( #947 )
...
* [Fix] Chinese version of ReadTheDoc
* rename
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2024-03-08 18:10:05 +08:00
Yang Yong
107e022cf4
Support prompt template for LightllmApi. Update LightllmApi token bucket. ( #945 )
2024-03-06 15:33:53 +08:00
Fengzhe Zhou
ba7cd58da3
[Update] Rename dataset pack ( #922 )
2024-02-28 10:54:04 +08:00
RunningLeon
4c87e777d8
[Feature] Add end_str for turbomind ( #859 )
...
* fix
* update
* fix internlm1
* fix docs
* remove sys
2024-02-01 22:31:14 +08:00
Fengzhe Zhou
f367551668
update doc ( #830 )
2024-01-24 13:39:28 +08:00
Yang Yong
f09a2ff418
Add LightllmApi KeyError log & Update doc ( #816 )
...
* Add LightllmApi KeyError log
* Update LightllmApi doc
2024-01-18 22:23:38 +08:00
RunningLeon
61fe873c89
[Fix] Fix turbomind and update docs ( #808 )
...
* update
* update docs
* add engine_config and gen_config in eval_config
* update
* fix
* fix
* fix
* fix docstr
* fix url
2024-01-18 14:41:35 +08:00
Fengzhe Zhou
9e5746d3d8
[Doc] Update News ( #810 )
2024-01-17 18:22:12 +08:00
Mo Li
acae560911
Added support for multi-needle testing in needle-in-a-haystack test ( #802 )
...
* Add NeedleInAHaystack Test
* Apply pre-commit formatting
* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* add needle in haystack test
* update needle in haystack test
* update plot function in tools_needleinahaystack.py
* optimizing needleinahaystack dataset generation strategy
* modify minor formatting issues
* add English version support
* change NeedleInAHaystackDataset to dynamic loading
* change NeedleInAHaystackDataset to dynamic loading
* fix needleinahaystack test eval bug
* fix needleinahaystack config bug
* Added support for multi-needle testing in needle-in-a-haystack test
* Optimize the code for plotting in the needle-in-a-haystack test.
* Correct the typo in the dataset parameters.
* update needleinahaystack test docs
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-17 13:47:34 +08:00
RunningLeon
0836aec67b
[Feature] Update evaluate turbomind ( #804 )
...
* update
* fix
* fix
* fix
2024-01-17 11:09:50 +08:00
Fengzhe Zhou
f78fcf6eeb
[Docs] Update contamination docs ( #775 )
2024-01-08 16:37:28 +08:00
tpoisonooo
ba1b684fec
typo(installation.md): fix unzip commands ( #774 )
...
* Update installation.md
* Update installation.md
2024-01-08 14:23:35 +08:00
Fengzhe Zhou
3a68083ecc
[Sync] update configs ( #734 )
2023-12-25 21:59:16 +08:00
Mo Li
0e24f4213e
[Feature] Add NeedleInAHaystack Test Support ( #714 )
...
* Add NeedleInAHaystack Test
* Apply pre-commit formatting
* Update configs/eval_hf_internlm_chat_20b_cdme.py
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
* add needle in haystack test
* update needle in haystack test
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-23 12:00:51 +08:00
RunningLeon
e34c552282
[Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend ( #721 )
...
* add llama2 test
* fix
* test qwen chat-7b
* test w4
* add baichuan2
* update
* update
* update configs and docs
* update
2023-12-21 18:22:17 +08:00
Hubert
fdf18a3238
[Docs] Update Docker docs ( #718 )
...
* [Docs] update docker docs
* [Docs] update docker docs
2023-12-19 23:29:43 +08:00
bittersweet1999
97c2068bd9
[Feature] Add JudgeLLMs ( #710 )
...
* add judgellms
* add judgellms
* add sub_size_partition
* add docs
* add ref
2023-12-19 18:40:25 +08:00
Songyang Zhang
637628a70f
[Doc] Update Doc for Alignbench ( #707 )
...
* update alignmentbench
* update alignmentbench
* update doc
* update
* update
2023-12-15 15:07:25 +08:00
Fengzhe Zhou
cadab9474f
[Doc] Update contamination docs ( #698 )
...
* update contamination docs
* add citation
* Update contamination_eval.md
* Update contamination_eval.md
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-13 18:03:39 +08:00
bittersweet1999
6130394165
[Feature] Add double order of subjective evaluation and removing duplicated response among two models ( #692 )
...
* add features
* add doc string
* add doc string
2023-12-12 20:58:17 +08:00
bittersweet1999
465308e430
[Feature] Add Subjective Evaluation ( #680 )
...
* new version of subject
* fixed draw
* fixed draw
* fixed draw
* done
* done
* done
* done
* fixed lint
2023-12-11 22:22:11 +08:00
liyucheng09
05bbce8b08
[Feature] Add Data Contamination Analysis ( #639 )
...
* add contamination analysis to ceval
* fix bugs
* add contamination docs
* to pass CI check
* update
---------
Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-08 10:00:11 +08:00
Fengzhe Zhou
79f6449d85
[Doc] Update FAQ ( #628 )
...
* update faq
* Update docs/zh_cn/get_started/faq.md
* Update docs/en/get_started/faq.md
* Update docs/zh_cn/get_started/faq.md
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-11-23 18:19:17 +08:00
Fengzhe Zhou
d949e3c003
[Feature] Add circular eval ( #610 )
...
* refactor default, add circular summarizer
* add circular
* update impl
* update doc
* minor update
* no more to be added
2023-11-23 16:45:47 +08:00
Songyang Zhang
5329724b65
[Doc] Update README and requirements. ( #622 )
...
* update readme
* update doc
2023-11-22 19:16:54 +08:00
Hubert
8c1483e3ce
[Docs] update ds1000 code eval docs ( #618 )
2023-11-22 13:37:53 +08:00
Lyu Han
eb56fd6d16
Integrate turbomind python api ( #484 )
...
* integrate turbomind python api
* update
* update user guide
* update
* fix according to reviewer's comments
* fix error
* fix linting
* update user guide
* remove debug log
---------
Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-11-21 22:34:46 +08:00
Yang Yong
d3b0d5c4ce
[Feature] Support Lightllm API ( #613 )
...
* [Feature] Support Lightllm api
* formatting & renaming
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-21 19:18:40 +08:00
Hubert
91fba2c2e9
[Feat] support humaneval and mbpp pass@k ( #598 )
...
* [Feat] support pass@ k
* [Feat] support pass@k
* [Feat] support pass@k
* [Feat] support pass@k
* [Feat] support pass@k
* [Feat] support pass@k docs
* update naming
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-16 21:22:06 +08:00
Songyang Zhang
01a0f2f3c7
[Doc] Update README ( #582 )
2023-11-13 20:39:43 +08:00
Fengzhe Zhou
689ffe5b63
[Feature] Use dataset in local path ( #570 )
...
* update commonsenseqa
* update drop
* update flores_first100
* update gsm8k
* update humaneval
* update lambda
* update obqa
* update piqa
* update race
* update siqa
* update story_cloze
* update strategyqa
* update tydiqa
* update winogrande
* update doc
* update hellaswag
* fix obqa
* update collections
* update .zip name
2023-11-13 13:00:37 +08:00
Hubert
95e0da0173
[Docs] add humanevalx dataset link in config ( #559 )
...
* [Docs] add humanevalx dataset link in config
* [Docs] add humanevalx dataset link in config
* minor fix
2023-11-10 18:18:58 +08:00
Hubert
36360bdfc3
[Fix] fix filename typo ( #549 )
2023-11-07 14:00:26 +08:00
Songyang Zhang
239c2a346e
[Feature] Add support for MiniMax API ( #548 )
...
* update requirement
* update requirement
* update with minimax
* update api model
* Update readme
* fix error
---------
Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2023-11-06 21:57:32 +08:00
Songyang Zhang
987a711232
[Doc] Update README and FAQ ( #535 )
...
* update readme
* update readme and faq
2023-11-02 15:16:37 +08:00
Fengzhe Zhou
dbb20b8270
[Sync] update ( #517 )
2023-10-27 20:31:22 +08:00
Wei Jueqi
b62842335d
[Doc] Update Subjective docs ( #510 )
...
* rename
* add en subdoc
* fix name
* fix writing
* update
---------
Co-authored-by: Leymore <zfz-960727@163.com>
2023-10-27 16:27:24 +08:00
Hubert
44c8d6cc60
[Docs] update invalid link in docs ( #499 )
2023-10-25 13:15:42 +08:00
Leymore
d7ff933a73
[Fix] Use jieba rouge in lcsts ( #459 )
...
* use jieba rouge in lcsts
* use rouge_chinese
2023-10-09 10:10:33 +08:00
Tong Gao
119bfd1569
[Refactor] Move fix_id_list to Retriever ( #442 )
...
* [Refactor] Move fix_id_list to Retriever
* update
* move to base
* fix
2023-10-07 12:53:41 +08:00