Commit Graph

114 Commits

Author SHA1 Message Date
Fengzhe Zhou
a256753221
[Feature] Add LLaMA-3 Series Configs (#1065)
* add LLaMA-3 Series configs

* update readme
2024-04-22 14:39:31 +08:00
Fengzhe Zhou
8c85edd1cd
[Sync] deprecate old mbpps (#1064) 2024-04-19 20:49:46 +08:00
Y0oMu
c220550fb9
updates docs (#1015)
Co-authored-by: youmuspc <yejiayi2004@outlook.com>
2024-04-02 10:30:04 +08:00
bittersweet1999
02e7eec911
[Feature] Support AlpacaEval_V2 (#1006)
* support alpacaeval_v2

* support alpacaeval

* update docs

* update docs
2024-03-28 16:49:04 +08:00
seanzhang-zhichen
7baa711fc7
[Fix] Fix doc problem (#975)
Co-authored-by: zhangzc <2608882093@qq.com>
2024-03-15 13:44:46 +08:00
Fengzhe Zhou
2a741477fe
update links and checkers (#890) 2024-03-13 11:01:35 +08:00
Songyang Zhang
47cb75a3f7
[Docs] Update README (#956)
* [Docs] Update README

* Update README.md

* [Docs] Update README
2024-03-12 11:40:34 +08:00
bittersweet1999
848e7c8a76
[fix] add different temp for different question in mtbench (#954)
* add temp for mtbench

* add document for mtbench

* add document for mtbench
2024-03-11 17:24:39 +08:00
Songyang Zhang
7c1a819bb4
[Fix] Chinese version of ReadTheDoc (#947)
* [Fix] Chinese version of ReadTheDoc

* rename

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2024-03-08 18:10:05 +08:00
Yang Yong
107e022cf4
Support prompt template for LightllmApi. Update LightllmApi token bucket. (#945) 2024-03-06 15:33:53 +08:00
Fengzhe Zhou
ba7cd58da3
[Update] Rename dataset pack (#922) 2024-02-28 10:54:04 +08:00
RunningLeon
4c87e777d8
[Feature] Add end_str for turbomind (#859)
* fix

* update

* fix internlm1

* fix docs

* remove sys
2024-02-01 22:31:14 +08:00
Fengzhe Zhou
f367551668
update doc (#830) 2024-01-24 13:39:28 +08:00
Yang Yong
f09a2ff418
Add LightllmApi KeyError log & Update doc (#816)
* Add LightllmApi KeyError log

* Update LightllmApi doc
2024-01-18 22:23:38 +08:00
RunningLeon
61fe873c89
[Fix] Fix turbomind and update docs (#808)
* update

* update docs

* add engine_config and gen_config in eval_config

* update

* fix

* fix

* fix

* fix docstr

* fix url
2024-01-18 14:41:35 +08:00
Fengzhe Zhou
9e5746d3d8
[Doc] Update News (#810) 2024-01-17 18:22:12 +08:00
Mo Li
acae560911
Added support for multi-needle testing in needle-in-a-haystack test (#802)
* Add NeedleInAHaystack Test

* Apply pre-commit formatting

* Update configs/eval_hf_internlm_chat_20b_cdme.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* add needle in haystack test

* update needle in haystack test

* update plot function in tools_needleinahaystack.py

* optimizing needleinahaystack dataset generation strategy

* modify minor formatting issues

* add English version support

* change NeedleInAHaystackDataset to dynamic loading

* change NeedleInAHaystackDataset to dynamic loading

* fix needleinahaystack test eval bug

* fix needleinahaystack config bug

* Added support for multi-needle testing in needle-in-a-haystack test

* Optimize the code for plotting in the needle-in-a-haystack test.

* Correct the typo in the dataset parameters.

* update needleinahaystack test docs

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2024-01-17 13:47:34 +08:00
RunningLeon
0836aec67b
[Feature] Update evaluate turbomind (#804)
* update

* fix

* fix

* fix
2024-01-17 11:09:50 +08:00
Fengzhe Zhou
f78fcf6eeb
[Docs] Update contamination docs (#775) 2024-01-08 16:37:28 +08:00
tpoisonooo
ba1b684fec
typo(installation.md): fix unzip commands (#774)
* Update installation.md

* Update installation.md
2024-01-08 14:23:35 +08:00
Songyang Zhang
0c75f0f95a
[Update] Update introduction of CompassBench-2024-Q1 (#769)
* [Doc] Update Example of CompassBench

* [Doc] Update Example of CompassBench

* [Doc] Update Example of CompassBench

* update

* Update docs/zh_cn/advanced_guides/compassbench_intro.md

Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>

---------

Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
2024-01-05 20:39:36 +08:00
Fengzhe Zhou
3a68083ecc
[Sync] update configs (#734) 2023-12-25 21:59:16 +08:00
Mo Li
0e24f4213e
[Feature] Add NeedleInAHaystack Test Support (#714)
* Add NeedleInAHaystack Test

* Apply pre-commit formatting

* Update configs/eval_hf_internlm_chat_20b_cdme.py

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>

* add needle in haystack test

* update needle in haystack test

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-23 12:00:51 +08:00
RunningLeon
e34c552282
[Feature] Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend (#721)
* add llama2 test

* fix

* test qwen chat-7b

* test w4

* add baichuan2

* update

* update

* update configs and docs

* update
2023-12-21 18:22:17 +08:00
Hubert
fdf18a3238
[Docs] Update Docker docs (#718)
* [Docs] update docker docs

* [Docs] update docker docs
2023-12-19 23:29:43 +08:00
bittersweet1999
97c2068bd9
[Feature] Add JudgeLLMs (#710)
* add judgellms

* add judgellms

* add sub_size_partition

* add docs

* add ref
2023-12-19 18:40:25 +08:00
Songyang Zhang
637628a70f
[Doc] Update Doc for Alignbench (#707)
* update alignmentbench

* update alignmentbench

* update doc

* update

* update
2023-12-15 15:07:25 +08:00
Fengzhe Zhou
cadab9474f
[Doc] Update contamination docs (#698)
* update contamination docs

* add citation

* Update contamination_eval.md

* Update contamination_eval.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-12-13 18:03:39 +08:00
bittersweet1999
6130394165
[Feature] Add double order of subjective evaluation and removing duplicated response among two models (#692)
* add features

* add doc string

* add doc string
2023-12-12 20:58:17 +08:00
bittersweet1999
465308e430
[Feature] Add Subjective Evaluation (#680)
* new version of subject

* fixed draw

* fixed draw

* fixed draw

* done

* done

* done

* done

* fixed lint
2023-12-11 22:22:11 +08:00
liyucheng09
05bbce8b08
[Feature] Add Data Contamination Analysis (#639)
* add contamination analysis to ceval

* fix bugs

* add contamination docs

* to pass CI check

* update

---------

Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
Co-authored-by: Leymore <zfz-960727@163.com>
2023-12-08 10:00:11 +08:00
Fengzhe Zhou
79f6449d85
[Doc] Update FAQ (#628)
* update faq

* Update docs/zh_cn/get_started/faq.md

* Update docs/en/get_started/faq.md

* Update docs/zh_cn/get_started/faq.md

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-11-23 18:19:17 +08:00
Fengzhe Zhou
d949e3c003
[Feature] Add circular eval (#610)
* refactor default, add circular summarizer

* add circular

* update impl

* update doc

* minor update

* no more to be added
2023-11-23 16:45:47 +08:00
Songyang Zhang
5329724b65
[Doc] Update README and requirements. (#622)
* update readme

* update doc
2023-11-22 19:16:54 +08:00
Hubert
8c1483e3ce
[Docs] update ds1000 code eval docs (#618) 2023-11-22 13:37:53 +08:00
Lyu Han
eb56fd6d16
Integrate turbomind python api (#484)
* integrate turbomind python api

* update

* update user guide

* update

* fix according to reviewer's comments

* fix error

* fix linting

* update user guide

* remove debug log

---------

Co-authored-by: Songyang Zhang <tonysy@users.noreply.github.com>
2023-11-21 22:34:46 +08:00
Yang Yong
d3b0d5c4ce
[Feature] Support Lightllm API (#613)
* [Feature] Support Lightllm api

* formatting & renaming

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-21 19:18:40 +08:00
Hubert
91fba2c2e9
[Feat] support humaneval and mbpp pass@k (#598)
* [Feat] support pass@ k

* [Feat] support pass@k

* [Feat] support pass@k

* [Feat] support pass@k

* [Feat] support pass@k

* [Feat] support pass@k docs

* update naming

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-11-16 21:22:06 +08:00
Wei Jueqi
14e6fe6f13
Fix bugs in subjective evaluation (#589)
* rename

* fix sub bugs and update docs

* update

* update
2023-11-14 16:11:55 +08:00
Songyang Zhang
01a0f2f3c7
[Doc] Update README (#582) 2023-11-13 20:39:43 +08:00
Fengzhe Zhou
689ffe5b63
[Feature] Use dataset in local path (#570)
* update commonsenseqa

* update drop

* update flores_first100

* update gsm8k

* update humaneval

* update lambda

* update obqa

* update piqa

* update race

* update siqa

* update story_cloze

* update strategyqa

* update tydiqa

* update winogrande

* update doc

* update hellaswag

* fix obqa

* update collections

* update .zip name
2023-11-13 13:00:37 +08:00
Hubert
95e0da0173
[Docs] add humanevalx dataset link in config (#559)
* [Docs] add humanevalx dataset link in config

* [Docs] add humanevalx dataset link in config

* minor fix
2023-11-10 18:18:58 +08:00
Hubert
36360bdfc3
[Fix] fix filename typo (#549) 2023-11-07 14:00:26 +08:00
Songyang Zhang
239c2a346e
[Feature] Add support for MiniMax API (#548)
* update requirement

* update requirement

* update with minimax

* update api model

* Update readme

* fix error

---------

Co-authored-by: zhangsongyang <zhangsongyang@pjlab.org.cn>
2023-11-06 21:57:32 +08:00
Songyang Zhang
987a711232
[Doc] Update README and FAQ (#535)
* update readme

* update readme and faq
2023-11-02 15:16:37 +08:00
Fengzhe Zhou
dbb20b8270
[Sync] update (#517) 2023-10-27 20:31:22 +08:00
Wei Jueqi
b62842335d
[Doc] Update Subjective docs (#510)
* rename

* add en subdoc

* fix name

* fix writing

* update

---------

Co-authored-by: Leymore <zfz-960727@163.com>
2023-10-27 16:27:24 +08:00
Hubert
44c8d6cc60
[Docs] update invalid link in docs (#499) 2023-10-25 13:15:42 +08:00
Leymore
d7ff933a73
[Fix] Use jieba rouge in lcsts (#459)
* use jieba rouge in lcsts

* use rouge_chinese
2023-10-09 10:10:33 +08:00
Tong Gao
119bfd1569
[Refactor] Move fix_id_list to Retriever (#442)
* [Refactor] Move fix_id_list to Retriever

* update

* move to base

* fix
2023-10-07 12:53:41 +08:00