Merge branch 'open-compass:main' into main

2025-05-30 16:03:24 +08:00 · 2025-02-14 13:13:06 +08:00 · 2025-02-14 13:13:06 +08:00 · cf8bb1a9e8
commit cf8bb1a9e8
parent f046d49e92 3fd8b4e0cd
1773 changed files with 3295 additions and 66327 deletions
--- a/.github/workflows/daily-run-test.yml
+++ b/.github/workflows/daily-run-test.yml
@ -70,10 +70,10 @@ jobs:
        with:
          repository: ${{ github.event.inputs.repo_org || 'open-compass/opencompass' }}
          ref: ${{github.event.inputs.repo_ref || 'main'}}
-      - name: Set up Python 3.x
-        uses: actions/setup-python@v2
+      - name: Set up Python 3.10
+        uses: actions/setup-python@v4
        with:
-          python-version: 3.x
+          python-version: '3.10'
      - name: Build lagent
        run: |
          pip install wheel setuptools
--- a/.github/workflows/lint.yml
+++ b/.github/workflows/lint.yml
@ -17,7 +17,7 @@ jobs:
          python-version: '3.10'
      - name: Install pre-commit hook
        run: |
-          pip install pre-commit==3.8.0 mmengine
+          pip install pre-commit==3.8.0 mmengine==0.10.5
          pre-commit install
      - name: Linting
        run: pre-commit run --all-files
--- a/.pre-commit-config-zh-cn.yaml
+++ b/.pre-commit-config-zh-cn.yaml
@ -1,6 +1,7 @@
 exclude: |
    (?x)^(
      tests/data/|
+      tests/dataset/|
      opencompass/models/internal/|
      opencompass/utils/internal/|
      opencompass/openicl/icl_evaluator/hf_metrics/|
@ -10,12 +11,9 @@ exclude: |
      opencompass/datasets/teval/|
      opencompass/datasets/NPHardEval/|
      opencompass/datasets/TheoremQA|
+      opencompass/datasets/subjective/mtbench101.py|
      docs/zh_cn/advanced_guides/compassbench_intro.md |
      docs/zh_cn/advanced_guides/compassbench_v2_0.md |
-      opencompass/configs/datasets/ |
-      opencompass/configs/models/|
-      opencompass/configs/summarizers/|
-      opencompass/configs/dataset_collections/ |
      opencompass/utils/datasets.py |
      opencompass/utils/datasets_info.py
    )
@ -26,8 +24,8 @@ repos:
      - id: flake8
        exclude: |
            (?x)^(
-                configs/ |
-                example_scripts/
+                opencompass/configs/|
+                examples/
            )
  - repo: https://gitee.com/openmmlab/mirrors-isort
    rev: 5.11.5
@ -35,8 +33,8 @@ repos:
      - id: isort
        exclude: |
            (?x)^(
-                configs/ |
-                example_scripts/
+                opencompass/configs/|
+                examples/
            )
  - repo: https://gitee.com/openmmlab/mirrors-yapf
    rev: v0.32.0
@ -44,8 +42,8 @@ repos:
      - id: yapf
        exclude: |
            (?x)^(
-                configs/ |
-                example_scripts/
+                opencompass/configs/|
+                examples/
            )
  - repo: https://gitee.com/openmmlab/mirrors-codespell
    rev: v2.2.1
@ -55,9 +53,8 @@ repos:
            (?x)^(
                .*\.jsonl|
                .*\.md.template|
-                configs/ |
                opencompass/configs/ |
-                example_scripts/
+                examples/
            )
  - repo: https://gitee.com/openmmlab/mirrors-pre-commit-hooks
    rev: v4.3.0
@ -67,7 +64,6 @@ repos:
            (?x)^(
              dicts/|
              projects/.*?/dicts/|
-              configs/.*?/.*\.txt
            )
      - id: check-yaml
      - id: end-of-file-fixer
@ -75,7 +71,6 @@ repos:
            (?x)^(
              dicts/|
              projects/.*?/dicts/|
-              configs/.*?/.*\.txt
            )
      - id: requirements-txt-fixer
      - id: double-quote-string-fixer
@ -107,7 +102,7 @@ repos:
        language: script
        pass_filenames: true
        require_serial: true
-        files: ^configs/datasets
+        files: ^opencompass/configs/datasets
  - repo: local
    hooks:
    -   id: update-dataset-suffix-pacakge
@ -120,41 +115,6 @@ repos:
        args:
          - --root_folder
          - opencompass/configs/datasets
-  - repo: local
-    hooks:
-    -   id: compare-configs-datasets
-        name: compare configs datasets
-        entry: ./tools/compare_configs.py
-        language: script
-        pass_filenames: false
-        # require_serial: true
-        args:
-          - configs/datasets
-          - opencompass/configs/datasets
-  - repo: local
-    hooks:
-    -   id: compare-configs-models
-        name: compare configs models
-        entry: ./tools/compare_configs.py
-        language: script
-        pass_filenames: false
-        # require_serial: true
-        args:
-          - configs/models
-          - opencompass/configs/models
-          - --ignore
-          - llama
-  - repo: local
-    hooks:
-    -   id: compare-configs-summarizers
-        name: compare configs summarizers
-        entry: ./tools/compare_configs.py
-        language: script
-        pass_filenames: false
-        # require_serial: true
-        args:
-          - configs/summarizers
-          - opencompass/configs/summarizers
  # - repo: https://github.com/open-mmlab/pre-commit-hooks
  #   rev: v0.2.0  # Use the ref you want to point at
  #   hooks:
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -14,10 +14,6 @@ exclude: |
      opencompass/datasets/subjective/mtbench101.py|
      docs/zh_cn/advanced_guides/compassbench_intro.md |
      docs/zh_cn/advanced_guides/compassbench_v2_0.md |
-      opencompass/configs/datasets/ |
-      opencompass/configs/models/|
-      opencompass/configs/summarizers/ |
-      opencompass/configs/dataset_collections/ |
      opencompass/utils/datasets.py |
      opencompass/utils/datasets_info.py
    )
@ -28,8 +24,8 @@ repos:
      - id: flake8
        exclude: |
            (?x)^(
-                configs/ |
-                example_scripts/
+                opencompass/configs/|
+                examples/
            )
  - repo: https://github.com/PyCQA/isort
    rev: 5.11.5
@ -37,8 +33,8 @@ repos:
      - id: isort
        exclude: |
            (?x)^(
-                configs/ |
-                example_scripts/
+                opencompass/configs/|
+                examples/
            )
  - repo: https://github.com/pre-commit/mirrors-yapf
    rev: v0.32.0
@ -46,8 +42,8 @@ repos:
      - id: yapf
        exclude: |
            (?x)^(
-                configs/ |
-                example_scripts/
+                opencompass/configs/|
+                examples/
            )
  - repo: https://github.com/codespell-project/codespell
    rev: v2.2.1
@ -57,9 +53,8 @@ repos:
            (?x)^(
                .*\.jsonl|
                .*\.md.template|
-                configs/ |
                opencompass/configs/ |
-                example_scripts/
+                examples/
            )
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.3.0
@ -69,7 +64,6 @@ repos:
            (?x)^(
              dicts/|
              projects/.*?/dicts/|
-              configs/.*?/.*\.txt
            )
      - id: check-yaml
      - id: end-of-file-fixer
@ -77,7 +71,6 @@ repos:
            (?x)^(
              dicts/|
              projects/.*?/dicts/|
-              configs/.*?/.*\.txt
            )
      - id: requirements-txt-fixer
      - id: double-quote-string-fixer
@ -109,7 +102,7 @@ repos:
        language: script
        pass_filenames: true
        require_serial: true
-        files: ^configs/datasets
+        files: ^opencompass/configs/datasets
  - repo: local
    hooks:
    -   id: update-dataset-suffix-pacakge
@ -122,42 +115,6 @@ repos:
        args:
          - --root_folder
          - opencompass/configs/datasets
-  - repo: local
-    hooks:
-    -   id: compare-configs-datasets
-        name: compare configs datasets
-        entry: ./tools/compare_configs.py
-        language: script
-        pass_filenames: false
-        # require_serial: true
-        args:
-          - configs/datasets
-          - opencompass/configs/datasets
-  - repo: local
-    hooks:
-    -   id: compare-configs-models
-        name: compare configs models
-        entry: ./tools/compare_configs.py
-        language: script
-        pass_filenames: false
-        # require_serial: true
-        args:
-          - configs/models
-          - opencompass/configs/models
-          - --ignore
-          - llama
-  - repo: local
-    hooks:
-    -   id: compare-configs-summarizers
-        name: compare configs summarizers
-        entry: ./tools/compare_configs.py
-        language: script
-        pass_filenames: false
-        # require_serial: true
-        args:
-          - configs/summarizers
-          - opencompass/configs/summarizers
-
  # - repo: https://github.com/open-mmlab/pre-commit-hooks
  #   rev: v0.2.0  # Use the ref you want to point at
  #   hooks:
--- a/README.md
+++ b/README.md
@ -57,9 +57,10 @@ Just like a compass guides us on our journey, OpenCompass will guide you through

 ## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

- **\[2024.12.17\]** We have provided the evaluation script for the December [CompassAcademic](configs/eval_academic_leaderboard_202412.py), which allows users to easily reproduce the official evaluation results by configuring it.
- **\[2024.11.14\]** OpenCompass now offers support for a sophisticated benchmark designed to evaluate complex reasoning skills — [MuSR](https://arxiv.org/pdf/2310.16049). Check out the [demo](configs/eval_musr.py) and give it a spin! 🔥🔥🔥
- **\[2024.11.14\]** OpenCompass now supports the brand new long-context language model evaluation benchmark — [BABILong](https://arxiv.org/pdf/2406.10149). Have a look at the [demo](configs/eval_babilong.py) and give it a try! 🔥🔥🔥
+- **\[2025.01.16\]** We now support the [InternLM3-8B-Instruct](https://huggingface.co/internlm/internlm3-8b-instruct) model which has enhanced performance on reasoning and knowledge-intensive tasks.
+- **\[2024.12.17\]** We have provided the evaluation script for the December [CompassAcademic](examples/eval_academic_leaderboard_202412.py), which allows users to easily reproduce the official evaluation results by configuring it.
+- **\[2024.11.14\]** OpenCompass now offers support for a sophisticated benchmark designed to evaluate complex reasoning skills — [MuSR](https://arxiv.org/pdf/2310.16049). Check out the [demo](examples/eval_musr.py) and give it a spin! 🔥🔥🔥
+- **\[2024.11.14\]** OpenCompass now supports the brand new long-context language model evaluation benchmark — [BABILong](https://arxiv.org/pdf/2406.10149). Have a look at the [demo](examples/eval_babilong.py) and give it a try! 🔥🔥🔥
 - **\[2024.10.14\]** We now support the OpenAI multilingual QA dataset [MMMLU](https://huggingface.co/datasets/openai/MMMLU). Feel free to give it a try! 🔥🔥🔥
 - **\[2024.09.19\]** We now support [Qwen2.5](https://huggingface.co/Qwen)(0.5B to 72B) with multiple backend(huggingface/vllm/lmdeploy). Feel free to give them a try! 🔥🔥🔥
 - **\[2024.09.17\]** We now support OpenAI o1(`o1-mini-2024-09-12` and `o1-preview-2024-09-12`). Feel free to give them a try! 🔥🔥🔥
@ -181,10 +182,10 @@ After ensuring that OpenCompass is installed correctly according to the above st
  opencompass --models hf_internlm2_5_1_8b_chat --datasets demo_gsm8k_chat_gen

  # Python scripts
-  opencompass ./configs/eval_chat_demo.py
+  opencompass examples/eval_chat_demo.py
  ```

-  You can find more script examples under [configs](./configs) folder.
+  You can find more script examples under [examples](./examples) folder.

 - API evaluation

@ -196,7 +197,7 @@ After ensuring that OpenCompass is installed correctly according to the above st
  opencompass --models gpt_4o_2024_05_13 --datasets demo_gsm8k_chat_gen

  # Python scripts
-  opencompass ./configs/eval_api_demo.py
+  opencompass examples/eval_api_demo.py

  # You can use o1_mini_2024_09_12/o1_preview_2024_09_12  for o1 models, we set max_completion_tokens=8192 as default.
  ```
@ -210,7 +211,7 @@ After ensuring that OpenCompass is installed correctly according to the above st
  opencompass --models hf_internlm2_5_1_8b_chat --datasets demo_gsm8k_chat_gen -a lmdeploy

  # Python scripts
-  opencompass ./configs/eval_lmdeploy_demo.py
+  opencompass examples/eval_lmdeploy_demo.py
  ```

 - Supported Models
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@ -57,6 +57,7 @@

 ## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

+- **\[2025.01.16\]** 我们现已支持 [InternLM3-8B-Instruct](https://huggingface.co/internlm/internlm3-8b-instruct) 模型，该模型在推理、知识类任务上取得同量级最优性能，欢迎尝试。
 - **\[2024.12.17\]** 我们提供了12月CompassAcademic学术榜单评估脚本 [CompassAcademic](configs/eval_academic_leaderboard_202412.py)，你可以通过简单地配置复现官方评测结果。
 - **\[2024.10.14\]** 现已支持OpenAI多语言问答数据集[MMMLU](https://huggingface.co/datasets/openai/MMMLU)，欢迎尝试! 🔥🔥🔥
 - **\[2024.09.19\]** 现已支持[Qwen2.5](https://huggingface.co/Qwen)(0.5B to 72B) ，可以使用多种推理后端(huggingface/vllm/lmdeploy), 欢迎尝试! 🔥🔥🔥
@ -168,17 +169,17 @@ humaneval, triviaqa, commonsenseqa, tydiqa, strategyqa, cmmlu, lambada, piqa, ce

 - ### 首次评测

-  OpenCompass 支持通过命令行界面 (CLI) 或 Python 脚本来设置配置。对于简单的评估设置，我们推荐使用 CLI；而对于更复杂的评估，则建议使用脚本方式。你可以在configs文件夹下找到更多脚本示例。
+  OpenCompass 支持通过命令行界面 (CLI) 或 Python 脚本来设置配置。对于简单的评估设置，我们推荐使用 CLI；而对于更复杂的评估，则建议使用脚本方式。你可以在examples文件夹下找到更多脚本示例。

  ```bash
  # 命令行界面 (CLI)
  opencompass --models hf_internlm2_5_1_8b_chat --datasets demo_gsm8k_chat_gen

  # Python 脚本
-  opencompass ./configs/eval_chat_demo.py
+  opencompass examples/eval_chat_demo.py
  ```

-  你可以在[configs](./configs) 文件夹下找到更多的脚本示例。
+  你可以在[examples](./examples) 文件夹下找到更多的脚本示例。

 - ### API评测

@ -190,7 +191,7 @@ humaneval, triviaqa, commonsenseqa, tydiqa, strategyqa, cmmlu, lambada, piqa, ce
  opencompass --models gpt_4o_2024_05_13 --datasets demo_gsm8k_chat_gen

  # Python 脚本
-  opencompass  ./configs/eval_api_demo.py
+  opencompass  examples/eval_api_demo.py


  # 现已支持 o1_mini_2024_09_12/o1_preview_2024_09_12  模型, 默认情况下 max_completion_tokens=8192.
--- a/configs/api_examples/eval_api_360.py
+++ b/configs/api_examples/eval_api_360.py
@ -1,43 +0,0 @@
-from mmengine.config import read_base
-from opencompass.models import AI360GPT
-from opencompass.partitioners import NaivePartitioner
-from opencompass.runners.local_api import LocalAPIRunner
-from opencompass.tasks import OpenICLInferTask
-
-with read_base():
-    from opencompass.configs.summarizers.medium import summarizer
-    from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
-
-datasets = [
-    *ceval_datasets,
-]
-
-models = [
-    dict(
-        abbr='360GPT_S2_V9',
-        type=AI360GPT,
-        path='360GPT_S2_V9',
-        key='xxxxxxxxxxxx',
-        generation_kwargs={
-            'temperature': 0.9,
-            'max_tokens': 2048,
-            'top_p': 0.5,
-            'tok_k': 0,
-            'repetition_penalty': 1.05,
-        },
-        query_per_second=1,
-        max_out_len=2048,
-        max_seq_len=2048,
-        batch_size=8),
-]
-
-infer = dict(
-    partitioner=dict(type=NaivePartitioner),
-    runner=dict(
-        type=LocalAPIRunner,
-        max_num_workers=2,
-        concurrent_users=2,
-        task=dict(type=OpenICLInferTask)),
-)
-
-work_dir ='./output/api_360GPT_S2_V9'
--- a/configs/api_examples/eval_api_baichuan.py
+++ b/configs/api_examples/eval_api_baichuan.py
@ -1,44 +0,0 @@
-from mmengine.config import read_base
-from opencompass.models import BaiChuan
-from opencompass.partitioners import NaivePartitioner
-from opencompass.runners.local_api import LocalAPIRunner
-from opencompass.tasks import OpenICLInferTask
-
-with read_base():
-    from opencompass.configs.summarizers.medium import summarizer
-    from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
-
-datasets = [
-    *ceval_datasets,
-]
-
-models = [
-    dict(
-        abbr='Baichuan2-53B',
-        type=BaiChuan,
-        path='Baichuan2-53B',
-        api_key='xxxxxx',
-        secret_key='xxxxx',
-        url='xxxxx',
-        generation_kwargs={
-            'temperature': 0.3,
-            'top_p': 0.85,
-            'top_k': 5,
-            'with_search_enhance': False,
-        },
-        query_per_second=1,
-        max_out_len=2048,
-        max_seq_len=2048,
-        batch_size=8),
-]
-
-infer = dict(
-    partitioner=dict(type=NaivePartitioner),
-    runner=dict(
-        type=LocalAPIRunner,
-        max_num_workers=2,
-        concurrent_users=2,
-        task=dict(type=OpenICLInferTask)),
-)
-
-work_dir = 'outputs/api_baichuan53b/'
--- a/configs/api_examples/eval_api_baidu.py
+++ b/configs/api_examples/eval_api_baidu.py
@ -1,42 +0,0 @@
-from mmengine.config import read_base
-from opencompass.models import ERNIEBot
-from opencompass.partitioners import NaivePartitioner
-from opencompass.runners.local_api import LocalAPIRunner
-from opencompass.tasks import OpenICLInferTask
-
-with read_base():
-    from opencompass.configs.summarizers.medium import summarizer
-    from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
-
-datasets = [
-    *ceval_datasets,
-]
-
-models = [
-    dict(
-        abbr='erniebot',
-        type=ERNIEBot,
-        path='erniebot',
-        key='xxxxxx',  # please give you key
-        secretkey='xxxxxxxxx',  # please give your group_id
-        url='xxxxxxxxx',
-        generation_kwargs = {
-            'temperature': 0.8,
-        },
-        query_per_second=1,
-        max_out_len=2048,
-        max_seq_len=2048,
-        batch_size=8
-    ),
-]
-
-infer = dict(
-    partitioner=dict(type=NaivePartitioner),
-    runner=dict(
-        type=LocalAPIRunner,
-        max_num_workers=2,
-        concurrent_users=2,
-        task=dict(type=OpenICLInferTask)),
-)
-
-work_dir = 'outputs/api_erniebot/'
--- a/configs/api_examples/eval_api_bailing.py
+++ b/configs/api_examples/eval_api_bailing.py
@ -1,44 +0,0 @@
-from mmengine.config import read_base
-
-from opencompass.models import BailingAPI
-from opencompass.partitioners import NaivePartitioner
-from opencompass.runners.local_api import LocalAPIRunner
-from opencompass.tasks import OpenICLInferTask
-
-with read_base():
-    from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
-    from opencompass.configs.summarizers.medium import summarizer
-
-datasets = [
-    *ceval_datasets,
-]
-
-models = [
-    dict(
-        path='Bailing-Lite-1116',
-        token='xxxxxx',  # set your key here or in environment variable BAILING_API_KEY
-        url='https://bailingchat.alipay.com/chat/completions',
-        type=BailingAPI,
-        max_out_len=11264,
-        batch_size=1,
-        generation_kwargs={
-            'temperature': 0.01,
-            'top_p': 1.0,
-            'top_k': -1,
-            'n': 1,
-            'logprobs': 1,
-        },
-    ),
-]
-
-infer = dict(
-    partitioner=dict(type=NaivePartitioner),
-    runner=dict(
-        type=LocalAPIRunner,
-        max_num_workers=2,
-        concurrent_users=2,
-        task=dict(type=OpenICLInferTask),
-    ),
-)
-
-work_dir = 'outputs/api_bailing/'
--- a/configs/api_examples/eval_api_bytedance.py
+++ b/configs/api_examples/eval_api_bytedance.py
@ -1,44 +0,0 @@
-from mmengine.config import read_base
-from opencompass.models import ByteDance
-from opencompass.partitioners import NaivePartitioner
-from opencompass.runners.local_api import LocalAPIRunner
-from opencompass.tasks import OpenICLInferTask
-
-with read_base():
-    # from opencompass.configs.datasets.collections.chat_medium import datasets
-    from opencompass.configs.summarizers.medium import summarizer
-    from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
-
-datasets = [
-    *ceval_datasets,
-]
-
-models = [
-    dict(
-        abbr='skylark-pro-public',
-        type=ByteDance,
-        path='skylark-pro-public',
-        accesskey='xxxxxxx',
-        secretkey='xxxxxxx',
-        url='xxxxxx',
-        generation_kwargs={
-            'temperature': 0.7,
-            'top_p': 0.9,
-            'top_k': 0,
-        },
-        query_per_second=1,
-        max_out_len=2048,
-        max_seq_len=2048,
-        batch_size=8),
-]
-
-infer = dict(
-    partitioner=dict(type=NaivePartitioner),
-    runner=dict(
-        type=LocalAPIRunner,
-        max_num_workers=2,
-        concurrent_users=2,
-        task=dict(type=OpenICLInferTask)),
-)
-
-work_dir = 'outputs/api_bytedance/'
--- a/configs/api_examples/eval_api_doubao.py
+++ b/configs/api_examples/eval_api_doubao.py
@ -1,40 +0,0 @@
-from mmengine.config import read_base
-from opencompass.models import Doubao
-from opencompass.partitioners import NaivePartitioner
-from opencompass.runners.local_api import LocalAPIRunner
-from opencompass.tasks import OpenICLInferTask
-
-with read_base():
-    # from opencompass.configs.datasets.collections.chat_medium import datasets
-    from opencompass.configs.summarizers.medium import summarizer
-    from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
-
-datasets = [
-    *ceval_datasets,
-]
-
-models = [
-    dict(
-        abbr='Doubao-pro-128k',
-        type=Doubao,
-        path='ep-xxxxxx',
-        accesskey='Your_AK',
-        secretkey='Your_SK',
-        generation_kwargs={
-            'temperature': 0.1,
-            'top_p': 0.9,
-        },
-        query_per_second=1,
-        max_out_len=2048,
-        max_seq_len=2048,
-        batch_size=8),
-]
-
-infer = dict(partitioner=dict(type=NaivePartitioner),
-             runner=dict(
-                 type=LocalAPIRunner,
-                 max_num_workers=2,
-                 concurrent_users=2,
-                 task=dict(type=OpenICLInferTask)), )
-
-work_dir = 'outputs/api_doubao/'
--- a/configs/api_examples/eval_api_minimax.py
+++ b/configs/api_examples/eval_api_minimax.py
@ -1,37 +0,0 @@
-from mmengine.config import read_base
-from opencompass.models import MiniMax
-from opencompass.partitioners import NaivePartitioner
-from opencompass.runners.local_api import LocalAPIRunner
-from opencompass.tasks import OpenICLInferTask
-
-with read_base():
-    from opencompass.configs.summarizers.medium import summarizer
-    from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
-
-datasets = [
-    *ceval_datasets,
-]
-
-models = [
-    dict(
-        abbr='minimax_abab5.5-chat',
-        type=MiniMax,
-        path='abab5.5-chat',
-        key='xxxxxxx', # please give you key
-        group_id='xxxxxxxx', # please give your group_id
-        query_per_second=1,
-        max_out_len=2048,
-        max_seq_len=2048,
-        batch_size=8),
-]
-
-infer = dict(
-    partitioner=dict(type=NaivePartitioner),
-    runner=dict(
-        type=LocalAPIRunner,
-        max_num_workers=4,
-        concurrent_users=4,
-        task=dict(type=OpenICLInferTask)),
-)
-
-work_dir = 'outputs/api_minimax/'
--- a/configs/api_examples/eval_api_moonshot.py
+++ b/configs/api_examples/eval_api_moonshot.py
@ -1,40 +0,0 @@
-from mmengine.config import read_base
-from opencompass.models import MoonShot
-from opencompass.partitioners import NaivePartitioner
-from opencompass.runners.local_api import LocalAPIRunner
-from opencompass.tasks import OpenICLInferTask
-
-with read_base():
-    from opencompass.configs.summarizers.medium import summarizer
-    from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
-
-datasets = [
-    *ceval_datasets,
-]
-
-models = [
-    dict(
-        abbr='moonshot-v1-32k',
-        type=MoonShot,
-        path='moonshot-v1-32k',
-        key='xxxxxxx',
-        url= 'xxxxxxxx',
-        system_prompt= '你是 Kimi，由 Moonshot AI 提供的人工智能助手，你更擅长中文和英文的对话。'
-        '你会为用户提供安全，有帮助，准确的回答。同时，你会拒绝一些涉及恐怖主义，种族歧视，'
-        '黄色暴力等问题的回答。Moonshot AI 为专有名词，不可翻译成其他语言。',
-        query_per_second=1,
-        max_out_len=2048,
-        max_seq_len=2048,
-        batch_size=8),
-]
-
-infer = dict(
-    partitioner=dict(type=NaivePartitioner),
-    runner=dict(
-        type=LocalAPIRunner,
-        max_num_workers=4,
-        concurrent_users=4,
-        task=dict(type=OpenICLInferTask)),
-)
-
-work_dir = 'outputs/api_moonshot/'
--- a/configs/api_examples/eval_api_nanbeige.py
+++ b/configs/api_examples/eval_api_nanbeige.py
@ -1,36 +0,0 @@
-from mmengine.config import read_base
-from opencompass.models import Nanbeige
-from opencompass.partitioners import NaivePartitioner
-from opencompass.runners.local_api import LocalAPIRunner
-from opencompass.tasks import OpenICLInferTask
-
-
-with read_base():
-    from opencompass.configs.summarizers.medium import summarizer
-    from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
-
-datasets = [
-    *ceval_datasets,
-]
-
-models = [
-    dict(
-        abbr='nanbeige-plus',
-        type=Nanbeige,
-        path='nanbeige-plus',
-        key='xxxxxx',
-        query_per_second=1,
-        max_out_len=2048,
-        batch_size=8),
-]
-
-infer = dict(
-    partitioner=dict(type=NaivePartitioner),
-    runner=dict(
-        type=LocalAPIRunner,
-        max_num_workers=2,
-        concurrent_users=2,
-        task=dict(type=OpenICLInferTask)),
-)
-
-work_dir ='./output/nanbeige-plus'
--- a/configs/api_examples/eval_api_pangu.py
+++ b/configs/api_examples/eval_api_pangu.py
@ -1,42 +0,0 @@
-from mmengine.config import read_base
-from opencompass.models import PanGu
-from opencompass.partitioners import NaivePartitioner
-from opencompass.runners.local_api import LocalAPIRunner
-from opencompass.tasks import OpenICLInferTask
-
-with read_base():
-    from opencompass.configs.summarizers.medium import summarizer
-    from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
-
-datasets = [
-    *ceval_datasets,
-]
-
-models = [
-dict(
-        abbr='pangu',
-        type=PanGu,
-        path='pangu',
-        access_key='xxxxxx',
-        secret_key='xxxxxx',
-        url = 'xxxxxx',
-        # url of token sever, used for generate token, like "https://xxxxxx.myhuaweicloud.com/v3/auth/tokens",
-        token_url = 'xxxxxx',
-        # scope-project-name, used for generate token
-        project_name = 'xxxxxx',
-        query_per_second=1,
-        max_out_len=2048,
-        max_seq_len=2048,
-        batch_size=8),
-]
-
-infer = dict(
-    partitioner=dict(type=NaivePartitioner),
-    runner=dict(
-        type=LocalAPIRunner,
-        max_num_workers=2,
-        concurrent_users=2,
-        task=dict(type=OpenICLInferTask)),
-)
-
-work_dir = 'outputs/api_pangu/'
--- a/configs/api_examples/eval_api_qwen.py
+++ b/configs/api_examples/eval_api_qwen.py
@ -1,40 +0,0 @@
-from mmengine.config import read_base
-from opencompass.models import Qwen
-from opencompass.partitioners import NaivePartitioner
-from opencompass.runners.local_api import LocalAPIRunner
-from opencompass.tasks import OpenICLInferTask
-
-with read_base():
-    from opencompass.configs.summarizers.medium import summarizer
-    from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
-
-datasets = [
-    *ceval_datasets,
-]
-
-models = [
-    dict(
-        abbr='qwen-max',
-        type=Qwen,
-        path='qwen-max',
-        key='xxxxxxxxxxxxxxxx',  # please give you key
-        generation_kwargs={
-            'enable_search': False,
-        },
-        query_per_second=1,
-        max_out_len=2048,
-        max_seq_len=2048,
-        batch_size=8
-    ),
-]
-
-infer = dict(
-    partitioner=dict(type=NaivePartitioner),
-    runner=dict(
-        type=LocalAPIRunner,
-        max_num_workers=1,
-        concurrent_users=1,
-        task=dict(type=OpenICLInferTask)),
-)
-
-work_dir = 'outputs/api_qwen/'
--- a/configs/api_examples/eval_api_rendu.py
+++ b/configs/api_examples/eval_api_rendu.py
@ -1,39 +0,0 @@
-from mmengine.config import read_base
-from opencompass.models import Rendu
-from opencompass.partitioners import NaivePartitioner
-from opencompass.runners.local_api import LocalAPIRunner
-from opencompass.tasks import OpenICLInferTask
-
-with read_base():
-    from ..summarizers.medium import summarizer
-    from ..datasets.ceval.ceval_gen import ceval_datasets
-
-datasets = [
-    *ceval_datasets
-]
-
-models = [
-    dict(
-        abbr='Rendu',
-        type=Rendu,
-        path='rendu',
-        key='xxxxxx',
-        url='xxxxxx',
-        generation_kwargs={
-            'temperature': 0.1,
-            'top_p': 0.9,
-        },
-        query_per_second=10,
-        max_out_len=2048,
-        max_seq_len=2048,
-        batch_size=8),
-]
-
-infer = dict(partitioner=dict(type=NaivePartitioner),
-             runner=dict(
-                 type=LocalAPIRunner,
-                 max_num_workers=1,
-                 concurrent_users=1,
-                 task=dict(type=OpenICLInferTask)), )
-
-work_dir = 'outputs/api_rendu/'
--- a/configs/api_examples/eval_api_sensetime.py
+++ b/configs/api_examples/eval_api_sensetime.py
@ -1,52 +0,0 @@
-from mmengine.config import read_base
-from opencompass.models import SenseTime
-from opencompass.partitioners import NaivePartitioner
-from opencompass.runners.local_api import LocalAPIRunner
-from opencompass.tasks import OpenICLInferTask
-
-with read_base():
-    from opencompass.configs.summarizers.medium import summarizer
-    from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
-
-datasets = [
-    *ceval_datasets,
-]
-
-models = [
-    dict(
-        abbr='nova-ptc-xl-v1',
-        type=SenseTime,
-        path='nova-ptc-xl-v1',
-        key='xxxxxxxxxxxxxx',
-        url='xxxxxxxxxxx',
-        query_per_second=1,
-        max_out_len=2048,
-        max_seq_len=2048,
-        batch_size=8,
-        parameters={
-            'temperature': 0.8,
-            'top_p': 0.7,
-            'max_new_tokens': 1024,
-            'repetition_penalty': 1.05,
-            'know_ids': [],
-            'stream': True,
-            'user': '#*#***TestUser***#*#',
-            'knowledge_config': {
-                'control_level': 'normal',
-                'knowledge_base_result': False,
-                'online_search_result': False
-            }
-        }
-    )
-]
-
-infer = dict(
-    partitioner=dict(type=NaivePartitioner),
-    runner=dict(
-        type=LocalAPIRunner,
-        max_num_workers=2,
-        concurrent_users=2,
-        task=dict(type=OpenICLInferTask)),
-)
-
-work_dir = 'outputs/api_sensetime/'
--- a/configs/api_examples/eval_api_xunfei.py
+++ b/configs/api_examples/eval_api_xunfei.py
@ -1,51 +0,0 @@
-from mmengine.config import read_base
-from opencompass.models.xunfei_api import XunFei
-from opencompass.partitioners import NaivePartitioner
-from opencompass.runners.local_api import LocalAPIRunner
-from opencompass.tasks import OpenICLInferTask
-
-with read_base():
-    # from opencompass.configs.datasets.collections.chat_medium import datasets
-    from opencompass.configs.summarizers.medium import summarizer
-    from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
-
-datasets = [
-    *ceval_datasets,
-]
-
-models = [
-    dict(
-        abbr='Spark-v1-1',
-        type=XunFei,
-        appid='xxxx',
-        path='ws://spark-api.xf-yun.com/v1.1/chat',
-        api_secret = 'xxxxxxx',
-        api_key = 'xxxxxxx',
-        query_per_second=1,
-        max_out_len=2048,
-        max_seq_len=2048,
-        batch_size=8),
-    dict(
-        abbr='Spark-v3-1',
-        type=XunFei,
-        appid='xxxx',
-        domain='generalv3',
-        path='ws://spark-api.xf-yun.com/v3.1/chat',
-        api_secret = 'xxxxxxxx',
-        api_key = 'xxxxxxxxx',
-        query_per_second=1,
-        max_out_len=2048,
-        max_seq_len=2048,
-        batch_size=8),
-]
-
-infer = dict(
-    partitioner=dict(type=NaivePartitioner),
-    runner=dict(
-        type=LocalAPIRunner,
-        max_num_workers=2,
-        concurrent_users=2,
-        task=dict(type=OpenICLInferTask)),
-)
-
-work_dir = 'outputs/api_xunfei/'
--- a/configs/api_examples/eval_api_zhipu.py
+++ b/configs/api_examples/eval_api_zhipu.py
@ -1,48 +0,0 @@
-from mmengine.config import read_base
-from opencompass.models import ZhiPuAI
-from opencompass.partitioners import NaivePartitioner
-from opencompass.runners.local_api import LocalAPIRunner
-from opencompass.tasks import OpenICLInferTask
-
-with read_base():
-    # from opencompass.configs.datasets.collections.chat_medium import datasets
-    from opencompass.configs.summarizers.medium import summarizer
-    from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
-
-datasets = [
-    *ceval_datasets,
-]
-
-# needs a special postprocessor for all
-# except 'gsm8k' and 'strategyqa'
-from opencompass.utils import general_eval_wrapper_postprocess
-for _dataset in datasets:
-    if _dataset['abbr'] not in ['gsm8k', 'strategyqa']:
-        if hasattr(_dataset['eval_cfg'], 'pred_postprocessor'):
-            _dataset['eval_cfg']['pred_postprocessor']['postprocess'] = _dataset['eval_cfg']['pred_postprocessor']['type']
-            _dataset['eval_cfg']['pred_postprocessor']['type'] = general_eval_wrapper_postprocess
-        else:
-            _dataset['eval_cfg']['pred_postprocessor'] = {'type': general_eval_wrapper_postprocess}
-
-models = [
-    dict(
-        abbr='chatglm_pro',
-        type=ZhiPuAI,
-        path='chatglm_pro',
-        key='xxxxxxxxxxxx',
-        query_per_second=1,
-        max_out_len=2048,
-        max_seq_len=2048,
-        batch_size=8),
-]
-
-infer = dict(
-    partitioner=dict(type=NaivePartitioner),
-    runner=dict(
-        type=LocalAPIRunner,
-        max_num_workers=2,
-        concurrent_users=2,
-        task=dict(type=OpenICLInferTask)),
-)
-
-work_dir = 'outputs/api_zhipu/'
--- a/configs/api_examples/eval_api_zhipu_v2.py
+++ b/configs/api_examples/eval_api_zhipu_v2.py
@ -1,67 +0,0 @@
-from mmengine.config import read_base
-from opencompass.models import ZhiPuV2AI
-from opencompass.partitioners import NaivePartitioner
-from opencompass.runners.local_api import LocalAPIRunner
-from opencompass.tasks import OpenICLInferTask
-
-with read_base():
-    # from opencompass.configs.datasets.collections.chat_medium import datasets
-    from opencompass.configs.summarizers.medium import summarizer
-    from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
-
-datasets = [
-    *ceval_datasets,
-]
-
-# needs a special postprocessor for all
-# except 'gsm8k' and 'strategyqa'
-from opencompass.utils import general_eval_wrapper_postprocess
-for _dataset in datasets:
-    if _dataset['abbr'] not in ['gsm8k', 'strategyqa']:
-        if hasattr(_dataset['eval_cfg'], 'pred_postprocessor'):
-            _dataset['eval_cfg']['pred_postprocessor']['postprocess'] = _dataset['eval_cfg']['pred_postprocessor']['type']
-            _dataset['eval_cfg']['pred_postprocessor']['type'] = general_eval_wrapper_postprocess
-        else:
-            _dataset['eval_cfg']['pred_postprocessor'] = {'type': general_eval_wrapper_postprocess}
-
-
-api_meta_template = dict(
-    round=[
-            dict(role='HUMAN', api_role='HUMAN'),
-            dict(role='BOT', api_role='BOT', generate=True),
-    ],
-)
-
-models = [
-     dict(
-        abbr='glm4_notools',
-        type=ZhiPuV2AI,
-        path='glm-4',
-        key='xxxxxx',
-        generation_kwargs={
-            'tools': [
-                {
-                    'type': 'web_search',
-                    'web_search': {
-                        'enable': False # turn off the search
-                    }
-                }
-            ]
-        },
-        meta_template=api_meta_template,
-        query_per_second=1,
-        max_out_len=2048,
-        max_seq_len=2048,
-        batch_size=8)
-]
-
-infer = dict(
-    partitioner=dict(type=NaivePartitioner),
-    runner=dict(
-        type=LocalAPIRunner,
-        max_num_workers=2,
-        concurrent_users=2,
-        task=dict(type=OpenICLInferTask)),
-)
-
-work_dir = 'outputs/api_zhipu_v2/'
--- a/configs/dataset_collections/chat_OC15.py
+++ b/configs/dataset_collections/chat_OC15.py
@ -1,22 +0,0 @@
-from mmengine.config import read_base
-
-with read_base():
-    from opencompass.configs.datasets.mmlu.mmlu_gen_4d595a import mmlu_datasets
-    from opencompass.configs.datasets.cmmlu.cmmlu_gen_c13365 import cmmlu_datasets
-    from opencompass.configs.datasets.ceval.ceval_gen_5f30c7 import ceval_datasets
-    from opencompass.configs.datasets.GaokaoBench.GaokaoBench_no_subjective_gen_4c31db import GaokaoBench_datasets
-    from opencompass.configs.datasets.triviaqa.triviaqa_wiki_1shot_gen_bc5f21 import triviaqa_datasets
-    from opencompass.configs.datasets.nq.nq_open_1shot_gen_2e45e5 import nq_datasets
-    from opencompass.configs.datasets.race.race_gen_69ee4f import race_datasets
-    from opencompass.configs.datasets.winogrande.winogrande_5shot_gen_b36770 import winogrande_datasets
-    from opencompass.configs.datasets.hellaswag.hellaswag_10shot_gen_e42710 import hellaswag_datasets
-    from opencompass.configs.datasets.bbh.bbh_gen_2879b0 import bbh_datasets
-    from opencompass.configs.datasets.gsm8k.gsm8k_gen_1d7fe4 import gsm8k_datasets
-    from opencompass.configs.datasets.math.math_0shot_gen_393424 import math_datasets
-    from opencompass.configs.datasets.TheoremQA.TheoremQA_5shot_gen_6f0af8 import TheoremQA_datasets
-    from opencompass.configs.datasets.humaneval.humaneval_gen_8e312c import humaneval_datasets
-    from opencompass.configs.datasets.mbpp.sanitized_mbpp_gen_830460 import sanitized_mbpp_datasets
-    from opencompass.configs.datasets.gpqa.gpqa_gen_4baadb import gpqa_datasets
-    from opencompass.configs.datasets.IFEval.IFEval_gen_3321a3 import ifeval_datasets
-
-datasets = sum((v for k, v in locals().items() if k.endswith('_datasets')), [])
--- a/configs/datasets/ARC_c/ARC_c_clean_ppl.py
+++ b/configs/datasets/ARC_c/ARC_c_clean_ppl.py
@ -1,55 +0,0 @@
-from opencompass.openicl.icl_prompt_template import PromptTemplate
-from opencompass.openicl.icl_retriever import ZeroRetriever
-from opencompass.openicl.icl_inferencer import PPLInferencer
-from opencompass.openicl.icl_evaluator import AccContaminationEvaluator
-from opencompass.datasets import ARCDatasetClean as ARCDataset
-
-ARC_c_reader_cfg = dict(
-    input_columns=['question', 'textA', 'textB', 'textC', 'textD'],
-    output_column='answerKey')
-
-ARC_c_infer_cfg = dict(
-    prompt_template=dict(
-        type=PromptTemplate,
-        template={
-            'A':
-            dict(
-                round=[
-                    dict(role='HUMAN', prompt='Question: {question}\nAnswer: '),
-                    dict(role='BOT', prompt='{textA}')
-                ], ),
-            'B':
-            dict(
-                round=[
-                    dict(role='HUMAN', prompt='Question: {question}\nAnswer: '),
-                    dict(role='BOT', prompt='{textB}')
-                ], ),
-            'C':
-            dict(
-                round=[
-                    dict(role='HUMAN', prompt='Question: {question}\nAnswer: '),
-                    dict(role='BOT', prompt='{textC}')
-                ], ),
-            'D':
-            dict(
-                round=[
-                    dict(role='HUMAN', prompt='Question: {question}\nAnswer: '),
-                    dict(role='BOT', prompt='{textD}')
-                ], ),
-        }),
-    retriever=dict(type=ZeroRetriever),
-    inferencer=dict(type=PPLInferencer))
-
-ARC_c_eval_cfg = dict(evaluator=dict(type=AccContaminationEvaluator),
-                      analyze_contamination=True)
-
-ARC_c_datasets = [
-    dict(
-        type=ARCDataset,
-        abbr='ARC-c-test',
-        path='opencompass/ai2_arc-test',
-        name='ARC-Challenge',
-        reader_cfg=ARC_c_reader_cfg,
-        infer_cfg=ARC_c_infer_cfg,
-        eval_cfg=ARC_c_eval_cfg)
-]
--- a/configs/datasets/ARC_c/ARC_c_cot_gen_926652.py
+++ b/configs/datasets/ARC_c/ARC_c_cot_gen_926652.py
@ -1,53 +0,0 @@
-from opencompass.openicl.icl_prompt_template import PromptTemplate
-from opencompass.openicl.icl_retriever import ZeroRetriever
-from opencompass.openicl.icl_inferencer import GenInferencer
-from opencompass.openicl.icl_evaluator import AccEvaluator
-from opencompass.datasets import ARCDataset
-from opencompass.utils.text_postprocessors import first_option_postprocess, match_answer_pattern
-
-QUERY_TEMPLATE = """
-Answer the following multiple choice question. The last line of your response should be of the following format: 'ANSWER: $LETTER' (without quotes) where LETTER is one of ABCD. Think step by step before answering.
-
-{question}
-
-A. {textA}
-B. {textB}
-C. {textC}
-D. {textD}
-""".strip()
-
-ARC_c_reader_cfg = dict(
-    input_columns=['question', 'textA', 'textB', 'textC', 'textD'],
-    output_column='answerKey')
-
-ARC_c_infer_cfg = dict(
-    prompt_template=dict(
-        type=PromptTemplate,
-        template=dict(
-            round=[
-                dict(
-                    role='HUMAN',
-                    prompt=QUERY_TEMPLATE)
-            ], ),
-    ),
-    retriever=dict(type=ZeroRetriever),
-    inferencer=dict(type=GenInferencer),
-)
-
-ARC_c_eval_cfg = dict(
-    evaluator=dict(type=AccEvaluator),
-    pred_role='BOT',
-    pred_postprocessor=dict(type=first_option_postprocess, options='ABCD'),
-)
-
-ARC_c_datasets = [
-    dict(
-        abbr='ARC-c',
-        type=ARCDataset,
-        path='opencompass/ai2_arc-dev',
-        name='ARC-Challenge',
-        reader_cfg=ARC_c_reader_cfg,
-        infer_cfg=ARC_c_infer_cfg,
-        eval_cfg=ARC_c_eval_cfg,
-    )
-]
--- a/configs/datasets/ARC_c/ARC_c_few_shot_gen_e9b043.py
+++ b/configs/datasets/ARC_c/ARC_c_few_shot_gen_e9b043.py
@ -1,48 +0,0 @@
-from opencompass.openicl.icl_prompt_template import PromptTemplate
-from opencompass.openicl.icl_retriever import ZeroRetriever, FixKRetriever
-from opencompass.openicl.icl_inferencer import GenInferencer
-from opencompass.openicl.icl_evaluator import AccEvaluator
-from opencompass.datasets import ARCDataset
-from opencompass.utils.text_postprocessors import first_capital_postprocess
-
-ARC_c_reader_cfg = dict(
-    input_columns=['question', 'textA', 'textB', 'textC', 'textD'],
-    output_column='answerKey',
-)
-
-ARC_c_infer_cfg = dict(
-    ice_template=dict(
-        type=PromptTemplate,
-        template=dict(
-            begin='</E>',
-            round=[
-                dict(
-                    role='HUMAN',
-                    prompt='Question: {question}\nA. {textA}\nB. {textB}\nC. {textC}\nD. {textD}\nAnswer:',
-                ),
-                dict(role='BOT', prompt='{answerKey}'),
-            ],
-        ),
-        ice_token='</E>',
-    ),
-    retriever=dict(type=FixKRetriever, fix_id_list=[0, 2, 4, 6, 8]),
-    inferencer=dict(type=GenInferencer, max_out_len=50),
-)
-
-ARC_c_eval_cfg = dict(
-    evaluator=dict(type=AccEvaluator),
-    pred_role='BOT',
-    pred_postprocessor=dict(type=first_capital_postprocess),
-)
-
-ARC_c_datasets = [
-    dict(
-        abbr='ARC-c',
-        type=ARCDataset,
-        path='opencompass/ai2_arc-dev',
-        name='ARC-Challenge',
-        reader_cfg=ARC_c_reader_cfg,
-        infer_cfg=ARC_c_infer_cfg,
-        eval_cfg=ARC_c_eval_cfg,
-    )
-]
--- a/configs/datasets/ARC_c/ARC_c_few_shot_ppl.py
+++ b/configs/datasets/ARC_c/ARC_c_few_shot_ppl.py
@ -1,63 +0,0 @@
-from opencompass.openicl.icl_prompt_template import PromptTemplate
-from opencompass.openicl.icl_retriever import ZeroRetriever, FixKRetriever
-from opencompass.openicl.icl_inferencer import PPLInferencer
-from opencompass.openicl.icl_evaluator import AccEvaluator
-from opencompass.datasets import ARCDataset
-
-ARC_c_reader_cfg = dict(
-    input_columns=['question', 'textA', 'textB', 'textC', 'textD'],
-    output_column='answerKey',
-)
-
-ARC_c_infer_cfg = dict(
-    ice_template=dict(
-        type=PromptTemplate,
-        template={
-            'A': dict(
-                begin='</E>',
-                round=[
-                    dict(role='HUMAN', prompt='Question: {question}\nAnswer: '),
-                    dict(role='BOT', prompt='{textA}'),
-                ],
-            ),
-            'B': dict(
-                begin='</E>',
-                round=[
-                    dict(role='HUMAN', prompt='Question: {question}\nAnswer: '),
-                    dict(role='BOT', prompt='{textB}'),
-                ],
-            ),
-            'C': dict(
-                begin='</E>',
-                round=[
-                    dict(role='HUMAN', prompt='Question: {question}\nAnswer: '),
-                    dict(role='BOT', prompt='{textC}'),
-                ],
-            ),
-            'D': dict(
-                begin='</E>',
-                round=[
-                    dict(role='HUMAN', prompt='Question: {question}\nAnswer: '),
-                    dict(role='BOT', prompt='{textD}'),
-                ],
-            ),
-        },
-        ice_token='</E>',
-    ),
-    retriever=dict(type=FixKRetriever, fix_id_list=[0, 2, 4, 6, 8]),
-    inferencer=dict(type=PPLInferencer),
-)
-
-ARC_c_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
-
-ARC_c_datasets = [
-    dict(
-        type=ARCDataset,
-        abbr='ARC-c',
-        path='opencompass/ai2_arc-dev',
-        name='ARC-Challenge',
-        reader_cfg=ARC_c_reader_cfg,
-        infer_cfg=ARC_c_infer_cfg,
-        eval_cfg=ARC_c_eval_cfg,
-    )
-]
--- a/configs/datasets/ARC_c/ARC_c_gen.py
+++ b/configs/datasets/ARC_c/ARC_c_gen.py
@ -1,4 +0,0 @@
-from mmengine.config import read_base
-
-with read_base():
-    from .ARC_c_gen_1e0de5 import ARC_c_datasets  # noqa: F401, F403
--- a/configs/datasets/ARC_c/ARC_c_gen_1e0de5.py
+++ b/configs/datasets/ARC_c/ARC_c_gen_1e0de5.py
@ -1,44 +0,0 @@
-from opencompass.openicl.icl_prompt_template import PromptTemplate
-from opencompass.openicl.icl_retriever import ZeroRetriever
-from opencompass.openicl.icl_inferencer import GenInferencer
-from opencompass.openicl.icl_evaluator import AccEvaluator
-from opencompass.datasets import ARCDataset
-from opencompass.utils.text_postprocessors import first_option_postprocess
-
-ARC_c_reader_cfg = dict(
-    input_columns=['question', 'textA', 'textB', 'textC', 'textD'],
-    output_column='answerKey')
-
-ARC_c_infer_cfg = dict(
-    prompt_template=dict(
-        type=PromptTemplate,
-        template=dict(
-            round=[
-                dict(
-                    role='HUMAN',
-                    prompt=
-                    'Question: {question}\nA. {textA}\nB. {textB}\nC. {textC}\nD. {textD}\nAnswer:'
-                )
-            ], ),
-    ),
-    retriever=dict(type=ZeroRetriever),
-    inferencer=dict(type=GenInferencer),
-)
-
-ARC_c_eval_cfg = dict(
-    evaluator=dict(type=AccEvaluator),
-    pred_role='BOT',
-    pred_postprocessor=dict(type=first_option_postprocess, options='ABCD'),
-)
-
-ARC_c_datasets = [
-    dict(
-        abbr='ARC-c',
-        type=ARCDataset,
-        path='opencompass/ai2_arc-dev',
-        name='ARC-Challenge',
-        reader_cfg=ARC_c_reader_cfg,
-        infer_cfg=ARC_c_infer_cfg,
-        eval_cfg=ARC_c_eval_cfg,
-    )
-]
--- a/configs/datasets/ARC_c/ARC_c_ppl.py
+++ b/configs/datasets/ARC_c/ARC_c_ppl.py
@ -1,4 +0,0 @@
-from mmengine.config import read_base
-
-with read_base():
-    from .ARC_c_ppl_a450bd import ARC_c_datasets  # noqa: F401, F403
--- a/configs/datasets/ARC_c/ARC_c_ppl_2ef631.py
+++ b/configs/datasets/ARC_c/ARC_c_ppl_2ef631.py
@ -1,37 +0,0 @@
-from opencompass.openicl.icl_prompt_template import PromptTemplate
-from opencompass.openicl.icl_retriever import ZeroRetriever
-from opencompass.openicl.icl_inferencer import PPLInferencer
-from opencompass.openicl.icl_evaluator import AccEvaluator
-from opencompass.datasets import ARCDataset
-
-ARC_c_reader_cfg = dict(
-    input_columns=['question', 'textA', 'textB', 'textC', 'textD'],
-    output_column='answerKey')
-
-ARC_c_infer_cfg = dict(
-    prompt_template=dict(
-        type=PromptTemplate,
-        template={
-            opt: dict(
-                round=[
-                    dict(role='HUMAN', prompt=f'{{question}}\nA. {{textA}}\nB. {{textB}}\nC. {{textC}}\nD. {{textD}}'),
-                    dict(role='BOT', prompt=f'Answer: {opt}'),
-                ]
-            ) for opt in ['A', 'B', 'C', 'D']
-        },
-    ),
-    retriever=dict(type=ZeroRetriever),
-    inferencer=dict(type=PPLInferencer))
-
-ARC_c_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
-
-ARC_c_datasets = [
-    dict(
-        type=ARCDataset,
-        abbr='ARC-c',
-        path='opencompass/ai2_arc-dev',
-        name='ARC-Challenge',
-        reader_cfg=ARC_c_reader_cfg,
-        infer_cfg=ARC_c_infer_cfg,
-        eval_cfg=ARC_c_eval_cfg)
-]
--- a/configs/datasets/ARC_c/ARC_c_ppl_a450bd.py
+++ b/configs/datasets/ARC_c/ARC_c_ppl_a450bd.py
@ -1,54 +0,0 @@
-from opencompass.openicl.icl_prompt_template import PromptTemplate
-from opencompass.openicl.icl_retriever import ZeroRetriever
-from opencompass.openicl.icl_inferencer import PPLInferencer
-from opencompass.openicl.icl_evaluator import AccEvaluator
-from opencompass.datasets import ARCDataset
-
-ARC_c_reader_cfg = dict(
-    input_columns=['question', 'textA', 'textB', 'textC', 'textD'],
-    output_column='answerKey')
-
-ARC_c_infer_cfg = dict(
-    prompt_template=dict(
-        type=PromptTemplate,
-        template={
-            'A':
-            dict(
-                round=[
-                    dict(role='HUMAN', prompt='Question: {question}\nAnswer: '),
-                    dict(role='BOT', prompt='{textA}')
-                ], ),
-            'B':
-            dict(
-                round=[
-                    dict(role='HUMAN', prompt='Question: {question}\nAnswer: '),
-                    dict(role='BOT', prompt='{textB}')
-                ], ),
-            'C':
-            dict(
-                round=[
-                    dict(role='HUMAN', prompt='Question: {question}\nAnswer: '),
-                    dict(role='BOT', prompt='{textC}')
-                ], ),
-            'D':
-            dict(
-                round=[
-                    dict(role='HUMAN', prompt='Question: {question}\nAnswer: '),
-                    dict(role='BOT', prompt='{textD}')
-                ], ),
-        }),
-    retriever=dict(type=ZeroRetriever),
-    inferencer=dict(type=PPLInferencer))
-
-ARC_c_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
-
-ARC_c_datasets = [
-    dict(
-        type=ARCDataset,
-        abbr='ARC-c',
-        path='opencompass/ai2_arc-dev',
-        name='ARC-Challenge',
-        reader_cfg=ARC_c_reader_cfg,
-        infer_cfg=ARC_c_infer_cfg,
-        eval_cfg=ARC_c_eval_cfg)
-]
--- a/configs/datasets/ARC_c/ARC_c_ppl_d52a21.py
+++ b/configs/datasets/ARC_c/ARC_c_ppl_d52a21.py
@ -1,36 +0,0 @@
-from mmengine.config import read_base
-# with read_base():
-from opencompass.openicl.icl_prompt_template import PromptTemplate
-from opencompass.openicl.icl_retriever import ZeroRetriever
-from opencompass.openicl.icl_inferencer import PPLInferencer
-from opencompass.openicl.icl_evaluator import AccEvaluator
-from opencompass.datasets import ARCDataset
-
-ARC_c_reader_cfg = dict(
-    input_columns=['question', 'textA', 'textB', 'textC', 'textD'],
-    output_column='answerKey')
-
-ARC_c_infer_cfg = dict(
-    prompt_template=dict(
-        type=PromptTemplate,
-        template={
-            'A': 'Question: {question}\nAnswer: {textA}',
-            'B': 'Question: {question}\nAnswer: {textB}',
-            'C': 'Question: {question}\nAnswer: {textC}',
-            'D': 'Question: {question}\nAnswer: {textD}'
-        }),
-    retriever=dict(type=ZeroRetriever),
-    inferencer=dict(type=PPLInferencer))
-
-ARC_c_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
-
-ARC_c_datasets = [
-    dict(
-        type=ARCDataset,
-        abbr='ARC-c',
-        path='opencompass/ai2_arc-dev',
-        name='ARC-Challenge',
-        reader_cfg=ARC_c_reader_cfg,
-        infer_cfg=ARC_c_infer_cfg,
-        eval_cfg=ARC_c_eval_cfg)
-]
--- a/configs/datasets/ARC_e/ARC_e_gen.py
+++ b/configs/datasets/ARC_e/ARC_e_gen.py
@ -1,4 +0,0 @@
-from mmengine.config import read_base
-
-with read_base():
-    from .ARC_e_gen_1e0de5 import ARC_e_datasets  # noqa: F401, F403
--- a/configs/datasets/ARC_e/ARC_e_gen_1e0de5.py
+++ b/configs/datasets/ARC_e/ARC_e_gen_1e0de5.py
@ -1,44 +0,0 @@
-from opencompass.openicl.icl_prompt_template import PromptTemplate
-from opencompass.openicl.icl_retriever import ZeroRetriever
-from opencompass.openicl.icl_inferencer import GenInferencer
-from opencompass.openicl.icl_evaluator import AccEvaluator
-from opencompass.datasets import ARCDataset
-from opencompass.utils.text_postprocessors import first_option_postprocess
-
-ARC_e_reader_cfg = dict(
-    input_columns=['question', 'textA', 'textB', 'textC', 'textD'],
-    output_column='answerKey')
-
-ARC_e_infer_cfg = dict(
-    prompt_template=dict(
-        type=PromptTemplate,
-        template=dict(
-            round=[
-                dict(
-                    role='HUMAN',
-                    prompt=
-                    'Question: {question}\nA. {textA}\nB. {textB}\nC. {textC}\nD. {textD}\nAnswer:'
-                )
-            ], ),
-    ),
-    retriever=dict(type=ZeroRetriever),
-    inferencer=dict(type=GenInferencer),
-)
-
-ARC_e_eval_cfg = dict(
-    evaluator=dict(type=AccEvaluator),
-    pred_role='BOT',
-    pred_postprocessor=dict(type=first_option_postprocess, options='ABCD'),
-)
-
-ARC_e_datasets = [
-    dict(
-        abbr='ARC-e',
-        type=ARCDataset,
-        path='opencompass/ai2_arc-easy-dev',
-        name='ARC-Easy',
-        reader_cfg=ARC_e_reader_cfg,
-        infer_cfg=ARC_e_infer_cfg,
-        eval_cfg=ARC_e_eval_cfg,
-    )
-]
--- a/configs/datasets/ARC_e/ARC_e_ppl.py
+++ b/configs/datasets/ARC_e/ARC_e_ppl.py
@ -1,4 +0,0 @@
-from mmengine.config import read_base
-
-with read_base():
-    from .ARC_e_ppl_a450bd import ARC_e_datasets  # noqa: F401, F403
--- a/configs/datasets/ARC_e/ARC_e_ppl_2ef631.py
+++ b/configs/datasets/ARC_e/ARC_e_ppl_2ef631.py
@ -1,37 +0,0 @@
-from opencompass.openicl.icl_prompt_template import PromptTemplate
-from opencompass.openicl.icl_retriever import ZeroRetriever
-from opencompass.openicl.icl_inferencer import PPLInferencer
-from opencompass.openicl.icl_evaluator import AccEvaluator
-from opencompass.datasets import ARCDataset
-
-ARC_e_reader_cfg = dict(
-    input_columns=['question', 'textA', 'textB', 'textC', 'textD'],
-    output_column='answerKey')
-
-ARC_e_infer_cfg = dict(
-    prompt_template=dict(
-        type=PromptTemplate,
-        template={
-            opt: dict(
-                round=[
-                    dict(role='HUMAN', prompt=f'{{question}}\nA. {{textA}}\nB. {{textB}}\nC. {{textC}}\nD. {{textD}}'),
-                    dict(role='BOT', prompt=f'Answer: {opt}'),
-                ]
-            ) for opt in ['A', 'B', 'C', 'D']
-        },
-    ),
-    retriever=dict(type=ZeroRetriever),
-    inferencer=dict(type=PPLInferencer))
-
-ARC_e_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
-
-ARC_e_datasets = [
-    dict(
-        type=ARCDataset,
-        abbr='ARC-e',
-        path='opencompass/ai2_arc-easy-dev',
-        name='ARC-Easy',
-        reader_cfg=ARC_e_reader_cfg,
-        infer_cfg=ARC_e_infer_cfg,
-        eval_cfg=ARC_e_eval_cfg)
-]
--- a/configs/datasets/ARC_e/ARC_e_ppl_a450bd.py
+++ b/configs/datasets/ARC_e/ARC_e_ppl_a450bd.py
@ -1,54 +0,0 @@
-from opencompass.openicl.icl_prompt_template import PromptTemplate
-from opencompass.openicl.icl_retriever import ZeroRetriever
-from opencompass.openicl.icl_inferencer import PPLInferencer
-from opencompass.openicl.icl_evaluator import AccEvaluator
-from opencompass.datasets import ARCDataset
-
-ARC_e_reader_cfg = dict(
-    input_columns=['question', 'textA', 'textB', 'textC', 'textD'],
-    output_column='answerKey')
-
-ARC_e_infer_cfg = dict(
-    prompt_template=dict(
-        type=PromptTemplate,
-        template={
-            'A':
-            dict(
-                round=[
-                    dict(role='HUMAN', prompt='Question: {question}\nAnswer: '),
-                    dict(role='BOT', prompt='{textA}')
-                ], ),
-            'B':
-            dict(
-                round=[
-                    dict(role='HUMAN', prompt='Question: {question}\nAnswer: '),
-                    dict(role='BOT', prompt='{textB}')
-                ], ),
-            'C':
-            dict(
-                round=[
-                    dict(role='HUMAN', prompt='Question: {question}\nAnswer: '),
-                    dict(role='BOT', prompt='{textC}')
-                ], ),
-            'D':
-            dict(
-                round=[
-                    dict(role='HUMAN', prompt='Question: {question}\nAnswer: '),
-                    dict(role='BOT', prompt='{textD}')
-                ], ),
-        }),
-    retriever=dict(type=ZeroRetriever),
-    inferencer=dict(type=PPLInferencer))
-
-ARC_e_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
-
-ARC_e_datasets = [
-    dict(
-        type=ARCDataset,
-        abbr='ARC-e',
-        path='opencompass/ai2_arc-easy-dev',
-        name='ARC-Easy',
-        reader_cfg=ARC_e_reader_cfg,
-        infer_cfg=ARC_e_infer_cfg,
-        eval_cfg=ARC_e_eval_cfg)
-]
--- a/configs/datasets/ARC_e/ARC_e_ppl_d52a21.py
+++ b/configs/datasets/ARC_e/ARC_e_ppl_d52a21.py
@ -1,34 +0,0 @@
-from opencompass.openicl.icl_prompt_template import PromptTemplate
-from opencompass.openicl.icl_retriever import ZeroRetriever
-from opencompass.openicl.icl_inferencer import PPLInferencer
-from opencompass.openicl.icl_evaluator import AccEvaluator
-from opencompass.datasets import ARCDataset
-
-ARC_e_reader_cfg = dict(
-    input_columns=['question', 'textA', 'textB', 'textC', 'textD'],
-    output_column='answerKey')
-
-ARC_e_infer_cfg = dict(
-    prompt_template=dict(
-        type=PromptTemplate,
-        template={
-            'A': 'Question: {question}\nAnswer: {textA}',
-            'B': 'Question: {question}\nAnswer: {textB}',
-            'C': 'Question: {question}\nAnswer: {textC}',
-            'D': 'Question: {question}\nAnswer: {textD}'
-        }),
-    retriever=dict(type=ZeroRetriever),
-    inferencer=dict(type=PPLInferencer))
-
-ARC_e_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
-
-ARC_e_datasets = [
-    dict(
-        type=ARCDataset,
-        abbr='ARC-e',
-        path='opencompass/ai2_arc-easy-dev',
-        name='ARC-Easy',
-        reader_cfg=ARC_e_reader_cfg,
-        infer_cfg=ARC_e_infer_cfg,
-        eval_cfg=ARC_e_eval_cfg)
-]
--- a/configs/datasets/CHARM/README.md
+++ b/configs/datasets/CHARM/README.md
@ -1,164 +0,0 @@
-# CHARM✨ Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations [ACL2024]
-[![arXiv](https://img.shields.io/badge/arXiv-2403.14112-b31b1b.svg)](https://arxiv.org/abs/2403.14112)
-[![license](https://img.shields.io/github/license/InternLM/opencompass.svg)](./LICENSE)
-<div align="center">
-
-📃[Paper](https://arxiv.org/abs/2403.14112)
-🏰[Project Page](https://opendatalab.github.io/CHARM/)
-🏆[Leaderboard](https://opendatalab.github.io/CHARM/leaderboard.html)
-✨[Findings](https://opendatalab.github.io/CHARM/findings.html)
-
-</div>
-
-<div align="center">
-    📖 <a href="./README_ZH.md">   中文</a> | <a href="./README.md">English</a>
-</div>
-
-## Dataset Description
-
-**CHARM** is the first benchmark for comprehensively and in-depth evaluating the commonsense reasoning ability of large language models (LLMs) in Chinese, which covers both globally known and Chinese-specific commonsense. In addition, the CHARM can evaluate the LLMs' memorization-independent reasoning abilities and analyze the typical errors.
-
-## Comparison of commonsense reasoning benchmarks
-<html lang="en">
-        <table align="center">
-            <thead class="fixed-header">
-                <tr>
-                    <th>Benchmarks</th>
-                    <th>CN-Lang</th>
-                    <th>CSR</th>
-                    <th>CN-specifics</th>
-                    <th>Dual-Domain</th>
-                    <th>Rea-Mem</th>
-                </tr>
-            </thead>
-            <tr>
-                <td>Most benchmarks in <a href="https://arxiv.org/abs/2302.04752"> davis2023benchmarks</a></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-            </tr>
-            <tr>
-                <td><a href="https://arxiv.org/abs/1809.05053"> XNLI</a>, <a
-                        href="https://arxiv.org/abs/2005.00333">XCOPA</a>,<a
-                        href="https://arxiv.org/abs/2112.10668">XStoryCloze</a></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-            </tr>
-            <tr>
-                <td><a href="https://arxiv.org/abs/2007.08124">LogiQA</a>, <a
-                        href="https://arxiv.org/abs/2004.05986">CLUE</a>, <a
-                        href="https://arxiv.org/abs/2306.09212">CMMLU</a></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-            </tr>
-            <tr>
-                <td><a href="https://arxiv.org/abs/2312.12853">CORECODE</a> </td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-            </tr>
-            <tr>
-                <td><strong><a href="https://arxiv.org/abs/2403.14112">CHARM (ours)</a> </strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-            </tr>
-        </table>
-
-"CN-Lang" indicates the benchmark is presented in Chinese language. "CSR" means the benchmark is designed to focus on <strong>C</strong>ommon<strong>S</strong>ense <strong>R</strong>easoning. "CN-specific" indicates the benchmark includes elements that are unique to Chinese culture, language, regional characteristics, history, etc. "Dual-Domain" indicates the benchmark encompasses both Chinese-specific and global domain tasks, with questions presented in the similar style and format. "Rea-Mem" indicates the benchmark includes closely-interconnected <strong>rea</strong>soning and <strong>mem</strong>orization tasks.
-
-
-## 🛠️ How to Use
-Below are the steps for quickly downloading CHARM and using OpenCompass for evaluation.
-
-### 1. Download CHARM
-```bash
-git clone https://github.com/opendatalab/CHARM ${path_to_CHARM_repo}
-
-cd ${path_to_opencompass}
-mkdir data
-ln -snf ${path_to_CHARM_repo}/data/CHARM ./data/CHARM
-```
-### 2. Run Inference and Evaluation
-```bash
-cd ${path_to_opencompass}
-
-# modify config file `configs/eval_charm_rea.py`: uncomment or add models you want to evaluate
-python run.py configs/eval_charm_rea.py -r --dump-eval-details
-
-# modify config file `configs/eval_charm_mem.py`: uncomment or add models you want to evaluate
-python run.py configs/eval_charm_mem.py -r --dump-eval-details
-```
-The inference and evaluation results would be in `${path_to_opencompass}/outputs`, like this:
-```bash
-outputs
-├── CHARM_mem
-│   └── chat
-│       └── 20240605_151442
-│           ├── predictions
-│           │   ├── internlm2-chat-1.8b-turbomind
-│           │   ├── llama-3-8b-instruct-lmdeploy
-│           │   └── qwen1.5-1.8b-chat-hf
-│           ├── results
-│           │   ├── internlm2-chat-1.8b-turbomind_judged-by--GPT-3.5-turbo-0125
-│           │   ├── llama-3-8b-instruct-lmdeploy_judged-by--GPT-3.5-turbo-0125
-│           │   └── qwen1.5-1.8b-chat-hf_judged-by--GPT-3.5-turbo-0125
-│           └── summary
-│               └── 20240605_205020 # MEMORY_SUMMARY_DIR
-│                   ├── judged-by--GPT-3.5-turbo-0125-charm-memory-Chinese_Anachronisms_Judgment
-│                   ├── judged-by--GPT-3.5-turbo-0125-charm-memory-Chinese_Movie_and_Music_Recommendation
-│                   ├── judged-by--GPT-3.5-turbo-0125-charm-memory-Chinese_Sport_Understanding
-│                   ├── judged-by--GPT-3.5-turbo-0125-charm-memory-Chinese_Time_Understanding
-│                   └── judged-by--GPT-3.5-turbo-0125.csv # MEMORY_SUMMARY_CSV
-└── CHARM_rea
-    └── chat
-        └── 20240605_152359
-            ├── predictions
-            │   ├── internlm2-chat-1.8b-turbomind
-            │   ├── llama-3-8b-instruct-lmdeploy
-            │   └── qwen1.5-1.8b-chat-hf
-            ├── results # REASON_RESULTS_DIR
-            │   ├── internlm2-chat-1.8b-turbomind
-            │   ├── llama-3-8b-instruct-lmdeploy
-            │   └── qwen1.5-1.8b-chat-hf
-            └── summary
-                ├── summary_20240605_205328.csv # REASON_SUMMARY_CSV
-                └── summary_20240605_205328.txt
-```
-### 3. Generate Analysis Results
-```bash
-cd ${path_to_CHARM_repo}
-
-# generate Table5, Table6, Table9 and Table10 in https://arxiv.org/abs/2403.14112
-PYTHONPATH=. python tools/summarize_reasoning.py ${REASON_SUMMARY_CSV}
-
-# generate Figure3 and Figure9 in https://arxiv.org/abs/2403.14112
-PYTHONPATH=. python tools/summarize_mem_rea.py ${REASON_SUMMARY_CSV} ${MEMORY_SUMMARY_CSV}
-
-# generate Table7, Table12, Table13 and Figure11 in https://arxiv.org/abs/2403.14112
-PYTHONPATH=. python tools/analyze_mem_indep_rea.py data/CHARM ${REASON_RESULTS_DIR} ${MEMORY_SUMMARY_DIR} ${MEMORY_SUMMARY_CSV}
-```
-
-## 🖊️ Citation
-```bibtex
-@misc{sun2024benchmarking,
-      title={Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations},
-      author={Jiaxing Sun and Weiquan Huang and Jiang Wu and Chenya Gu and Wei Li and Songyang Zhang and Hang Yan and Conghui He},
-      year={2024},
-      eprint={2403.14112},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL}
-}
-```
--- a/configs/datasets/CHARM/README_ZH.md
+++ b/configs/datasets/CHARM/README_ZH.md
@ -1,162 +0,0 @@
-# CHARM✨ Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations [ACL2024]
-[![arXiv](https://img.shields.io/badge/arXiv-2403.14112-b31b1b.svg)](https://arxiv.org/abs/2403.14112)
-[![license](https://img.shields.io/github/license/InternLM/opencompass.svg)](./LICENSE)
-<div align="center">
-
-📃[Paper](https://arxiv.org/abs/2403.14112)
-🏰[Project Page](https://opendatalab.github.io/CHARM/)
-🏆[Leaderboard](https://opendatalab.github.io/CHARM/leaderboard.html)
-✨[Findings](https://opendatalab.github.io/CHARM/findings.html)
-</div>
-
-<div align="center">
-    📖 <a href="./README_ZH.md">   中文</a> | <a href="./README.md">English</a>
-</div>
-
-## 数据集介绍
-
-**CHARM** 是首个全面深入评估大型语言模型（LLMs）在中文常识推理能力的基准测试，它覆盖了国际普遍认知的常识以及独特的中国文化常识。此外，CHARM 还可以评估 LLMs 独立于记忆的推理能力，并分析其典型错误。
-
-
-## 与其他常识推理评测基准的比较
-<html lang="en">
-        <table align="center">
-            <thead class="fixed-header">
-                <tr>
-                    <th>基准</th>
-                    <th>汉语</th>
-                    <th>常识推理</th>
-                    <th>中国特有知识</th>
-                    <th>中国和世界知识域</th>
-                    <th>推理和记忆的关系</th>
-                </tr>
-            </thead>
-            <tr>
-                <td><a href="https://arxiv.org/abs/2302.04752"> davis2023benchmarks</a> 中提到的基准</td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-            </tr>
-            <tr>
-                <td><a href="https://arxiv.org/abs/1809.05053"> XNLI</a>, <a
-                        href="https://arxiv.org/abs/2005.00333">XCOPA</a>,<a
-                        href="https://arxiv.org/abs/2112.10668">XStoryCloze</a></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-            </tr>
-            <tr>
-                <td><a href="https://arxiv.org/abs/2007.08124">LogiQA</a>,<a
-                        href="https://arxiv.org/abs/2004.05986">CLUE</a>, <a
-                        href="https://arxiv.org/abs/2306.09212">CMMLU</a></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-            </tr>
-            <tr>
-                <td><a href="https://arxiv.org/abs/2312.12853">CORECODE</a> </td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-                <td><strong><span style="color: red;">&#x2718;</span></strong></td>
-            </tr>
-            <tr>
-                <td><strong><a href="https://arxiv.org/abs/2403.14112">CHARM (ours)</a> </strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-                <td><strong><span style="color: green;">&#x2714;</span></strong></td>
-            </tr>
-        </table>
-
-
-## 🛠️ 如何使用
-以下是快速下载 CHARM 并在 OpenCompass 上进行评估的步骤。
-
-### 1. 下载 CHARM
-```bash
-git clone https://github.com/opendatalab/CHARM ${path_to_CHARM_repo}
-
-cd ${path_to_opencompass}
-mkdir data
-ln -snf ${path_to_CHARM_repo}/data/CHARM ./data/CHARM
-```
-### 2. 推理和评测
-```bash
-cd ${path_to_opencompass}
-
-# 修改配置文件`configs/eval_charm_rea.py`: 将现有的模型取消注释，或者添加你想评测的模型
-python run.py configs/eval_charm_rea.py -r --dump-eval-details
-
-# 修改配置文件`configs/eval_charm_mem.py`: 将现有的模型取消注释，或者添加你想评测的模型
-python run.py configs/eval_charm_mem.py -r --dump-eval-details
-```
-推理和评测的结果位于路径`${path_to_opencompass}/outputs`, 如下所示:
-```bash
-outputs
-├── CHARM_mem
-│   └── chat
-│       └── 20240605_151442
-│           ├── predictions
-│           │   ├── internlm2-chat-1.8b-turbomind
-│           │   ├── llama-3-8b-instruct-lmdeploy
-│           │   └── qwen1.5-1.8b-chat-hf
-│           ├── results
-│           │   ├── internlm2-chat-1.8b-turbomind_judged-by--GPT-3.5-turbo-0125
-│           │   ├── llama-3-8b-instruct-lmdeploy_judged-by--GPT-3.5-turbo-0125
-│           │   └── qwen1.5-1.8b-chat-hf_judged-by--GPT-3.5-turbo-0125
-│           └── summary
-│               └── 20240605_205020 # MEMORY_SUMMARY_DIR
-│                   ├── judged-by--GPT-3.5-turbo-0125-charm-memory-Chinese_Anachronisms_Judgment
-│                   ├── judged-by--GPT-3.5-turbo-0125-charm-memory-Chinese_Movie_and_Music_Recommendation
-│                   ├── judged-by--GPT-3.5-turbo-0125-charm-memory-Chinese_Sport_Understanding
-│                   ├── judged-by--GPT-3.5-turbo-0125-charm-memory-Chinese_Time_Understanding
-│                   └── judged-by--GPT-3.5-turbo-0125.csv # MEMORY_SUMMARY_CSV
-└── CHARM_rea
-    └── chat
-        └── 20240605_152359
-            ├── predictions
-            │   ├── internlm2-chat-1.8b-turbomind
-            │   ├── llama-3-8b-instruct-lmdeploy
-            │   └── qwen1.5-1.8b-chat-hf
-            ├── results # REASON_RESULTS_DIR
-            │   ├── internlm2-chat-1.8b-turbomind
-            │   ├── llama-3-8b-instruct-lmdeploy
-            │   └── qwen1.5-1.8b-chat-hf
-            └── summary
-                ├── summary_20240605_205328.csv # REASON_SUMMARY_CSV
-                └── summary_20240605_205328.txt
-```
-### 3. 生成分析结果
-```bash
-cd ${path_to_CHARM_repo}
-
-# 生成论文中的Table5, Table6, Table9 and Table10，详见https://arxiv.org/abs/2403.14112
-PYTHONPATH=. python tools/summarize_reasoning.py ${REASON_SUMMARY_CSV}
-
-# 生成论文中的Figure3 and Figure9，详见https://arxiv.org/abs/2403.14112
-PYTHONPATH=. python tools/summarize_mem_rea.py ${REASON_SUMMARY_CSV} ${MEMORY_SUMMARY_CSV}
-
-# 生成论文中的Table7, Table12, Table13 and Figure11，详见https://arxiv.org/abs/2403.14112
-PYTHONPATH=. python tools/analyze_mem_indep_rea.py data/CHARM ${REASON_RESULTS_DIR} ${MEMORY_SUMMARY_DIR} ${MEMORY_SUMMARY_CSV}
-```
-
-## 🖊️ 引用
-```bibtex
-@misc{sun2024benchmarking,
-      title={Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations},
-      author={Jiaxing Sun and Weiquan Huang and Jiang Wu and Chenya Gu and Wei Li and Songyang Zhang and Hang Yan and Conghui He},
-      year={2024},
-      eprint={2403.14112},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL}
-}
-```
--- a/configs/datasets/CHARM/charm_memory_gen_bbbd53.py
+++ b/configs/datasets/CHARM/charm_memory_gen_bbbd53.py
@ -1,63 +0,0 @@
-import os
-from mmengine.config import read_base
-
-from opencompass.openicl.icl_prompt_template import PromptTemplate
-from opencompass.openicl.icl_retriever import ZeroRetriever
-from opencompass.openicl.icl_inferencer import GenInferencer
-from opencompass.datasets import CharmDataset, CharmMemoryEvaluator, LMEvaluator
-
-with read_base():
-    from .charm_memory_settings import charm_memory_tasks, judge_system_prompts, dataset_path
-
-charm_memory_datasets = []
-
-for _task in charm_memory_tasks:
-
-    charm_memory_reader_cfg = dict(input_columns=['input'],
-                                   output_column='target')
-
-    charm_memory_infer_cfg = dict(
-        prompt_template=dict(
-            type=PromptTemplate,
-            template=dict(round=[
-                dict(role='HUMAN', prompt='请尽可能简短地回答下述问题。\n问题：{input}\n答：')
-            ]),
-        ),
-        retriever=dict(type=ZeroRetriever),
-        inferencer=dict(type=GenInferencer, max_out_len=512),
-    )
-
-    if _task == 'Chinese_Movie_and_Music_Recommendation':
-        charm_memory_eval_cfg = dict(
-            evaluator=dict(type=CharmMemoryEvaluator),
-            pred_role='BOT',
-        )
-    else:
-        judge_system_prompt = judge_system_prompts[_task]
-        charm_memory_eval_cfg = dict(
-            evaluator=dict(
-                type=LMEvaluator,
-                prompt_template=dict(
-                    type=PromptTemplate,
-                    template=dict(round=[
-                        dict(
-                            role='HUMAN',
-                            prompt=judge_system_prompt +
-                            "\n\n[Question]\n{input}\n[The Start of Reference Answer]\n{target}\n[The End of Reference Answer]\n\n[The Start of Assistant's Answer]\n{prediction}\n[The End of Assistant's Answer]"  # noqa
-                        ),
-                    ]),
-                ),
-            ),
-            pred_role='BOT',
-        )
-
-    charm_memory_datasets.append(
-        dict(
-            type=CharmDataset,
-            path=dataset_path,
-            name=_task,
-            abbr='charm-memory-' + _task,
-            reader_cfg=charm_memory_reader_cfg,
-            infer_cfg=charm_memory_infer_cfg.copy(),
-            eval_cfg=charm_memory_eval_cfg.copy(),
-        ))
--- a/configs/datasets/CHARM/charm_memory_settings.py
+++ b/configs/datasets/CHARM/charm_memory_settings.py
@ -1,31 +0,0 @@
-import os
-
-charm_memory_tasks = [
-    'Chinese_Anachronisms_Judgment',
-    'Chinese_Movie_and_Music_Recommendation',
-    'Chinese_Sport_Understanding',
-    'Chinese_Time_Understanding',
-]
-
-dataset_path = 'data/CHARM/memorization'
-
-system_prompt_template = """Please act as an impartial judge, comparing the responses of the AI assistants to the reference answer and determining if the answers are correct.
-You will receive the reference answer provided by a human and the responses of the AI assistants.
-Your task is to judge whether the AI assistant's answers is correct.
-{task_specific_prompt}
-After providing your explanation, strictly output your final judgment in the following format: “[正确]” if the AI assistant's response is correct, “[错误]” if the AI assistant's response is incorrect.
-"""
-
-task_specific_prompts = {
-    'Chinese_Anachronisms_Judgment':
-    "If the provided reference answer is a list, the model's prediction is considered correct if it matches any item in the list.",
-    'Chinese_Time_Understanding':
-    "When evaluating the AI assistant's response regarding Chinese solar terms, as long as the AI assistant's response falls within the time frame provided in the reference answer, consider it correct.",
-    'Chinese_Sport_Understanding':
-    "If the provided reference answer is a list, the model's prediction is considered correct if it matches any item in the list."
-}
-
-judge_system_prompts = {
-    k: system_prompt_template.format(task_specific_prompt=v)
-    for k, v in task_specific_prompts.items()
-}
--- a/configs/datasets/CHARM/charm_reason_cot_only_gen_f7b7d3.py
+++ b/configs/datasets/CHARM/charm_reason_cot_only_gen_f7b7d3.py
@ -1,50 +0,0 @@
-import os
-from mmengine.config import read_base
-
-from opencompass.openicl.icl_prompt_template import PromptTemplate
-from opencompass.openicl.icl_retriever import ZeroRetriever
-from opencompass.openicl.icl_inferencer import GenInferencer
-from opencompass.datasets import CharmDataset, charm_reason_postprocess, CharmReasonEvaluator
-
-with read_base():
-    from .charm_reason_settings import charm_tasks, settings
-
-settings = [s for s in settings if s[0] in ['ZH-CoT', 'EN-CoT']]
-
-charm_reason_datasets = []
-
-for _cot, _cot_prefix, dataset_path, fewshot_example_path, prompt_template in settings:
-    for _task in charm_tasks:
-        _fewshot_example_file = os.path.join(fewshot_example_path, f'{_task}_{_cot}.txt')
-        with open(_fewshot_example_file, 'r') as f:
-            _hint = f.read()
-
-        charm_reason_reader_cfg = dict(input_columns=['input'], output_column='target')
-
-        charm_reason_infer_cfg = dict(
-            prompt_template=dict(
-                type=PromptTemplate,
-                template=dict(round=[dict(role='HUMAN', prompt=prompt_template.format(_hint=_hint) + _cot_prefix)]),
-            ),
-            retriever=dict(type=ZeroRetriever),
-            inferencer=dict(type=GenInferencer, max_out_len=512),
-        )
-
-        charm_reason_eval_cfg = dict(
-            evaluator=dict(type=CharmReasonEvaluator),
-            pred_role='BOT',
-            pred_postprocessor=dict(type=charm_reason_postprocess),
-            dataset_postprocessor=dict(type=charm_reason_postprocess),
-        )
-
-        charm_reason_datasets.append(
-            dict(
-                type=CharmDataset,
-                path=dataset_path,
-                name=_task,
-                abbr='charm-reason-' + _task + '_' + _cot,
-                reader_cfg=charm_reason_reader_cfg,
-                infer_cfg=charm_reason_infer_cfg.copy(),
-                eval_cfg=charm_reason_eval_cfg.copy(),
-            )
-        )
--- a/configs/datasets/CHARM/charm_reason_gen.py
+++ b/configs/datasets/CHARM/charm_reason_gen.py
@ -1,4 +0,0 @@
-from mmengine.config import read_base
-
-with read_base():
-    from .charm_reason_gen_f8fca2 import charm_reason_datasets  # noqa: F401, F403
--- a/configs/datasets/CHARM/charm_reason_gen_f8fca2.py
+++ b/configs/datasets/CHARM/charm_reason_gen_f8fca2.py
@ -1,49 +0,0 @@
-import os
-from mmengine.config import read_base
-
-from opencompass.openicl.icl_prompt_template import PromptTemplate
-from opencompass.openicl.icl_retriever import ZeroRetriever
-from opencompass.openicl.icl_inferencer import GenInferencer
-from opencompass.datasets import CharmDataset, charm_reason_postprocess, CharmReasonEvaluator
-
-with read_base():
-    from .charm_reason_settings import charm_tasks, settings
-
-
-charm_reason_datasets = []
-
-for _cot, _cot_prefix, dataset_path, fewshot_example_path, prompt_template in settings:
-    for _task in charm_tasks:
-        _fewshot_example_file = os.path.join(fewshot_example_path, f'{_task}_{_cot}.txt')
-        with open(_fewshot_example_file, 'r') as f:
-            _hint = f.read()
-
-        charm_reason_reader_cfg = dict(input_columns=['input'], output_column='target')
-
-        charm_reason_infer_cfg = dict(
-            prompt_template=dict(
-                type=PromptTemplate,
-                template=dict(round=[dict(role='HUMAN', prompt=prompt_template.format(_hint=_hint) + _cot_prefix)]),
-            ),
-            retriever=dict(type=ZeroRetriever),
-            inferencer=dict(type=GenInferencer, max_out_len=512),
-        )
-
-        charm_reason_eval_cfg = dict(
-            evaluator=dict(type=CharmReasonEvaluator),
-            pred_role='BOT',
-            pred_postprocessor=dict(type=charm_reason_postprocess),
-            dataset_postprocessor=dict(type=charm_reason_postprocess),
-        )
-
-        charm_reason_datasets.append(
-            dict(
-                type=CharmDataset,
-                path=dataset_path,
-                name=_task,
-                abbr='charm-reason-' + _task + '_' + _cot,
-                reader_cfg=charm_reason_reader_cfg,
-                infer_cfg=charm_reason_infer_cfg.copy(),
-                eval_cfg=charm_reason_eval_cfg.copy(),
-            )
-        )
--- a/configs/datasets/CHARM/charm_reason_ppl_3da4de.py
+++ b/configs/datasets/CHARM/charm_reason_ppl_3da4de.py
@ -1,57 +0,0 @@
-import os
-
-from opencompass.openicl.icl_prompt_template import PromptTemplate
-from opencompass.openicl.icl_retriever import ZeroRetriever
-from opencompass.openicl.icl_inferencer import PPLInferencer
-from opencompass.datasets import CharmDataset
-from opencompass.openicl.icl_evaluator import AccwithDetailsEvaluator
-
-charm_tasks = [
-    ['Chinese_Anachronisms_Judgment', 'AB'],
-    ['Chinese_Movie_and_Music_Recommendation', 'ABCD'],
-    ['Chinese_Natural_Language_Inference', 'ABC'],
-    ['Chinese_Reading_Comprehension', 'ABCD'],
-    ['Chinese_Sequence_Understanding', 'ABCD'],
-    ['Chinese_Sport_Understanding', 'AB'],
-    ['Chinese_Time_Understanding', 'ABCD'],
-    ['Global_Anachronisms_Judgment', 'AB'],
-    ['Global_Movie_and_Music_Recommendation', 'ABCD'],
-    ['Global_Natural_Language_Inference', 'ABC'],
-    ['Global_Reading_Comprehension', 'ABCD'],
-    ['Global_Sequence_Understanding', 'ABCD'],
-    ['Global_Sport_Understanding', 'AB'],
-    ['Global_Time_Understanding', 'ABCDEF'],
-]
-
-charm_reason_datasets = []
-for task_name, options in charm_tasks:
-
-    with open(os.path.join(os.path.dirname(__file__), 'few-shot-examples', f'{task_name}_Direct.txt'), 'r') as f:
-        few_shot_example = f.read()
-
-    charm_reason_reader_cfg = dict(input_columns=['input'], output_column='target')
-
-    charm_reason_infer_cfg = dict(
-        prompt_template=dict(
-            type=PromptTemplate,
-            template={
-                f'({opt})': f'{few_shot_example}\n{{input}}\nA: {opt}' for opt in options
-            },
-        ),
-        retriever=dict(type=ZeroRetriever),
-        inferencer=dict(type=PPLInferencer),
-    )
-
-    charm_reason_eval_cfg = dict(evaluator=dict(type=AccwithDetailsEvaluator))
-
-    charm_reason_datasets.append(
-        dict(
-            type=CharmDataset,
-            abbr=f'charm-reason-{task_name}_Direct',
-            path=f'data/CHARM/reasoning',
-            name=task_name,
-            reader_cfg=charm_reason_reader_cfg,
-            infer_cfg=charm_reason_infer_cfg,
-            eval_cfg=charm_reason_eval_cfg,
-        )
-    )
--- a/configs/datasets/CHARM/charm_reason_settings.py
+++ b/configs/datasets/CHARM/charm_reason_settings.py
@ -1,36 +0,0 @@
-import os
-
-charm_tasks = [
-    'Chinese_Anachronisms_Judgment',
-    'Chinese_Movie_and_Music_Recommendation',
-    'Chinese_Natural_Language_Inference',
-    'Chinese_Reading_Comprehension',
-    'Chinese_Sequence_Understanding',
-    'Chinese_Sport_Understanding',
-    'Chinese_Time_Understanding',
-    'Global_Anachronisms_Judgment',
-    'Global_Movie_and_Music_Recommendation',
-    'Global_Natural_Language_Inference',
-    'Global_Reading_Comprehension',
-    'Global_Sequence_Understanding',
-    'Global_Sport_Understanding',
-    'Global_Time_Understanding',
-]
-
-XLT_template = 'Follow the given examples and answer the question.\n{_hint}\n\n I want you to act as an commonsense reasoning expert for Chinese. \n Request: {{input}}\n'
-Translate_EN_template = 'Follow the given examples and answer the question.\n{_hint}\n\nQ: {{input}}\nA: '
-Other_template = '请按照给定的例子回答问题。\n{_hint}\n\nQ：{{input}}\nA：'
-
-data_dir = 'data/CHARM'
-dataset_path_ZH = f'{data_dir}/reasoning'
-dataset_path_TransEn = f'{data_dir}/reasoning_Translate-EN'
-fewshot_example_path_ZH = os.path.join(os.path.dirname(__file__), 'few-shot-examples')
-fewshot_example_path_TransEn = os.path.join(os.path.dirname(__file__), 'few-shot-examples_Translate-EN')
-
-settings = [
-    ('Direct', '', dataset_path_ZH, fewshot_example_path_ZH, Other_template),
-    ('ZH-CoT', '让我们一步一步来思考。', dataset_path_ZH, fewshot_example_path_ZH, Other_template),
-    ('EN-CoT', "Let's think step by step.", dataset_path_ZH, fewshot_example_path_ZH, Other_template),
-    ('XLT', """You should retell the request in English.\nYou should do the answer step by step to choose the right answer.\nYou should step-by-step answer the request.\nYou should tell me the answer in this format 'So the answer is'.""", dataset_path_ZH, fewshot_example_path_ZH, XLT_template),
-    ('Translate-EN', "Let's think step by step.", dataset_path_TransEn, fewshot_example_path_TransEn, Translate_EN_template),
-]
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Anachronisms_Judgment_Direct.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Anachronisms_Judgment_Direct.txt
@ -1,22 +0,0 @@
-识别给定陈述是否包含时代错误。
-
-Q：以下陈述是否包含时代错误，请选择正确选项。一个接受了义务教育、具备基本常识的人会如何选择？
-李白用钢笔写诗。
-选项：
-(A) 是
-(B) 否
-A:(A)
-
-Q：以下陈述是否包含时代错误，请选择正确选项。一个接受了义务教育、具备基本常识的人会如何选择？
-唐朝时期，人们可以在街头买到《红楼梦》。
-选项：
-(A) 是
-(B) 否
-A:(A)
-
-Q：以下陈述是否包含时代错误，请选择正确选项。一个接受了义务教育、具备基本常识的人会如何选择？
-1998年10月1日，人们手举五星红旗，在天安门广场上庆祝国庆
-选项：
-(A) 是
-(B) 否
-A:(B)
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Anachronisms_Judgment_EN-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Anachronisms_Judgment_EN-CoT.txt
@ -1,25 +0,0 @@
-识别给定陈述是否包含时代错误。
-
-Q：以下陈述是否包含时代错误，请选择正确选项。一个接受了义务教育、具备基本常识的人会如何选择？
-李白用钢笔写诗。
-选项：
-(A) 是
-(B) 否
-A：Let's think step by step.
-This statement mentions "Li Bai", a poet from the Tang Dynasty in China. The "pen" mentioned in the statement is a modern device, so it is impossible for Li Bai to write poetry with a pen. This statement contains errors from the times. So the answer is (A).
-
-Q：以下陈述是否包含时代错误，请选择正确选项。一个接受了义务教育、具备基本常识的人会如何选择？
-唐朝时期，人们可以在街头买到《红楼梦》。
-选项：
-(A) 是
-(B) 否
-A：Let's think step by step.
-This statement mentions "Dream of the Red Chamber", which was written by Qing Dynasty writer Cao Xueqin. There was no "Dream of the Red Chamber" during the Tang Dynasty, so this statement contains historical errors. So the answer is (A).
-
-Q：以下陈述是否包含时代错误，请选择正确选项。一个接受了义务教育、具备基本常识的人会如何选择？
-1998年10月1日，人们手举五星红旗，在天安门广场上庆祝国庆
-选项：
-(A) 是
-(B) 否
-A：Let's think step by step.
-This statement mentions that in 1998, New China was established in 1949, and the five-star red flag was designated as the national flag of China. Therefore, October 1, 1998 is National Day, and it is reasonable for people to celebrate National Day at Tiananmen Square, excluding historical errors. So the answer is (B).
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Anachronisms_Judgment_XLT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Anachronisms_Judgment_XLT.txt
@ -1,63 +0,0 @@
-识别给定陈述是否包含时代错误。
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：以下陈述是否包含时代错误，请选择正确选项。一个接受了义务教育、具备基本常识的人会如何选择？
-李白用钢笔写诗。
-选项：
-(A) 是
-(B) 否
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: How would a typical person answer each of the following statement whether contains an anachronism?
-Li Bai writes poetry with a pen.
-Option:
-(A) Yes
-(B) No
-Step-by-step answer:
-1.This statement mentions "Li Bai", a poet from the Tang Dynasty in China.
-2.The pen mentioned in the statement is a modern device.
-3.so it is impossible for Li Bai to write poetry with a pen. This statement contains errors from the times. 
-So the answer is (A).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：以下陈述是否包含时代错误，请选择正确选项。一个接受了义务教育、具备基本常识的人会如何选择？
-唐朝时期，人们可以在街头买到《红楼梦》。
-选项：
-(A) 是
-(B) 否
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: How would a typical person answer each of the following statement whether contains an anachronism?
-During the Tang Dynasty, people could buy "Dream of the Red Chamber" on the streets.
-Option:
-(A) Yes
-(B) No
-Step-by-step answer:
-1.This statement mentions "Dream of the Red Chamber", which was written by Qing Dynasty writer Cao Xueqin
-2.During the Tang Dynasty, there was no "Dream of the Red Chamber", so this statement contains historical errors.
-So the answer is (A).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：以下陈述是否包含时代错误，请选择正确选项。一个接受了义务教育、具备基本常识的人会如何选择？
-1998年10月1日，人们手举五星红旗，在天安门广场上庆祝国庆
-选项：
-(A) 是
-(B) 否
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: How would a typical person answer each of the following statement whether contains an anachronism?
-On October 1, 1998, people held five-star red flags and celebrated National Day on Tiananmen Square
-Option:
-(A) Yes
-(B) No
-Step-by-step answer:
-1.This statement mentions that in 1998, New China was established in 1949
-2.The Five Star Red Flag was designated as the national flag of China in 1949.
-3.So October 1st, 1998 is National Day, and it is reasonable for people to celebrate National Day at Tiananmen Square, so the statement does not contain any historical errors.
-So the answer is (B).
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Anachronisms_Judgment_ZH-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Anachronisms_Judgment_ZH-CoT.txt
@ -1,25 +0,0 @@
-识别给定陈述是否包含时代错误。
-
-Q：以下陈述是否包含时代错误，请选择正确选项。一个接受了义务教育、具备基本常识的人会如何选择？
-李白用钢笔写诗。
-选项：
-(A) 是
-(B) 否
-A：让我们一步一步来思考。
-这个陈述提到了“李白”，他是中国唐朝时期的诗人。而陈述中提到的“钢笔”是现代设备，因此李白不可能使用钢笔写诗，该陈述包含时代错误。所以答案是(A)。
-
-Q：以下陈述是否包含时代错误，请选择正确选项。一个接受了义务教育、具备基本常识的人会如何选择？
-唐朝时期，人们可以在街头买到《红楼梦》。
-选项：
-(A) 是
-(B) 否
-A：让我们一步一步来思考。
-这个陈述提到了《红楼梦》，《红楼梦》是清代作家曹雪芹所写，唐朝时还没有《红楼梦》，因此该陈述包含时代错误。所以答案是(A)。
-
-Q：以下陈述是否包含时代错误，请选择正确选项。一个接受了义务教育、具备基本常识的人会如何选择？
-1998年10月1日，人们手举五星红旗，在天安门广场上庆祝国庆
-选项：
-(A) 是
-(B) 否
-A：让我们一步一步来思考。
-这个陈述提到了1998年，新中国是1949年成立的，五星红旗在1949年被确定为中国国旗，因此1998年10月1日是国庆节，人们在天安门庆祝国庆是合理的，因此陈述不包含时代错误。所以答案是(B)。
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Movie_and_Music_Recommendation_Direct.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Movie_and_Music_Recommendation_Direct.txt
@ -1,25 +0,0 @@
-给根据给定艺术作品清单，找出最类似的。
-
-Q: 和这些电影《疯狂的外星人》、《斗牛》、《杀生》、《疯狂的石头》有共同点的电影是：
-选项:
-(A)《泰囧》
-(B)《少年派》
-(C)《江湖儿女》
-(D)《湄公河行动》
-A: (A)
-
-Q: 和这些电影《红高梁》、《活着》、《大红灯笼高高挂》、《英雄》有共同点的电影是：
-选项:
-(A)《一个都不能少》
-(B)《让子弹飞》
-(C)《阿飞正传》
-(D)《东邪西毒》
-A: (A)
-
-Q: 和这些歌曲《夜曲》、《本草纲目》、《听妈妈的话》、《七里香》有共同点的歌曲是：
-选项:
-(A)《双节棍》
-(B)《年少有为》
-(C)《浮夸》
-(D)《三人游》
-A: (A)
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Movie_and_Music_Recommendation_EN-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Movie_and_Music_Recommendation_EN-CoT.txt
@ -1,40 +0,0 @@
-给根据给定艺术作品清单，找出最类似的。
-
-Q: 和这些电影《疯狂的外星人》、《斗牛》、《杀生》、《疯狂的石头》有共同点的电影是：
-选项:
-(A)《泰囧》
-(B)《少年派》
-(C)《江湖儿女》
-(D)《湄公河行动》
-A：Let's think step by step.
-"Crazy Alien" is a comedy science fiction film directed by Ning Hao, written by Liu Cixin and Sun Xiaohang, and starring Huang Bo, Shen Teng, and Xu Zheng. It was released in 2019.
-"Cow" is a dark comedy film directed by Guan Hu, starring Huang Bo and Yan Ni. It was released in 2009.
-"Design of Death" is an absurd and suspenseful comedy film directed by Guan Hu, featuring Huang Bo, Simon Yam, Su Youpeng, and Yu Nan. It was released in 2012.
-"Crazy Stone" is a dark comedy film directed by Ning Hao, featuring Guo Tao, Liu Hua, Lian Jin, Liu Gang, Xu Zheng, and Huang Bo. It was released in 2006.
-These are all famous classic Chinese comedy films featuring Huang Bo. The only film among the options that seems to have something in common with these films is "Lost in Thailand" (directed by Xu Zheng, starring Huang Bo, Xu Zheng, and Wang Baoqiang), a comedy film released in 2012. So the answer is (A).
-
-Q: 和这些电影《红高梁》、《活着》、《大红灯笼高高挂》、《英雄》有共同点的电影是：
-选项:
-(A)《一个都不能少》
-(B)《让子弹飞》
-(C)《阿飞正传》
-(D)《东邪西毒》
-A:Let's think step by step.
-"Red Sorghum," directed by Zhang Yimou and starring Jiang Wen, Gong Li, and Teng Rujun, is a war drama film that was released in China in 1987.
-"To Live," directed by Zhang Yimou and starring Ge You and Gong Li, is a drama film that was released in China in 1994.
-"Raise the Red Lantern," directed by Zhang Yimou and starring Gong Li, He Saifei, Ma Jingwu, Cao Cuifen, Kong Lin, and Jin Shuyuan, is a drama film that was released in China in 1991.
-"Hero," directed by Zhang Yimou and starring Jet Li, Tony Leung, Maggie Cheung, Chen Daoming, Zhang Ziyi, and Donnie Yen, is a wuxia film that was released in China in 2002.
-These are all famous classic Chinese films directed by Zhang Yimou. The only film among the options that seems to have something in common with these films is "Not One Less" (directed by Zhang Yimou, starring Wei Minzhi and Zhang Huike), a drama film released in 1999. So the answer is (A).
-
-Q: 和这些歌曲《夜曲》、《本草纲目》、《听妈妈的话》、《七里香》有共同点的歌曲是：
-选项:
-(A)《双节棍》
-(B)《年少有为》
-(C)《浮夸》
-(D)《三人游》
-A：Let's think step by step.
-"Nocturne" is a song performed by Jay Chou, with lyrics by Vincent Fang, music by Jay Chou, and arrangement by Michael Lin. It is included in Jay Chou's 2005 album "November's Chopin."
-"Herbalist's Manual" is a song performed by Jay Chou, with lyrics by Vincent Fang, music by Jay Chou, and arrangement by Michael Lin. It is included in Jay Chou's 2006 album "Still Fantasy."
-"Listen to Your Mother" is a song performed by Jay Chou, with lyrics and music by Jay Chou, arrangement by Michael Lin and Hong Jingyao. It is included in Jay Chou's 2006 album "Still Fantasy."
-"Common Jasmine Orange" is a song performed by Jay Chou, with lyrics by Vincent Fang, music by Jay Chou, and arrangement by Chung Hsin-min. It is included in Jay Chou's self-titled album "Common Jasmine Orange" released in 2004.
-These are all famous pop songs performed by Jay Chou. The only song among the options that seems to have something in common with these songs is "Nunchucks" (performed by Jay Chou, composed by Jay Chou, lyrics by Vincent Fang, arrangement by Chung Hsin-min, included in Jay Chou's 2001 album "Fantasy"). So the answer is (A).
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Movie_and_Music_Recommendation_XLT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Movie_and_Music_Recommendation_XLT.txt
@ -1,76 +0,0 @@
-给根据给定艺术作品清单，找出最类似的。
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：和这些电影《疯狂的外星人》、《斗牛》、《杀生》、《疯狂的石头》有共同点的电影是：
-选项:
-(A)《泰囧》
-(B)《少年派》
-(C)《江湖儿女》
-(D)《湄公河行动》
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request：The movie that has something in common with these movies Crazy Aliens, Bullitt, Killjoys and Crazy Stone is:
-Options.
-(A)Lost in Thailand
-(B)The Young and the Restless
-(C)The Children of the River and the Lake
-(D)The Mekong Operation
-Step-by-step answer:
-1."Crazy Alien" is a comedy science fiction film directed by Ning Hao, written by Liu Cixin and Sun Xiaohang, and starring Huang Bo, Shen Teng, and Xu Zheng. It was released in 2019.
-2."Cow" is a dark comedy film directed by Guan Hu, starring Huang Bo and Yan Ni. It was released in 2009.
-3."Design of Death" is an absurd and suspenseful comedy film directed by Guan Hu, featuring Huang Bo, Simon Yam, Su Youpeng, and Yu Nan. It was released in 2012.
-4."Crazy Stone" is a dark comedy film directed by Ning Hao, featuring Guo Tao, Liu Hua, Lian Jin, Liu Gang, Xu Zheng, and Huang Bo. It was released in 2006.
-5.These are all famous classic Chinese comedy films featuring Huang Bo. The only film among the options that seems to have something in common with these films is "Lost in Thailand" (directed by Xu Zheng, starring Huang Bo, Xu Zheng, and Wang Baoqiang), a comedy film released in 2012. 
-So the answer is (A).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：和这些电影《红高梁》、《活着》、《大红灯笼高高挂》、《英雄》有共同点的电影是：
-选项:
-(A)《一个都不能少》
-(B)《让子弹飞》
-(C)《阿飞正传》
-(D)《东邪西毒》
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request：The movie that has something in common with these movies 'Red High Beam', 'Alive', 'Big Red Lantern' and 'Hero' is:
-Option.
-(A) 'Not One Less'
-(B)Let the Bullets Fly
-(C)The Legend of Ah Fei
-(D)East is East, West is West
-Step-by-step answer:
-1."Red Sorghum," directed by Zhang Yimou and starring Jiang Wen, Gong Li, and Teng Rujun, is a war drama film that was released in China in 1987.
-2."To Live," directed by Zhang Yimou and starring Ge You and Gong Li, is a drama film that was released in China in 1994.
-3."Raise the Red Lantern," directed by Zhang Yimou and starring Gong Li, He Saifei, Ma Jingwu, Cao Cuifen, Kong Lin, and Jin Shuyuan, is a drama film that was released in China in 1991.
-4."Hero," directed by Zhang Yimou and starring Jet Li, Tony Leung, Maggie Cheung, Chen Daoming, Zhang Ziyi, and Donnie Yen, is a wuxia film that was released in China in 2002.
-5.These are all famous classic Chinese films directed by Zhang Yimou. The only film among the options that seems to have something in common with these films is "Not One Less" (directed by Zhang Yimou, starring Wei Minzhi and Zhang Huike), a drama film released in 1999.
-So the answer is (A).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：和这些歌曲《夜曲》、《本草纲目》、《听妈妈的话》、《七里香》有共同点的歌曲是：
-选项:
-(A)《双节棍》
-(B)《年少有为》
-(C)《浮夸》
-(D)《三人游》
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request：The songs that have something in common with these songs "Nocturne", "Materia Medica", "Listen to Your Mother" and "Seven Miles" are:
-Option.
-(A) 'Nunchaku'
-(B) 'The Young and the Restless'
-(C) 'Pomp and Circumstance'
-(D) "Three's a Crowd
-Step-by-step answer:
-1."Nocturne" is a song performed by Jay Chou, with lyrics by Vincent Fang, music by Jay Chou, and arrangement by Michael Lin. It is included in Jay Chou's 2005 album "November's Chopin."
-2."Herbalist's Manual" is a song performed by Jay Chou, with lyrics by Vincent Fang, music by Jay Chou, and arrangement by Michael Lin. It is included in Jay Chou's 2006 album "Still Fantasy."
-3."Listen to Your Mother" is a song performed by Jay Chou, with lyrics and music by Jay Chou, arrangement by Michael Lin and Hong Jingyao. It is included in Jay Chou's 2006 album "Still Fantasy."
-4."Common Jasmine Orange" is a song performed by Jay Chou, with lyrics by Vincent Fang, music by Jay Chou, and arrangement by Chung Hsin-min. It is included in Jay Chou's self-titled album "Common Jasmine Orange" released in 2004.
-5.These are all famous pop songs performed by Jay Chou. The only song among the options that seems to have something in common with these songs is "Nunchucks" (performed by Jay Chou, composed by Jay Chou, lyrics by Vincent Fang, arrangement by Chung Hsin-min, included in Jay Chou's 2001 album "Fantasy").
-So the answer is (A).
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Movie_and_Music_Recommendation_ZH-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Movie_and_Music_Recommendation_ZH-CoT.txt
@ -1,40 +0,0 @@
-给根据给定艺术作品清单，找出最类似的。
-
-Q: 和这些电影《疯狂的外星人》、《斗牛》、《杀生》、《疯狂的石头》有共同点的电影是：
-选项:
-(A)《泰囧》
-(B)《少年派》
-(C)《江湖儿女》
-(D)《湄公河行动》
-A: 让我们一步一步来思考。
-《疯狂的外星人》是由宁浩执导，刘慈欣、孙小杭编剧，黄渤、沈腾、徐峥等主演的喜剧科幻片，2019年上映。
-《斗牛》是由管虎执导，黄渤、闫妮等主演的黑色喜剧电影，2009年上映。
-《杀生》是由管虎执导，黄渤、任达华、苏有朋、余男等联袂主演的荒诞悬疑喜剧片，2012年上映。
-《疯狂的石头》是宁浩执导，郭涛、刘桦、连晋、刘刚、徐峥、黄渤等出演的黑色喜剧片，2006年上映。
-这些都是有黄渤出演的著名经典中国喜剧电影，在所有选项中，唯一与这些电影有相同点的电影似乎是《泰囧》（徐峥执导，黄渤、徐峥、王宝强主演的喜剧片，2012年上映）。所以答案是(A)。
-
-Q: 和这些电影《红高梁》、《活着》、《大红灯笼高高挂》、《英雄》有共同点的电影是：
-选项:
-(A)《一个都不能少》
-(B)《让子弹飞》
-(C)《阿飞正传》
-(D)《东邪西毒》
-A: 让我们一步一步来思考。
-《红高粱》由张艺谋执导，姜文、巩俐、滕汝骏等主演的战争文艺片，1987年在中国上映。
-《活着》是由张艺谋执导，葛优、巩俐等主演的剧情片，1994年在中国上映。
-《大红灯笼高高挂》是由张艺谋执导，巩俐、何赛飞、马精武、曹翠芬、孔琳、金淑媛等主演的剧情片，1991年在中国上映。
-《英雄》是张艺谋执导，由李连杰、梁朝伟、张曼玉、陈道明、章子怡及甄子丹主演的的武侠电影，2002年在中国上映。
-这些都是由张艺谋执导的著名经典中国电影，在所有选项中，唯一与这些电影有相同点的电影似乎是《一个都不能少》（张艺谋执导，魏敏芝、张慧科主演的剧情电影，1999年上映）。所以答案是(A)。
-
-Q: 和这些歌曲《夜曲》、《本草纲目》、《听妈妈的话》、《七里香》有共同点的歌曲是：
-选项:
-(A)《双节棍》
-(B)《年少有为》
-(C)《浮夸》
-(D)《三人游》
-A: 让我们一步一步来思考。
-《夜曲》是周杰伦演唱的一首歌曲，由方文山作词，周杰伦作曲，林迈可编曲，收录在周杰伦2005年发行的专辑《11月的萧邦》中
-《本草纲目》是周杰伦演唱的一首歌曲，由方文山作词，周杰伦作曲，林迈可编曲，收录在周杰伦2006年发行的专辑《依然范特西》中。
-《听妈妈的话》是周杰伦演唱的一首歌曲，由周杰伦作词、作曲，林迈可、洪敬尧编曲，收录在周杰伦2006年发行的专辑《依然范特西》中。
-《七里香》是周杰伦演唱的一首歌曲，由方文山作词，周杰伦谱曲，钟兴民编曲，收录在周杰伦2004年发行的同名专辑《七里香》中。
-这些都是由周杰伦演唱的中国著名流行音乐歌曲，在所有选项中，唯一与这些歌曲有相同点的歌曲似乎是《双节棍》（由周杰伦演唱，由周杰伦作曲，方文山作词，钟兴民编曲，收录于周杰伦2001年发行的专辑《范特西》中）。所以答案是(A)。
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Natural_Language_Inference_Direct.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Natural_Language_Inference_Direct.txt
@ -1,25 +0,0 @@
-请根据题目中两句话的关系选择正确答案。
-
-Q:语句一：小明刚刚去什刹海滑冰
-语句二：北京现在是冬季
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A:(A)
-
-Q:语句一：下周，一股强降水将影响整个河北省
-语句二：下周，上海天气很好
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A:(C)
-
-Q:语句一：昨天，小明在山上看落日，感叹道："夕阳无限好，只是近黄昏"
-语句二：昨天下雨，小明没有出门
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A:(B)
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Natural_Language_Inference_EN-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Natural_Language_Inference_EN-CoT.txt
@ -1,28 +0,0 @@
-请根据题目中两句话的关系选择正确答案。
-
-Q:语句一：小明刚刚去什刹海滑冰
-语句二：北京现在是冬季
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A: Let's think step by step.
-The first sentence mentions that Xiaoming goes to Shichahai for ice skating, which usually takes place in winter. Moreover, Shichahai is located in Beijing, which contains the message from the second sentence that it is currently winter. So the answer is (A).
-
-Q:语句一：下周，一股强降水将影响整个河北省
-语句二：下周，上海天气很好
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A: Let's think step by step.
-These two sentences describe the weather conditions in two geographical locations, one in Hebei Province and the other in Shanghai. Hebei Province and Shanghai are geographically far apart, so the weather conditions in these two places may not necessarily be directly related. So, the relationship between these two sentences is irrelevant. So the answer is (C).
-
-Q:语句一：昨天，小明在山上看落日，感叹道："夕阳无限好，只是近黄昏"
-语句二：昨天下雨，小明没有出门
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A: Let's think step by step.
-The first sentence states that Xiaoming saw the sunset on the mountain yesterday, while the second sentence states that it rained yesterday and Xiaoming did not go out. There is a contradiction between these two sentences, because if Xiaoming had not gone out, he could not have seen the sunset on the mountain. So, the relationship between these two sentences is contradictory. So the answer is (B).
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Natural_Language_Inference_XLT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Natural_Language_Inference_XLT.txt
@ -1,67 +0,0 @@
-请根据题目中两句话的关系选择正确答案。
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：语句一：小明刚刚去什刹海滑冰
-语句二：北京现在是冬季
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request：Sentence 1: Xiaoming has just gone ice-skating in Shichahai
-Sentence 2: It's winter in Beijing
-What is the relationship between these two statements?
-(A) Implicit
-(B) Contradictory
-(C) Irrelevant
-Step-by-step answer:
-1.The first sentence mentions that Xiaoming goes to Shichahai for ice skating, which usually takes place in winter.
-2.Moreover, Shichahai is located in Beijing, which contains the message from the second sentence that it is currently winter.
-So the answer is (A).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：语句一：下周，一股强降水将影响整个河北省
-语句二：下周，上海天气很好
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request：Sentence 1：Next week, a heavy rainfall will affect the whole Hebei province
-Sentence 2: Next week, the weather in Shanghai will be fine.
-What is the relationship between these two statements?
-(A) Implied
-(B) Contradictory
-(C) Irrelevant
-Step-by-step answer:
-1.These two sentences describe the weather conditions in two geographical locations, one in Hebei Province and the other in Shanghai.
-2.Hebei Province and Shanghai are geographically far apart, so the weather conditions in these two places may not necessarily be directly related. So, the relationship between these two sentences is irrelevant.
-So the answer is (C).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：语句一：昨天，小明在山上看落日，感叹道："夕阳无限好，只是近黄昏"
-语句二：昨天下雨，小明没有出门
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request：Sentence 1: Yesterday, Xiao Ming watched the sunset on a hill and exclaimed, "The sunset is infinite, but it's just near dusk"
-Sentence 2: Yesterday it rained and Ming didn't go out
-What is the relationship between these two statements?
-(A) implied
-(B) contradictory
-(C) Irrelevant
-Step-by-step answer:
-1.The first sentence states that Xiaoming saw the sunset on the mountain yesterday, while the second sentence states that it rained yesterday and Xiaoming did not go out.
-2.There is a contradiction between these two sentences, because if Xiaoming had not gone out, he could not have seen the sunset on the mountain. So, the relationship between these two sentences is contradictory.
-So the answer is (B).
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Natural_Language_Inference_ZH-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Natural_Language_Inference_ZH-CoT.txt
@ -1,28 +0,0 @@
-请根据题目中两句话的关系选择正确答案。
-
-Q:语句一：小明刚刚去什刹海滑冰
-语句二：北京现在是冬季
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A:让我们一步一步来思考。
-第一句话提到小明去什刹海滑冰，而滑冰通常在冬季进行，而且什刹海位于北京，这蕴含了第二句话的信息，即当前是冬季。所以答案是(A)。
-
-Q:语句一：下周，一股强降水将影响整个河北省
-语句二：下周，上海天气很好
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A:让我们一步一步来思考。
-这两句话描述的是两个地理位置的天气情况，一个是河北省，一个是上海。河北省和上海在地理位置上相距较远，因此，这两个地方的天气情况并不一定有直接关联。所以，这两句话之间的关系是无关的。所以答案是(C)。
-
-Q:语句一：昨天，小明在山上看落日，感叹道："夕阳无限好，只是近黄昏"
-语句二：昨天下雨，小明没有出门
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A:让我们一步一步来思考。
-第一句话说小明昨天在山上看到了落日，而第二句话说昨天下雨，小明没有出门。这两句话之间存在矛盾，因为如果小明没有出门，那么他就不可能在山上看到落日。所以，这两句话之间的关系是矛盾的。所以答案是(B)。
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Reading_Comprehension_Direct.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Reading_Comprehension_Direct.txt
@ -1,23 +0,0 @@
-请理解题目含义并选择正确答案。
-
-Q:有些广东人不爱吃辣椒.因此,有些南方人不爱吃辣椒. 以下哪项能保证上述论证的成立？
-(A) 有些广东人爱吃辣椒
-(B) 爱吃辣椒的有些是南方人
-(C) 所有的广东人都是南方人
-(D) 有些广东人不爱吃辣椒也不爱吃甜食
-A：(C)
-
-
-Q:唐卡是极富藏族文化特色的一种绘画形式,自吐蕃王朝兴起至今已有1300多年的历史,是雪域高原的文化瑰宝.它的题材除宗教外,还有历史和民俗内容,故又被称为了解西藏的“百科全书”.所以,想要了解西藏的历史,除了正襟危坐地阅读严谨但略显呆板的史书外,你还可以选择一种惬意和愉悦的方式--欣赏唐卡,与众多的古人对话,想象曾经的历史事件,体味藏族人丰富的精神世界,了解独特的藏族民俗,这是一个让历史变得立体可感的过程. 这段文字意在说明:
-(A) 唐卡可以给大家提供一种惬意轻松的了解西藏的方式
-(B) 唐卡中记录了独特的藏族民俗和曾经的历史事件
-(C) 唐卡是了解西藏文化和历史的“百科全书”式的绘画形式
-(D) 唐卡是极富藏族文化特色且历史悠久的一种绘画形式
-A：(A)
-
-Q:“知人论世”作为一种文学批评的原则和方法,最早由战国时期的思想家孟子提出.孟子认为,后人要交结古人,只是读其诗书是不行的,还必须了解他们的为人行事以及他们的生活的时代,这样,才能读懂古人的诗书,才能和古人心契神交,成为知音. 对这段话的理解,不正确的是?
-(A) 人的心灵是可以互通和共鸣的
-(B) “知人论世”作为一种文学评论发沿用至今并显现了强大的生命力
-(C) “知人论世”可以帮助后人交结古人和古人成为知音
-(D) 了解古人和他所处的时代,有助于理解他的作品
-A: (B)
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Reading_Comprehension_EN-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Reading_Comprehension_EN-CoT.txt
@ -1,25 +0,0 @@
-请理解题目含义并选择正确答案。
-
-Q:有些广东人不爱吃辣椒.因此,有些南方人不爱吃辣椒. 以下哪项能保证上述论证的成立？
-(A) 有些广东人爱吃辣椒
-(B) 爱吃辣椒的有些是南方人
-(C) 所有的广东人都是南方人
-(D) 有些广东人不爱吃辣椒也不爱吃甜食
-A: Let's think step by step.
-In this argument, we infer from "some Cantonese people do not like to eat chili peppers" that "some southerners do not like to eat chili peppers". The establishment of this reasoning depends on the relationship between Cantonese and Southerners. In order for this reasoning to be valid, we need to ensure that at least a portion of Cantonese people are from the south. Therefore, option (C) "All Cantonese are southerners" can ensure the validity of this argument. So the answer is (C).
-
-Q:唐卡是极富藏族文化特色的一种绘画形式,自吐蕃王朝兴起至今已有1300多年的历史,是雪域高原的文化瑰宝.它的题材除宗教外,还有历史和民俗内容,故又被称为了解西藏的“百科全书”.所以,想要了解西藏的历史,除了正襟危坐地阅读严谨但略显呆板的史书外,你还可以选择一种惬意和愉悦的方式--欣赏唐卡,与众多的古人对话,想象曾经的历史事件,体味藏族人丰富的精神世界,了解独特的藏族民俗,这是一个让历史变得立体可感的过程. 这段文字意在说明:
-(A) 唐卡可以给大家提供一种惬意轻松的了解西藏的方式
-(B) 唐卡中记录了独特的藏族民俗和曾经的历史事件
-(C) 唐卡是了解西藏文化和历史的“百科全书”式的绘画形式
-(D) 唐卡是极富藏族文化特色且历史悠久的一种绘画形式
-A: Let's think step by step.
-It is explicitly mentioned in the article that besides reading rigorous but somewhat rigid historical books, appreciating thangkas is a comfortable and enjoyable way for people to converse with numerous ancient people, imagine past historical events, appreciate the rich spiritual world of Tibetans, and understand unique Tibetan customs. So the main purpose of this passage is (A) "Thangka can provide a comfortable and easy way for everyone to understand Xizang". So the answer is (A).
-
-Q:“知人论世”作为一种文学批评的原则和方法,最早由战国时期的思想家孟子提出.孟子认为,后人要交结古人,只是读其诗书是不行的,还必须了解他们的为人行事以及他们的生活的时代,这样,才能读懂古人的诗书,才能和古人心契神交,成为知音. 对这段话的理解,不正确的是?
-(A) 人的心灵是可以互通和共鸣的
-(B) “知人论世”作为一种文学评论发沿用至今并显现了强大的生命力
-(C) “知人论世”可以帮助后人交结古人和古人成为知音
-(D) 了解古人和他所处的时代,有助于理解他的作品
-A: Let's think step by step.
-From this passage, we cannot see (B) that "understanding people and discussing the world" as a literary criticism has been used to this day and has shown strong vitality. Although "knowing people and discussing the world" was indeed proposed by the philosopher Mencius during the Warring States period as a principle and method of literary criticism, this passage does not mention that "knowing people and discussing the world" is still in use today, or that it has shown strong vitality. Therefore, option (B) is an incorrect understanding. So the answer is (B).
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Reading_Comprehension_XLT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Reading_Comprehension_XLT.txt
@ -1,62 +0,0 @@
-请理解题目含义并选择正确答案。
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：有些广东人不爱吃辣椒.因此,有些南方人不爱吃辣椒. 以下哪项能保证上述论证的成立？
-(A) 有些广东人爱吃辣椒
-(B) 爱吃辣椒的有些是南方人
-(C) 所有的广东人都是南方人
-(D) 有些广东人不爱吃辣椒也不爱吃甜食
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: Some Cantonese people don't like chili peppers. Therefore, some southerners don't like chili peppers. Which of the following ensures the validity of the above argument?
-(A) Some Cantonese people love chili peppers
-(B) Some Southerners love chili peppers.
-(C) All Cantonese are Southerners.
-(D) Some Cantonese people do not love chili or sweets.
-Step-by-step answer:
-1.In this argument, we infer from "some Cantonese people do not like to eat chili peppers" that "some southerners do not like to eat chili peppers". 
-2.The establishment of this reasoning depends on the relationship between Cantonese and Southerners. In order for this reasoning to be valid, we need to ensure that at least a portion of Cantonese people are from the south. Therefore, option (C) "All Cantonese are southerners" can ensure the validity of this argument.
-So the answer is (C).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：唐卡是极富藏族文化特色的一种绘画形式,自吐蕃王朝兴起至今已有1300多年的历史,是雪域高原的文化瑰宝.它的题材除宗教外,还有历史和民俗内容,故又被称为了解西藏的“百科全书”.所以,想要了解西藏的历史,除了正襟危坐地阅读严谨但略显呆板的史书外,你还可以选择一种惬意和愉悦的方式--欣赏唐卡,与众多的古人对话,想象曾经的历史事件,体味藏族人丰富的精神世界,了解独特的藏族民俗,这是一个让历史变得立体可感的过程. 这段文字意在说明:
-(A) 唐卡可以给大家提供一种惬意轻松的了解西藏的方式
-(B) 唐卡中记录了独特的藏族民俗和曾经的历史事件
-(C) 唐卡是了解西藏文化和历史的“百科全书”式的绘画形式
-(D) 唐卡是极富藏族文化特色且历史悠久的一种绘画形式
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: Thangka is a form of painting rich in Tibetan cultural characteristics, which has a history of more than 1300 years since the rise of the Tubo Dynasty and is a cultural treasure of the Snowy Plateau. It is a cultural treasure of the Snowy Plateau. Its subject matter is not only religious, but also historical and folklore content, so it is also known as the "encyclopedia" to understand Tibet. Therefore, if you want to understand the history of Tibet, in addition to sitting down and reading the strict but slightly dull history books, you can also choose a pleasant and enjoyable way - enjoying the thangka, conversing with many ancient people, imagining the historical events, savoring the rich spiritual world of the Tibetans, and understanding the unique folklore of the Tibetans, which is a process to make the history become a three-dimensional and palpable. This is a process of making history three-dimensional and palpable.
-(A) Thangkas can provide a cozy and relaxing way to learn about Tibet.
-(B) The thangkas are a unique record of Tibetan folklore and historical events.
-(C) The thangka is an "encyclopedic" form of painting for understanding Tibetan culture and history.
-(D) The thangka is a form of painting that is rich in Tibetan cultural characteristics and has a long history.
-Step-by-step answer:
-1.It is explicitly mentioned in the article that besides reading rigorous but somewhat rigid historical books, appreciating thangkas is a comfortable and enjoyable way for people to converse with numerous ancient people, imagine past historical events, appreciate the rich spiritual world of Tibetans, and understand unique Tibetan customs.
-2.So the main purpose of this passage is (A) "Thangka can provide a comfortable and easy way for everyone to understand Xizang".
-So the answer is (A).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：“知人论世”作为一种文学批评的原则和方法,最早由战国时期的思想家孟子提出.孟子认为,后人要交结古人,只是读其诗书是不行的,还必须了解他们的为人行事以及他们的生活的时代,这样,才能读懂古人的诗书,才能和古人心契神交,成为知音. 对这段话的理解,不正确的是?
-(A) 人的心灵是可以互通和共鸣的
-(B) “知人论世”作为一种文学评论发沿用至今并显现了强大的生命力
-(C) “知人论世”可以帮助后人交结古人和古人成为知音
-(D) 了解古人和他所处的时代,有助于理解他的作品
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: As a principle and method of literary criticism, the concept of "knowing people and discussing the world" was first proposed by Mencius, a thinker of the Warring States period. According to Mencius, in order to make friends with the ancients, it is not enough just to read their poems and books, but also to understand their behavior and the times in which they lived, so as to read and understand their poems and books, and to make friends with them and become their soulmates. What is the incorrect understanding of this passage?
-(A) People's hearts can communicate and resonate with each other.
-(B) "Knowing people and discussing the world" has been used as a kind of literary criticism and has shown a strong vitality.
-(C) "Knowing people and discussing the world" can help the descendants to make friends with the ancients and the ancients to become their soulmates.
-(D) Knowing about the ancients and their times helps to understand their works.
-Step-by-step answer:
-1.From this passage, we cannot see (B) that "understanding people and discussing the world" as a literary criticism has been used to this day and has shown strong vitality.
-2.Although "knowing people and discussing the world" was indeed proposed by the philosopher Mencius during the Warring States period as a principle and method of literary criticism, this passage does not mention that "knowing people and discussing the world" is still in use today, or that it has shown strong vitality.
-3.Therefore, option (B) is an incorrect understanding.
-So the answer is (B).
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Reading_Comprehension_ZH-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Reading_Comprehension_ZH-CoT.txt
@ -1,26 +0,0 @@
-请理解题目含义并选择正确答案。
-
-Q:有些广东人不爱吃辣椒.因此,有些南方人不爱吃辣椒. 以下哪项能保证上述论证的成立？
-(A) 有些广东人爱吃辣椒
-(B) 爱吃辣椒的有些是南方人
-(C) 所有的广东人都是南方人
-(D) 有些广东人不爱吃辣椒也不爱吃甜食
-A：让我们一步一步来思考。
-在这个论证中，我们从"有些广东人不爱吃辣椒"推断出"有些南方人不爱吃辣椒"。这个推理的成立依赖于广东人和南方人的关系。为了使这个推理有效，我们需要保证至少一部分广东人是南方人。因此，选项 (C) "所有的广东人都是南方人" 是能保证这个论证成立的。所以答案是(C)。
-
-
-Q:唐卡是极富藏族文化特色的一种绘画形式,自吐蕃王朝兴起至今已有1300多年的历史,是雪域高原的文化瑰宝.它的题材除宗教外,还有历史和民俗内容,故又被称为了解西藏的“百科全书”.所以,想要了解西藏的历史,除了正襟危坐地阅读严谨但略显呆板的史书外,你还可以选择一种惬意和愉悦的方式--欣赏唐卡,与众多的古人对话,想象曾经的历史事件,体味藏族人丰富的精神世界,了解独特的藏族民俗,这是一个让历史变得立体可感的过程. 这段文字意在说明:
-(A) 唐卡可以给大家提供一种惬意轻松的了解西藏的方式
-(B) 唐卡中记录了独特的藏族民俗和曾经的历史事件
-(C) 唐卡是了解西藏文化和历史的“百科全书”式的绘画形式
-(D) 唐卡是极富藏族文化特色且历史悠久的一种绘画形式
-A：让我们一步一步来思考。
-文中明确提到了，除了阅读严谨但略显呆板的史书外，欣赏唐卡是一种惬意和愉悦的方式，可以让人与众多的古人对话，想象曾经的历史事件，体味藏族人丰富的精神世界，了解独特的藏族民俗。所以这段文字的主要意图是 (A) "唐卡可以给大家提供一种惬意轻松的了解西藏的方式"。所以答案是(A)。
-
-Q:“知人论世”作为一种文学批评的原则和方法,最早由战国时期的思想家孟子提出.孟子认为,后人要交结古人,只是读其诗书是不行的,还必须了解他们的为人行事以及他们的生活的时代,这样,才能读懂古人的诗书,才能和古人心契神交,成为知音. 对这段话的理解,不正确的是?
-(A) 人的心灵是可以互通和共鸣的
-(B) “知人论世”作为一种文学评论发沿用至今并显现了强大的生命力
-(C) “知人论世”可以帮助后人交结古人和古人成为知音
-(D) 了解古人和他所处的时代,有助于理解他的作品
-A:让我们一步一步来思考。
-从这段话中，我们看不到（B）“知人论世”作为一种文学批评已经沿用至今，并显示出强大的生命力。虽然“知人论世”确实是战国时期哲学家孟子提出的一种文学批评的原则和方法，但这段话并没有提到“知人论世”在今天仍在使用，也没有提到它已经显示出强大的生命力。因此，选项（B）是一种错误的理解。所以答案是（B）。
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Sequence_Understanding_Direct.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Sequence_Understanding_Direct.txt
@ -1,22 +0,0 @@
-根据上下文选择正确答案
-
-Q: 下列人物按时间先后顺序排序正确的是？选项：
-(A) 秦始皇、诸葛亮、刘邦、白居易
-(B) 诸葛亮、秦始皇、刘邦、白居易
-(C) 秦始皇、刘邦、诸葛亮、白居易
-(D) 白居易、诸葛亮、刘邦、秦始皇
-A：(C)
-
-Q:下列描述年龄的词语按照年龄从小到大的顺序排序正确的是？选项：
-(A) 不惑、而立、知天命、花甲
-(B) 而立、不惑、知天命、花甲
-(C) 花甲、知天命、而立、不惑
-(D) 而立、花甲、不惑、知天命
-A：(B)
-
-Q:下列制作老式棒棒糖的步骤正确的是？选项：
-(A) 准备材料、将糖浆倒入模具、制作糖浆、冷却定型
-(B) 准备材料、制作糖浆、将糖浆倒入模具、冷却定型
-(C) 准备材料、将糖浆倒入模具、冷却定型、制作糖浆
-(D) 准备材料、冷却定型、制作糖浆、将糖浆倒入模具
-A：(B)
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Sequence_Understanding_EN-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Sequence_Understanding_EN-CoT.txt
@ -1,25 +0,0 @@
-根据上下文选择正确答案
-
-Q: 下列人物按时间先后顺序排序正确的是？选项：
-(A) 秦始皇、诸葛亮、刘邦、白居易
-(B) 诸葛亮、秦始皇、刘邦、白居易
-(C) 秦始皇、刘邦、诸葛亮、白居易
-(D) 白居易、诸葛亮、刘邦、秦始皇
-A: Let's think step by step.
-There are four characters mentioned in the options, among which Qin Shi Huang is from the Qin Dynasty, Zhuge Liang is from the Three Kingdoms period, Liu Bang is from the Han Dynasty period, and Bai Juyi is from the Tang Dynasty period. They are sorted in chronological order as Qin Dynasty, Han Dynasty, Three Kingdoms period, and Tang Dynasty. Therefore, the characters are sorted in chronological order as Qin Shi Huang, Liu Bang, Zhuge Liang, and Bai Juyi. So the answer is (C).
-
-Q:下列描述年龄的词语按照年龄从小到大的顺序排序正确的是？选项：
-(A) 不惑、而立、知天命、花甲
-(B) 而立、不惑、知天命、花甲
-(C) 花甲、知天命、而立、不惑
-(D) 而立、花甲、不惑、知天命
-A: Let's think step by step.
-The options mention four words that describe age, among which "Erli" refers to 30 years old, "Bu Fu" refers to 40 years old, "Zhi Tian Ming" refers to 50 years old, and "Hua Jia" refers to 60 years old. Therefore, in order of age, they are Erli, Bu Fu, Zhi Tian Ming, and Hua Jia. So the answer is (B).
-
-Q:下列制作老式棒棒糖的步骤正确的是？选项：
-(A) 准备材料、将糖浆倒入模具、制作糖浆、冷却定型
-(B) 准备材料、制作糖浆、将糖浆倒入模具、冷却定型
-(C) 准备材料、将糖浆倒入模具、冷却定型、制作糖浆
-(D) 准备材料、冷却定型、制作糖浆、将糖浆倒入模具
-A: Let's think step by step.
-The title mentions the steps to make old-fashioned lollipops, and the options include "preparing materials", "pouring syrup into the mold", "making syrup", and "cooling and shaping". According to the steps to make old-fashioned lollipops, the first step should be to prepare the materials, then make syrup, pour syrup into the mold, and finally cool and shape. So the answer is (B).
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Sequence_Understanding_XLT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Sequence_Understanding_XLT.txt
@ -1,62 +0,0 @@
-根据上下文选择正确答案
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request: 下列人物按时间先后顺序排序正确的是？选项：
-(A) 秦始皇、诸葛亮、刘邦、白居易
-(B) 诸葛亮、秦始皇、刘邦、白居易
-(C) 秦始皇、刘邦、诸葛亮、白居易
-(D) 白居易、诸葛亮、刘邦、秦始皇
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: The following characters are correctly ordered in chronological order? Options:
-(A) Qin Shi Huang, Zhuge Liang, Liu Bang, Bai Juyi
-(B) Zhuge Liang, Qin Shi Huang, Liu Bang, Bai Ju Yi
-(C) Qin Shi Huang, Liu Bang, Zhu Geliang, Bai Juyi
-(D) Bai Juyi, Zhu Geliang, Liu Bang, Qin Shi Huang
-Step-by-step answer:
-1.There are four characters mentioned in the options, among which Qin Shi Huang is from the Qin Dynasty, Zhuge Liang is from the Three Kingdoms period, Liu Bang is from the Han Dynasty period, and Bai Juyi is from the Tang Dynasty period.
-2.They are sorted in chronological order as Qin Dynasty, Han Dynasty, Three Kingdoms period, and Tang Dynasty.
-3.Therefore, the characters are sorted in chronological order as Qin Shi Huang, Liu Bang, Zhuge Liang, and Bai Juyi.
-So the answer is (C).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request: 下列描述年龄的词语按照年龄从小到大的顺序排序正确的是？选项：
-(A) 不惑、而立、知天命、花甲
-(B) 而立、不惑、知天命、花甲
-(C) 花甲、知天命、而立、不惑
-(D) 而立、花甲、不惑、知天命
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: The following words describing age are correctly ordered from youngest to oldest age? Options:
-(A) Unconfused, Established, Knowledge of life, Flowering age
-(B) To stand, not to be confused, to know one's destiny, and to be in the prime of life.
-(C) Being in the prime of life, knowing one's destiny, being in the prime of life, not being confused.
-(D) to stand up, to grow old, to be unperturbed, to know one's destiny
-Step-by-step answer:
-1.The options mention four words that describe age, among which "Erli" refers to 30 years old, "Bu Fu" refers to 40 years old, "Zhi Tian Ming" refers to 50 years old, and "Hua Jia" refers to 60 years old.
-2.Therefore, in order of age, they are Erli, Bu Fu, Zhi Tian Ming, and Hua Jia.
-So the answer is (B).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request: 下列制作老式棒棒糖的步骤正确的是？选项：
-(A) 准备材料、将糖浆倒入模具、制作糖浆、冷却定型
-(B) 准备材料、制作糖浆、将糖浆倒入模具、冷却定型
-(C) 准备材料、将糖浆倒入模具、冷却定型、制作糖浆
-(D) 准备材料、冷却定型、制作糖浆、将糖浆倒入模具
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: Which of the following steps is correct for making old-fashioned lollipops? Options:
-(A) Preparing the ingredients, pouring the syrup into the molds, making the syrup, cooling to set the shape
-(B) Prepare ingredients, make syrup, pour syrup into molds, cool to set
-(C) Prepare ingredients, pour syrup into mold, cool and set, make syrup
-(D) Prepare ingredients, cool and set, make syrup, pour syrup into molds
-Step-by-step answer:
-1.The title mentions the steps to make old-fashioned lollipops, and the options include "preparing materials", "pouring syrup into the mold", "making syrup", and "cooling and shaping".
-2.According to the steps to make old-fashioned lollipops, the first step should be to prepare the materials, then make syrup, pour syrup into the mold, and finally cool and shape.
-So the answer is (B).
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Sequence_Understanding_ZH-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Sequence_Understanding_ZH-CoT.txt
@ -1,25 +0,0 @@
-根据上下文选择正确答案
-
-Q: 下列人物按时间先后顺序排序正确的是？选项：
-(A) 秦始皇、诸葛亮、刘邦、白居易
-(B) 诸葛亮、秦始皇、刘邦、白居易
-(C) 秦始皇、刘邦、诸葛亮、白居易
-(D) 白居易、诸葛亮、刘邦、秦始皇
-A：让我们一步一步来思考。
-选项中提到了4个人物，其中秦始皇是秦朝的，诸葛亮是三国时期的，刘邦是汉朝时期的，白居易是唐朝时期的，按照时间先后排序为秦朝，汉朝，三国时期，唐朝，因此人物先后排序为秦始皇、刘邦、诸葛亮、白居易。所以答案是(C)。
-
-Q:下列描述年龄的词语按照年龄从小到大的顺序排序正确的是？选项：
-(A) 不惑、而立、知天命、花甲
-(B) 而立、不惑、知天命、花甲
-(C) 花甲、知天命、而立、不惑
-(D) 而立、花甲、不惑、知天命
-A：让我们一步一步来思考。
-选项中提到了四个描述年龄的词语，其中“而立”指30岁，“不惑”指40岁，“知天命”指50岁，“花甲”指60岁，因此按年龄从小到大排序是而立、不惑、知天命、花甲。所以答案是(B)。
-
-Q:下列制作老式棒棒糖的步骤正确的是？选项：
-(A) 准备材料、将糖浆倒入模具、制作糖浆、冷却定型
-(B) 准备材料、制作糖浆、将糖浆倒入模具、冷却定型
-(C) 准备材料、将糖浆倒入模具、冷却定型、制作糖浆
-(D) 准备材料、冷却定型、制作糖浆、将糖浆倒入模具
-A：让我们一步一步来思考。
-题目中提到了制作老式棒棒糖的步骤，选项中包含“准备材料”、“将糖浆倒入模具”、“制作糖浆”、“冷却定型”四个步骤，按照制作老式棒棒糖的步骤第一步应该是准备材料，然后需要制作糖浆，再把糖浆倒入模具，最后冷却定型。所以答案是(B)。
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Sport_Understanding_Direct.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Sport_Understanding_Direct.txt
@ -1,19 +0,0 @@
-判断一句人为构造的关于体育的句子是否可信。
-
-Q: 下面的句子可信吗? "运动员张培萌一传稳定如山，使得进攻端占据着主动"
-选项：
-(A)可信
-(B)不可信
-A: (B)
-
-Q: 下面的句子可信吗? "运动员郭艾伦打入任意球"
-选项：
-(A)可信
-(B)不可信
-A: (B)
-
-Q: 下面的句子可信吗? "运动员范智毅射门得分"
-选项：
-(A)可信
-(B)不可信
-A: (A)
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Sport_Understanding_EN-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Sport_Understanding_EN-CoT.txt
@ -1,22 +0,0 @@
-判断一句人为构造的关于体育的句子是否可信。
-
-Q: 下面的句子可信吗? "运动员张培萌一传稳定如山，使得进攻端占据着主动"
-选项：
-(A)可信
-(B)不可信
-A: Let's think step by step.
-Zhang Peimeng is an athlete from China who competes in track and field. First pass is a term in volleyball. Since he is not a volleyball player, he won't be involved in the passing part of a volleyball game. So the answer is (B).
-
-Q: 下面的句子可信吗? "运动员郭艾伦打入任意球"
-选项：
-(A)可信
-(B)不可信
-A: Let's think step by step.
-Guo Ailun is a Chinese basketball player, and "free kick" is a term used in soccer. Since he is not a soccer player, he won't be involved in the free kick phase of a soccer match. Therefore, So the answer is (B).
-
-Q: 下面的句子可信吗? "运动员范智毅射门得分"
-选项：
-(A)可信
-(B)不可信
-A: Let's think step by step.
-Fan Zhiyi is a Chinese football player, and "scoring a goal" is a term used in football. Since he is a professional football player, he is very likely to score a goal. So the answer is (A).
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Sport_Understanding_XLT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Sport_Understanding_XLT.txt
@ -1,56 +0,0 @@
-判断一句人为构造的关于体育的句子是否可信。
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：下面的句子可信吗? "运动员张培萌一传稳定如山，使得进攻端占据着主动"
-选项：
-(A)可信
-(B)不可信
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request：Is the following sentence credible? "Zhang Peimeng's pass was as stable as a mountain, allowing the attacking end to take the initiative."
-Option:
-(A) Credible
-(B) Not credible
-Step-by-step answer:
-1.Zhang Peimeng is an athlete from China who competes in track and field. First pass is a term in volleyball.
-2.Since he is not a volleyball player, he won't be involved in the passing part of a volleyball game. So the answer is (B).
-So the answer is (B).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：下面的句子可信吗? "运动员郭艾伦打入任意球"
-选项：
-(A)可信
-(B)不可信
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request：Is the following sentence credible? "Guo Ailun scored a free kick"
-Option:
-(A) Credible
-(B) Not credible
-Step-by-step answer:
-1.Guo Ailun is a Chinese basketball player, and "free kick" is a term used in soccer.
-2.Since he is not a soccer player, he won't be involved in the free kick phase of a soccer match.
-So the answer is (B).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：下面的句子可信吗? "运动员范智毅射门得分"
-选项：
-(A)可信
-(B)不可信
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request：Is the following sentence credible? "Fan Zhiyi's shooting score"
-Option:
-(A) Credible
-(B) Not credible
-Step-by-step answer:
-1.Fan Zhiyi is a Chinese football player, and "scoring a goal" is a term used in football.
-2.Since he is a professional football player, he is very likely to score a goal.
-So the answer is (A).
-
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Sport_Understanding_ZH-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Sport_Understanding_ZH-CoT.txt
@ -1,22 +0,0 @@
-判断一句人为构造的关于体育的句子是否可信。
-
-Q: 下面的句子可信吗? "运动员张培萌一传稳定如山，使得进攻端占据着主动"
-选项：
-(A)可信
-(B)不可信
-A: 让我们一步一步来思考。
-张培萌是一位中国的田径运动员，一传是排球运动术语。由于他并不是排球运动员，因此他不会参与到排球比赛中的传球环节。所以答案是(B)。
-
-Q: 下面的句子可信吗? "运动员郭艾伦打入任意球"
-选项：
-(A)可信
-(B)不可信
-A: 让我们一步一步来思考。
-郭艾伦是一位中国的篮球运动员，任意球是足球运动术语。由于他并不是足球运动员，因此他不会在参与到足球比赛的任意球环节，所以答案是(B)。
-
-Q: 下面的句子可信吗? "运动员范智毅射门得分"
-选项：
-(A)可信
-(B)不可信
-A: 让我们一步一步来思考。
-范智毅是一位中国的足球运动员，射门得分是足球运动术语。由于他是一名专业的足球运动员，因此他很可能射门得分，所以答案是(A)。
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Time_Understanding_Direct.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Time_Understanding_Direct.txt
@ -1,25 +0,0 @@
-根据上下文选择正确答案。
-
-Q：小华在丙申年出生，他的哥哥比他大6岁，所以他的哥哥出生在哪一年？
-选项：
-(A) 己卯年
-(B) 庚寅年
-(C) 丙申年
-(D) 乙丑年
-A：(B)
-
-Q：如果今年是甲辰年，李华的爷爷说：“今年是我的知天命之年”，请问赵婷爷爷的属相是什么？
-选项：
-(A) 狗
-(B) 虎
-(C) 鸡
-(D) 鼠
-A：(B)
-
-Q：小李在亥时三刻开始制作画画，他知道他需要10个小时才能完成。那么他最早在什么时辰可以完成？
-选项：
- (A) 辰时
- (B) 卯时
- (C) 午时
- (D) 未时
-A：(A)
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Time_Understanding_EN-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Time_Understanding_EN-CoT.txt
@ -1,28 +0,0 @@
-根据上下文选择正确答案。
-
-Q：小华在丙申年出生，他的哥哥比他大6岁，所以他的哥哥出生在哪一年？
-选项：
-(A) 己卯年
-(B) 庚寅年
-(C) 丙申年
-(D) 乙丑年
-A：Let's think step by step.
-Xiaohua was born in the year of Bingshen, and his older brother was 6 years older than him. The sixth year before Bingshen was in the year of Gengyin, so his older brother was born in the year of Gengyin. So the answer is (B).
-
-Q：如果今年是甲辰年，李华的爷爷说：“今年是我的知天命之年”，请问赵婷爷爷的属相是什么？
-选项：
-(A) 狗
-(B) 虎
-(C) 鸡
-(D) 鼠
-A：Let's think step by step.
-The title mentions that Grandpa was born in the year of Jiayin, which is the year of the Tiger. In ancient China, the term "year of knowing the destiny of heaven" referred to the age of 50. Therefore, Grandpa is 50 years old this year, which is the year of Jiachen. According to the Chinese Tiangan Dizhi chronology, the year of Grandpa's birth is the year of Jiayin, which is the year of the Tiger. Therefore, Grandpa belongs to the Year of the Tiger. So the answer is (B).
-
-Q：小李在亥时三刻开始制作画画，他知道他需要10个小时才能完成。那么他最早在什么时辰可以完成？
-选项：
- (A) 辰时
- (B) 卯时
- (C) 午时
- (D) 未时
-A：Let's think step by step.
-According to the ancient Chinese timing method, the third quarter of the pig hour refers to approximately 21:45 minutes, and 10 hours later it is 7:45 minutes, which is the Dragon Hour . So the answer is (A).
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Time_Understanding_XLT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Time_Understanding_XLT.txt
@ -1,68 +0,0 @@
-根据上下文选择正确答案。
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：小华在丙申年出生，他的哥哥比他大6岁，所以他的哥哥出生在哪一年？
-选项：
-(A) 己卯年
-(B) 庚寅年
-(C) 丙申年
-(D) 乙丑年
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: Xiaohua was born in the year of Bingshen, and his brother is 6 years older than him, so in which year was his brother born?
-Option:
-(A) Ji Mao Year
-(B) Gengyin Year
-(C) Bingshen Year
-(D) Yi Chou Year
-Step-by-step answer:
-1.Xiaohua was born in the year of Bingshen, and his older brother is 6 years older than him. According to the Chinese Tian Gan Di Zhi chronology, the sixth year before Bingshen is the year of Gengyin.
-2.So his brother was born in the year of Gengyin.
-So the answer is (B).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：如果今年是甲辰年，李华的爷爷说：“今年是我的知天命之年”，请问赵婷爷爷的属相是什么？
-选项：
-(A) 狗
-(B) 虎
-(C) 鸡
-(D) 鼠
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: If this year is the year of Jiachen and Li Hua's grandfather says, "This year is my year of knowing fate." What is the zodiac sign of Grandpa Zhao Ting?
-Option:
-(A) Dogs
-(B) Tiger
-(C) Chicken
-(D) Mouse
-Step-by-step answer:
-1.The title mentions that Grandpa is the "year of knowing destiny", and in ancient China, "year of knowing destiny" referred to the age of 50, so Grandpa is 50 years old this year.
-2.This year is the year of Jiachen. According to the Chinese Tiangan and Dizhi chronology, the year my grandfather was born is the year of Jiayin.
-3.The year of Jiayin is the year of the tiger, so my grandfather belongs to the tiger.
-So the answer is (B).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：小李在亥时三刻开始制作画画，他知道他需要10个小时才能完成。那么他最早在什么时辰可以完成？
-选项：
- (A) 辰时
- (B) 卯时
- (C) 午时
- (D) 未时
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request：Xiao Li started making paintings at the third quarter of the pig hour, and he knew it would take him 10 hours to complete. So when can he finish it at the earliest?
-Option:
-(A) Dragon Hour
-(B) Rabbit Hour
-(C) Horse Hour
-(D) Sheep Hour
-Step-by-step answer:
-1.According to the ancient Chinese timing method, the third quarter of the hour of Hai refers to approximately 21:45,
-2.Ten hours later, it will be 7:45, which is Dragon Hour .
-So the answer is (A).
--- a/configs/datasets/CHARM/few-shot-examples/Chinese_Time_Understanding_ZH-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Chinese_Time_Understanding_ZH-CoT.txt
@ -1,28 +0,0 @@
-根据上下文选择正确答案。
-
-Q：小华在丙申年出生，他的哥哥比他大6岁，所以他的哥哥出生在哪一年？
-选项：
-(A) 己卯年
-(B) 庚寅年
-(C) 丙申年
-(D) 乙丑年
-A：让我们一步一步来思考。
-小华在丙申年出生，他的哥哥比他大6岁，按照中国天干地支纪年法，丙申年之前的第六年是庚寅年，所以他的哥哥在庚寅年出生。所以答案是(B)。
-
-Q：如果今年是甲辰年，李华的爷爷说：“今年是我的知天命之年”，请问赵婷爷爷的属相是什么？
-选项：
-(A) 狗
-(B) 虎
-(C) 鸡
-(D) 鼠
-A：让我们一步一步来思考。
-题目中提到爷爷是“知天命之年”，中国古代“知天命之年”代指50岁，所以爷爷今年50岁，今年是甲辰年，根据中国天干地支纪年法推算，爷爷出生那年是甲寅年，甲寅年是虎年,所以爷爷属虎。所以答案是(B)。
-
-Q：小李在亥时三刻开始制作画画，他知道他需要10个小时才能完成。那么他最早在什么时辰可以完成？
-选项：
- (A) 辰时
- (B) 卯时
- (C) 午时
- (D) 未时
-A：让我们一步一步来思考。
-按照中国古代计时方法，亥时三刻大约指的是21点45分，10个小时后是7点45分，是辰时。所以答案是(A)。
--- a/configs/datasets/CHARM/few-shot-examples/Global_Anachronisms_Judgment_Direct.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Anachronisms_Judgment_Direct.txt
@ -1,22 +0,0 @@
-识别给定陈述是否包含时代错误。
-
-Q：以下陈述是否包含时代错误，一个接受了义务教育、具备基本常识的人会如何回答？
-在硫磺岛登陆作战期间，拉尔夫大声对着收音机说话。
-选项：
-(A) 是
-(B) 否
-A：(B)
-
-Q：以下陈述是否包含时代错误，一个接受了义务教育、具备基本常识的人会如何回答？
-在硫磺岛登陆作战期间，拉尔夫大声对着他的 iPhone 说话。
-选项：
-(A) 是
-(B) 否
-A：(A)
-
-Q：以下陈述是否包含时代错误，一个接受了义务教育、具备基本常识的人会如何回答？
-没有什么比萨莉·海明斯边看 HBO 的《真探》边织毛衣更令人满足。
-选项：
-(A) 是
-(B) 否
-A：(A)
--- a/configs/datasets/CHARM/few-shot-examples/Global_Anachronisms_Judgment_EN-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Anachronisms_Judgment_EN-CoT.txt
@ -1,25 +0,0 @@
-识别给定陈述是否包含时代错误。
-
-Q：以下陈述是否包含时代错误，一个接受了义务教育、具备基本常识的人会如何回答？
-在硫磺岛登陆作战期间，拉尔夫大声对着收音机说话。
-选项：
-(A) 是
-(B) 否
-A: Let's think step by step.
-The statement mentions “the Allied bombardment of the beaches of Iwo Jima,” which refers to a historical event during World War II. The use of radios for communication among military personnel during that time is accurate and appropriate. So the answer is (B).
-
-Q：以下陈述是否包含时代错误，一个接受了义务教育、具备基本常识的人会如何回答？
-在硫磺岛登陆作战期间，拉尔夫大声对着他的 iPhone 说话。
-选项：
-(A) 是
-(B) 否
-A: Let's think step by step.
-The statement mentions “the Allied bombardment of the beaches of Iwo Jima,” which refers to a historical event during World War II. However, the mention of Ralph speaking loudly into his iPhone introduces an anachronism.The iPhone is a modern-day smartphone that was not available during the time of the Allied bombardment of Iwo Jima in 1945. So the answer is (A).
-
-Q：以下陈述是否包含时代错误，一个接受了义务教育、具备基本常识的人会如何回答？
-没有什么比萨莉·海明斯边看 HBO 的《真探》边织毛衣更令人满足。
-选项：
-(A) 是
-(B) 否
-A: Let's think step by step.
-The statement mentions Sally Hemings, who was an enslaved woman in the United States during the late 18th and early 19th centuries. However, the mention of watching HBO’s True Detective, which is a modern television show, introduces an anachronism. During Sally Hemings’ time, television did not exist, and the specific mention of watching a specific show like True Detective is clearly out of place for that historical period. So the answer is (A).
--- a/configs/datasets/CHARM/few-shot-examples/Global_Anachronisms_Judgment_XLT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Anachronisms_Judgment_XLT.txt
@ -1,61 +0,0 @@
-识别给定陈述是否包含时代错误。
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：以下陈述是否包含时代错误，一个接受了义务教育、具备基本常识的人会如何回答？
-在硫磺岛登陆作战期间，拉尔夫大声对着收音机说话。
-选项：
-(A) 是
-(B) 否
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request：How would a person with compulsory education and basic common sense answer whether the following statement contains an anachronism?
-During the landing operations on Iwo Jima, Ralph spoke loudly into the radio.
-Options:
-(A) Yes
-(B) No
-Step-by-step answer:
-1.The statement mentions “the Allied bombardment of the beaches of Iwo Jima,” which refers to a historical event during World War II.
-2.The use of radios for communication among military personnel during that time is accurate and appropriate.
-So the answer is (B).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：以下陈述是否包含时代错误，一个接受了义务教育、具备基本常识的人会如何回答？
-在硫磺岛登陆作战期间，拉尔夫大声对着他的 iPhone 说话。
-选项：
-(A) 是
-(B) 否
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request：How would a person with compulsory education and basic common sense answer whether the following statement contains an anachronism?
-During the landing operations on Iwo Jima, Ralph spoke loudly into the radio.
-Options:
-(A) Yes
-(B) No
-Step-by-step answer:
-1.The statement mentions “the Allied bombardment of the beaches of Iwo Jima,” which refers to a historical event during World War II.
-2.However, the mention of Ralph speaking loudly into his iPhone introduces an anachronism.The iPhone is a modern-day smartphone that was not available during the time of the Allied bombardment of Iwo Jima in 1945.
-So the answer is (A).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：以下陈述是否包含时代错误，一个接受了义务教育、具备基本常识的人会如何回答？
-没有什么比萨莉·海明斯边看 HBO 的《真探》边织毛衣更令人满足。
-选项：
-(A) 是
-(B) 否
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request：How would a person with compulsory education and basic common sense answer whether the following statement contains an anachronism?
-During the landing operations on Iwo Jima, Ralph spoke loudly into the radio.
-Options:
-(A) Yes
-(B) No
-Step-by-step answer:
-1.The statement mentions Sally Hemings, who was an enslaved woman in the United States during the late 18th and early 19th centuries.
-2.However, the mention of watching HBO’s True Detective, which is a modern television show, introduces an anachronism. During Sally Hemings’ time, television did not exist, and the specific mention of watching a specific show like True Detective is clearly out of place for that historical period.
-So the answer is (A).
--- a/configs/datasets/CHARM/few-shot-examples/Global_Anachronisms_Judgment_ZH-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Anachronisms_Judgment_ZH-CoT.txt
@ -1,25 +0,0 @@
-识别给定陈述是否包含时代错误。
-
-Q：以下陈述是否包含时代错误，一个接受了义务教育、具备基本常识的人会如何回答？
-在硫磺岛登陆作战期间，拉尔夫大声对着收音机说话。
-选项：
-(A) 是
-(B) 否
-A：让我们一步一步来思考。
-这个陈述提到了“硫磺岛登陆作战期间”，这是指二战期间的一个历史事件。在当时，军事人员之间使用收音机进行通信是准确和恰当的。所以答案是(B)。
-
-Q：以下陈述是否包含时代错误，一个接受了义务教育、具备基本常识的人会如何回答？
-在硫磺岛登陆作战期间，拉尔夫大声对着他的 iPhone 说话。
-选项：
-(A) 是
-(B) 否
-A：让我们一步一步来思考。
-这个陈述提到了“硫磺岛登陆作战期间”，这是指二战期间的一个历史事件。然而，提到拉尔夫大声对着他的 iPhone 说话引入了一个时代错误。iPhone 是现代的智能手机，在 1945 年硫磺岛登陆作战期间是不存在的。所以答案是(A)。
-
-Q：以下陈述是否包含时代错误，一个接受了义务教育、具备基本常识的人会如何回答？
-没有什么比萨莉·海明斯边看 HBO 的《真探》边织毛衣更令人满足。
-选项：
-(A) 是
-(B) 否
-A：让我们一步一步来思考。
-这个陈述提到了萨莉·海明斯，她是 18 世纪末到 19 世纪初美国的一个被奴役的女性。然而，提到她边看 HBO 的《真探》边织毛衣引入了一个时代错误。在萨莉·海明斯所处的时代，电视是不存在的，而且具体提到观看像《真探》这样的特定节目在那个历史时期显然是不合适的。所以答案是(A)。
--- a/configs/datasets/CHARM/few-shot-examples/Global_Movie_and_Music_Recommendation_Direct.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Movie_and_Music_Recommendation_Direct.txt
@ -1,25 +0,0 @@
-给根据给定艺术作品清单，找出最类似的。
-
-Q: 寻找一部与《勇敢的心》、《风月俏佳人》、《辛德勒的名单》、《阿波罗13号》类似的电影：
-选项：
-(A)《星际迷航》
-(B)《我盛大的希腊婚礼2》
-(C)《圣诞老人2》
-(D)《与狼共舞》
-A: (D)
-
-Q: 寻找一部与《勇敢的心》、《风月俏佳人》、《阿波罗13号》、《与狼共舞》类似的电影：
-选项：
-(A)《蝙蝠侠：突袭阿卡姆》
-(B)《肖申克的救赎》
-(C)《玩具总动员》
-(D)《狮子王》
-A: (B)
-
-Q: 寻找一部与《惊世骇案》、《勇敢的心》、《低俗小说》、《辛德勒的名单》类似的电影：
-选项：
-(A)《卡里加里博士的小屋》
-(B)《肖申克的救赎》
-(C)《蜘蛛侠2》
-(D)《出租车》
-A: (B)
--- a/configs/datasets/CHARM/few-shot-examples/Global_Movie_and_Music_Recommendation_EN-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Movie_and_Music_Recommendation_EN-CoT.txt
@ -1,40 +0,0 @@
-给根据给定艺术作品清单，找出最类似的。
-
-Q: 寻找一部与《勇敢的心》、《风月俏佳人》、《辛德勒的名单》、《阿波罗13号》类似的电影：
-选项：
-(A)《星际迷航》
-(B)《我盛大的希腊婚礼2》
-(C)《圣诞老人2》
-(D)《与狼共舞》
-A: Let's think step by step.
-Star Trek is a science fiction film that, despite its depth and complexity, has significant differences in theme and style from the four aforementioned films.
-My Grand Greek Wedding 2 is a light hearted comedy film that differs significantly from the themes and styles of the four aforementioned films.
-Santa Claus 2 is a family movie with a Christmas theme, which differs significantly from the themes and styles of the four aforementioned movies.
-"Dancing with Wolves" is a film that depicts the relationship between Native Americans and the American West during its pioneering period
-The theme and style of "Dancing with Wolves" are consistent with the four films mentioned above. This movie, like Brave Heart, Pretty Woman, Schindler's List, and Apollo 13, is a historical film with depth and seriousness. So the answer is (D).
-
-Q: 寻找一部与《勇敢的心》、《风月俏佳人》、《阿波罗13号》、《与狼共舞》类似的电影：
-选项：
-(A)《蝙蝠侠：突袭阿卡姆》
-(B)《肖申克的救赎》
-(C)《玩具总动员》
-(D)《狮子王》
-A: Let's think step by step.
-Batman: Assault on Arkham is a superhero film with significant differences in theme and style from the four aforementioned films.
-Shawshank Redemption is a 1994 American drama film directed by Frank Delabond and starring Tim Robbins and Morgan Freeman. It is a film about hope and perseverance.
-Toy Story is an animated film, although it may have some themes of adventure and friendship, its themes and style differ significantly from the four aforementioned films.
-Although Lion King is a classic animated film that covers themes of courage and growth, its themes and style differ significantly from the four aforementioned films.
-The Shawshank Redemption, like Brave Heart, Pretty Woman, Apollo 13, and Dancing with Wolves, is a film with depth and seriousness, and its theme and style are similar to the other three films. So the answer is (B).
-
-Q: 寻找一部与《惊世骇案》、《勇敢的心》、《低俗小说》、《辛德勒的名单》类似的电影：
-选项：
-(A)《卡里加里博士的小屋》
-(B)《肖申克的救赎》
-(C)《蜘蛛侠2》
-(D)《出租车》
-A: Let's think step by step.
-"Dr. Caligary's Cabin" is a 1920 German expressionist silent film directed by Robert Wiener. This film is often considered a milestone in German expressionist cinema and one of the earliest horror films.
-Shawshank Redemption is a 1994 American drama film directed by Frank Delabond and starring Tim Robbins and Morgan Freeman. It is a film about hope and perseverance.
-Spider Man 2 is a 2004 American superhero film directed by Sam Remy and starring Toby Maguire. It is the second installment of the Spider Man trilogy.
-"Taxi" is a 2004 American comedy action film directed by Tim Storey, starring Jimmy Flanders and Quentin Latafa. This movie is an American remake of a 1998 French film.
-And the titles of "The Amazing Case", "Brave Heart", "Pulp Fiction", and "Schindler's List" are all very profound, plot rich, and have strong human themes in movies. They have all won high praise from audiences and critics for their excellent scripts, brilliant performances, and profound themes. The Shawshank Redemption tells the story of a wrongly accused banker who maintains hope in prison and ultimately escapes. The plot of this movie is deeply ingrained in people's hearts, with a profound portrayal of human nature, and there are many similarities with the movie in the title. So the answer is (B).
--- a/configs/datasets/CHARM/few-shot-examples/Global_Movie_and_Music_Recommendation_XLT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Movie_and_Music_Recommendation_XLT.txt
@ -1,76 +0,0 @@
-给根据给定艺术作品清单，找出最类似的。
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：寻找一部与《勇敢的心》、《风月俏佳人》、《辛德勒的名单》、《阿波罗13号》类似的电影：
-选项：
-(A)《星际迷航》
-(B)《我盛大的希腊婚礼2》
-(C)《圣诞老人2》
-(D)《与狼共舞》
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request：Find a movie similar to Braveheart, Pretty Woman, Schindler's List, Apollo 13:
-Options:
-(A) Star Trek
-(B) My Big Fat Greek Wedding 2
-(C) The Santa Clause 2
-(D) Dances with Wolves
-Step-by-step answer:
-1.Star Trek is a science fiction film that, despite its depth and complexity, has significant differences in theme and style from the four aforementioned films.
-2.My Big Fat Greek Wedding 2 is a light hearted comedy film that differs significantly from the themes and styles of the four aforementioned films.
-3.Santa Claus 2 is a family movie with a Christmas theme, which differs significantly from the themes and styles of the four aforementioned movies.
-4.Dancing with Wolves is a film that depicts the relationship between Native Americans and the American West during its pioneering period
-5.The theme and style of "Dancing with Wolves" are consistent with the four films mentioned above. This movie, like Brave Heart, Pretty Woman, Schindler's List, and Apollo 13, is a historical film with depth and seriousness.
-So the answer is (D).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：寻找一部与《勇敢的心》、《风月俏佳人》、《阿波罗13号》、《与狼共舞》类似的电影：
-选项：
-(A)《蝙蝠侠：突袭阿卡姆》
-(B)《肖申克的救赎》
-(C)《玩具总动员》
-(D)《狮子王》
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request：Find a movie similar to Braveheart, Pretty Woman, Apollo 13, Dances with Wolves:
-Options:
-(A) Batman Assault on Arkham
-(B) The Shawshank Redemption
-(C) Toy Story
-(D) The Lion King
-Step-by-step answer:
-1.Batman: Assault on Arkham is a superhero film with significant differences in theme and style from the four aforementioned films.
-2.Shawshank Redemption is a 1994 American drama film directed by Frank Delabond and starring Tim Robbins and Morgan Freeman. It is a film about hope and perseverance.
-3.Toy Story is an animated film, although it may have some themes of adventure and friendship, its themes and style differ significantly from the four aforementioned films.
-4.Although Lion King is a classic animated film that covers themes of courage and growth, its themes and style differ significantly from the four aforementioned films.
-5.The Shawshank Redemption, like Brave Heart, Pretty Woman, Apollo 13, and Dancing with Wolves, is a film with depth and seriousness, and its theme and style are similar to the other three films.
-So the answer is (B).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：寻找一部与《惊世骇案》、《勇敢的心》、《低俗小说》、《辛德勒的名单》类似的电影：
-选项：
-(A)《卡里加里博士的小屋》
-(B)《肖申克的救赎》
-(C)《蜘蛛侠2》
-(D)《出租车》
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request：Find a movie similar to The Usual Suspects, Braveheart, Pulp Fiction, Schindler's List:
-Options:
-(A) The Cabinet of Dr Caligari
-(B) The Shawshank Redemption
-(C) Spider-Man 2
-(D) Taxi
-Step-by-step answer:
-1."Dr. Caligary's Cabin" is a 1920 German expressionist silent film directed by Robert Wiener. This film is often considered a milestone in German expressionist cinema and one of the earliest horror films.
-2.Shawshank Redemption is a 1994 American drama film directed by Frank Delabond and starring Tim Robbins and Morgan Freeman. It is a film about hope and perseverance.
-3.Spider Man 2 is a 2004 American superhero film directed by Sam Remy and starring Toby Maguire. It is the second installment of the Spider Man trilogy.
-4."Taxi" is a 2004 American comedy action film directed by Tim Storey, starring Jimmy Flanders and Quentin Latafa. This movie is an American remake of a 1998 French film.
-5.And the titles of "The Amazing Case", "Brave Heart", "Pulp Fiction", and "Schindler's List" are all very profound, plot rich, and have strong human themes in movies. They have all won high praise from audiences and critics for their excellent scripts, brilliant performances, and profound themes. The Shawshank Redemption tells the story of a wrongly accused banker who maintains hope in prison and ultimately escapes. The plot of this movie is deeply ingrained in people's hearts, with a profound portrayal of human nature, and there are many similarities with the movie in the title.
-So the answer is (B).
--- a/configs/datasets/CHARM/few-shot-examples/Global_Movie_and_Music_Recommendation_ZH-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Movie_and_Music_Recommendation_ZH-CoT.txt
@ -1,40 +0,0 @@
-给根据给定艺术作品清单，找出最类似的。
-
-Q: 寻找一部与《勇敢的心》、《风月俏佳人》、《辛德勒的名单》、《阿波罗13号》类似的电影：
-选项：
-(A)《星际迷航》
-(B)《我盛大的希腊婚礼2》
-(C)《圣诞老人2》
-(D)《与狼共舞》
-A: 让我们一步一步来思考。
-《星际迷航》是一部科幻电影，虽然它具有深度和复杂性，但它的主题和风格与上述四部电影有较大的差异。
-《我盛大的希腊婚礼2》是一部轻松的喜剧电影，与上述四部电影的主题和风格有较大的差异。
-《圣诞老人2》是一部圣诞主题的家庭电影，与上述四部电影的主题和风格也有较大的差异。
-《与狼共舞》是一部描绘美国西部开拓时期与原住民关系的电影.
-其中《与狼共舞》的主题和风格与上述四部电影相符。这部电影与《勇敢的心》、《风月俏佳人》、《辛德勒的名单》、《阿波罗13号》一样，都是历史题材的电影，具有深度和严肃性。所以答案是(D)。
-
-Q: 寻找一部与《勇敢的心》、《风月俏佳人》、《阿波罗13号》、《与狼共舞》类似的电影：
-选项：
-(A)《蝙蝠侠：突袭阿卡姆》
-(B)《肖申克的救赎》
-(C)《玩具总动员》
-(D)《狮子王》
-A: 让我们一步一步来思考。
-《蝙蝠侠：突袭阿卡姆》是一部超级英雄电影，它的主题和风格与上述四部电影有较大的差异。
-《肖申克的救赎》是一部1994年的美国剧情电影，由弗兰克·德拉邦特执导，蒂姆·罗宾斯和摩根·弗里曼主演，这是一部讲述希望和毅力的电影。
-《玩具总动员》是一部动画电影，虽然它可能具有一些冒险和友谊的主题，但其主题和风格与上述四部电影有较大的差异。
-《狮子王》虽然是一部经典的动画电影，涵盖了勇气和成长的主题，但其主题和风格与上述四部电影有较大的差异。
-其中《肖申克的救赎》这部电影与《勇敢的心》、《风月俏佳人》、《阿波罗13号》、《与狼共舞》一样，都是具有深度和严肃性的电影，并且主题和风格与其他三部电影相似。所以答案是(B)。
-
-Q: 寻找一部与《惊世骇案》、《勇敢的心》、《低俗小说》、《辛德勒的名单》类似的电影：
-选项：
-(A)《卡里加里博士的小屋》
-(B)《肖申克的救赎》
-(C)《蜘蛛侠2》
-(D)《出租车》
-A: 让我们一步一步来思考。
-《卡里加里博士的小屋》是一部1920年的德国表现主义默片，由罗伯特·维内执导。这部电影通常被认为是德国表现主义电影的一部里程碑式作品，也是最早的恐怖电影之一。
-《肖申克的救赎》是一部1994年的美国剧情电影，由弗兰克·德拉邦特执导，蒂姆·罗宾斯和摩根·弗里曼主演，这是一部讲述希望和毅力的电影。
-《蜘蛛侠2》是一部2004年的美国超级英雄电影，由萨姆·雷米执导，托比·马奎尔主演，是《蜘蛛侠》三部曲的第二部。
-《出租车》这是一部2004年的美国喜剧动作片，由蒂姆·斯托瑞执导，吉米·福兰和昆汀·拉塔法主演。这部电影是1998年法国电影的美国翻拍版。
-而题目中《惊世骇案》、《勇敢的心》、《低俗小说》和《辛德勒的名单》都是一些非常深刻、情节丰富且具有强烈人性主题的电影。它们都以其出色的剧本、精彩的表演和深刻的主题赢得了观众和评论家的高度赞誉。选项中《肖申克的救赎》讲述了一名被冤枉的银行家如何在监狱中保持希望，并最终逃脱的故事。这部电影的情节深入人心，人性描绘深刻，与题目中的电影有许多相似之处。所以答案是(B)。
--- a/configs/datasets/CHARM/few-shot-examples/Global_Natural_Language_Inference_Direct.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Natural_Language_Inference_Direct.txt
@ -1,25 +0,0 @@
-请根据题目中两句话的关系选择正确答案。
-
-Q:语句一：可是老人小心翼翼将蛇挑开,让它爬向草丛,嘴里念念有词:罪过,罪过,这本来是你的家呀
-语句二：老人心里十分难过。
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A:(A)
-
-Q:语句一：她是一个有着丰满的脸、丰满的嘴唇和大牙齿的黑色爆炸头女人。
-语句二：她喜欢抹红色的口红，穿红色的衣服。
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A:(C)
-
-Q:语句一：你不确定你已经清楚你站着谁的一面。
-语句二：你支持谁，这一点显而易见。
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A:(B)
--- a/configs/datasets/CHARM/few-shot-examples/Global_Natural_Language_Inference_EN-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Natural_Language_Inference_EN-CoT.txt
@ -1,28 +0,0 @@
-请根据题目中两句话的关系选择正确答案。
-
-Q:语句一：可是老人小心翼翼将蛇挑开,让它爬向草丛,嘴里念念有词:罪过,罪过,这本来是你的家呀
-语句二：老人心里十分难过。
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A: Let's think step by step.
-The first sentence describes the process of an old man carefully handling a snake's movements. The old man says "sin, sin, sin," indicating that he feels guilty and sad for violating the snake's territory. The second sentence can be inferred, the old man is very sad in his heart. Therefore, the two sentences contain a relationship. So the answer is (A).
-
-Q:语句一：她是一个有着丰满的脸、丰满的嘴唇和大牙齿的黑色爆炸头女人。
-语句二：她喜欢抹红色的口红，穿红色的衣服。
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A: Let's think step by step.
-These two sentences both describe the same woman, but they focus on different characteristics. The first sentence describes her physical characteristics, including face, lips, teeth, and hairstyle. The second sentence describes her aesthetic preferences, including lipstick color and clothing color. These two sentences do not have any obvious implication or contradictory relationship, so we can say that they are unrelated. So the answer is (C).
-
-Q:语句一：你不确定你已经清楚你站着谁的一面。
-语句二：你支持谁，这一点显而易见。
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A: Let's think step by step.
-The first sentence indicates that you are not sure who you support, while the second sentence clearly indicates that your position is obvious, which means you are clear about who you support. Therefore, the content of these two sentences is contradictory to each other. So the answer is (B).
--- a/configs/datasets/CHARM/few-shot-examples/Global_Natural_Language_Inference_XLT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Natural_Language_Inference_XLT.txt
@ -1,69 +0,0 @@
-请根据题目中两句话的关系选择正确答案。
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：语句一：可是老人小心翼翼将蛇挑开,让它爬向草丛,嘴里念念有词:罪过,罪过,这本来是你的家呀
-语句二：老人心里十分难过。
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request:Sentence 1：But the old man carefully picked the snake away from the grass, letting it crawl towards the grass, mumbling: sin, sin, this was your home!
-Sentence 2: The old man was very sad in his heart.
-What is the relationship between these two statements?
-(A) Contained
-(B) contradictory
-(C) Unrelated
-Step-by-step answer:
-1.The first sentence describes the process of an old man carefully handling a snake's movements.
-2.The old man says "sin, sin, sin," indicating that he feels guilty and sad for violating the snake's territory.
-3.The second sentence can be inferred, the old man is very sad in his heart. Therefore, the two sentences contain a relationship.
-So the answer is (A).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：语句一：她是一个有着丰满的脸、丰满的嘴唇和大牙齿的黑色爆炸头女人。
-语句二：她喜欢抹红色的口红，穿红色的衣服。
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request:Sentence 1：Sentence 1: She was a black explosive-headed woman with a plump face, full lips and big teeth.
-Sentence 2: She likes to wear red lipstick and red clothes.
-What is the relationship between these two statements?
-(A) implied
-(B) contradictory
-(C) Unrelated
-Step-by-step answer:
-1.These two sentences both describe the same woman, but they focus on different characteristics.
-2.The first sentence describes her physical characteristics, including face, lips, teeth, and hairstyle.
-3.The second sentence describes her aesthetic preferences, including lipstick color and clothing color. These two sentences do not have any obvious implication or contradictory relationship, so we can say that they are unrelated.
-So the answer is (C).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：语句一：你不确定你已经清楚你站着谁的一面。
-语句二：你支持谁，这一点显而易见。
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request:Sentence 1: You're not sure you've made it clear whose side you're on.
-Sentence 2: It is obvious whose side you are on.
-What is the relationship between these two statements?
-(A) Contained
-(B) Contradictory
-(C) Unrelated
-Step-by-step answer:
-1.The first sentence indicates that you are not sure who you support, while the second sentence clearly indicates that your position is obvious, which means you are clear about who you support.
-2.Therefore, the content of these two sentences is contradictory to each other.
-So the answer is (B).
--- a/configs/datasets/CHARM/few-shot-examples/Global_Natural_Language_Inference_ZH-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Natural_Language_Inference_ZH-CoT.txt
@ -1,28 +0,0 @@
-请根据题目中两句话的关系选择正确答案。
-
-Q:语句一：可是老人小心翼翼将蛇挑开,让它爬向草丛,嘴里念念有词:罪过,罪过,这本来是你的家呀
-语句二：老人心里十分难过。
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A:让我们一步一步来思考。
-第一句话描述了个老人在小心地处理一条蛇的行的过程，老人说道“罪过，罪过”，说明他因为侵犯了蛇的领地而感到内疚和难过。可以推出第二句话，老人心里十分难过。因此两句话是蕴含关系。所以答案是(A)。
-
-Q:语句一：她是一个有着丰满的脸、丰满的嘴唇和大牙齿的黑色爆炸头女人。
-语句二：她喜欢抹红色的口红，穿红色的衣服。
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A:让我们一步一步来思考。
-这两句话都在描述同一个女人，但是它们关注的特征是不同的。第一句话描述了她的身体特征，包括脸、嘴唇、牙齿和发型。第二句话描述了她的审美喜好，包括口红颜色和衣服颜色。这两句话并没有明显的蕴含或矛盾关系，因此，我们可以说这两句话是无关的。所以答案是(C)。
-
-Q:语句一：你不确定你已经清楚你站着谁的一面。
-语句二：你支持谁，这一点显而易见。
-请问这两句话什么关系？
-(A) 蕴含
-(B) 矛盾
-(C) 无关
-A:让我们一步一步来思考。
-第一句话表示你不确定你支持谁，而第二句话则明确表明你的立场是显而易见的，也就是说你清楚自己支持谁。因此，这两句话的内容是相互矛盾的。所以答案是(B)。
--- a/configs/datasets/CHARM/few-shot-examples/Global_Reading_Comprehension_Direct.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Reading_Comprehension_Direct.txt
@ -1,22 +0,0 @@
-请理解题目含义并选择正确答案。
-
-Q:一个人摄入的精制糖和在消化过程中分解成糖的食物几乎是该人所有葡萄糖（一种糖）的饮食来源的血液。然而, 咖啡在消化过程中本身不会分解成糖, 有时会导致人的血糖水平急剧上升, 即使咖啡没有加奶油或任何甜味剂。 以下哪项，如果为真，最有助于解释咖啡对如上所述的血糖水平的影响？ 选项：
-(A) 久坐办公室工作的人更有可能喝大量咖啡和比从事需要持续体育锻炼的工作的人血糖水平更高。
-(B) 人们经常在饭后喝咖啡，这些食物由几种食物组成，这些食物在消化过程中会迅速分解成糖分。
-(C) 对许多人来说，一块巧克力蛋糕的消耗量与一杯普通咖啡相同的刺激效果。
-(D) 一小时内喝两杯以上的咖啡会增加人的压力水平，身体通过将储存的葡萄糖释放到血液中来应对压力。
-A:(D)
-
-Q:某人因为心理疾病尝试了几种不同的心理疗法:精神分析疗法,认知行为疗法以及沙盘游戏疗法.他说:"心理治疗过程让我非常不快乐,因此,这些疗法是无效的." 以下哪项如果为真,将最有力质疑上述的结论?? 选项：
-(A) 几种不同心理疗法所针对的心理疾病是不同的
-(B) 尝试多种心理疗法的人要比只尝试一种疗法的人快乐
-(C) 同时尝试不同心理疗法能够更容易找到可以起作用的方法
-(D) 治疗效果好的人在治疗过程中往往感觉不快乐
-A:(D)
-
-Q:一本书要成为畅销书,必须有可读性或者经过精心的包装. 如果上述断定成立,则以下哪项一定为真？? 选项：
-(A) 没有可读性又没有精心包装的书一定不是畅销书
-(B) 没经过精心包装的书一定不是畅销书
-(C) 有可读性的书一定是畅销书
-(D) 大多数人喜欢有可读性的畅销书
-A:(A)
--- a/configs/datasets/CHARM/few-shot-examples/Global_Reading_Comprehension_EN-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Reading_Comprehension_EN-CoT.txt
@ -1,25 +0,0 @@
-请理解题目含义并选择正确答案。
-
-Q:一个人摄入的精制糖和在消化过程中分解成糖的食物几乎是该人所有葡萄糖（一种糖）的饮食来源的血液。然而, 咖啡在消化过程中本身不会分解成糖, 有时会导致人的血糖水平急剧上升, 即使咖啡没有加奶油或任何甜味剂。 以下哪项，如果为真，最有助于解释咖啡对如上所述的血糖水平的影响？ 选项：
-(A) 久坐办公室工作的人更有可能喝大量咖啡和比从事需要持续体育锻炼的工作的人血糖水平更高。
-(B) 人们经常在饭后喝咖啡，这些食物由几种食物组成，这些食物在消化过程中会迅速分解成糖分。
-(C) 对许多人来说，一块巧克力蛋糕的消耗量与一杯普通咖啡相同的刺激效果。
-(D) 一小时内喝两杯以上的咖啡会增加人的压力水平，身体通过将储存的葡萄糖释放到血液中来应对压力。
-A: Let's think step by step
-The key to the question is to identify a factor that can explain how coffee causes an increase in blood sugar levels, even though coffee itself does not break down into sugar during digestion. This requires us to find a mechanism that can explain how coffee indirectly affects blood sugar levels. Option (D) "Drinking two or more cups of coffee within an hour can increase a person's stress level, and the body responds to stress by releasing stored glucose into the bloodstream." If true, it is most helpful in explaining the impact of coffee on blood sugar levels. This option provides a possible mechanism that the stress response caused by coffee may lead to the release of stored glucose by the body, resulting in an increase in blood sugar levels. This explains why coffee (even without cream or sweeteners) can lead to an increase in blood sugar levels, even if it does not break down into sugar during digestion. So the answer is (D).
-
-Q:某人因为心理疾病尝试了几种不同的心理疗法:精神分析疗法,认知行为疗法以及沙盘游戏疗法.他说:"心理治疗过程让我非常不快乐,因此,这些疗法是无效的." 以下哪项如果为真,将最有力质疑上述的结论?? 选项：
-(A) 几种不同心理疗法所针对的心理疾病是不同的
-(B) 尝试多种心理疗法的人要比只尝试一种疗法的人快乐
-(C) 同时尝试不同心理疗法能够更容易找到可以起作用的方法
-(D) 治疗效果好的人在治疗过程中往往感觉不快乐
-A: Let's think step by step
-By understanding the question, it can be inferred that if unhappiness during psychotherapy is a sign of good treatment outcomes, then the person's conclusion that the therapy is ineffective because they feel unhappy during the treatment may be incorrect. He may have misunderstood that the difficulties and discomfort during the treatment process are manifestations of ineffective treatment, but in reality, this may be a sign that treatment is working. (D) If true, it would strongly question the above conclusion that people with good treatment outcomes often feel unhappy during the treatment process. So the answer is (D).
-
-Q:一本书要成为畅销书,必须有可读性或者经过精心的包装. 如果上述断定成立,则以下哪项一定为真？选项：
-(A) 没有可读性又没有精心包装的书一定不是畅销书
-(B) 没经过精心包装的书一定不是畅销书
-(C) 有可读性的书一定是畅销书
-(D) 大多数人喜欢有可读性的畅销书
-A: Let's think step by step
-According to the above conclusion, "To become a bestseller, a book must have readability or be carefully packaged." Therefore, (A) "A book without readability or careful packaging is definitely not a bestseller.". Because if a book is neither readable nor carefully packaged, it does not meet the conditions to become a bestseller. So the answer is (A).
--- a/configs/datasets/CHARM/few-shot-examples/Global_Reading_Comprehension_XLT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Reading_Comprehension_XLT.txt
@ -1,61 +0,0 @@
-请理解题目含义并选择正确答案。
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：一个人摄入的精制糖和在消化过程中分解成糖的食物几乎是该人所有葡萄糖（一种糖）的饮食来源的血液。然而, 咖啡在消化过程中本身不会分解成糖, 有时会导致人的血糖水平急剧上升, 即使咖啡没有加奶油或任何甜味剂。 以下哪项，如果为真，最有助于解释咖啡对如上所述的血糖水平的影响？ 选项：
-(A) 久坐办公室工作的人更有可能喝大量咖啡和比从事需要持续体育锻炼的工作的人血糖水平更高。
-(B) 人们经常在饭后喝咖啡，这些食物由几种食物组成，这些食物在消化过程中会迅速分解成糖分。
-(C) 对许多人来说，一块巧克力蛋糕的消耗量与一杯普通咖啡相同的刺激效果。
-(D) 一小时内喝两杯以上的咖啡会增加人的压力水平，身体通过将储存的葡萄糖释放到血液中来应对压力。
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: A person's intake of refined sugar and foods that break down into sugar during digestion is almost all of that person's dietary source of glucose (a type of sugar) in the bloodstream. However, coffee itself does not break down into sugar during digestion, which can sometimes cause a person's blood sugar levels to rise sharply, even if the coffee is not sweetened with cream or any sweetener. Which of the following, if true, would best help explain the effect of coffee on blood sugar levels as described above? Options:
-(A) People with sedentary office jobs are more likely to drink large amounts of coffee and to have higher blood sugar levels than people with jobs that require constant physical activity.
-(B) People often drink coffee with meals that consist of several foods that break down quickly into sugar during digestion.
-(C) For many people, a slice of chocolate cake consumes the same stimulating effect as a regular cup of coffee.
-(D) Drinking more than two cups of coffee in an hour increases a person's stress level, and the body responds to stress by releasing stored glucose into the bloodstream.
-Step-by-step answer:
-1.The key to the question is to identify a factor that can explain how coffee causes an increase in blood sugar levels, even though coffee itself does not break down into sugar during digestion.
-2.This requires us to find a mechanism that can explain how coffee indirectly affects blood sugar levels. Option (D) "Drinking two or more cups of coffee within an hour can increase a person's stress level, and the body responds to stress by releasing stored glucose into the bloodstream." If true, it is most helpful in explaining the impact of coffee on blood sugar levels. This option provides a possible mechanism that the stress response caused by coffee may lead to the release of stored glucose by the body, resulting in an increase in blood sugar levels.
-3.This explains why coffee (even without cream or sweeteners) can lead to an increase in blood sugar levels, even if it does not break down into sugar during digestion.
-So the answer is (D).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：某人因为心理疾病尝试了几种不同的心理疗法:精神分析疗法,认知行为疗法以及沙盘游戏疗法.他说:"心理治疗过程让我非常不快乐,因此,这些疗法是无效的." 以下哪项如果为真,将最有力质疑上述的结论?? 选项：
-(A) 几种不同心理疗法所针对的心理疾病是不同的
-(B) 尝试多种心理疗法的人要比只尝试一种疗法的人快乐
-(C) 同时尝试不同心理疗法能够更容易找到可以起作用的方法
-(D) 治疗效果好的人在治疗过程中往往感觉不快乐
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: A person has tried several different psychotherapies for a mental illness:psychoanalytic therapy, cognitive behavioral therapy, and sandplay therapy. He says: "The psychotherapy process made me very unhappy and, therefore, these therapies are ineffective." Which of the following, if true, would most strongly challenge the above conclusion? Options:
-(A) Several different psychotherapies target different mental illnesses.
-(B) People who try multiple psychotherapies are happier than those who try only one.
-(C) Trying different psychotherapies at the same time makes it easier to find the one that works
-(D) People who do well in therapy tend to feel unhappy during the therapy process
-Step-by-step answer:
-1.By understanding the question, it can be inferred that if unhappiness during psychotherapy is a sign of good treatment outcomes, then the person's conclusion that the therapy is ineffective because they feel unhappy during the treatment may be incorrect.
-2.He may have misunderstood that the difficulties and discomfort during the treatment process are manifestations of ineffective treatment, but in reality, this may be a sign that treatment is working. (D) If true, it would strongly question the above conclusion that people with good treatment outcomes often feel unhappy during the treatment process.
-So the answer is (D).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：一本书要成为畅销书,必须有可读性或者经过精心的包装. 如果上述断定成立,则以下哪项一定为真？选项：
-(A) 没有可读性又没有精心包装的书一定不是畅销书
-(B) 没经过精心包装的书一定不是畅销书
-(C) 有可读性的书一定是畅销书
-(D) 大多数人喜欢有可读性的畅销书You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: For a book to be a bestseller,it must be readable or well packaged. If the above assertion holds true, which of the following must be true? Options:
-(A) A book that is not readable and not well-packaged is not a bestseller.
-(B) A book that is not well-packaged must not be a bestseller.
-(C) A book that is readable must be a bestseller
-(D) Most people like bestsellers that are readable.
-Step-by-step answer:
-1.According to the above conclusion, "To become a bestseller, a book must have readability or be carefully packaged." Therefore, (A) "A book without readability or careful packaging is definitely not a bestseller.".
-2.Because if a book is neither readable nor carefully packaged, it does not meet the conditions to become a bestseller.
-So the answer is (A).
--- a/configs/datasets/CHARM/few-shot-examples/Global_Reading_Comprehension_ZH-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Reading_Comprehension_ZH-CoT.txt
@ -1,25 +0,0 @@
-请理解题目含义并选择正确答案。
-
-Q:一个人摄入的精制糖和在消化过程中分解成糖的食物几乎是该人所有葡萄糖（一种糖）的饮食来源的血液。然而, 咖啡在消化过程中本身不会分解成糖, 有时会导致人的血糖水平急剧上升, 即使咖啡没有加奶油或任何甜味剂。 以下哪项，如果为真，最有助于解释咖啡对如上所述的血糖水平的影响？ 选项：
-(A) 久坐办公室工作的人更有可能喝大量咖啡和比从事需要持续体育锻炼的工作的人血糖水平更高。
-(B) 人们经常在饭后喝咖啡，这些食物由几种食物组成，这些食物在消化过程中会迅速分解成糖分。
-(C) 对许多人来说，一块巧克力蛋糕的消耗量与一杯普通咖啡相同的刺激效果。
-(D) 一小时内喝两杯以上的咖啡会增加人的压力水平，身体通过将储存的葡萄糖释放到血液中来应对压力。
-A:让我们一步一步来思考。
-题目的关键是要找出一个能够解释咖啡如何导致血糖水平上升的因素，尽管咖啡在消化过程中本身不会分解成糖。这就需要我们找到一个可以解释咖啡如何间接影响血糖水平的机制。选项 (D) "一小时内喝两杯以上的咖啡会增加人的压力水平，身体通过将储存的葡萄糖释放到血液中来应对压力" 如果为真，最有助于解释咖啡对血糖水平的影响。这个选项提供了一个可能的机制，即咖啡引起的压力反应可能导致身体释放储存的葡萄糖，从而导致血糖水平上升。这解释了为什么咖啡（即使没有加奶油或甜味剂）会导致血糖水平上升，即使它在消化过程中不会分解成糖。所以答案是(D)。
-
-Q:某人因为心理疾病尝试了几种不同的心理疗法:精神分析疗法,认知行为疗法以及沙盘游戏疗法.他说:"心理治疗过程让我非常不快乐,因此,这些疗法是无效的." 以下哪项如果为真,将最有力质疑上述的结论?? 选项：
-(A) 几种不同心理疗法所针对的心理疾病是不同的
-(B) 尝试多种心理疗法的人要比只尝试一种疗法的人快乐
-(C) 同时尝试不同心理疗法能够更容易找到可以起作用的方法
-(D) 治疗效果好的人在治疗过程中往往感觉不快乐
-A:让我们一步一步来思考。
-理解题目可知，如果心理治疗过程中的不快乐是治疗效果好的一个标志，那么这个人的结论——认为疗法无效，因为他在治疗过程中感到不快乐——就可能是错误的。他可能误解了治疗过程的困难和不适是治疗无效的表现，而实际上这可能是治疗正在起作用的标志。(D) "治疗效果好的人在治疗过程中往往感觉不快乐" 如果为真，将最有力质疑上述的结论。所以答案是(D)。
-
-Q:一本书要成为畅销书,必须有可读性或者经过精心的包装. 如果上述断定成立,则以下哪项一定为真？ 选项：
-(A) 没有可读性又没有精心包装的书一定不是畅销书
-(B) 没经过精心包装的书一定不是畅销书
-(C) 有可读性的书一定是畅销书
-(D) 大多数人喜欢有可读性的畅销书
-A:让我们一步一步来思考。
-根据上述断定，“一本书要成为畅销书,必须有可读性或者经过精心的包装”，所以 (A) "没有可读性又没有精心包装的书一定不是畅销书" 是一定为真的。因为如果一本书既没有可读性，也没有经过精心的包装，那么它就不满足成为畅销书的条件。所以答案是(A)。
--- a/configs/datasets/CHARM/few-shot-examples/Global_Sequence_Understanding_Direct.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Sequence_Understanding_Direct.txt
@ -1,22 +0,0 @@
-根据上下文选择正确答案
-
-Q:以下哪个列表按照时间顺序排列正确？选项：
-(A) 大学，硕士，博士，博士后
-(B) 大学，博士，硕士，博士后
-(C) 大学，高中，博士后，博士
-(D) 博士后，大学，硕士，博士
-A：(A)
-
-Q:以下哪个列表按照时间顺序排列正确？选项：
-(A) 五月一日，七月十五日，八月十二日，六月十四日
-(B) 七月十五日，八月十二日，五月一日，六月十四日
-(C) 八月十二日，五月一日，七月十五日，六月十四日
-(D) 五月一日，六月十四日，七月十五日，八月十二日
-A：(D)
-
-Q:以下哪个列表按时间顺序排列正确？选项：
-(A) 通往天国的倒计时，世纪末的魔术师，贝克街的亡灵，侦探们的镇魂歌
-(B) 世纪末的魔术师，贝克街的亡灵，通往天国的倒计时，侦探们的镇魂歌
-(C) 世纪末的魔术师，通往天国的倒计时，贝克街的亡灵，侦探们的镇魂歌
-(D) 世纪末的魔术师，贝克街的亡灵，侦探们的镇魂歌，通往天国的倒计时
-A：(C)
--- a/configs/datasets/CHARM/few-shot-examples/Global_Sequence_Understanding_EN-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Sequence_Understanding_EN-CoT.txt
@ -1,25 +0,0 @@
-根据上下文选择正确答案
-
-Q:以下哪个列表按照时间顺序排列正确？选项：
-(A) 大学，硕士，博士，博士后
-(B) 大学，博士，硕士，博士后
-(C) 大学，高中，博士后，博士
-(D) 博士后，大学，硕士，博士
-A: Let's think step by step.
-The title options introduce four educational concepts: university, master's degree, doctoral degree, and postdoctoral degree. They are ranked in the order of obtaining these degrees. The first one obtained is a university degree, followed by a master's degree, then a doctoral degree, and finally a postdoctoral degree. So the answer is (A).
-
-Q:以下哪个列表按照时间顺序排列正确？选项：
-(A) 五月一日，七月十五日，八月十二日，六月十四日
-(B) 七月十五日，八月十二日，五月一日，六月十四日
-(C) 八月十二日，五月一日，七月十五日，六月十四日
-(D) 五月一日，六月十四日，七月十五日，八月十二日
-A: Let's think step by step.
-The question options introduce four dates, sorted in chronological order of the year as May 1st, June 14th, July 15th, and August 12th. So the answer is (D).
-
-Q:以下哪个列表按时间顺序排列正确？选项：
-(A) 通往天国的倒计时，世纪末的魔术师，贝克街的亡灵，侦探们的镇魂歌
-(B) 世纪末的魔术师，贝克街的亡灵，通往天国的倒计时，侦探们的镇魂歌
-(C) 世纪末的魔术师，通往天国的倒计时，贝克街的亡灵，侦探们的镇魂歌
-(D) 世纪末的魔术师，贝克街的亡灵，侦探们的镇魂歌，通往天国的倒计时
-A: Let's think step by step.
-The options mention the names of four Conan theater versions, among which the Magician of the End of the Century was released in 1999, the Undead on Baker Street was released in 2002, the Countdown to Heaven was released in 2001, and the Souls of Detectives was released in 2006. Therefore, in chronological order, they are the Magician of the End of the Century, the Countdown to Heaven, the Undead on Baker Street, and the Souls of Detectives. So the answer is (C).
--- a/configs/datasets/CHARM/few-shot-examples/Global_Sequence_Understanding_XLT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Sequence_Understanding_XLT.txt
@ -1,60 +0,0 @@
-根据上下文选择正确答案
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：以下哪个列表按照时间顺序排列正确？选项：
-(A) 大学，硕士，博士，博士后
-(B) 大学，博士，硕士，博士后
-(C) 大学，高中，博士后，博士
-(D) 博士后，大学，硕士，博士
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request:Which of the following lists is correct in chronological order? Options:
-(A) University, M.A., Ph.D., Post-doctoral
-(B) University, Ph.D., M.Phil, Post Ph.D.
-(C) University, High School, Post-doctoral, Doctoral
-(D) Post-doctoral, University, Master's, PhD
-Step-by-step answer:
-1.The title options introduce four educational concepts: university, master's degree, doctoral degree, and postdoctoral degree.
-2.They are ranked in the order of obtaining these degrees. The first one obtained is a university degree, followed by a master's degree, then a doctoral degree, and finally a postdoctoral degree.
-So the answer is (A).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：以下哪个列表按照时间顺序排列正确？选项：
-(A) 五月一日，七月十五日，八月十二日，六月十四日
-(B) 七月十五日，八月十二日，五月一日，六月十四日
-(C) 八月十二日，五月一日，七月十五日，六月十四日
-(D) 五月一日，六月十四日，七月十五日，八月十二日
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request:Which of the following list is correct in chronological order? Options:
-(A) May 1, July 15, August 12, June 14
-(B) July 15, August 12, May 1, June 14
-(C) August 12, May 1, July 15, June 14
-(D) May 1, June 14, July 15, August 12
-Step-by-step answer:
-1.The question options introduce four dates, sorted in chronological order of the year as May 1st, June 14th, July 15th, and August 12th. 
-So the answer is (D).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：以下哪个列表按时间顺序排列正确？选项：
-(A) 通往天国的倒计时，世纪末的魔术师，贝克街的亡灵，侦探们的镇魂歌
-(B) 世纪末的魔术师，贝克街的亡灵，通往天国的倒计时，侦探们的镇魂歌
-(C) 世纪末的魔术师，通往天国的倒计时，贝克街的亡灵，侦探们的镇魂歌
-(D) 世纪末的魔术师，贝克街的亡灵，侦探们的镇魂歌，通往天国的倒计时
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request:Which of the following lists is correct in chronological order? Options:
-(A) Countdown to Heaven, Magicians at the End of the Century, The Undead of Baker Street, Detectives' Requiem
-(B) The Magician at the End of the Century, The Undead of Baker Street, Countdown to Heaven, The Detectives' Requiem
-(C) The Magician at the End of the Century, Countdown to Heaven, The Undead of Baker Street, The Detectives' Requiem
-(D) The Magician at the End of the Century, The Undead of Baker Street, The Detectives' Requiem, The Countdown to Heaven
-Step-by-step answer:
-1.The options mention the names of four Conan theater versions, among which the Magician of the End of the Century was released in 1999, the Undead on Baker Street was released in 2002, the Countdown to Heaven was released in 2001, and the Souls of Detectives was released in 2006. 
-2.Therefore, in chronological order, they are the Magician of the End of the Century, the Countdown to Heaven, the Undead on Baker Street, and the Souls of Detectives.
-So the answer is (C).
--- a/configs/datasets/CHARM/few-shot-examples/Global_Sequence_Understanding_ZH-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Sequence_Understanding_ZH-CoT.txt
@ -1,25 +0,0 @@
-根据上下文选择正确答案
-
-Q:以下哪个列表按照时间顺序排列正确？选项：
-(A) 大学，硕士，博士，博士后
-(B) 大学，博士，硕士，博士后
-(C) 大学，高中，博士后，博士
-(D) 博士后，大学，硕士，博士
-A：让我们一步一步来思考。
-题目选项中引入了四个学历概念“大学，硕士，博士，博士后”，按照取得这些学历的顺序进行排序，首先获得的是大学学历，其次是硕士，然后是博士，最后是博士后。所以答案是(A)。
-
-Q:以下哪个列表按照时间顺序排列正确？选项：
-(A) 五月一日，七月十五日，八月十二日，六月十四日
-(B) 七月十五日，八月十二日，五月一日，六月十四日
-(C) 八月十二日，五月一日，七月十五日，六月十四日
-(D) 五月一日，六月十四日，七月十五日，八月十二日
-A：让我们一步一步来思考。
-题目选项中引入了四个日期，按照一年中的时间顺序排序为五月一日，六月十四日，七月十五日，八月十二日。所以答案是(D)。
-
-Q:以下哪个列表按时间顺序排列正确？选项：
-(A) 通往天国的倒计时，世纪末的魔术师，贝克街的亡灵，侦探们的镇魂歌
-(B) 世纪末的魔术师，贝克街的亡灵，通往天国的倒计时，侦探们的镇魂歌
-(C) 世纪末的魔术师，通往天国的倒计时，贝克街的亡灵，侦探们的镇魂歌
-(D) 世纪末的魔术师，贝克街的亡灵，侦探们的镇魂歌，通往天国的倒计时
-A：让我们一步一步来思考。
-选项中提到了四个柯南剧场版的名称，其中世纪末的魔术师是1999年上映的，贝克街的亡灵是2002年上映的，通往天国的倒计时是2001年上映的，侦探们的镇魂歌是2006年上映的，因此按时间排序是世纪末的魔术师，通往天国的倒计时，贝克街的亡灵，侦探们的镇魂歌。所以答案是(C)。
--- a/configs/datasets/CHARM/few-shot-examples/Global_Sport_Understanding_Direct.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Sport_Understanding_Direct.txt
@ -1,19 +0,0 @@
-判断一句人为构造的关于体育的句子是否可信。
-
-Q: 下面的句子可信吗? "泰森打出右直拳"
-选项：
-(A) 可信
-(B) 不可信
-A: (A)
-
-Q: 下面的句子可信吗? "贝克汉姆完成了一次扣篮"
-选项：
-(A) 可信
-(B) 不可信
-A: (B)
-
-Q: 下面的句子可信吗? "勒布朗詹姆斯打进老鹰球"
-选项：
-(A) 可信
-(B) 不可信
-A: (B)
--- a/configs/datasets/CHARM/few-shot-examples/Global_Sport_Understanding_EN-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Sport_Understanding_EN-CoT.txt
@ -1,22 +0,0 @@
-判断一句人为构造的关于体育的句子是否可信。
-
-Q: 下面的句子可信吗? "泰森打出右直拳"
-选项：
-(A) 可信
-(B) 不可信
-A: Let's think step by step.
-Tyson is an American professional boxer. The right straight punch is a common move in boxing. It's quite normal for a boxer to throw a right straight punch, so this sentence is credible. The correct answer should be (A).
-
-Q: 下面的句子可信吗? "贝克汉姆完成了一次扣篮"
-选项：
-(A) 可信
-(B) 不可信
-A: Let's think step by step.
-Beckham is an English professional footballer. A slam dunk is a technical move in basketball. Therefore, this sentence is not credible. The correct answer should be (B).
-
-Q: 下面的句子可信吗? "勒布朗詹姆斯打进老鹰球"
-选项：
-(A) 可信
-(B) 不可信
-A: Let's think step by step.
-LeBron James is a well-known American professional basketball player. An eagle is a term in golf, meaning two strokes under par. Therefore, this sentence is not credible. The correct answer should be (B).
--- a/configs/datasets/CHARM/few-shot-examples/Global_Sport_Understanding_XLT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Sport_Understanding_XLT.txt
@ -1,57 +0,0 @@
-判断一句人为构造的关于体育的句子是否可信。
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：下面的句子可信吗? "泰森打出右直拳"
-选项：
-(A) 可信
-(B) 不可信
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: Is the following sentence plausible? "Tyson throws a straight right."
-Options:
-(A) Credible
-(B) Unbelievable
-Step-by-step answer:
-1.Tyson is a professional boxer from the American. The right straight punch is a common move in boxing.
-2.It's normal for a boxer to throw a right straight punch, so this sentence is credible.
-So the answer is (A).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：下面的句子可信吗? "贝克汉姆完成了一次扣篮"
-选项：
-(A) 可信
-(B) 不可信
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: Is the following sentence plausible? "Beckham completed a slam dunk."
-Options:
-(A) Credible
-(B) Unbelievable
-Step-by-step answer:
-1.Beckham is a professional footballer from England.
-2.A slam dunk is a technical move in basketball.
-3.Therefore, this sentence is not credible.
-So the answer is (B).
-
-I want you to act as a commonsense reasoning expert for Chinese.
-Request：下面的句子可信吗? "勒布朗詹姆斯打进老鹰球"
-选项：
-(A) 可信
-(B) 不可信
-You should retell the request in English.
-You should do the answer step by step to choose the right answer.
-You should step-by-step answer the request.
-You should tell me the answer in this format 'So the answer is'.
-Request: Is the following sentence believable? "LeBron James scored the Eagles' goal."
-Options:
-(A) Credible
-(B) Unbelievable
-Step-by-step answer:
-1.LeBron James is a well-known professional basketball player from the United States.
-2.An eagle is a term in golf, meaning two strokes under par.
-3.Therefore, this sentence is not credible.
-So the answer is (B).
--- a/configs/datasets/CHARM/few-shot-examples/Global_Sport_Understanding_ZH-CoT.txt
+++ b/configs/datasets/CHARM/few-shot-examples/Global_Sport_Understanding_ZH-CoT.txt
@ -1,22 +0,0 @@
-判断一句人为构造的关于体育的句子是否可信。
-
-Q: 下面的句子可信吗? "泰森打出右直拳"
-选项：
-(A) 可信
-(B) 不可信
-A: 让我们一步一步来思考。
-泰森是一位美国职业拳击手。右直拳是拳击运动中的常用招式。作为拳击手使出右直拳这样的拳击动作是很正常的，所以这个句子是可信的，正确答案应该是(A)。
-
-Q: 下面的句子可信吗? "贝克汉姆完成了一次扣篮"
-选项：
-(A) 可信
-(B) 不可信
-A: 让我们一步一步来思考。
-贝克汉姆是一位英格兰的职业足球运动员。扣篮是篮球运动的技术动作。所以这个句子是不可信的，正确答案应该是(B)。
-
-Q: 下面的句子可信吗? "勒布朗詹姆斯打进老鹰球"
-选项：
-(A) 可信
-(B) 不可信
-A: 让我们一步一步来思考。
-勒布朗詹姆斯是一位知名的美国职业篮球运动员，老鹰球是高尔夫球术语，就是比标准杆少了两杆的意思。所以这个句子是不可信的，正确答案应该是(B)。
--- a/Show More
+++ b/Show More