Calm dataset (#1385)

* Add CALM Dataset
2025-05-30 16:03:24 +08:00 · 2024-08-01 10:03:21 +08:00 · 2024-08-01 10:03:21 +08:00 · 07c96ac659
commit 07c96ac659
parent 46cc7894e1
72 changed files with 9931 additions and 0 deletions
--- a/configs/datasets/calm/README.md
+++ b/configs/datasets/calm/README.md
@ -0,0 +1,117 @@
+# CaLM Lite
+**CaLM Lite** is a lightweight version of CaLM.
+
+**Ca**usal evaluation of **L**anguage **M**odels (CaLM), to the best of our knowledge, is the first comprehensive benchmark for evaluating the causal reasoning capabilities of language models. The CaLM framework establishes a foundational taxonomy consisting of four modules: causal target (i.e., what to evaluate), adaptation (i.e., how to obtain the results), metric (i.e., how to measure the results), and error (i.e., how to analyze the bad results).
+
+<div align="center">
+
+[🌐 Website](https://opencausalab.github.io/CaLM) |
+[📃 Report](https://arxiv.org/abs/2405.00622) |[ 🎆 Github](https://github.com/OpenCausaLab/CaLM) | 📧 Welcome to join us by email at causalai@pjlab.org.cn
+</div>
+
+## Quick Start
+### Data Preparation
+Download dataset to data/ folder.
+```
+wget https://github.com/OpenCausaLab/CaLM/releases/download/v1.0.0.lite/calm.zip
+unzip calm.zip
+```
+### Run Model and Infer
+To obtain a concise output with only the average information for all tasks, use:
+
+```
+python run.py --models YOUR_MODEL --datasets calm --summarizer calm
+```
+
+If you want detailed information for each task, use:
+
+```
+python run.py --models YOUR_MODEL --datasets calm
+```
+
+The `--summarizer calm` flag in the first command is used to generate a summarized output, while omitting it in the second command will provide task-specific details.
+## Available Causal Tasks
+We provide 92 tasks for causal evaluation, stored in the `data/calm` folder. For more information about our causal tasks, refer to [tasks](https://github.com/OpenCausaLab/CaLM/blob/main/documents/tasks.md).
+The directory structure is:
+
+```
+├── calm
+| ├── association
+| ├── causal_discovery # Rung of the causal ladder
+| │ ├── abstract_reasoning # Causal scenario
+| │ │ ├── AR-B_CaLM-AR_CN.json # Causal task
+| │ | └── AR-B_CaLM-AR_EN.json # Causal task
+| │ └── ...
+| └── ...
+└── ...
+```
+
+## Dataset
+- **Dataset size**: CaLM Lite leverages a light dataset of **9200**, while CaLM uses a significantly larger dataset of 126,334. The table below details the English dataset composition, with the Chinese version structured identically.
+- **Dataset configuration**: We prioritize balance in our dataset for **binary classification** and **choice selection** questions. By ensuring an equal number of each GT label, we minimize the risk of introducing bias into the model's testing. For **probability calculation**, CaLM-Lite takes extra attention to balance the number of problems across different causal reasoning processes. (For more details on how causal reasoning process is defined, please refer to Section 9.1.6 of the [paper](https://arxiv.org/abs/2405.00622).)
+- **Efficient evaluation**: For enhanced evaluation efficiency, OpenCompass offers customizable methods. Refer to the [documentation](https://opencompass.org.cn/doc) for guidance on tailoring these methods to your needs.
+
+| Causal ladder | Causal scenario | Subset | Question type | Mode | CaLM Lite | CaLM |
+|---------------|-----------------|--------|---------------|------|-----------|------|
+| Causal discovery | PCD | E-CARE | Binary classification | Natural | 100 | 2000 |
+| Causal discovery | PCD | E-CARE | Choice selection | Natural | 100 | 1000 |
+| Causal discovery | PCD | COPA | Binary classification | Natural | 100 | 2000 |
+| Causal discovery | PCD | COPA | Choice selection | Natural | 100 | 1000 |
+| Causal discovery | ECI | CTB | Binary classification | Natural | 100 | 596 |
+| Causal discovery | ECI | ESC | Binary classification | Natural | 100 | 1000 |
+| Causal discovery | ECI | MAVEN-ERE | Binary classification | Natural | 100 | 1000 |
+| Causal discovery | AR | CaLM-AR | Binary classification | Symbolic | 100 | 1600 |
+| Causal discovery | CA | FP | Binary classification | Symbolic | 100 | 1600 |
+| Causal discovery | CA | FA | Binary classification | Symbolic | 100 | 1600 |
+| Association | CORR | correlation | Binary classification | Natural | 100 | 1476 |
+| Association | EAE | exp-away | Binary classification | Natural | 100 | 168 |
+| Intervention | CB | collider-bias | Binary classification | Natural | 100 | 163 |
+| Intervention | ATE | ATE-natural | Binary classification | Natural | 100 | 1600 |
+| Intervention | ATE | ATE-basic | Probability calculation | Mathematical | 100 | 1600 |
+| Intervention | ATE | ATE-hard | Probability calculation | Mathematical | 100 | 1600 |
+| Intervention | CDE | CDE-natural | Binary classification | Natural | 100 | 1600 |
+| Intervention | CDE | CDE-basic | Probability calculation | Mathematical | 100 | 1600 |
+| Intervention | CDE | CDE-hard | Probability calculation | Mathematical | 100 | 1600 |
+| Intervention | BAS | backadj | Binary classification | Natural | 100 | 227 |
+| Intervention | BAS | max-BAS | Choice selection | Symbolic | 100 | 1600 |
+| Intervention | BAS | min-BAS | Choice selection | Symbolic | 100 | 1600 |
+| Intervention | BAS | mix-BAS | Choice selection | Symbolic | 100 | 1600 |
+| Intervention | FAS | FAS | Choice selection | Symbolic | 100 | 1600 |
+| Intervention | IV | CaLM-IV | Choice selection | Symbolic | 100 | 1600 |
+| Intervention | CEI | 0.2-UC | Binary classification | Symbolic | 100 | 1600 |
+| Intervention | CEI | 0.4-UC | Binary classification | Symbolic | 100 | 1600 |
+| Intervention | CEI | 0.6-UC | Binary classification | Symbolic | 100 | 1600 |
+| Intervention | CEI | 0.8-UC | Binary classification | Symbolic | 100 | 1600 |
+| Counterfactuals | ETT | ETT-natural | Binary classification | Natural | 100 | 1600 |
+| Counterfactuals | ETT | ETT-basic | Probability calculation | Mathematical | 100 | 1600 |
+| Counterfactuals | ETT | ETT-hard | Probability calculation | Mathematical | 100 | 1600 |
+| Counterfactuals | NDE | NDE-natural | Binary classification | Natural | 100 | 1600 |
+| Counterfactuals | NDE | NDE-basic | Probability calculation | Mathematical | 100 | 1600 |
+| Counterfactuals | NDE | NDE-hard | Probability calculation | Mathematical | 100 | 1600 |
+| Counterfactuals | NIE | NIE-natural | Binary classification | Natural | 100 | 1600 |
+| Counterfactuals | NIE | NIE-basic | Probability calculation | Mathematical | 100 | 1600 |
+| Counterfactuals | NIE | NIE-hard | Probability calculation | Mathematical | 100 | 1600 |
+| Counterfactuals | PN | PN-basic | Probability calculation | Mathematical | 100 | 1600 |
+| Counterfactuals | PN | PN-hard | Probability calculation | Mathematical | 100 | 1600 |
+| Counterfactuals | PS | PS-basic | Probability calculation | Mathematical | 100 | 1600 |
+| Counterfactuals | PS | PS-hard | Probability calculation | Mathematical | 100 | 1600 |
+| Counterfactuals | AC | causal judgement | Binary classification | Natural | 100 | 187 |
+| Counterfactuals | CR | CRASS | Choice selection | Natural | 100 | 274 |
+| Counterfactuals | CR | det-counterfactual | Binary classification | Natural | 100 | 1476 |
+| Counterfactuals | CEG | E-CARE | Open-ended generation | Natural | 100 | 1000 |
+| **Total** | | | | | 4600 | 63167 |
+
+## Available Prompt Styles (Adaptation)
+Basic Prompt is our default setting for efficient evaluation of CaLM Lite, but we provide flexibility for exploring additional prompts through CaLM. If you'd like to explore and compare a wider range of prompts, we encourage you to use CaLM. We provide a comprehensive and easy-to-follow guide to assist you in our [repository](https://github.com/OpenCausaLab/CaLM).
+
+## Citation
+```
+@misc{chen2024causal,
+      title={Causal Evaluation of Language Models},
+      author={Sirui Chen and Bo Peng and Meiqi Chen and Ruiqi Wang and Mengying Xu and Xingyu Zeng and Rui Zhao and Shengjie Zhao and Yu Qiao and Chaochao Lu},
+      year={2024},
+      eprint={2405.00622},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
--- a/configs/datasets/calm/calm.py
+++ b/configs/datasets/calm/calm.py
@ -0,0 +1,160 @@
+from opencompass.openicl.icl_prompt_template import PromptTemplate
+from opencompass.openicl.icl_retriever import ZeroRetriever
+from opencompass.openicl.icl_inferencer import GenInferencer
+from opencompass.datasets import CaLMDataset, CaLMEvaluator
+
+task_hiearchy_dict = {
+    # association/
+        # correlation/
+            'CORR-B_correlation_CN':'association/correlation/',
+            'CORR-B_correlation_EN':'association/correlation/',
+        # explaining_away_effect/
+            'EAE-B_exp-away_CN':'association/explaining_away_effect/',
+            'EAE-B_exp-away_EN':'association/explaining_away_effect/',
+    # causal_discovery/
+        # abstract_reasoning/
+            'AR-B_CaLM-AR_CN':'causal_discovery/abstract_reasoning/',
+            'AR-B_CaLM-AR_EN':'causal_discovery/abstract_reasoning/',
+        # causal_attribution/
+            'CA-B_FA_CN':'causal_discovery/causal_attribution/',
+            'CA-B_FA_EN':'causal_discovery/causal_attribution/',
+            'CA-B_FP_CN':'causal_discovery/causal_attribution/',
+            'CA-B_FP_EN':'causal_discovery/causal_attribution/',
+        # event_causality_identification/
+            'ECI-B_CTB_CN':'causal_discovery/event_causality_identification/',
+            'ECI-B_CTB_EN':'causal_discovery/event_causality_identification/',
+            'ECI-B_ESC_CN':'causal_discovery/event_causality_identification/',
+            'ECI-B_ESC_EN':'causal_discovery/event_causality_identification/',
+            'ECI-B_MAVEN-ERE_CN':'causal_discovery/event_causality_identification/',
+            'ECI-B_MAVEN-ERE_EN':'causal_discovery/event_causality_identification/',
+        # pairwise_causal_discovery/
+            'PCD-B_COPA_CN':'causal_discovery/pairwise_causal_discovery/',
+            'PCD-B_COPA_EN':'causal_discovery/pairwise_causal_discovery/',
+            'PCD-B_E-CARE_CN':'causal_discovery/pairwise_causal_discovery/',
+            'PCD-B_E-CARE_EN':'causal_discovery/pairwise_causal_discovery/',
+            'PCD-C_COPA_CN':'causal_discovery/pairwise_causal_discovery/',
+            'PCD-C_COPA_EN':'causal_discovery/pairwise_causal_discovery/',
+            'PCD-C_E-CARE_CN':'causal_discovery/pairwise_causal_discovery/',
+            'PCD-C_E-CARE_EN':'causal_discovery/pairwise_causal_discovery/',
+    # counterfactual/
+        # actual_causality/
+            'AC-B_causal_judgement_CN':'counterfactual/actual_causality/',
+            'AC-B_causal_judgement_EN':'counterfactual/actual_causality/',
+        # causal_explanation_generation/
+            'CEG-O_E-CARE_CN':'counterfactual/causal_explanation_generation/',
+            'CEG-O_E-CARE_EN':'counterfactual/causal_explanation_generation/',
+        # counterfactual_reasoning/
+            'CR-B_det-counterfactual_CN':'counterfactual/counterfactual_reasoning/',
+            'CR-B_det-counterfactual_EN':'counterfactual/counterfactual_reasoning/',
+            'CR-C_CRASS_CN':'counterfactual/counterfactual_reasoning/',
+            'CR-C_CRASS_EN':'counterfactual/counterfactual_reasoning/',
+        # effect_of_the_treatment_on_the_treated/
+            'ETT-B_ETT-natural_CN':'counterfactual/effect_of_the_treatment_on_the_treated/',
+            'ETT-B_ETT-natural_EN':'counterfactual/effect_of_the_treatment_on_the_treated/',
+            'ETT-P_ETT-basic_CN':'counterfactual/effect_of_the_treatment_on_the_treated/',
+            'ETT-P_ETT-basic_EN':'counterfactual/effect_of_the_treatment_on_the_treated/',
+            'ETT-P_ETT-hard_CN':'counterfactual/effect_of_the_treatment_on_the_treated/',
+            'ETT-P_ETT-hard_EN':'counterfactual/effect_of_the_treatment_on_the_treated/',
+        # natural_direct_effect/
+            'NDE-B_NDE-natural_CN':'counterfactual/natural_direct_effect/',
+            'NDE-B_NDE-natural_EN':'counterfactual/natural_direct_effect/',
+            'NDE-P_NDE-basic_CN':'counterfactual/natural_direct_effect/',
+            'NDE-P_NDE-basic_EN':'counterfactual/natural_direct_effect/',
+            'NDE-P_NDE-hard_CN':'counterfactual/natural_direct_effect/',
+            'NDE-P_NDE-hard_EN':'counterfactual/natural_direct_effect/',
+        # natural_indirect_effect/
+            'NIE-B_NIE-natural_CN':'counterfactual/natural_indirect_effect/',
+            'NIE-B_NIE-natural_EN':'counterfactual/natural_indirect_effect/',
+            'NIE-P_NIE-basic_CN':'counterfactual/natural_indirect_effect/',
+            'NIE-P_NIE-basic_EN':'counterfactual/natural_indirect_effect/',
+            'NIE-P_NIE-hard_CN':'counterfactual/natural_indirect_effect/',
+            'NIE-P_NIE-hard_EN':'counterfactual/natural_indirect_effect/',
+        # probability_of_necessity/
+            'PN-P_PN-basic_CN':'counterfactual/probability_of_necessity/',
+            'PN-P_PN-basic_EN':'counterfactual/probability_of_necessity/',
+            'PN-P_PN-hard_CN':'counterfactual/probability_of_necessity/',
+            'PN-P_PN-hard_EN':'counterfactual/probability_of_necessity/',
+        # probability_of_sufficiency/
+            'PS-P_PS-basic_CN':'counterfactual/probability_of_sufficiency/',
+            'PS-P_PS-basic_EN':'counterfactual/probability_of_sufficiency/',
+            'PS-P_PS-hard_CN':'counterfactual/probability_of_sufficiency/',
+            'PS-P_PS-hard_EN':'counterfactual/probability_of_sufficiency/',
+    # intervention/
+        # average_treatment_effect/
+            'ATE-B_ATE-natural_CN':'intervention/average_treatment_effect/',
+            'ATE-B_ATE-natural_EN':'intervention/average_treatment_effect/',
+            'ATE-P_ATE-basic_CN':'intervention/average_treatment_effect/',
+            'ATE-P_ATE-basic_EN':'intervention/average_treatment_effect/',
+            'ATE-P_ATE-hard_CN':'intervention/average_treatment_effect/',
+            'ATE-P_ATE-hard_EN':'intervention/average_treatment_effect/',
+        # backdoor_adjustment_set/
+            'BAS-B_backadj_CN':'intervention/backdoor_adjustment_set/',
+            'BAS-B_backadj_EN':'intervention/backdoor_adjustment_set/',
+            'BAS-C_max-BAS_CN':'intervention/backdoor_adjustment_set/',
+            'BAS-C_max-BAS_EN':'intervention/backdoor_adjustment_set/',
+            'BAS-C_min-BAS_CN':'intervention/backdoor_adjustment_set/',
+            'BAS-C_min-BAS_EN':'intervention/backdoor_adjustment_set/',
+            'BAS-C_mix-BAS_CN':'intervention/backdoor_adjustment_set/',
+            'BAS-C_mix-BAS_EN':'intervention/backdoor_adjustment_set/',
+        # causal_effect_identification/
+            'CEI-B_0.2-UC_CN':'intervention/causal_effect_identification/',
+            'CEI-B_0.2-UC_EN':'intervention/causal_effect_identification/',
+            'CEI-B_0.4-UC_CN':'intervention/causal_effect_identification/',
+            'CEI-B_0.4-UC_EN':'intervention/causal_effect_identification/',
+            'CEI-B_0.6-UC_CN':'intervention/causal_effect_identification/',
+            'CEI-B_0.6-UC_EN':'intervention/causal_effect_identification/',
+            'CEI-B_0.8-UC_CN':'intervention/causal_effect_identification/',
+            'CEI-B_0.8-UC_EN':'intervention/causal_effect_identification/',
+        # collider_bias/
+            'CB-B_collider-bias_CN':'intervention/collider_bias/',
+            'CB-B_collider-bias_EN':'intervention/collider_bias/',
+        # controlled_direct_effect/
+            'CDE-B_CDE-natural_CN':'intervention/controlled_direct_effect/',
+            'CDE-B_CDE-natural_EN':'intervention/controlled_direct_effect/',
+            'CDE-P_CDE-basic_CN':'intervention/controlled_direct_effect/',
+            'CDE-P_CDE-basic_EN':'intervention/controlled_direct_effect/',
+            'CDE-P_CDE-hard_CN':'intervention/controlled_direct_effect/',
+            'CDE-P_CDE-hard_EN':'intervention/controlled_direct_effect/',
+        # frontdoor_adjustment_set/
+            'FAS-C_FAS_CN':'intervention/frontdoor_adjustment_set/',
+            'FAS-C_FAS_EN':'intervention/frontdoor_adjustment_set/',
+        # instrumental_variable/
+            'IV-C_CaLM-IV_CN':'intervention/instrumental_variable/',
+            'IV-C_CaLM-IV_EN':'intervention/instrumental_variable/',}
+
+calm_reader_cfg = dict(
+    input_columns=['question'],
+    output_column='gt_item')
+
+calm_all_sets = list(set(key[:-3] for key in task_hiearchy_dict.keys()))
+
+calm_datasets = []
+for _name in calm_all_sets:
+    for _prompt_style in ['basic','basic-CN']:
+        _task_name = _name + ('_CN' if _prompt_style.endswith('-CN') else '_EN')
+        _path = f'./data/calm/{task_hiearchy_dict[_task_name]}{_task_name}.json'
+
+        calm_infer_cfg = dict(
+            prompt_template=dict(
+                type=PromptTemplate,
+                template='{question}'),
+            retriever=dict(type=ZeroRetriever),
+            inferencer=dict(type=GenInferencer, max_out_len=500))
+
+        calm_eval_cfg = dict(evaluator=dict(
+                type=CaLMEvaluator,
+                core_metrics=True,
+                error_analysis=True,
+                prompt_style=_prompt_style,
+                task=_task_name))
+        calm_datasets.append(
+            dict(
+                abbr=f'calm_{_task_name}',
+                type=CaLMDataset,
+                path=_path,
+                prompt_style=_prompt_style,
+                reader_cfg=calm_reader_cfg,
+                infer_cfg=calm_infer_cfg,
+                eval_cfg=calm_eval_cfg)
+        )
+del _prompt_style, _task_name, _path, _name
--- a/configs/summarizers/groups/calm.py
+++ b/configs/summarizers/groups/calm.py
@ -0,0 +1,169 @@
+task_hiearchy_dict = {
+    # association/
+        # correlation/
+            'CORR-B_correlation_CN':'association/correlation/',
+            'CORR-B_correlation_EN':'association/correlation/',
+        # explaining_away_effect/
+            'EAE-B_exp-away_CN':'association/explaining_away_effect/',
+            'EAE-B_exp-away_EN':'association/explaining_away_effect/',
+    # causal_discovery/
+        # abstract_reasoning/
+            'AR-B_CaLM-AR_CN':'causal_discovery/abstract_reasoning/',
+            'AR-B_CaLM-AR_EN':'causal_discovery/abstract_reasoning/',
+        # causal_attribution/
+            'CA-B_FA_CN':'causal_discovery/causal_attribution/',
+            'CA-B_FA_EN':'causal_discovery/causal_attribution/',
+            'CA-B_FP_CN':'causal_discovery/causal_attribution/',
+            'CA-B_FP_EN':'causal_discovery/causal_attribution/',
+        # event_causality_identification/
+            'ECI-B_CTB_CN':'causal_discovery/event_causality_identification/',
+            'ECI-B_CTB_EN':'causal_discovery/event_causality_identification/',
+            'ECI-B_ESC_CN':'causal_discovery/event_causality_identification/',
+            'ECI-B_ESC_EN':'causal_discovery/event_causality_identification/',
+            'ECI-B_MAVEN-ERE_CN':'causal_discovery/event_causality_identification/',
+            'ECI-B_MAVEN-ERE_EN':'causal_discovery/event_causality_identification/',
+        # pairwise_causal_discovery/
+            'PCD-B_COPA_CN':'causal_discovery/pairwise_causal_discovery/',
+            'PCD-B_COPA_EN':'causal_discovery/pairwise_causal_discovery/',
+            'PCD-B_E-CARE_CN':'causal_discovery/pairwise_causal_discovery/',
+            'PCD-B_E-CARE_EN':'causal_discovery/pairwise_causal_discovery/',
+            'PCD-C_COPA_CN':'causal_discovery/pairwise_causal_discovery/',
+            'PCD-C_COPA_EN':'causal_discovery/pairwise_causal_discovery/',
+            'PCD-C_E-CARE_CN':'causal_discovery/pairwise_causal_discovery/',
+            'PCD-C_E-CARE_EN':'causal_discovery/pairwise_causal_discovery/',
+    # counterfactual/
+        # actual_causality/
+            'AC-B_causal_judgement_CN':'counterfactual/actual_causality/',
+            'AC-B_causal_judgement_EN':'counterfactual/actual_causality/',
+        # causal_explanation_generation/
+            'CEG-O_E-CARE_CN':'counterfactual/causal_explanation_generation/',
+            'CEG-O_E-CARE_EN':'counterfactual/causal_explanation_generation/',
+        # counterfactual_reasoning/
+            'CR-B_det-counterfactual_CN':'counterfactual/counterfactual_reasoning/',
+            'CR-B_det-counterfactual_EN':'counterfactual/counterfactual_reasoning/',
+            'CR-C_CRASS_CN':'counterfactual/counterfactual_reasoning/',
+            'CR-C_CRASS_EN':'counterfactual/counterfactual_reasoning/',
+        # effect_of_the_treatment_on_the_treated/
+            'ETT-B_ETT-natural_CN':'counterfactual/effect_of_the_treatment_on_the_treated/',
+            'ETT-B_ETT-natural_EN':'counterfactual/effect_of_the_treatment_on_the_treated/',
+            'ETT-P_ETT-basic_CN':'counterfactual/effect_of_the_treatment_on_the_treated/',
+            'ETT-P_ETT-basic_EN':'counterfactual/effect_of_the_treatment_on_the_treated/',
+            'ETT-P_ETT-hard_CN':'counterfactual/effect_of_the_treatment_on_the_treated/',
+            'ETT-P_ETT-hard_EN':'counterfactual/effect_of_the_treatment_on_the_treated/',
+        # natural_direct_effect/
+            'NDE-B_NDE-natural_CN':'counterfactual/natural_direct_effect/',
+            'NDE-B_NDE-natural_EN':'counterfactual/natural_direct_effect/',
+            'NDE-P_NDE-basic_CN':'counterfactual/natural_direct_effect/',
+            'NDE-P_NDE-basic_EN':'counterfactual/natural_direct_effect/',
+            'NDE-P_NDE-hard_CN':'counterfactual/natural_direct_effect/',
+            'NDE-P_NDE-hard_EN':'counterfactual/natural_direct_effect/',
+        # natural_indirect_effect/
+            'NIE-B_NIE-natural_CN':'counterfactual/natural_indirect_effect/',
+            'NIE-B_NIE-natural_EN':'counterfactual/natural_indirect_effect/',
+            'NIE-P_NIE-basic_CN':'counterfactual/natural_indirect_effect/',
+            'NIE-P_NIE-basic_EN':'counterfactual/natural_indirect_effect/',
+            'NIE-P_NIE-hard_CN':'counterfactual/natural_indirect_effect/',
+            'NIE-P_NIE-hard_EN':'counterfactual/natural_indirect_effect/',
+        # probability_of_necessity/
+            'PN-P_PN-basic_CN':'counterfactual/probability_of_necessity/',
+            'PN-P_PN-basic_EN':'counterfactual/probability_of_necessity/',
+            'PN-P_PN-hard_CN':'counterfactual/probability_of_necessity/',
+            'PN-P_PN-hard_EN':'counterfactual/probability_of_necessity/',
+        # probability_of_sufficiency/
+            'PS-P_PS-basic_CN':'counterfactual/probability_of_sufficiency/',
+            'PS-P_PS-basic_EN':'counterfactual/probability_of_sufficiency/',
+            'PS-P_PS-hard_CN':'counterfactual/probability_of_sufficiency/',
+            'PS-P_PS-hard_EN':'counterfactual/probability_of_sufficiency/',
+    # intervention/
+        # average_treatment_effect/
+            'ATE-B_ATE-natural_CN':'intervention/average_treatment_effect/',
+            'ATE-B_ATE-natural_EN':'intervention/average_treatment_effect/',
+            'ATE-P_ATE-basic_CN':'intervention/average_treatment_effect/',
+            'ATE-P_ATE-basic_EN':'intervention/average_treatment_effect/',
+            'ATE-P_ATE-hard_CN':'intervention/average_treatment_effect/',
+            'ATE-P_ATE-hard_EN':'intervention/average_treatment_effect/',
+        # backdoor_adjustment_set/
+            'BAS-B_backadj_CN':'intervention/backdoor_adjustment_set/',
+            'BAS-B_backadj_EN':'intervention/backdoor_adjustment_set/',
+            'BAS-C_max-BAS_CN':'intervention/backdoor_adjustment_set/',
+            'BAS-C_max-BAS_EN':'intervention/backdoor_adjustment_set/',
+            'BAS-C_min-BAS_CN':'intervention/backdoor_adjustment_set/',
+            'BAS-C_min-BAS_EN':'intervention/backdoor_adjustment_set/',
+            'BAS-C_mix-BAS_CN':'intervention/backdoor_adjustment_set/',
+            'BAS-C_mix-BAS_EN':'intervention/backdoor_adjustment_set/',
+        # causal_effect_identification/
+            'CEI-B_0.2-UC_CN':'intervention/causal_effect_identification/',
+            'CEI-B_0.2-UC_EN':'intervention/causal_effect_identification/',
+            'CEI-B_0.4-UC_CN':'intervention/causal_effect_identification/',
+            'CEI-B_0.4-UC_EN':'intervention/causal_effect_identification/',
+            'CEI-B_0.6-UC_CN':'intervention/causal_effect_identification/',
+            'CEI-B_0.6-UC_EN':'intervention/causal_effect_identification/',
+            'CEI-B_0.8-UC_CN':'intervention/causal_effect_identification/',
+            'CEI-B_0.8-UC_EN':'intervention/causal_effect_identification/',
+        # collider_bias/
+            'CB-B_collider-bias_CN':'intervention/collider_bias/',
+            'CB-B_collider-bias_EN':'intervention/collider_bias/',
+        # controlled_direct_effect/
+            'CDE-B_CDE-natural_CN':'intervention/controlled_direct_effect/',
+            'CDE-B_CDE-natural_EN':'intervention/controlled_direct_effect/',
+            'CDE-P_CDE-basic_CN':'intervention/controlled_direct_effect/',
+            'CDE-P_CDE-basic_EN':'intervention/controlled_direct_effect/',
+            'CDE-P_CDE-hard_CN':'intervention/controlled_direct_effect/',
+            'CDE-P_CDE-hard_EN':'intervention/controlled_direct_effect/',
+        # frontdoor_adjustment_set/
+            'FAS-C_FAS_CN':'intervention/frontdoor_adjustment_set/',
+            'FAS-C_FAS_EN':'intervention/frontdoor_adjustment_set/',
+        # instrumental_variable/
+            'IV-C_CaLM-IV_CN':'intervention/instrumental_variable/',
+            'IV-C_CaLM-IV_EN':'intervention/instrumental_variable/',}
+dict_keys = list(task_hiearchy_dict.keys())
+error_dict = {'Same response to all questions':[],
+               'Language inconsistency':[],
+               'Limitation of instruction-following':[],
+               'Repetition':[],
+               'Empty response':[],}
+
+for error in error_dict:
+    for key in dict_keys:
+        if 'CEG-O_E-CARE' in key:
+            continue
+        error_dict[error].append([f'calm_{key}', error])
+
+English_avg = []
+Chinese_avg = []
+for key in dict_keys:
+    if key.endswith('EN'):
+        English_avg.append([f'calm_{key}', 'Accuracy'])
+    else:
+        assert key.endswith('CN')
+        Chinese_avg.append([f'calm_{key}', 'Accuracy'])
+
+calm_summary_groups = [
+    # English Average
+    {'name': 'English Average', 'subsets': English_avg},
+
+    # Chinese Average
+    {'name': 'Chinese Average', 'subsets': Chinese_avg},
+
+    # Accuracy Average
+    {'name': 'Accuracy Average', 'subsets': ['English Average', 'Chinese Average']},
+]
+for error in error_dict:
+    calm_summary_groups.append({'name': error+' Average', 'subsets': error_dict[error]})
+
+summarizer = dict(
+    dataset_abbrs = [
+        '###### CALM-Lite Accuracy ######',
+        'Accuracy Average',
+        'English Average',
+        'Chinese Average',
+
+        '###### CALM-Lite Errors ######',
+        'Same response to all questions Average',
+        'Language inconsistency Average',
+        'Limitation of instruction-following Average',
+        'Repetition Average',
+        'Empty response Average',
+    ],
+    summary_groups=calm_summary_groups,
+)
--- a/opencompass/datasets/init.py
+++ b/opencompass/datasets/init.py
@ -10,6 +10,7 @@ from .bbh import *  # noqa: F401, F403
 from .boolq import *  # noqa: F401, F403
 from .bustum import *  # noqa: F401, F403
 from .c3 import *  # noqa: F401, F403
+from .calm import *  # noqa: F401, F403
 from .cb import *  # noqa: F401, F403
 from .ceval import *  # noqa: F401, F403
 from .charm import *  # noqa: F401, F403
--- a/opencompass/datasets/calm/init.py
+++ b/opencompass/datasets/calm/init.py
@ -0,0 +1 @@
+from .calm import *  # noqa: F401, F403
--- a/opencompass/datasets/calm/calm.py
+++ b/opencompass/datasets/calm/calm.py
@ -0,0 +1,60 @@
+from typing import List
+
+import datasets
+from datasets import Dataset
+
+from opencompass.openicl.icl_evaluator import BaseEvaluator
+from opencompass.registry import LOAD_DATASET
+
+from ..base import BaseDataset
+from .data_processing.generate_questions import generate_question_list
+from .evaluation.core_metrics import compute_core_metrics
+from .evaluation.errors import identify_model_errors
+
+
+@LOAD_DATASET.register_module()
+class CaLMDataset(BaseDataset):
+
+    @staticmethod
+    def load(path: str, prompt_style: str) -> datasets.Dataset:
+        question_list = generate_question_list(dataset_path=path,
+                                               prompt_style=prompt_style)
+        dataset = Dataset.from_list(question_list)
+        return dataset
+
+
+class CaLMEvaluator(BaseEvaluator):
+
+    def __init__(self, core_metrics, error_analysis, prompt_style,
+                 task) -> None:
+        super().__init__()
+        self.core_metrics = core_metrics
+        self.error_analysis = error_analysis
+        self.prompt_style = prompt_style
+        self.task = task
+
+    def score(
+        self,
+        predictions: List,
+        references: List,
+    ) -> dict:
+        results = {}
+        if self.core_metrics:
+            metrics, pred_list = compute_core_metrics(
+                predictions,
+                task=self.task,
+                prompt_style=self.prompt_style,
+                gt_items=references)
+            results.update(metrics)
+        if self.error_analysis:
+            if self.task.startswith('CEG-O_E-CARE'):
+                print("There's no error analysis for CEG-O_E-CARE task. ",
+                      'Skipping error analysis.')
+                return results
+            errors = identify_model_errors(
+                predictions,
+                task=self.task,
+                prompt_style=self.prompt_style,
+                gt_items=references)  # Define specific criteria
+            results.update(errors)
+        return results
--- a/opencompass/datasets/calm/data_processing/generate_questions.py
+++ b/opencompass/datasets/calm/data_processing/generate_questions.py
@ -0,0 +1,192 @@
+# flake8: noqa: E501
+import importlib
+from pathlib import Path
+
+from ..utils.load_items import load_query_instances
+
+
+def get_get_prompt_func(task):
+    """Returns the appropriate prompt generation function based on the given
+    task.
+
+    Args:
+        task (str): The name of the task for which the prompt function is required.
+
+    Returns:
+        function: The prompt generation function for the specified task.
+
+    Raises:
+        NotImplementedError: If no prompt function is found for the given task.
+    """
+    task_to_module_map = {
+        # association/
+        # correlation/
+        'CORR-B_correlation_CN': 'CORR-B_correlation',
+        'CORR-B_correlation_EN': 'CORR-B_correlation',
+        # explaining_away_effect/
+        'EAE-B_exp-away_CN': 'EAE-B_exp-away',
+        'EAE-B_exp-away_EN': 'EAE-B_exp-away',
+        # causal_discovery/
+        # abstract_reasoning/
+        'AR-B_CaLM-AR_CN': 'AR-B_CaLM-AR',
+        'AR-B_CaLM-AR_EN': 'AR-B_CaLM-AR',
+        # causal_attribution/
+        'CA-B_FA_CN': 'CA-B_FA',
+        'CA-B_FA_EN': 'CA-B_FA',
+        'CA-B_FP_CN': 'CA-B_FP',
+        'CA-B_FP_EN': 'CA-B_FP',
+        # event_causality_identification/
+        'ECI-B_CTB_CN': 'ECI-B_CTB',
+        'ECI-B_CTB_EN': 'ECI-B_CTB',
+        'ECI-B_ESC_CN': 'ECI-B_ESC',
+        'ECI-B_ESC_EN': 'ECI-B_ESC',
+        'ECI-B_MAVEN-ERE_CN': 'ECI-B_MAVEN-ERE',
+        'ECI-B_MAVEN-ERE_EN': 'ECI-B_MAVEN-ERE',
+        # pairwise_causal_discovery/
+        'PCD-B_COPA_CN': 'PCD-B_COPA',
+        'PCD-B_COPA_EN': 'PCD-B_COPA',
+        'PCD-B_E-CARE_CN': 'PCD-B_E-CARE',
+        'PCD-B_E-CARE_EN': 'PCD-B_E-CARE',
+        'PCD-C_COPA_CN': 'PCD-C_COPA',
+        'PCD-C_COPA_EN': 'PCD-C_COPA',
+        'PCD-C_E-CARE_CN': 'PCD-C_E-CARE',
+        'PCD-C_E-CARE_EN': 'PCD-C_E-CARE',
+        # counterfactual/
+        # actual_causality/
+        'AC-B_causal_judgement_CN': 'AC-B_causal_judgement',
+        'AC-B_causal_judgement_EN': 'AC-B_causal_judgement',
+        # causal_explanation_generation/
+        'CEG-O_E-CARE_CN': 'CEG-O_E-CARE',
+        'CEG-O_E-CARE_EN': 'CEG-O_E-CARE',
+        # counterfactual_reasoning/
+        'CR-B_det-counterfactual_CN': 'CR-B_det-counterfactual',
+        'CR-B_det-counterfactual_EN': 'CR-B_det-counterfactual',
+        'CR-C_CRASS_CN': 'CR-C_CRASS',
+        'CR-C_CRASS_EN': 'CR-C_CRASS',
+        # effect_of_the_treatment_on_the_treated/
+        'ETT-B_ETT-natural_CN': 'ETT',
+        'ETT-B_ETT-natural_EN': 'ETT',
+        'ETT-P_ETT-basic_CN': 'ETT',
+        'ETT-P_ETT-basic_EN': 'ETT',
+        'ETT-P_ETT-hard_CN': 'ETT',
+        'ETT-P_ETT-hard_EN': 'ETT',
+        # natural_direct_effect/
+        'NDE-B_NDE-natural_CN': 'NDE',
+        'NDE-B_NDE-natural_EN': 'NDE',
+        'NDE-P_NDE-basic_CN': 'NDE',
+        'NDE-P_NDE-basic_EN': 'NDE',
+        'NDE-P_NDE-hard_CN': 'NDE',
+        'NDE-P_NDE-hard_EN': 'NDE',
+        # natural_indirect_effect/
+        'NIE-B_NIE-natural_CN': 'NIE',
+        'NIE-B_NIE-natural_EN': 'NIE',
+        'NIE-P_NIE-basic_CN': 'NIE',
+        'NIE-P_NIE-basic_EN': 'NIE',
+        'NIE-P_NIE-hard_CN': 'NIE',
+        'NIE-P_NIE-hard_EN': 'NIE',
+        # probability_of_necessity/
+        'PN-P_PN-basic_CN': 'PN',
+        'PN-P_PN-basic_EN': 'PN',
+        'PN-P_PN-hard_CN': 'PN',
+        'PN-P_PN-hard_EN': 'PN',
+        # probability_of_sufficiency/
+        'PS-P_PS-basic_CN': 'PS',
+        'PS-P_PS-basic_EN': 'PS',
+        'PS-P_PS-hard_CN': 'PS',
+        'PS-P_PS-hard_EN': 'PS',
+        # intervention/
+        # average_treatment_effect/
+        'ATE-B_ATE-natural_CN': 'ATE',
+        'ATE-B_ATE-natural_EN': 'ATE',
+        'ATE-P_ATE-basic_CN': 'ATE',
+        'ATE-P_ATE-basic_EN': 'ATE',
+        'ATE-P_ATE-hard_CN': 'ATE',
+        'ATE-P_ATE-hard_EN': 'ATE',
+        # backdoor_adjustment_set/
+        'BAS-B_backadj_CN': 'BAS-B_backadj',
+        'BAS-B_backadj_EN': 'BAS-B_backadj',
+        'BAS-C_max-BAS_CN': 'BAS-C_max-BAS',
+        'BAS-C_max-BAS_EN': 'BAS-C_max-BAS',
+        'BAS-C_min-BAS_CN': 'BAS-C_min-BAS',
+        'BAS-C_min-BAS_EN': 'BAS-C_min-BAS',
+        'BAS-C_mix-BAS_CN': 'BAS-C_mix-BAS',
+        'BAS-C_mix-BAS_EN': 'BAS-C_mix-BAS',
+        # causal_effect_identification/
+        'CEI-B_0.2-UC_CN': 'CEI-B',
+        'CEI-B_0.2-UC_EN': 'CEI-B',
+        'CEI-B_0.4-UC_CN': 'CEI-B',
+        'CEI-B_0.4-UC_EN': 'CEI-B',
+        'CEI-B_0.6-UC_CN': 'CEI-B',
+        'CEI-B_0.6-UC_EN': 'CEI-B',
+        'CEI-B_0.8-UC_CN': 'CEI-B',
+        'CEI-B_0.8-UC_EN': 'CEI-B',
+        # collider_bias/
+        'CB-B_collider-bias_CN': 'CB-B_collider-bias',
+        'CB-B_collider-bias_EN': 'CB-B_collider-bias',
+        # controlled_direct_effect/
+        'CDE-B_CDE-natural_CN': 'CDE',
+        'CDE-B_CDE-natural_EN': 'CDE',
+        'CDE-P_CDE-basic_CN': 'CDE',
+        'CDE-P_CDE-basic_EN': 'CDE',
+        'CDE-P_CDE-hard_CN': 'CDE',
+        'CDE-P_CDE-hard_EN': 'CDE',
+        # frontdoor_adjustment_set/
+        'FAS-C_FAS_CN': 'FAS-C_FAS',
+        'FAS-C_FAS_EN': 'FAS-C_FAS',
+        # instrumental_variable/
+        'IV-C_CaLM-IV_CN': 'IV-C_CaLM-IV',
+        'IV-C_CaLM-IV_EN': 'IV-C_CaLM-IV',
+    }
+
+    module_name = task_to_module_map.get(task)
+
+    if module_name:
+        module = importlib.import_module(
+            'opencompass.datasets.calm.data_processing.prompt.' + module_name)
+        return module.get_prompt
+    else:
+        raise NotImplementedError(
+            f'No get_prompt function found for task {task}.')
+
+
+def generate_question_list(dataset_path, prompt_style):
+    """Generates a list of questions from the dataset based on the specified
+    prompt style.
+
+    Args:
+        dataset_path (str): The path to the dataset JSON file.
+        prompt_style (str): The style of prompt to be used for generating questions.
+
+    Returns:
+        list: A list of question dictionaries, each containing an item from the dataset along with its corresponding question.
+
+    Raises:
+        AssertionError: If the task name and prompt style do not match the expected language suffix.
+    """
+    # Extract task name from dataset path
+    dataset_path = Path(dataset_path)
+    task_name = dataset_path.name[:-len('.json')]
+
+    # Validate prompt style based on task language
+    if task_name.endswith('CN'):
+        assert prompt_style.endswith('-CN')
+    else:
+        assert not prompt_style.endswith('-CN')
+
+    # Get prompt generation function based on task
+    get_prompt_func = get_get_prompt_func(task=task_name)
+
+    # Load items from dataset
+    item_list = load_query_instances(dataset_path)
+    question_list = []
+
+    # Generate questions for each item in the dataset
+    for idx, item in enumerate(item_list):
+        question = get_prompt_func(task_name=task_name,
+                                   prompt_style=prompt_style,
+                                   item=item)
+        question_list.append({
+            'question': question,
+            'gt_item': item,
+        })
+    return question_list
--- a/opencompass/datasets/calm/data_processing/prompt/AC-B_causal_judgement.py
+++ b/opencompass/datasets/calm/data_processing/prompt/AC-B_causal_judgement.py
@ -0,0 +1,148 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic': """Question: %s
+Answer (Yes or No ?):""",
+    'basic-CN': """问题：%s
+答案（是或否？）：""",
+    'adversarial-ignore': """Question: %s
+Answer (Yes or No ?):""",
+    'adversarial-ignore-CN': """问题：%s
+答案（是或否？）：""",
+    'adversarial-doubt': """Question: %s
+Answer (Yes or No ?):""",
+    'adversarial-doubt-CN': """问题：%s
+答案（是或否？）：""",
+    'zero-shot-IcL': """
+Define actual causality in a way that is consistent with how humans naturally attribute cause and related concepts of responsibility, blame, and explanations to events and their outcomes.
+Question: %s
+Answer (Yes or No ?):""",
+    'zero-shot-IcL-CN': """
+请用类似于人们定义事件的因果以及与其相关的责任、过错、解释等概念的方式，定义真实的因果性。
+问题：%s
+答案（是或否？）：""",
+    'one-shot-IcL': """
+Define actual causality in a way that is consistent with how humans naturally attribute cause and related concepts of responsibility, blame, and explanations to events and their outcomes.
+Question: How would a typical person answer each of the following questions about causation?
+A machine is set up in such a way that it will short circuit if both the black wire and the red wire touch the battery at the same time. The machine will not short circuit if just one of these wires touches the battery. The black wire is designated as the one that is supposed to touch the battery, while the red wire is supposed to remain in some other part of the machine. One day, the black wire and the red wire both end up touching the battery at the same time. There is a short circuit. Did the black wire cause the short circuit?
+Options:
+- Yes
+- No
+Answer (Yes or No ?): No
+
+Question: %s
+Answer (Yes or No ?):""",
+    'one-shot-IcL-CN': """
+请用类似于人们定义事件的因果以及与其相关的责任、过错、解释等概念的方式，定义真实的因果性。
+问题：对于以下关于因果关系的问题，一个普通人会怎么回答？
+一台机器被设置成如果黑线和红线同时碰到电池，那么机器就会短路。而如果只有其中一根线碰到电池，则不会短路。黑线被指定为应该接触电池的那根，而红线则应该留在机器的其他部分。有一天，黑线和红线同时接触到电池，从而机器出现了短路，是黑线造成了短路吗？
+选项：
+- 是
+- 否
+答案（是或否？）：否
+
+问题：%s
+答案（是或否？）：""",
+    'three-shot-IcL': """
+Define actual causality in a way that is consistent with how humans naturally attribute cause and related concepts of responsibility, blame, and explanations to events and their outcomes.
+Question: How would a typical person answer each of the following questions about causation?
+A machine is set up in such a way that it will short circuit if both the black wire and the red wire touch the battery at the same time. The machine will not short circuit if just one of these wires touches the battery. The black wire is designated as the one that is supposed to touch the battery, while the red wire is supposed to remain in some other part of the machine. One day, the black wire and the red wire both end up touching the battery at the same time. There is a short circuit. Did the black wire cause the short circuit?
+Options:
+- Yes
+- No
+Answer (Yes or No ?): No
+
+Question: How would a typical person answer each of the following questions about causation?
+Claire's parents bought her an old computer. Claire uses it for schoolwork, but her brother Daniel sometimes logs on to play games. Claire has told Daniel, "Please don't log on to my computer. If we are both logged on at the same time, it will crash". One day, Claire and Daniel logged on to the computer at the same time. The computer crashed. Later that day, Claire's mother is talking with the computer repairman. The repairman says, "I see that Daniel was logged on, but this computer will only crash if two people are logged on at the same time. So, I still don't see quite why the computer crashed." Did Daniel cause the computer crash?
+Options:
+- Yes
+- No
+Answer (Yes or No ?): Yes
+
+Question: How would a typical person answer each of the following questions about causation?
+Suzy and Billy are working on a project that is very important for our nation's security. The boss tells Suzy: "Be sure that you are here at exactly 9 am. It is absolutely essential that you arrive at that time." Then he tells Billy: "Be sure that you do not come in at all tomorrow morning. It is absolutely essential that you not appear at that time." Both Billy and Suzy arrive at 9 am. As it happens, there was a motion detector installed in the room where they arrived. The motion detector was set up to be triggered if at least one person appeared in the room at the same time. So the motion detector went off. Did Billy cause the motion detector to go off?
+Options:
+- Yes
+- No
+Answer (Yes or No ?): Yes
+
+Question: %s
+Answer (Yes or No ?):""",
+    'three-shot-IcL-CN': """
+请用类似于人们定义事件的因果以及与其相关的责任、过错、解释等概念的方式，定义真实的因果性。
+问题：对于以下关于因果关系的问题，一个普通人会怎么回答？
+一台机器被设置成如果黑线和红线同时碰到电池，那么机器就会短路。而如果只有其中一根线碰到电池，则不会短路。黑线被指定为应该接触电池的那根，而红线则应该留在机器的其他部分。有一天，黑线和红线同时接触到电池，从而机器出现了短路，是黑线造成了短路吗？
+选项：
+- 是
+- 否
+答案（是或否？）：否
+
+问题：对于以下关于因果关系的问题，一个普通人会怎么回答？
+克洛伊的父母给她买了一台旧电脑。克洛伊用它做作业，但她的兄弟丹尼尔有时会登录来玩游戏。克洛伊告诉过丹尼尔：“请不要登录我的电脑。如果我们两个都在线，电脑就会死机。”有一天，克洛伊和丹尼尔同时登录了电脑。电脑死机了。当天晚些时候，克洛伊的母亲正在与电脑修理工交谈。技术员说：“我看到丹尼尔已经登录了，但这个电脑只有在两个人同时登录时才会死机。所以，我仍然不明白为什么电脑会死机。”丹尼尔导致了电脑死机吗？
+选项：
+- 是
+- 否
+答案（是或否？）：是
+
+问题：对于以下关于因果关系的问题，一个普通人会怎么回答？
+苏茜和比利正在合作完成一份关乎国家安全的重要项目。老板告诉苏茜：“你务必在早上九点准时到达。切记要按时到。”然后他告诉比利：“不要在明天早上九点来。你要保证千万别在那个时间出现。”两人都在上午九点到达了。恰好，房间里装有运动探测器。只要有至少一个人在房间内出现，运动探测器就会触发响起。是比利导致了运动探测器响起的吗？
+选项：
+- 是
+- 否
+答案（是或否？）：是
+
+问题：%s
+答案（是或否？）：""",
+    'zero-shot-CoT': """Question: %s\nLet's think step by step.
+Answer (Yes or No ?):""",
+    'zero-shot-CoT-CN': """问题：%s请逐步思考。
+答案（是或否？）：""",
+    'manual-CoT':
+    """Here we will provide three chain-of-thought exemplars, followed by a binary question that needs to be answered.
+
+Question: How would a typical person answer each of the following questions about causation?
+Frank T., had an ongoing dispute with his neighbor over a stretch of land and one day decided to shoot his neighbor in the body. Frank T. had no experience with guns, his hand slipped on the barrel of the gun, and the shot went wild. Nonetheless, the bullet bounced off a large boulder several feet away and hit the neighbor's body, causing significant injury. Did Frank T. intentionally shoot his neighbor in the body?
+Options:
+- Yes
+- No
+Answer(with chain of thought): Here in this question, we are told that "Frank T. had no experience with guns, his hand slipped on the barrel of the gun, and the shot went wild." A typical person would assume that this passage suggests that Frank T. had no intention of shooting and injuring someone and that the bullet accidentally hit the neighbor's body; therefore, we conclude that Frank T. did not intentionally hit his neighbor. So the answer is No.
+
+Question: How would a typical person answer each of the following questions about causation?
+Suzy and Billy are working on a project that is very important for our nation's security. The boss tells them both: "Be sure that you are here at exactly 9 am. It is absolutely essential that you arrive at that time." Both Billy and Suzy arrive at 9 am. As it happens, there was a motion detector installed in the room where they arrived. The motion detector was set up to be triggered if at least one person appeared in the room at the same time. So the motion detector went off. Did Billy cause the motion detector to go off?
+Options:
+- Yes
+- No
+Answer(with chain of thought):
+Here in this question, we are told that the boss ordered them both to arrive at the meeting room at the same time and that the motion detector was set up to be triggered if at least one person appeared in the room at the same time." A typical person would assume that the person probably meant to say the detector was set up to be triggered if "both persons" appeared in the room at the same time, not at least one person, since otherwise the phrase "at the same time" would not make much sense in that sentence. Because the motion detector went off, a typical person would therefore come to the conclusion that both Suzy and Billy triggered the motion detector to go off; hence, Billy did indeed cause the motion detector to go off. So the answer is Yes.
+
+Question: How would a typical person answer each of the following questions about causation?
+George and his sister Lena reunite at their parents' house for Thanksgiving. Whereas George just got into medical school, Lena is unhappy in her marriage and recently lost her job. Over the course of the day, George and Lena get into a number of heated arguments. Later in the afternoon they play a game of darts. They split the first two games, and the third game is close until the end. Who will win comes down to George's last shot. If he hits a high point region, he wins; if he hits a low point region, Lena wins. George thinks of the difficult time Lena is having, and he really wants to let her win. He aims the dart at the low point region. He sets up his shot and the dart lands in the low point region. After his shot, Lena wins the game and is very happy. Did George hit the low point region intentionally?
+Options:
+- Yes
+- No
+Answer(with chain of thought):
+Here in this question, we are told that "He aims the dart at the low point region." A typical person might therefore think George did intentionally hit the low point region, because he wanted to lift up the spirit of his sister Lena. So the answer is Yes.
+
+Question: %s
+Answer (Yes or No ?):""",
+    'manual-CoT-CN': """如下为一个使用思维链进行推理的因果归因的示例，和一个需要回答的问题。
+
+问题：对于以下关于因果关系的问题，一个普通人会怎么回答？\n研发部门的代表向董事会报告并说:\"我们正在考虑启动一项新计划。这将有助于增加利润，也有助于保护环境。\"董事会的回答是,\"我们根本不在乎帮助环境。只想尽可能地多赚钱。我们实施计划吧。\"果然，该计划对环境保护起了作用。董事会是有意做环境保护的吗？\n选项：\n- 是\n- 否
+答案（是或否？）：董事会的回应表明他们只关心赚钱，根本不在乎环境保护，只是该计划恰好也有助于保护环境。因此答案是“否”。
+
+问题：%s
+答案（是或否？）：
+""",
+    'explicit-function': """You are a helpful assistant for causal attribution.
+Question: %s
+Answer (Yes or No ?):""",
+    'explicit-function-CN': """你是一个用于因果归因的得力助手。
+问题：%s
+答案（是或否？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['input'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/AR-B_CaLM-AR.py
+++ b/opencompass/datasets/calm/data_processing/prompt/AR-B_CaLM-AR.py
@ -0,0 +1,153 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Input Event: If %s.
+Question: Does %s cause %s?
+Answer (Yes or No ?): """,
+    'basic-CN':
+    """输入信息：如果%s。
+问题：%s是否导致%s？
+答案（是或否？）：""",
+    'adversarial-ignore':
+    """Input Event: If %s.
+Question: Does %s cause %s?
+Answer (Yes or No ?): """,
+    'adversarial-ignore-CN':
+    """输入信息：如果%s。
+问题：%s是否导致%s？
+答案（是或否？）：""",
+    'adversarial-doubt':
+    """Input Event: If %s.
+Question: Does %s cause %s?
+Answer (Yes or No ?): """,
+    'adversarial-doubt-CN':
+    """输入信息：如果%s。
+问题：%s是否导致%s？
+答案（是或否？）：""",
+    'zero-shot-IcL':
+    """Answer questions based on causal relations in a given causal graph.
+Input Event: If %s.
+Question: Does %s cause %s?
+Answer (Yes or No ?):""",
+    'zero-shot-IcL-CN':
+    """根据给定因果图中的因果关系回答问题。
+输入信息：如果%s。
+问题：%s是否导致%s？
+答案（是或否？）：""",
+    'one-shot-IcL':
+    """Answer questions based on causal relations in a given causal graph.
+Input Event: If A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Does C cause A?
+Answer (Yes or No ?): No
+
+Input Event: If %s.
+Question: Does %s cause %s?
+Answer (Yes or No ?):""",
+    'one-shot-IcL-CN':
+    """根据给定因果图中的因果关系回答问题。
+输入信息：如果A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+问题：C是否导致A？
+答案（是或否？）：否
+
+输入信息：如果%s。
+问题：%s是否导致%s？
+答案（是或否？）：""",
+    'three-shot-IcL':
+    """Answer questions based on causal relations in a given causal graph.
+Input Event: If A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Does C cause A?
+Answer (Yes or No ?): No
+
+Input Event: If A causes B, A causes E, B causes E, B causes D, C causes E, and C causes D.
+Question: Does C cause D?
+Answer (Yes or No ?): Yes
+
+Input Event: If A causes D, A causes C, B causes E, C causes D, and D causes E.
+Question: Does E cause E?
+Answer (Yes or No ?): No
+
+Input Event: If %s.
+Question: Does %s cause %s?
+Answer (Yes or No ?):""",
+    'three-shot-IcL-CN':
+    """根据给定因果图中的因果关系回答问题。
+输入信息：如果A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+问题：C是否导致A？
+答案（是或否？）：否
+
+输入信息：如果A导致B, A导致E, B导致E, B导致D, C导致E, 以及C导致D。
+问题：C是否导致D？
+答案（是或否？）：是
+
+输入信息：如果A导致D, A导致C, B导致E, C导致D, 以及D导致E。
+问题：E是否导致E？
+答案（是或否？）：否
+
+输入信息：如果%s。
+问题：%s是否导致%s？
+答案（是或否？）：""",
+    'zero-shot-CoT':
+    """Input Event: If %s.
+Question: Does %s cause %s? Let's think step by step.
+Answer (Yes or No ?):""",
+    'zero-shot-CoT-CN':
+    """输入信息：如果%s。
+问题：%s是否导致%s？请逐步思考。
+答案（是或否？）：""",
+    'manual-CoT':
+    """Here are three examples of causal abstract reasoning using chain of thought, and a question to answer.
+
+Input Event: If A causes D, A causes C, A causes B, B causes E, B causes D, and C causes D.
+Question: Does A cause C?
+Answer (Yes or No ?): The input states that A causes C. Therefore, the answer is Yes.
+
+Input Event: If A causes B, A causes C, A causes D, B causes E, and C causes E.
+Question: Does E cause A?
+Answer (Yes or No ?): A is not caused by anything in the input, thus E does not cause A. Therefore, the answer is No.
+
+Input Event: If A causes C, A causes H, A causes E, A causes B, B causes H, B causes G, B causes F, B causes C, C causes F, C causes H, C causes D, C causes E, D causes E, E causes G, E causes H, E causes F, F causes H, and F causes G.
+Question: Does D cause F?
+Answer (Yes or No ?): Given D causes E, and E causes F, such that D causes F. Therefore, the answer is Yes.
+
+Input Event: If %s.
+Question: Does %s cause %s?
+Answer (Yes or No ?):
+""",
+    'manual-CoT-CN':
+    """如下为三个使用思维链进行推理的因果抽象推理的示例，和一个需要回答的问题。
+
+输入信息：A导致D, A导致C, A导致B, B导致E, B导致D, 以及C导致D。
+问题：A是否导致C？
+答案（是或否？）：输入信息中说明了A导致C。因此答案为“是”。
+
+输入信息：A导致B, A导致C, A导致D, B导致E, 以及C导致E。
+问题：E是否导致A？
+答案（是或否？）：输入信息中没有任何元素导致A，因此E没有导致A。因此答案为“否”。
+
+输入信息：如果A导致C, A导致H, A导致E, A导致B, B导致H, B导致G, B导致F, B导致C, C导致F, C导致H, C导致D, C导致E, D导致E, E导致G, E导致H, E导致F, F导致H, 以及F导致G。
+问题：D是否导致F？
+答案（是或否？）：D导致E，E导致F，所以D导致F。因此答案为“是”。
+
+输入信息：如果%s。
+问题：%s是否导致%s？
+答案（是或否？）：
+""",
+    'explicit-function':
+    """You are a helpful assistant for abstract reasoning.
+Input Event: If %s.
+Question: Does %s cause %s?
+Answer (Yes or No ?):""",
+    'explicit-function-CN':
+    """你是一个用于抽象推理的得力助手。
+输入信息：如果%s。
+问题：%s是否导致%s？
+答案（是或否？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['ar_edges'], item['former'],
+                                        item['latter'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/ATE.py
+++ b/opencompass/datasets/calm/data_processing/prompt/ATE.py
@ -0,0 +1,182 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'basic-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'adversarial-ignore':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'adversarial-ignore-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'adversarial-doubt':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'adversarial-doubt-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'zero-shot-IcL':
+    """Answer questions about the Average Treatment Effect (ATE). Computing the Average Treatment Effect involves comparing the outcomes of two groups: the treated group and the control group. The ATE is the difference in average outcomes between these two groups.
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'zero-shot-IcL-CN':
+    """回答有关平均处理效应 (ATE) 的问题。计算平均处理效应需要比较两组结果：处理组和对照组。ATE 是这两组之间平均处理效应的差值。
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'one-shot-IcL':
+    """Answer questions about the Average Treatment Effect (ATE). Computing the Average Treatment Effect involves comparing the outcomes of two groups: the treated group and the control group. The ATE is the difference in average outcomes between these two groups.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: appearance has a direct effect on air pressure. Air pressure has a direct effect on education level.
+For those with appearance being high, the probability of education level being high is 0.3192. For those with appearance being low, the probability of education level being high is 0.3100.
+Instruction: Consider the average treatment effect (ATE) of appearance on education level.
+Question: If appearance is changed to be high, will education level be more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "Yes", "PROB": "0.0092"}
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'one-shot-IcL-CN':
+    """回答有关平均处理效应 (ATE) 的问题。计算平均处理效应需要比较两组结果：处理组和对照组。ATE 是这两组之间平均处理效应的差值。
+
+输入信息：设想一个只有以下条件，而没有其他因素或因果关系的假设世界：外貌水平对气压有直接影响。气压对教育水平有直接影响。
+在外貌水平为高的条件下, 教育水平为高的概率为0.3192。在外貌水平为低的条件下, 教育水平为高的概率为0.3100。
+指令：考虑外貌水平作用于教育水平的“平均干预效果”(average treatment effect, ATE)。
+问题：如果外貌水平被改变为高，那么教育水平更有可能为高吗？
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}： {"ANSWER":"是","PROB":"0.0092"}
+
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'three-shot-IcL':
+    """Answer questions about the Average Treatment Effect (ATE). Computing the Average Treatment Effect involves comparing the outcomes of two groups: the treated group and the control group. The ATE is the difference in average outcomes between these two groups.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: appearance has a direct effect on air pressure. Air pressure has a direct effect on education level.
+For those with appearance being high, the probability of education level being high is 0.3192. For those with appearance being low, the probability of education level being high is 0.3100.
+Instruction: Consider the average treatment effect (ATE) of appearance on education level.
+Question: If appearance is changed to be high, will education level be more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "Yes", "PROB": "0.0092"}
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Alor has a direct effect on geer. Tnkc has a direct effect on dzww. Dzww has a direct effect on geer.
+
+Instruction: Consider the average treatment effect (ATE) of dzww on tnkc.
+Question: If dzww is changed to be low, will tnkc be more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "No", "PROB": "0.0000"}
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: The amount of exercise a person does per week has a direct effect on the person's physical fitness level. The amount of exercise a person does per week has a direct effect on the person's risk of developing chronic diseases.
+For those with the amount of exercise a person does per week being little, the probability of the person's physical fitness level being excellent is 0.2598. For those with the amount of exercise a person does per week being a lot, the probability of the person's physical fitness level being excellent is 0.5314.
+Instruction: Consider the average treatment effect (ATE) of the amount of exercise a person does per week on the person's physical fitness level.
+Question: If the amount of exercise a person does per week is changed to be little, will the person's physical fitness level be more likely to be excellent?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "No", "PROB": "-0.2716"}
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'zero-shot-CoT':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s Let's think step by step.
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'zero-shot-CoT-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s请逐步思考。
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'manual-CoT':
+    """Here are three examples for math problems about average treatment effect(ATE) task with chain of thought.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Alor has a direct effect on geer. Tnkc has a direct effect on dzww. Dzww has a direct effect on geer.
+
+Instruction: Consider the average treatment effect (ATE) of dzww on tnkc.
+Question: If dzww is changed to be low, will tnkc be more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: With B represents tnkc and C represents dzww, we find there is no directed path from C to B. The answer is: {"ANSWER": "No", "PROB": "0.0000"}.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Tvkj has a direct effect on clwv. Clwv has a direct effect on bjtk. Bjtk has a direct effect on dmfl.
+For those with clwv being low, the probability of dmfl being high is 0.4780. For those with clwv being high, the probability of dmfl being high is 0.4949.
+Instruction: Consider the average treatment effect (ATE) of clwv on dmfl.
+Question: If clwv is changed to be low, will dmfl be more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: With B represents clwv and D represents dmfl, we find P(D=1|B=0)=0.4780; P(D=1|B=1)=0.4949; Considering there is a path B->C->D from B to D, and in this situation, empty set is a valid backdoor adjustment set, we calculate ATE=P(D=1|do(B=0))-P(D=1|do(B=1))=P(D=1|B=0)-P(D=1|B=1)=0.4780-0.4949=-0.0169<0. The answer is:  {"ANSWER": "No", "PROB": "-0.0169"}.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Zavj has a direct effect on nvcm. Nvcm has a direct effect on sxxy.
+For those with nvcm being high, the probability of sxxy being high is 0.8173. For those with nvcm being low, the probability of sxxy being high is 0.7873.
+Instruction: Consider the average treatment effect (ATE) of nvcm on sxxy.
+Question: If nvcm is changed to be high, will sxxy be more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: With B represents nvcm and C represents sxxy, we find P(C=1|B=1)=0.8173; P(C=1|B=0)=0.7873; Considering there is a path B->C from B to C, and in this situation empty set is a valid backdoor adjustment set, we calculate ATE=P(C=1|do(B=1))-P(C=1|do(B=0))=P(C=1|B=1)-P(C=1|B=0)=0.8173-0.7873=0.0300>0. The answer is: {"ANSWER": "Yes", "PROB": "0.0300"}.
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'manual-CoT-CN':
+    """如下为一个使用思维链进行推理的关于“平均干预效果”(average treatment effect, ATE)任务的数学问题：
+
+输入信息：设想一个只有以下条件，而没有其他因素或因果关系的假设世界：是否为考试而学习对考试成绩有直接影响。考试成绩对学生是否通过课程有直接影响。
+在考试成绩为高的条件下, 学生是否通过课程为不及格的概率为0.9874。在考试成绩为低的条件下, 学生是否通过课程为不及格的概率为0.7798。
+指令：考虑考试成绩作用于学生是否通过课程的“平均干预效果”(average treatment effect, ATE)。
+问题：如果考试成绩被改变为高，那么学生是否通过课程更有可能为不及格吗？
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：用B代表考试成绩, C代表学生是否通过课程，B到C有一条或多条有向路径(例如B->C)，所以节点B是节点C的原因。考虑到P(C=0|B=1)=0.9874，P(C=0|B=0)=0.7798，且在该问题中有一个合法的后门调整集合：空集，所以ATE=0.9874-0.7798=0.2076>0。因此答案为{"ANSWER":"是","PROB":"0.2076"}。
+
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'explicit-function':
+    """You are a helpful assistant for math probability.
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'explicit-function-CN':
+    """你是一个用于计算数学概率的得力助手。
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['given_info'],
+                                        item['Background']['data_info'],
+                                        item['Instruction'], item['Question'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/BAS-B_backadj.py
+++ b/opencompass/datasets/calm/data_processing/prompt/BAS-B_backadj.py
@ -0,0 +1,136 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic': """Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'basic-CN': """输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'adversarial-ignore': """Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'adversarial-ignore-CN': """输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'adversarial-doubt': """Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'adversarial-doubt-CN': """输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'zero-shot-IcL':
+    """Answer questions by considering what constitutes a valid adjustment set that can block all backdoor spurious correlations between two events.
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'zero-shot-IcL-CN': """通过考虑什么构成一个有效的调整集，以阻断两个事件之间所有后门伪相关，来回答问题。
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'one-shot-IcL':
+    """Answer questions by considering what constitutes a valid adjustment set that can block all backdoor spurious correlations between two events.
+Input Info: Method 1: We look at how husband correlates with alarm clock case by case according to wife. Method 2: We look directly at how husband correlates with alarm clock in general.
+Question: To understand how husband affects alarm clock, is it more correct to use the Method 1 than Method 2?
+Answer (Yes or No ?): no
+
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'one-shot-IcL-CN': """通过考虑什么构成一个有效的调整集，以阻断两个事件之间所有后门伪相关，来回答问题。
+输入信息：方法1：根据妻子的情况，我们逐个研究丈夫与闹钟之间的关联；方法2：我们直接研究一般情况下丈夫与闹钟之间的关联。
+问题：要了解丈夫如何影响闹钟，使用方法1比方法2更准确吗？
+答案（是或否？）：否
+
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'three-shot-IcL':
+    """Answer questions by considering what constitutes a valid adjustment set that can block all backdoor spurious correlations between two events.
+Input Info: Method 1: We look at how husband correlates with alarm clock case by case according to wife. Method 2: We look directly at how husband correlates with alarm clock in general.
+Question: To understand how husband affects alarm clock, is it more correct to use the Method 1 than Method 2?
+Answer (Yes or No ?): no
+
+Input Info: Method 1: We look directly at how husband correlates with alarm clock in general. Method 2: We look at this correlation case by case according to wife.
+Question: To understand how husband affects alarm clock, is it more correct to use the Method 1 than Method 2?
+Answer (Yes or No ?): yes
+
+Input Info: Method 1: We look directly at how the man in the room correlates with room in general. Method 2: We look at this correlation case by case according to the candle.
+Question: To understand how the man in the room affects room, is it more correct to use the Method 1 than Method 2?
+Answer (Yes or No ?): yes
+
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'three-shot-IcL-CN': """通过考虑什么构成一个有效的调整集，以阻断两个事件之间所有后门伪相关，来回答问题。
+输入信息：方法1：根据妻子的情况，我们逐个研究丈夫与闹钟之间的关联；方法2：我们直接研究一般情况下丈夫与闹钟之间的关联。
+问题：要了解丈夫如何影响闹钟，使用方法1比方法2更准确吗？
+答案（是或否？）：否
+
+输入信息：方法1：我们直接研究一般情况下丈夫与闹钟之间的关联。方法2：根据妻子的情况，我们逐个研究这种关联。
+问题：要了解丈夫如何影响闹钟，使用方法1比方法2更准确吗？
+答案（是或否？）：是
+
+输入信息：方法1: 我们直接研究一般情况下房间里的男人与房间之间的关联;方法2:根据蜡烛，我们逐个研究这种关联。
+问题：要了解房间里的男子如何影响房间，使用方法1比方法2更准确吗？
+答案（是或否？）：是
+
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'zero-shot-CoT': """Input Info: %s
+Question: %s Let's think step by step.
+Answer (Yes or No ?):""",
+    'zero-shot-CoT-CN': """输入信息：%s
+问题：%s请逐步思考。
+答案（是或否？）：""",
+    'manual-CoT':
+    """Here are three examples for problems about considering backdoor adjustment set with chain of thought.
+Input Info: Method 1: We look directly at how jyka correlates with lirg in general. Method 2: We look at this correlation case by case according to gyzp.
+Question: To understand how jyka affects lirg, is it more correct to use the Method 1 than Method 2?
+Answer (Yes or No ?): Since gyzp is a confounder, both affects jyka and lirg, looking directly at the relation between jyka and lirg like Method 1 is not correct. Therefore, the answer is No.
+
+Input Info: Method 1: We look directly at how encouragement level correlates with brown eyes in general. Method 2: We look at this correlation case by case according to studying habit.
+Question: To understand how encouragement level affects brown eyes, is it more correct to use the Method 1 than Method 2?
+Answer (Yes or No ?): Since studying habit is a result of encouragement level, there is no need to consider studying habit when studying the relation between encouragement level and brown eyes. Therefore, the answer is Yes.
+
+Input Info: Method 1: We look directly at how zuph correlates with glimx in general. Method 2: We look at this correlation case by case according to zory.
+Question: To understand how zuph affects glimx, is it more correct to use the Method 1 than Method 2?
+Answer (Yes or No ?): Since zory is a confounder, both affects zuph and glimx, looking at the correlation without considering zory is not correct. Therefore, the answer is No.
+
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'manual-CoT-CN': """如下为三个使用思维链进行推理的有关后门变量集合的问题：
+
+输入信息：方法1: 我们直接研究一般情况下房间里的男人与房间之间的关联;方法2:根据蜡烛，我们逐个研究这种关联。
+问题：要了解房间里的男子如何影响房间，使用方法1比方法2更准确吗？
+答案（是或否？）：因为房间里的男人和蜡烛对房间的影响是相互独立的，所以蜡烛不会影响房间里的男人和房间之间的关联。因此方法1更好。因此答案为“是”。
+
+输入信息：方法1：我们直接研究一般情况下jyka与lirg之间的关联。方法2：根据gyzp，我们逐个研究这种关联。
+问题：要了解gyzp如何影响lirg，使用方法1比方法2更准确吗？
+答案（是或否？）：因为gyzp作为混淆变量会同时影响jyka和lirg，使用方法1会导致对jyka和lirg之间的关联产生错误判断。因此答案为“否”。
+
+输入信息：方法1：我们直接研究一般情况下鼓励程度与考试成绩之间的关联。方法2：根据学习习惯，我们逐个研究这种关联。
+问题：要了解鼓励程度如何影响考试成绩，使用方法1比方法2更准确吗？
+答案（是或否？）：因为学习成绩是鼓励程度的结果，不会影响鼓励程度和考试成绩之间的关联。因此方法1更好。因此答案为“是”。
+
+输入信息：%s
+问题：%s
+答案（是或否？）""",
+    'explicit-function':
+    """You are a helpful assistant for backdoor adjustment set.
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'explicit-function-CN': """你是一个用于后门调节的得力助手。
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['given_info'], item['question'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/BAS-C_max-BAS.py
+++ b/opencompass/datasets/calm/data_processing/prompt/BAS-C_max-BAS.py
@ -0,0 +1,322 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which is the maximal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'basic-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最大变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：
+""",
+    'adversarial-ignore':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which is the maximal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'adversarial-ignore-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最大变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：
+""",
+    'adversarial-doubt':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which is the maximal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'adversarial-doubt-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最大变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：
+""",
+    'zero-shot-IcL':
+    """Objective:
+Given a causal graph with specified relationships between variables, your task is to identify the set of variables that satisfy the back-door criterion relative to an ordered pair of variables (X, Y) in that graph.
+Background Information:
+The back-door criterion is defined as follows:
+1. The variable set Z must not contain any descendants of X.
+2. The variable set Z must block every path from X to Y that has an arrow pointing to X.
+Input:
+- A textual description of the causal graph, detailing which variables cause which other variables.
+- A question asking for the maximal set of variables that satisfy the back-door criterion for a specified ordered pair of variables (X, Y).
+- Multiple choice options representing possible sets of variables that could satisfy the criterion.
+Output:
+- Answer to the question in the format "Option N", where N is 1, 2, or 3 based on the given options.
+
+You will be presented with a causal graph in the following form: %s.
+Question: Which is the maximal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'zero-shot-IcL-CN':
+    """给定一个具有指定变量间关系的因果图，你的任务是找出相对于该图中有序的一对变量（X，Y），满足后门标准的一组变量。
+背景信息：
+后门准则的定义如下：
+1. 变量集 Z 不能包含任何 X 的后代。
+2. 变量集 Z 必须阻止每一条从 X 到 Y 的路径，这条路径必须有一个箭头指向 X。
+输入
+- 因果图的文字描述，详细说明哪些变量会导致哪些其他变量。
+- 一个问题，要求找出满足指定有序变量对（X，Y）的后门标准的最大变量集。
+- 代表可能满足该标准的变量集的多选选项。
+输出：
+- 问题答案，格式为 "选项 N"，根据给定选项，N 为 一、二或 三。
+
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最大变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'one-shot-IcL':
+    """Objective:
+Given a causal graph with specified relationships between variables, your task is to identify the set of variables that satisfy the back-door criterion relative to an ordered pair of variables (X, Y) in that graph.
+Background Information:
+The back-door criterion is defined as follows:
+1. The variable set Z must not contain any descendants of X.
+2. The variable set Z must block every path from X to Y that has an arrow pointing to X.
+Input:
+- A textual description of the causal graph, detailing which variables cause which other variables.
+- A question asking for the maximal set of variables that satisfy the back-door criterion for a specified ordered pair of variables (X, Y).
+- Multiple choice options representing possible sets of variables that could satisfy the criterion.
+Output:
+- Answer to the question in the format "Option N", where N is 1, 2, or 3 based on the given options.
+
+Example:
+You will be presented with a causal graph in the following form: A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Which is the maximal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (A, E) in the above causal graph?
+Option 1: E, B
+Option 2: B, C
+Option 3: E, D
+Answer (Option 1 or Option 2 or Option 3 ?): Option 2
+
+New Input:
+You will be presented with a causal graph in the following form: %s.
+Question: Which is the maximal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'one-shot-IcL-CN':
+    """给定一个具有指定变量间关系的因果图，你的任务是找出相对于该图中有序的一对变量（X，Y），满足后门标准的一组变量。
+背景信息：
+后门准则的定义如下：
+1. 变量集 Z 不能包含任何 X 的后代。
+2. 变量集 Z 必须阻止每一条从 X 到 Y 的路径，这条路径必须有一个箭头指向 X。
+输入
+- 因果图的文字描述，详细说明哪些变量会导致哪些其他变量。
+- 一个问题，要求找出满足指定有序变量对（X，Y）的后门标准的最大变量集。
+- 代表可能满足该标准的变量集的多选选项。
+输出：
+- 问题答案，格式为 "选项 N"，根据给定选项，N 为 一、二或 三。
+
+例子：
+给定如下因果图：A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (A, E)，满足后门准则的最大变量集是哪个？
+选项一：E, B
+选项二：B, C
+选项三：E, D
+答案（选项一或选项二或选项三？）：选项二
+
+新输入：
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最小变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'three-shot-IcL':
+    """Objective:
+Given a causal graph with specified relationships between variables, your task is to identify the set of variables that satisfy the back-door criterion relative to an ordered pair of variables (X, Y) in that graph.
+Background Information:
+The back-door criterion is defined as follows:
+1. The variable set Z must not contain any descendants of X.
+2. The variable set Z must block every path from X to Y that has an arrow pointing to X.
+Input:
+- A textual description of the causal graph, detailing which variables cause which other variables.
+- A question asking for the maximal set of variables that satisfy the back-door criterion for a specified ordered pair of variables (X, Y).
+- Multiple choice options representing possible sets of variables that could satisfy the criterion.
+Output:
+- Answer to the question in the format "Option N", where N is 1, 2, or 3 based on the given options.
+
+Example:
+You will be presented with a causal graph in the following form: A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Which is the maximal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (A, E) in the above causal graph?
+Option 1: E, B
+Option 2: B, C
+Option 3: E, D
+Answer (Option 1 or Option 2 or Option 3 ?): Option 2
+
+You will be presented with a causal graph in the following form: A causes D, A causes C, B causes E, C causes D, and D causes E.
+Question: Which is the maximal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (B, E) in the above causal graph?
+Option 1: D, C
+Option 2: E, A
+Option 3: A, C
+Answer (Option 1 or Option 2 or Option 3 ?): Option 1
+
+You will be presented with a causal graph in the following form: A causes E, A causes D, B causes D, B causes E, C causes E, and D causes E.
+Question: Which is the maximal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (A, E) in the above causal graph?
+Option 1: B, D
+Option 2: A, C
+Option 3: B, C
+Answer (Option 1 or Option 2 or Option 3 ?): Option 3
+
+New Input:
+You will be presented with a causal graph in the following form: %s.
+Question: Which is the maximal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'three-shot-IcL-CN':
+    """给定一个具有指定变量间关系的因果图，你的任务是找出相对于该图中有序的一对变量（X，Y），满足后门标准的一组变量。
+背景信息：
+后门准则的定义如下：
+1. 变量集 Z 不能包含任何 X 的后代。
+2. 变量集 Z 必须阻止每一条从 X 到 Y 的路径，这条路径必须有一个箭头指向 X。
+输入
+- 因果图的文字描述，详细说明哪些变量会导致哪些其他变量。
+- 一个问题，要求找出满足指定有序变量对（X，Y）的后门标准的最大变量集。
+- 代表可能满足该标准的变量集的多选选项。
+输出：
+- 问题答案，格式为 "选项 N"，根据给定选项，N 为 一、二或 三。
+
+例子：
+给定如下因果图：A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (A, E)，满足后门准则的最大变量集是哪个？
+选项一：E, B
+选项二：B, C
+选项三：E, D
+答案（选项一或选项二或选项三？）：选项二
+
+
+给定如下因果图：A导致D, A导致C, B导致E, C导致D, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (B, E)，满足后门准则的最小变量集是哪个？
+选项一：D, C
+选项二：E, A
+选项三：A, C
+答案（选项一或选项二或选项三？）：选项一
+
+给定如下因果图：A导致E, A导致D, B导致D, B导致E, C导致E, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (A, E)，满足后门准则的最小变量集是哪个？
+选项一：B, D
+选项二：A, C
+选项三：B, C
+答案（选项一或选项二或选项三？）：选项三
+
+新输入：
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最小变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'zero-shot-CoT':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which is the maximal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph? Let's think step by step.
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'zero-shot-CoT-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最大变量集是哪个？请逐步思考。
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'manual-CoT':
+    """Here are three examples of finding the maximal backdoor adjustment set using chain of thought, and a question to answer. Note A is unobserved in the following questions.
+
+You will be presented with a causal graph in the following form: A causes B, A causes D, A causes E, B causes E, B causes D, B causes C, C causes D, C causes E, D causes F, and D causes E.
+Question: Which is the maximal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (D, F) in the above causal graph?
+Option 1: D, A
+Option 2: D, F
+Option 3: B, C
+Answer (Option 1 or Option 2 or Option 3 ?): Since E is a descendant of D and A is unobserved, and there is no path between D and F containing an arrow pointing into D, thus the maximal backdoor adjust set is {B, C}. Therefore, the answer is Option 3.
+
+You will be presented with a causal graph in the following form: A causes G, A causes C, B causes F, B causes D, B causes E, B causes G, C causes E, D causes F, D causes G, E causes G, and E causes F.
+Question: Which is the maximal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (A, E) in the above causal graph?
+Option 1: E, F
+Option 2: F, A
+Option 3: B, D
+Answer (Option 1 or Option 2 or Option 3 ?): Since C, G, E and F are all the descendants of A, and there is no path between A and E with an arrow pointing into A, thus the maximal backdoor adjustment set is {B, D}. Therefore, the answer is Option 3.
+
+You will be presented with a causal graph in the following form: A causes H, A causes C, A causes E, A causes G, A causes B, B causes D, C causes D, C causes E, D causes F, D causes G, and F causes H.
+Question: Which is the maximal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (B, D) in the above causal graph?
+Option 1: C, E
+Option 2: E, D
+Option 3: C, H
+Answer (Option 1 or Option 2 or Option 3 ?): Since F, G and H are all the descendants of B, and C and E block all the paths from B to D with an arrow pointing into B, thus the maximal backdoor adjustment set is {C,E}. Therefore, the answer is Option 1.
+
+You will be presented with a causal graph in the following form: %s.
+Question: Which is the maximal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'manual-CoT-CN':
+    """如下为两个使用思维链进行推理的判断最大后门变量集合的示例，和一个需要回答的问题。注意在以下的问题设定中，A为未观测到的变量。
+
+给定如下因果图：A导致B, A导致D, A导致E, B导致E, B导致D, B导致C, C导致D, C导致E, D导致F, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (D, F)，满足后门准则的最大变量集是哪个？
+选项一：D, A
+选项二：D, F
+选项三：B, C
+答案（选项一或选项二或选项三？）：由于E是D的后代，A为未观测到的变量，D和F之间没有箭头指向D的路径，所以最大后门调整集是 {B，C}。因此答案为选项三。
+
+给定如下因果图：A导致G, A导致C, B导致F, B导致D, B导致E, B导致G, C导致E, D导致F, D导致G, E导致G, 以及E导致F。
+问题：对于上述因果图中的有序变量对 (A, E)，满足后门准则的最大变量集是哪个？
+选项一：E, F
+选项二：F, A
+选项三：B, D
+答案（选项一或选项二或选项三？）：由于C、G、E和F都是A的后代，A和E之间没有箭头指向A的路径，所以最大后门调整集是 {B，D}。因此答案为选项三。
+
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最大变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'explicit-function':
+    """You are a helpful assistant for adjustment set analysis (back-door criterion).
+You will be presented with a causal graph in the following form: %s.
+Question: Which is the maximal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'explicit-function-CN':
+    """你是一个用于调整集分析(后门准则)的得力助手。
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最大变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：
+""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+    prompt = prompt_style_str + base % (item['edges'], item['treatment'],
+                                        item['outcome'], item['option1'],
+                                        item['option2'], item['option3'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/BAS-C_min-BAS.py
+++ b/opencompass/datasets/calm/data_processing/prompt/BAS-C_min-BAS.py
@ -0,0 +1,353 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'basic-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最小变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'adversarial-ignore':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'adversarial-ignore-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最小变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'adversarial-doubt':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'adversarial-doubt-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最小变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'zero-shot-IcL':
+    """Objective:
+Given a causal graph with specified relationships between variables, your task is to identify the set of variables that satisfy the back-door criterion relative to an ordered pair of variables (X, Y) in that graph.
+Background Information:
+The back-door criterion is defined as follows:
+1. The variable set Z must not contain any descendants of X.
+2. The variable set Z must block every path from X to Y that has an arrow pointing to X.
+Input:
+- A textual description of the causal graph, detailing which variables cause which other variables.
+- A question asking for the minimal set of variables that satisfy the back-door criterion for a specified ordered pair of variables (X, Y).
+- Multiple choice options representing possible sets of variables that could satisfy the criterion.
+Output:
+- Answer to the question in the format "Option N", where N is 1, 2, or 3 based on the given options.
+
+You will be presented with a causal graph in the following form: %s.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'zero-shot-IcL-CN':
+    """给定一个具有指定变量间关系的因果图，你的任务是找出相对于该图中有序的一对变量（X，Y），满足后门标准的一组变量。
+背景信息：
+后门准则的定义如下：
+1. 变量集 Z 不能包含任何 X 的后代。
+2. 变量集 Z 必须阻止每一条从 X 到 Y 的路径，这条路径必须有一个箭头指向 X。
+输入
+- 因果图的文字描述，详细说明哪些变量会导致哪些其他变量。
+- 一个问题，要求找出满足指定有序变量对（X，Y）的后门标准的最小变量集。
+- 代表可能满足该标准的变量集的多选选项。
+输出：
+- 问题答案，格式为 "选项 N"，根据给定选项，N 为 一、二或 三。
+
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最小变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'one-shot-IcL':
+    """Objective:
+Given a causal graph with specified relationships between variables, your task is to identify the set of variables that satisfy the back-door criterion relative to an ordered pair of variables (X, Y) in that graph.
+Background Information:
+The back-door criterion is defined as follows:
+1. The variable set Z must not contain any descendants of X.
+2. The variable set Z must block every path from X to Y that has an arrow pointing to X.
+Input:
+- A textual description of the causal graph, detailing which variables cause which other variables.
+- A question asking for the minimal set of variables that satisfy the back-door criterion for a specified ordered pair of variables (X, Y).
+- Multiple choice options representing possible sets of variables that could satisfy the criterion.
+Output:
+- Answer to the question in the format "Option N", where N is 1, 2, or 3 based on the given options.
+
+Example:
+You will be presented with a causal graph in the following form: A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (A, C) in the above causal graph?
+Option 1:
+Option 2: A
+Option 3: E
+Answer (Option 1 or Option 2 or Option 3 ?): Option 1
+
+New Input:
+You will be presented with a causal graph in the following form: %s.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'one-shot-IcL-CN':
+    """给定一个具有指定变量间关系的因果图，你的任务是找出相对于该图中有序的一对变量（X，Y），满足后门标准的一组变量。
+背景信息：
+后门准则的定义如下：
+1. 变量集 Z 不能包含任何 X 的后代。
+2. 变量集 Z 必须阻止每一条从 X 到 Y 的路径，这条路径必须有一个箭头指向 X。
+输入
+- 因果图的文字描述，详细说明哪些变量会导致哪些其他变量。
+- 一个问题，要求找出满足指定有序变量对（X，Y）的后门标准的最小变量集。
+- 代表可能满足该标准的变量集的多选选项。
+输出：
+- 问题答案，格式为 "选项 N"，根据给定选项，N 为 一、二或 三。
+
+例子：
+给定如下因果图：A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (A, C)，满足后门准则的最小变量集是哪个？
+选项一：
+选项二：A
+选项三：E
+答案（选项一或选项二或选项三？）：选项一
+
+新输入：
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最小变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'three-shot-IcL':
+    """Objective:
+Given a causal graph with specified relationships between variables, your task is to identify the set of variables that satisfy the back-door criterion relative to an ordered pair of variables (X, Y) in that graph.
+Background Information:
+The back-door criterion is defined as follows:
+1. The variable set Z must not contain any descendants of X.
+2. The variable set Z must block every path from X to Y that has an arrow pointing to X.
+Input:
+- A textual description of the causal graph, detailing which variables cause which other variables.
+- A question asking for the minimal set of variables that satisfy the back-door criterion for a specified ordered pair of variables (X, Y).
+- Multiple choice options representing possible sets of variables that could satisfy the criterion.
+Output:
+- Answer to the question in the format "Option N", where N is 1, 2, or 3 based on the given options.
+
+Example:
+You will be presented with a causal graph in the following form: A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (A, C) in the above causal graph?
+Option 1:
+Option 2: A
+Option 3: E
+Answer (Option 1 or Option 2 or Option 3 ?): Option 1
+
+You will be presented with a causal graph in the following form: A causes B, A causes C, B causes E, B causes C, C causes D, and C causes E.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (C, E) in the above causal graph?
+Option 1: E
+Option 2: D
+Option 3: B
+Answer (Option 1 or Option 2 or Option 3 ?): Option 3
+
+You will be presented with a causal graph in the following form: A causes C, A causes D, A causes E, B causes D, B causes E, C causes D, and D causes E.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (D, D) in the above causal graph?
+Option 1: B
+Option 2:
+Option 3: C
+Answer (Option 1 or Option 2 or Option 3 ?): Option 2
+
+New Input:
+You will be presented with a causal graph in the following form: %s.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'three-shot-IcL-CN':
+    """给定一个具有指定变量间关系的因果图，你的任务是找出相对于该图中有序的一对变量（X，Y），满足后门标准的一组变量。
+背景信息：
+后门准则的定义如下：
+1. 变量集 Z 不能包含任何 X 的后代。
+2. 变量集 Z 必须阻止每一条从 X 到 Y 的路径，这条路径必须有一个箭头指向 X。
+输入
+- 因果图的文字描述，详细说明哪些变量会导致哪些其他变量。
+- 一个问题，要求找出满足指定有序变量对（X，Y）的后门标准的最小变量集。
+- 代表可能满足该标准的变量集的多选选项。
+输出：
+- 问题答案，格式为 "选项 N"，根据给定选项，N 为 一、二或 三。
+
+例子：
+给定如下因果图：A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (A, C)，满足后门准则的最小变量集是哪个？
+选项一：
+选项二：A
+选项三：E
+答案（选项一或选项二或选项三？）：选项一
+
+给定如下因果图：A导致B, A导致C, B导致E, B导致C, C导致D, 以及C导致E。
+问题：对于上述因果图中的有序变量对 (C, E)，满足后门准则的最小变量集是哪个？
+选项一：E
+选项二：D
+选项三：B
+答案（选项一或选项二或选项三？）：选项三
+
+给定如下因果图：A导致C, A导致D, A导致E, B导致D, B导致E, C导致D, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (D, D)，满足后门准则的最小变量集是哪个？
+选项一：B
+选项二：
+选项三：C
+答案（选项一或选项二或选项三？）：选项二
+
+新输入：
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最小变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'zero-shot-CoT':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph? Let's think step by step.
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'zero-shot-CoT-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最小变量集是哪个？请逐步思考。
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'manual-CoT':
+    """Here are eight examples for problems about finding minimal backdoor adjustment set with chain of thought.
+
+You will be presented with a causal graph in the following form: A causes C, A causes D, B causes C, B causes D, and C causes E.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (B, E) in the above causal graph?
+Option 1: D
+Option 2:
+Option 3: C
+Answer (Option 1 or Option 2 or Option 3 ?): B does not have any ancestors. C and D are descendants of B, thus the minimal adjustment set is empty. The answer is Option 2.
+
+You will be presented with a causal graph in the following form: A causes F, A causes D, B causes D, B causes C, B causes E, B causes F, C causes D, C causes F, D causes E, and D causes F.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (C, D) in the above causal graph?
+Option 1: D
+Option 2: C
+Option 3: B
+Answer (Option 1 or Option 2 or Option 3 ?): B causes C and B causes D, so B blocks the path C<-B->D. C does not have any other ancestor, thus B blocks every path from C to D with an arrow pointing into C. The answer is Option 3.
+
+You will be presented with a causal graph in the following form: A causes D, A causes F, A causes E, B causes D, B causes F, B causes C, C causes D, C causes E, and E causes F.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (C, D) in the above causal graph?
+Option 1: B
+Option 2: C
+Option 3: E
+Answer (Option 1 or Option 2 or Option 3 ?): B causes C and B causes D, so B blocks the path C<-B->D. C does not have any other ancestor, thus B blocks every path from C to D with an arrow pointing into C. The answer is Option 1.
+
+You will be presented with a causal graph in the following form: A causes D, B causes C, B causes D, C causes E, and C causes D.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (C, D) in the above causal graph?
+Option 1: D
+Option 2: B
+Option 3: A
+Answer (Option 1 or Option 2 or Option 3 ?): B causes C and D, then B blocks path C<-B->D. C does not have any other ancestor, thus B blocks every path from C to D with an arrow pointing into C. The answer is Option 2.
+
+You will be presented with a causal graph in the following form: A causes G, A causes F, A causes C, B causes G, B causes E, B causes D, B causes F, C causes E, C causes G, D causes G, E causes G, and F causes G.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (D, G) in the above causal graph?
+Option 1: A
+Option 2: D
+Option 3: B
+Answer (Option 1 or Option 2 or Option 3 ?): B is the only parent of D, and B blocks all the paths from D to G with an arrow pointing into D, like D<-B->G and D<-B->E->G. The answer is Option 3.
+
+You will be presented with a causal graph in the following form: A causes B, A causes F, B causes D, B causes F, B causes E, C causes E, C causes F, D causes E, and E causes F.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (D, E) in the above causal graph?
+Option 1: B
+Option 2: E
+Option 3: C
+Answer (Option 1 or Option 2 or Option 3 ?): B is the only parent of D, and B blocks all the paths from D to E with an arrow pointing into D, like D<-B->E. The answer is Option 1.
+
+You will be presented with a causal graph in the following form: A causes D, A causes C, A causes E, A causes B, B causes C, B causes E, B causes D, C causes F, C causes D, and E causes F.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (C, F) in the above causal graph?
+Option 1: A
+Option 2: B
+Option 3: E
+Answer (Option 1 or Option 2 or Option 3 ?): E is not a descendant of C, and E blocks every path from C to F with an arrow pointing into C, like C<-B->E->F. The answer is Option 3.
+
+You will be presented with a causal graph in the following form: A causes B, A causes F, A causes C, B causes F, C causes E, C causes F, D causes F, D causes E, and E causes F.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (E, F) in the above causal graph?
+Option 1: A
+Option 2: D, C
+Option 3: F
+Answer (Option 1 or Option 2 or Option 3 ?): D and C are two parents of E. C and D block every path from E to F with an arrow pointing into E, and any smaller variable set fails to achieve it. The answer is Option 2.
+
+You will be presented with a causal graph in the following form: %s.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):
+""",
+    'manual-CoT-CN':
+    """如下为两个使用思维链进行推理的判断最小后门变量集合的示例，和一个需要回答的问题。
+
+给定如下因果图：A导致C, A导致D, B导致C, B导致D, 以及C导致E。
+问题：对于上述因果图中的有序变量对 (B, E)，满足后门准则的最小变量集是哪个？
+选项一：D
+选项二：
+选项三：C
+答案（选项一或选项二或选项三？）：由于D和C是B的后代，B和E之间没有箭头指向B的路径，所以最小后门集合为空集。因此答案选项二。
+
+给定如下因果图：A导致D, A导致B, A导致E, B导致C, B导致E, C导致D, C导致E, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (C, D)，满足后门准则的最小变量集是哪个？
+选项一：C
+选项二：B
+选项三：D
+答案（选项一或选项二或选项三？）：B阻断了C和D之间所有含有指向C的边的路径，例如C<-B<-A->D，所以最小后门集合是{B}。因此答案为选项二。
+
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最小变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'explicit-function':
+    """You are a helpful assistant for adjustment set analysis (back-door criterion).
+You will be presented with a causal graph in the following form: %s.
+Question: Which is the minimal set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'explicit-function-CN':
+    """你是一个用于调整集分析(后门准则)的得力助手。
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的最小变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+    prompt = prompt_style_str + base % (item['edges'], item['treatment'],
+                                        item['outcome'], item['option1'],
+                                        item['option2'], item['option3'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/BAS-C_mix-BAS.py
+++ b/opencompass/datasets/calm/data_processing/prompt/BAS-C_mix-BAS.py
@ -0,0 +1,357 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'basic-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：
+""",
+    'adversarial-ignore':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'adversarial-ignore-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：
+""",
+    'adversarial-doubt':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'adversarial-doubt-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：
+""",
+    'zero-shot-IcL':
+    """Objective:
+Given a causal graph with specified relationships between variables, your task is to identify the set of variables that satisfy the back-door criterion relative to an ordered pair of variables (X, Y) in that graph.
+Background Information:
+The back-door criterion is defined as follows:
+1. The variable set Z must not contain any descendants of X.
+2. The variable set Z must block every path from X to Y that has an arrow pointing to X.
+Input:
+- A textual description of the causal graph, detailing which variables cause which other variables.
+- A question asking for the set of variables that satisfy the back-door criterion for a specified ordered pair of variables (X, Y).
+- Multiple choice options representing possible sets of variables that could satisfy the criterion.
+Output:
+- Answer to the question in the format "Option N", where N is 1, 2, or 3 based on the given options.
+
+You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'zero-shot-IcL-CN':
+    """给定一个具有指定变量间关系的因果图，你的任务是找出相对于该图中有序的一对变量（X，Y），满足后门标准的一组变量。
+背景信息：
+后门准则的定义如下：
+1. 变量集 Z 不能包含任何 X 的后代。
+2. 变量集 Z 必须阻止每一条从 X 到 Y 的路径，这条路径必须有一个箭头指向 X。
+输入
+- 因果图的文字描述，详细说明哪些变量会导致哪些其他变量。
+- 一个问题，要求找出满足指定有序变量对（X，Y）的后门标准的变量集。
+- 代表可能满足该标准的变量集的多选选项。
+输出：
+- 问题答案，格式为 "选项 N"，根据给定选项，N 为 一、二或 三。
+
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'one-shot-IcL':
+    """Objective:
+Given a causal graph with specified relationships between variables, your task is to identify the set of variables that satisfy the back-door criterion relative to an ordered pair of variables (X, Y) in that graph.
+Background Information:
+The back-door criterion is defined as follows:
+1. The variable set Z must not contain any descendants of X.
+2. The variable set Z must block every path from X to Y that has an arrow pointing to X.
+Input:
+- A textual description of the causal graph, detailing which variables cause which other variables.
+- A question asking for the set of variables that satisfy the back-door criterion for a specified ordered pair of variables (X, Y).
+- Multiple choice options representing possible sets of variables that could satisfy the criterion.
+Output:
+- Answer to the question in the format "Option N", where N is 1, 2, or 3 based on the given options.
+
+Example:
+You will be presented with a causal graph in the following form: A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (B, B) in the above causal graph?
+Option 1:
+Option 2: C
+Option 3: E
+Answer (Option 1 or Option 2 or Option 3 ?): Option 1
+
+New Input:
+You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'one-shot-IcL-CN':
+    """给定一个具有指定变量间关系的因果图，你的任务是找出相对于该图中有序的一对变量（X，Y），满足后门标准的一组变量。
+背景信息：
+后门准则的定义如下：
+1. 变量集 Z 不能包含任何 X 的后代。
+2. 变量集 Z 必须阻止每一条从 X 到 Y 的路径，这条路径必须有一个箭头指向 X。
+输入
+- 因果图的文字描述，详细说明哪些变量会导致哪些其他变量。
+- 一个问题，要求找出满足指定有序变量对（X，Y）的后门标准的变量集。
+- 代表可能满足该标准的变量集的多选选项。
+输出：
+- 问题答案，格式为 "选项 N"，根据给定选项，N 为 一、二或 三。
+
+例子：
+给定如下因果图：A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (B, B)，满足后门准则的变量集是哪个？
+选项一：
+选项二：C
+选项三：E
+答案（选项一或选项二或选项三？）：选项一
+
+新输入：
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'three-shot-IcL':
+    """Objective:
+Given a causal graph with specified relationships between variables, your task is to identify the set of variables that satisfy the back-door criterion relative to an ordered pair of variables (X, Y) in that graph.
+Background Information:
+The back-door criterion is defined as follows:
+1. The variable set Z must not contain any descendants of X.
+2. The variable set Z must block every path from X to Y that has an arrow pointing to X.
+Input:
+- A textual description of the causal graph, detailing which variables cause which other variables.
+- A question asking for the set of variables that satisfy the back-door criterion for a specified ordered pair of variables (X, Y).
+- Multiple choice options representing possible sets of variables that could satisfy the criterion.
+Output:
+- Answer to the question in the format "Option N", where N is 1, 2, or 3 based on the given options.
+
+Example:
+You will be presented with a causal graph in the following form: A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (B, B) in the above causal graph?
+Option 1:
+Option 2: C
+Option 3: E
+Answer (Option 1 or Option 2 or Option 3 ?): Option 1
+
+You will be presented with a causal graph in the following form: A causes C, A causes D, A causes E, B causes D, B causes E, C causes D, and D causes E.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (A, D) in the above causal graph?
+Option 1: C
+Option 2: B
+Option 3: E
+Answer (Option 1 or Option 2 or Option 3 ?): Option 2
+
+You will be presented with a causal graph in the following form: A causes E, A causes D, B causes D, B causes E, C causes E, and D causes E.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (A, B) in the above causal graph?
+Option 1: A
+Option 2: D
+Option 3:
+Answer (Option 1 or Option 2 or Option 3 ?): Option 3
+
+New Input:
+You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'three-shot-IcL-CN':
+    """给定一个具有指定变量间关系的因果图，你的任务是找出相对于该图中有序的一对变量（X，Y），满足后门标准的一组变量。
+背景信息：
+后门准则的定义如下：
+1. 变量集 Z 不能包含任何 X 的后代。
+2. 变量集 Z 必须阻止每一条从 X 到 Y 的路径，这条路径必须有一个箭头指向 X。
+输入
+- 因果图的文字描述，详细说明哪些变量会导致哪些其他变量。
+- 一个问题，要求找出满足指定有序变量对（X，Y）的后门标准的变量集。
+- 代表可能满足该标准的变量集的多选选项。
+输出：
+- 问题答案，格式为 "选项 N"，根据给定选项，N 为 一、二或 三。
+
+例子：
+给定如下因果图：A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (B, B)，满足后门准则的变量集是哪个？
+选项一：
+选项二：C
+选项三：E
+答案（选项一或选项二或选项三？）：选项一
+
+给定如下因果图：A导致C, A导致D, A导致E, B导致D, B导致E, C导致D, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (A, D)，满足后门准则的变量集是哪个？
+选项一：C
+选项二：B
+选项三：E
+答案（选项一或选项二或选项三？）：选项二
+
+给定如下因果图：A导致E, A导致D, B导致D, B导致E, C导致E, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (A, B)，满足后门准则的变量集是哪个？
+选项一：A
+选项二：D
+选项三：
+答案（选项一或选项二或选项三？）：选项三
+
+新输入：
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'zero-shot-CoT':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph? Let's think step by step.
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'zero-shot-CoT-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的变量集是哪个？请逐步思考。
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'manual-CoT':
+    """Here are eight examples for problems about finding mix backdoor adjustment set with chain of thought. Note A is unobserved in the following questions.
+
+You will be presented with a causal graph in the following form: "A causes D, A causes B, C causes E, and D causes E.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (C, B) in the above causal graph?
+Option 1: B
+Option 2: D
+Option 3:
+Answer (Option 1 or Option 2 or Option 3 ?): Since there is no path between C and B with an arrow pointing into C, thus no variable satisfies the back-door criterion. The answer is Option 3.
+
+You will be presented with a causal graph in the following form: A causes B, A causes E, B causes C, B causes E, C causes E, and D causes E.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (A, E) in the above causal graph?
+Option 1: C
+Option 2: A
+Option 3: D
+Answer (Option 1 or Option 2 or Option 3 ?): Since B and C are both the descendants of A, and no path between A and E contains an arrow pointing into A, thus only D satisfies the back-door criterion. The answer is Option 3.
+
+You will be presented with a causal graph in the following form: A causes F, A causes D, A causes E, A causes B, B causes E, B causes F, C causes F, D causes E, and E causes F.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (A, F) in the above causal graph?
+Option 1: D
+Option 2: C
+Option 3: E
+Answer (Option 1 or Option 2 or Option 3 ?): Since B, D and E are all descendants of A, and no path between A and F contains an arrow pointing into A, thus only C satisfies the back-door criterion. The answer is Option 2.
+
+You will be presented with a causal graph in the following form: A causes C, B causes E, C causes D, and C causes E.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (C, A) in the above causal graph?
+Option 1:
+Option 2: C
+Option 3: B
+Answer (Option 1 or Option 2 or Option 3 ?): Since D and E are both the descendants of C, and no path between C and A contains an arrow pointing into C, thus no variable satisfies the back-door criterion. The answer is Option 1.
+
+You will be presented with a causal graph in the following form: A", "B", "C", "D", "E", "F"], "edges": "A causes E, A causes D, A causes C, B causes F, C causes D, C causes E, D causes F, D causes E, and E causes F.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (E, F) in the above causal graph?
+Option 1: A
+Option 2: B
+Option 3: D
+Answer (Option 1 or Option 2 or Option 3 ?): Since D is not a descendant of A, and D blocks every path between E and F containing an arrow pointing into E, D satisfies the back-door criterion. The answer is Option 3.
+
+You will be presented with a causal graph in the following form: A causes E, A causes B, A causes C, B causes F, C causes D, and C causes E.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (B, F) in the above causal graph?
+Option 1: E, D, C
+Option 2: C, E, B
+Option 3: F, D, C
+Answer (Option 1 or Option 2 or Option 3 ?): Since E, C and D are not descendants of B, and no path between B and F contains an arrow pointing into B, then E, C and D satisfy the back-door criterion. The answer is Option 1.
+
+You will be presented with a causal graph in the following form: A causes E, A causes B, A causes C, B causes D, B causes C, C causes D, and D causes E.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (C, D) in the above causal graph?
+Option 1: C
+Option 2: B
+Option 3: E
+Answer (Option 1 or Option 2 or Option 3 ?): Since E is a descendant of C, A is unobserved and B blocks every path between C and D containing an arrow pointing into C, B satisfies the back-door criterion. The answer is Option 2.
+
+You will be presented with a causal graph in the following form: A causes C, A causes B, A causes E, B causes D, C causes D, and D causes E.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (B, D) in the above causal graph?
+Option 1: D
+Option 2: C
+Option 3: B
+Answer (Option 1 or Option 2 or Option 3 ?): Since E is a descendant of B, A is unobserved and C blocks every path between B and D with an arrow pointing into B, C satisfies the back-door criterion. The answer is Option 2.
+
+You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'manual-CoT-CN':
+    """如下为两个使用思维链进行推理的判断后门变量集合的示例，和一个需要回答的问题。
+
+给定如下因果图：A导致D, A导致B, C导致E, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (C, B)，满足后门准则的变量集是哪个？
+选项一：B
+选项二：D
+选项三：
+答案（选项一或选项二或选项三？）：因为C和B之间没有含有指向C的边的路径，所以后门集合为空集。因此答案为选项三。
+
+给定如下因果图：A导致F, A导致D, A导致E, A导致B, B导致E, B导致F, C导致F, D导致E, 以及E导致F。
+问题：对于上述因果图中的有序变量对 (A, F)，满足后门准则的变量集是哪个？
+选项一：D
+选项二：C
+选项三：E
+答案（选项一或选项二或选项三？）：由于B, D和E都是A的后代，A和F之间没有箭头指向A的路径，因此C满足后门准则。答案为选项二。
+
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：
+""",
+    'explicit-function':
+    """You are a helpful assistant for adjustment set analysis (back-door criterion).
+You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the back-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'explicit-function-CN':
+    """你是一个用于调整集分析(后门准则)的得力助手。
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足后门准则的变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：
+""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+    prompt = prompt_style_str + base % (item['edges'], item['treatment'],
+                                        item['outcome'], item['option1'],
+                                        item['option2'], item['option3'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/CA-B_FA.py
+++ b/opencompass/datasets/calm/data_processing/prompt/CA-B_FA.py
@ -0,0 +1,172 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the ancestor node of %s?
+Answer (Yes or No ?):""",
+    'basic-CN':
+    """给定如下因果图：%s。
+问题：%s是%s的祖先节点吗？
+答案（是或否？）：""",
+    'adversarial-ignore':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the ancestor node of %s?
+Answer (Yes or No ?):""",
+    'adversarial-ignore-CN':
+    """给定如下因果图：%s。
+问题：%s是%s的祖先节点吗？
+答案（是或否？）：""",
+    'adversarial-doubt':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the ancestor node of %s?
+Answer (Yes or No ?):""",
+    'adversarial-doubt-CN':
+    """给定如下因果图：%s。
+问题：%s是%s的祖先节点吗？
+答案（是或否？）：""",
+    'zero-shot-IcL':
+    """Determine whether or not a variable can serve as the ancestor of another variable in a given causal graph.
+You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the ancestor node of %s?
+Answer (Yes or No ?):""",
+    'zero-shot-IcL-CN':
+    """确定在给定的因果图中，一个变量是否可以作为另一个变量的祖先。
+给定如下因果图：%s。
+问题：%s是%s的祖先节点吗？
+答案（是或否？）：""",
+    'one-shot-IcL':
+    """Determine whether or not a variable can serve as the ancestor of another variable in a given causal graph.
+You will be presented with a causal graph in the following form: A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Does A serve as the ancestor node of E?
+Answer (Yes or No ?): Yes
+
+You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the ancestor node of %s?
+Answer (Yes or No ?):""",
+    'one-shot-IcL-CN':
+    """确定在给定的因果图中，一个变量是否可以作为另一个变量的祖先。
+给定如下因果图：A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+问题：A是E的祖先节点吗？
+答案（是或否？）：是
+
+给定如下因果图：%s。
+问题：%s是%s的祖先节点吗？
+答案（是或否？）：""",
+    'three-shot-IcL':
+    """Determine whether or not a variable can serve as the ancestor of another variable in a given causal graph.
+You will be presented with a causal graph in the following form: A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Does A serve as the ancestor node of E?
+Answer (Yes or No ?): Yes
+
+You will be presented with a causal graph in the following form: A causes D, A causes B, C causes E, and D causes E.
+Question: Does B serve as the ancestor node of E?
+Answer (Yes or No ?): No
+
+You will be presented with a causal graph in the following form: A causes B, A causes D, A causes E, B causes C, B causes E, C causes D, and D causes E.
+Question: Does E serve as the ancestor node of E?
+Answer (Yes or No ?): No
+
+You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the ancestor node of %s?
+Answer (Yes or No ?):""",
+    'three-shot-IcL-CN':
+    """确定在给定的因果图中，一个变量是否可以作为另一个变量的祖先。
+给定如下因果图：A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+问题：A是E的祖先节点吗？
+答案（是或否？）：是
+
+给定如下因果图：A导致D, A导致B, C导致E, 以及D导致E。
+问题：B是E的祖先节点吗？
+答案（是或否？）：否
+
+给定如下因果图：A导致B, A导致D, A导致E, B导致C, B导致E, C导致D, 以及D导致E。
+问题：E是E的祖先节点吗？
+答案（是或否？）：否
+
+给定如下因果图：%s。
+问题：%s是%s的祖先节点吗？
+答案（是或否？）：""",
+    'zero-shot-CoT':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the ancestor node of %s? Let's think step by step.
+Answer (Yes or No ?):""",
+    'zero-shot-CoT-CN':
+    """给定如下因果图：%s。
+问题：%s是%s的祖先节点吗？请逐步思考。
+答案（是或否？）：""",
+    'manual-CoT':
+    """Here are eight examples for symbol causal attribution task of ancestors with chain of thought.
+
+You will be presented with a causal graph in the following form: A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Does A serve as the ancestor node of E?
+Answer (Yes or No ?): A causes E, so A is a direct ancestor of E. Thus the answer is Yes.
+
+You will be presented with a causal graph in the following form: A causes D, A causes C, B causes D, B causes E, and C causes D.
+Question: Does C serve as the ancestor node of E?
+Answer (Yes or No ?): B is the only node causes E, and no node causes B. So E only has one ancestor B. Thus the answer is No.
+
+You will be presented with a causal graph in the following form: A causes E, A causes B, A causes C, B causes D, B causes C, C causes D, and D causes E.
+Question: Does B serve as the ancestor node of E?
+Answer (Yes or No ?):Yes. B causes D and D causes E, so B is an ancestor of E. Thus the answer is Yes.
+
+You will be presented with a causal graph in the following form: A causes E, A causes C, A causes B, B causes C, B causes E, and C causes D.
+Question: Does E serve as the ancestor node of E?
+Answer (Yes or No ?): E does not cause any node, so E cannot be the ancestor of itself. Thus the answer is No.
+
+You will be presented with a causal graph in the following form: A causes B, A causes F, A causes E, A causes D, B causes C, B causes F, B causes D, B causes G, B causes H, C causes F, C causes H, C causes E, D causes H, E causes H, and F causes G.
+Question: Does D serve as the ancestor node of H?
+Answer (Yes or No ?): D causes H so D is the ancestor node of H. Thus the answer is Yes.
+
+You will be presented with a causal graph in the following form: A causes G, B causes C, B causes H, B causes J, B causes D, C causes D, C causes E, C causes J, C causes G, D causes G, D causes I, E causes I, E causes F, E causes J, F causes H, F causes I, F causes J, F causes G, G causes I, and H causes I.
+Question: Does A serve as the ancestor node of J?
+Answer (Yes or No ?): A only causes G, G only causes I, and I causes none. So A cannot be the ancestor of J. Thus the answer is No.
+
+You will be presented with a causal graph in the following form: A causes H, A causes G, A causes D, A causes F, B causes D, B causes H, B causes J, C causes K, C causes G, C causes E, C causes I, C causes H, C causes F, D causes J, D causes E, D causes G, D causes I, E causes J, E causes F, F causes I, F causes G, F causes J, G causes J, G causes I, G causes H, H causes I, H causes K, and J causes K.
+Question: Does D serve as the ancestor node of K?
+Answer (Yes or No ?): D causes J and J causes K. So D is an ancestor node of K. Thus the answer is Yes.
+
+You will be presented with a causal graph in the following form: A causes I, A causes B, A causes F, A causes K, A causes G, B causes C, C causes G, D causes E, D causes H, D causes G, E causes J, E causes F, F causes I, F causes K, and H causes I.
+Question: Does G serve as the ancestor node of K?
+Answer (Yes or No ?): G causes none, so G cannot be the ancestor of K. Thus the answer is No.
+
+You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the ancestor node of %s?
+Answer (Yes or No ?):""",
+    'manual-CoT-CN':
+    """如下为三个使用思维链进行推理的因果归因判断祖先节点的示例，和一个需要回答的问题。
+
+给定如下因果图：A导致E, A导致C, A导致D, B导致C, B导致D, C导致D, 以及D导致E。
+问题：C是E的祖先节点吗？
+答案（是或否？）：C导致D，D导致E，所以C是E的祖先节点。因此答案为“是”。
+
+给定如下因果图：A导致E, A导致B, B导致E, B导致C, C导致F, C导致E, 以及D导致F。
+问题：E是F的祖先节点吗？
+答案（是或否？）：E没有导致任何节点，E不是任何节点的祖先节点。因此答案为“否”。
+
+给定如下因果图：A导致C, A导致D, A导致F, B导致D, B导致E, C导致D, D导致E, 以及D导致F。
+问题：A是F的祖先节点吗？
+答案（是或否？）：A导致C，C导致D，D导致F，所以A是F的祖先节点。因此答案为“是”。
+
+给定如下因果图：%s。
+问题：%s是%s的祖先节点吗？
+答案（是或否？）：
+""",
+    'explicit-function':
+    """You are a helpful assistant for causal attribution (ancestor node).
+You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the ancestor node of %s?
+Answer (Yes or No ?):""",
+    'explicit-function-CN':
+    """你是一个用于因果归因（祖先节点）的得力助手。
+给定如下因果图：%s。
+问题：%s是%s的祖先节点吗？
+答案（是或否？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (
+        item['edges'], item['sampled_ancestor'], item['attribution'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/CA-B_FP.py
+++ b/opencompass/datasets/calm/data_processing/prompt/CA-B_FP.py
@ -0,0 +1,172 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the parent node of %s?
+Answer (Yes or No ?):""",
+    'basic-CN':
+    """给定如下因果图：%s。
+问题：%s是%s的父节点吗？
+答案（是或否？）：""",
+    'adversarial-ignore':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the parent node of %s?
+Answer (Yes or No ?):""",
+    'adversarial-ignore-CN':
+    """给定如下因果图：%s。
+问题：%s是%s的父节点吗？
+答案（是或否？）：""",
+    'adversarial-doubt':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the parent node of %s?
+Answer (Yes or No ?):""",
+    'adversarial-doubt-CN':
+    """给定如下因果图：%s。
+问题：%s是%s的父节点吗？
+答案（是或否？）：""",
+    'zero-shot-IcL':
+    """Determine whether or not a variable can serve as the parent of another variable in a given causal graph.
+You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the parent node of %s?
+Answer (Yes or No ?):""",
+    'zero-shot-IcL-CN':
+    """确定在给定的因果图中，一个变量是否可以作为另一个变量的父变量。
+给定如下因果图：%s。
+问题：%s是%s的父节点吗？
+答案（是或否？）：""",
+    'one-shot-IcL':
+    """Determine whether or not a variable can serve as the parent of another variable in a given causal graph.
+You will be presented with a causal graph in the following form: A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Does E serve as the parent node of E?
+Answer (Yes or No ?): No
+
+You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the parent node of %s?
+Answer (Yes or No ?):""",
+    'one-shot-IcL-CN':
+    """确定在给定的因果图中，一个变量是否可以作为另一个变量的父变量。
+给定如下因果图：A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+问题：E是E的父节点吗？
+答案（是或否？）：否
+
+给定如下因果图：%s。
+问题：%s是%s的父节点吗？
+答案（是或否？）：""",
+    'three-shot-IcL':
+    """Determine whether or not a variable can serve as the parent of another variable in a given causal graph.
+You will be presented with a causal graph in the following form: A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Does E serve as the parent node of E?
+Answer (Yes or No ?): No
+
+You will be presented with a causal graph in the following form: A causes C, A causes D, A causes E, B causes D, B causes E, C causes D, and D causes E.
+Question: Does B serve as the parent node of E?
+Answer (Yes or No ?): Yes
+
+You will be presented with a causal graph in the following form: A causes B, A causes D, A causes E, B causes C, B causes E, C causes D, and D causes E.
+Question: Does E serve as the parent node of E?
+Answer (Yes or No ?): No
+
+You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the parent node of %s?
+Answer (Yes or No ?):""",
+    'three-shot-IcL-CN':
+    """确定在给定的因果图中，一个变量是否可以作为另一个变量的父变量。
+给定如下因果图：A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+问题：E是E的父节点吗？
+答案（是或否？）：否
+
+给定如下因果图：A导致C, A导致D, A导致E, B导致D, B导致E, C导致D, 以及D导致E。
+问题：B是E的父节点吗？
+答案（是或否？）：是
+
+给定如下因果图：A导致B, A导致D, A导致E, B导致C, B导致E, C导致D, 以及D导致E。
+问题：E是E的父节点吗？
+答案（是或否？）：否
+
+给定如下因果图：%s。
+问题：%s是%s的父节点吗？
+答案（是或否？）：""",
+    'zero-shot-CoT':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the parent node of %s? Let's think step by step.
+Answer (Yes or No ?):""",
+    'zero-shot-CoT-CN':
+    """给定如下因果图：%s。
+问题：%s是%s的父节点吗？请逐步思考。
+答案（是或否？）：""",
+    'manual-CoT':
+    """Here are eight examples for symbol causal attribution task of parents with chain of thought.
+
+You will be presented with a causal graph in the following form: A causes D, A causes B, C causes E, and D causes E.
+Question: Does D serve as the parent node of E?
+Answer (Yes or No ?): D causes E, so D is the parent node of E. Thus the answer is Yes.
+
+You will be presented with a causal graph in the following form: A causes B, B causes C, B causes D, and D causes E.
+Question: Does E serve as the parent node of E?
+Answer (Yes or No ?): E does not cause itself so E is not a parent node of E. Thus the answer is No.
+
+You will be presented with a causal graph in the following form: A causes F, A causes B, B causes E, C causes F, C causes D, C causes E, and D causes E.
+Question: Does C serve as the parent node of F?
+Answer (Yes or No ?): C causes F, so C is the parent node of F. Thus the answer is Yes.
+
+You will be presented with a causal graph in the following form: A causes E, A causes D, A causes F, B causes E, B causes C, C causes E, C causes D, C causes F, C causes G, D causes E, D causes F, E causes F, and F causes G.
+Question: Does A serve as the parent node of G?
+Answer (Yes or No ?): A does not cause G, so A is not a parent node of G. Thus the answer is No.
+
+You will be presented with a causal graph in the following form: A causes H, A causes B, B causes H, B causes G, B causes D, B causes E, B causes I, B causes F, C causes F, C causes G, C causes E, D causes G, D causes I, D causes E, D causes F, F causes I, G causes H, G causes I, and H causes I.
+Question: Does G serve as the parent node of I?
+Answer (Yes or No ?): G causes I, so G is a parent node of I. Thus the answer is Yes.
+
+You will be presented with a causal graph in the following form: A causes E, A causes J, A causes B, A causes G, B causes G, B causes F, B causes H, B causes D, C causes D, C causes G, C causes H, D causes F, D causes H, E causes F, E causes G, E causes K, E causes I, F causes K, F causes G, G causes J, G causes K, H causes K, and J causes K.
+Question: Does D serve as the parent node of K?
+Answer (Yes or No ?): D does not cause K, so D is not a parent node of K. Thus the answer is No.
+
+You will be presented with a causal graph in the following form: A causes M, A causes C, A causes B, A causes H, A causes N, A causes E, A causes J, A causes F, B causes J, B causes C, B causes H, B causes N, B causes O, C causes H, C causes J, C causes I, C causes F, C causes L, D causes O, D causes Q, D causes I, D causes F, D causes G, D causes K, E causes Q, E causes J, E causes N, E causes G, F causes M, F causes K, F causes L, F causes I, G causes L, G causes M, G causes K, H causes Q, H causes M, H causes O, H causes K, I causes Q, I causes K, K causes P, K causes Q, L causes O, L causes P, M causes Q, N causes P, N causes O, and P causes Q.
+Question: Does E serve as the parent node of Q?
+Answer (Yes or No ?): E causes Q, so E is a parent node of Q. Thus the answer is Yes.
+
+You will be presented with a causal graph in the following form: A causes G, A causes H, A causes C, A causes M, A causes K, A causes J, B causes I, B causes G, B causes E, B causes L, B causes H, B causes D, B causes J, C causes E, C causes K, C causes G, C causes L, C causes N, D causes N, D causes M, D causes G, D causes J, D causes K, D causes E, D causes I, E causes I, E causes F, E causes H, E causes N, E causes G, E causes J, F causes J, F causes I, F causes G, F causes N, F causes L, F causes K, G causes L, G causes J, G causes H, H causes M, H causes N, H causes L, I causes M, I causes K, I causes L, J causes N, J causes M, K causes M, K causes L, and M causes N.
+Question: Does K serve as the parent node of N?
+Answer (Yes or No ?): K does not cause N, so K cannot be a parent node of N. Thus the answer is No.
+
+You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the parent node of %s?
+Answer (Yes or No ?):""",
+    'manual-CoT-CN':
+    """如下为三个使用思维链进行推理的因果归因判断父节点的示例，和一个需要回答的问题。
+
+给定如下因果图：A导致D, A导致C, A导致B, B导致E, B导致D, 以及C导致D。
+问题：D是E的父节点吗？
+答案（是或否？）：D没有导致E，D不是E的父节点。因此答案为“否”。
+
+给定如下因果图：A导致B, A导致E, A导致C, B导致E, B导致C, C导致D, 以及C导致E。
+问题：A是E的父节点吗？
+答案（是或否？）：A导致E，A是E的父节点。因此答案为“是”。
+
+给定如下因果图：A导致E, A导致B, B导致E, B导致C, C导致F, C导致E, 以及D导致F。
+问题：A是F的父节点吗？
+答案（是或否？）：A没有导致F，A不是F的父节点。因此答案为“否”。
+
+给定如下因果图：%s。
+问题：%s是%s的父节点吗？
+答案（是或否？）：
+""",
+    'explicit-function':
+    """You are a helpful assistant for causal attribution (parent node).
+You will be presented with a causal graph in the following form: %s.
+Question: Does %s serve as the parent node of %s?
+Answer (Yes or No ?):""",
+    'explicit-function-CN':
+    """你是一个用于因果归因（父节点）的得力助手。
+给定如下因果图：%s。
+问题：%s是%s的父节点吗？
+答案（是或否？）""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['edges'], item['sampled_parent'],
+                                        item['attribution'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/CB-B_collider-bias.py
+++ b/opencompass/datasets/calm/data_processing/prompt/CB-B_collider-bias.py
@ -0,0 +1,154 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic': """Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'basic-CN': """输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'adversarial-ignore': """Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'adversarial-ignore-CN': """输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'adversarial-doubt': """Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'adversarial-doubt-CN': """输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'zero-shot-IcL': """Answer questions about collider bias.
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'zero-shot-IcL-CN': """请回答有关碰撞偏见的问题。
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'one-shot-IcL': """Answer questions about collider bias.
+Input Info: For people who are famous, the correlation between attractive appearance and talent is -0.08.
+Question: If we look at people who are famous, does it mean that attractive appearance does not affect talent?
+Answer (Yes or No ?):Yes.
+
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'one-shot-IcL-CN': """请回答有关碰撞偏见的问题。
+输入信息：对于那些出名的人来说，有吸引力的外表和才华之间的相关系数为-0.08。
+问题：如果我们观察那些出名的人，这是否意味着有吸引力的外表不会影响才华?
+答案（是或否？）：是
+
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'three-shot-IcL': """Answer questions about collider bias.
+Input Info: For people who are famous, the correlation between attractive appearance and talent is -0.08.
+Question: If we look at people who are famous, does it mean that attractive appearance does not affect talent?
+Answer (Yes or No ?):Yes.
+
+Input Info: For people who are famous, the correlation between attractive appearance and talent is -0.16.
+Question: If we look at people who are famous, does it mean that attractive appearance does not affect talent?
+Answer (Yes or No ?): yes
+
+Input Info: For people who are famous, the correlation between attractive appearance and talent is -0.23.
+Question: If we look at people who are famous, does it mean that attractive appearance affects talent?
+Answer (Yes or No ?): no
+
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'three-shot-IcL-CN': """请回答有关碰撞偏见的问题。
+输入信息：对于那些出名的人来说，有吸引力的外表和才华之间的相关系数为-0.08。
+问题：如果我们观察那些出名的人，这是否意味着有吸引力的外表不会影响才华?
+答案（是或否？）：是
+
+输入信息：对于那些出名的人来说，有吸引力的外表和才华之间的相关系数为-0.16。
+问题：如果我们观察那些出名的人，这是否意味着有吸引力的外表不会影响才华?
+答案（是或否？）：是
+
+输入信息：对于那些出名的人来说，有吸引力的外表和才华之间的相关系数为-0.23。
+问题：如果我们观察那些出名的人，这是否意味着有吸引力的外表会影响才华？
+答案（是或否？）：否
+
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'zero-shot-CoT': """Input Info: %s
+Question: %s Let's think step by step.
+Answer (Yes or No ?):""",
+    'zero-shot-CoT-CN': """输入信息：%s
+问题：%s 请逐步思考。
+答案（是或否？）：""",
+    'manual-CoT':
+    """Here are eight examples of problems with collider bias answered with chain of thought.
+
+Input Info: For people who are famous, the correlation between attractive appearance and talent is -0.08.
+Question: If we look at people who are famous, does it mean that attractive appearance does not affect talent?
+Answer (Yes or No ?): Both attractive appearance and talent have direct effects on fame. This collision creates a spurious association between attractive appearance and talent when analysis is limited to famous people. Therefore, the answer is Yes.
+
+Input Info: For hospitalized individuals, the correlation between respiratory issues and broken bones is -0.25.
+Question: If we look at hospitalized individuals, does it mean that respiratory issues affects broken bones?
+Answer (Yes or No ?): Both respiratory issues and broken bones affect hospitalization status. This collision creates a spurious association between respiratory issues and broken bones when analysis is limited to hospitalized individuals. Therefore, the answer is No.
+
+Input Info: For students accepted to elite institutions, the correlation between listening to jazz and being hard-working is -0.06.
+Question: If we look at students accepted to elite institutions, does it mean that listening to jazz does not affect being hard-working?
+Answer (Yes or No ?): Both listening to jazz and effort affect elite institution admission status. This collision creates a spurious association between listening to jazz and hard-working when analysis is limited to students accepted to elite institutions. Therefore, the answer is Yes.
+
+Input Info: For those who are yupt, the correlation between jyka and kwox is 0.02.
+Question: If we look at those who are yupt, does it mean that jyka does not affect kwox?
+Answer (Yes or No ?): Both jyka and kwox affect yupt. This collision creates a spurious association between jyka and kwox when analysis is limited to those who are yupt. Therefore, the answer is Yes.
+
+Input Info: For those who are zupj, the correlation between yupt and muvq is -0.15.
+Question: If we look at those who are zupj, does it mean that yupt affects muvq?
+Answer (Yes or No ?): Both yupt and muvq affect zupj. This collision creates a spurious association between yupt and muvq when analysis is limited to those who are zupj. Therefore, the answer is No.
+
+Input Info: For those who are swoq, the correlation between kwox and kwoz is -0.25.
+Question: If we look at those who are swoq, does it mean that kwox affects kwoz?
+Answer (Yes or No ?): Both kwox and kwoz affect swoq. This collision creates a spurious association between kwox and kwoz when analysis is limited to those who are swoq. Therefore, the answer is No.
+
+Input Info: For those who are wibl, the correlation between zuph and uvzi is -0.01.
+Question: If we look at those who are wibl, does it mean that zuph affects uvzi?
+Answer (Yes or No ?): Both zuph and uvzi affect wibl. This collision creates a spurious association between zuph and uvzi when analysis is limited to those who are wibl. Therefore, the answer is No.
+
+Input Info: For those who are jyka, the correlation between zuph and glimx is -0.04.
+Question: If we look at those who are jyka, does it mean that zuph does not affect glimx?
+Answer (Yes or No ?): Both zuph and glimx affect jyka. This collision creates a spurious association between zuph and glimx when analysis is limited to those who are jyka. Therefore, the answer is Yes.
+
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'manual-CoT-CN': """如下为三个使用思维链进行推理的对撞偏差问题：
+
+输入信息：对于那些出名的人来说，有吸引力的外表和才华之间的相关系数为-0.08。
+问题：如果我们观察那些出名的人，这是否意味着有吸引力的外表不会影响才华?
+答案（是或否？）：有吸引力的外表和才华都会影响名气。如果只分析出名的人，这些影响可能会造成有吸引力的外表和才华之间的虚假关系。因此答案为“是”。
+
+输入信息：对于住院患者，呼吸问题与骨折之间的相关系数为-0.25。
+问题：如果我们观察住院患者，这是否意味着呼吸问题会影响骨折？
+答案（是或否？）：呼吸问题和骨折都会导致患者住院。如果只分析住院患者，这些影响可能会造成呼吸问题和骨折之间的虚假关系。因此答案为“否”。
+
+输入信息：对于那些swoq的人来说，kwox和kwoz之间的相关系数为-0.25。
+问题：如果我们观察那些swoq的人，这是否意味着kwox会影响kwoz？
+答案（是或否？）：kwox和kwoz都会对swoq产生直接影响。如果只分析那些swoq的人，这些影响可能会造成kwox和kwoz之间的虚假关系。因此答案为“否”。
+
+输入信息：%s
+问题：%s
+答案（是或否？）""",
+    'explicit-function':
+    """You are a helpful assistant for collider bias analysis.
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'explicit-function-CN': """你是一个用于分析汇聚偏差的得力助手。
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['given_info'], item['question'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/CDE.py
+++ b/opencompass/datasets/calm/data_processing/prompt/CDE.py
@ -0,0 +1,182 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'basic-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'adversarial-ignore':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'adversarial-ignore-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'adversarial-doubt':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'adversarial-doubt-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'zero-shot-IcL':
+    """Answer questions about the Controlled Direct Effect (CDE). Computing the Controlled Direct Effect involves comparing the outcomes of individuals under two scenarios: receiving the treatment and not receiving the treatment, while holding a third variable (the mediator) constant.
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'zero-shot-IcL-CN':
+    """回答有关受控直接效应（CDE）的问题。计算受控直接效应包括比较两种情况下的个人结果：接受治疗和不接受治疗，同时保持第三个变量（中介变量）不变。
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'one-shot-IcL':
+    """Answer questions about the Controlled Direct Effect (CDE). Computing the Controlled Direct Effect involves comparing the outcomes of individuals under two scenarios: receiving the treatment and not receiving the treatment, while holding a third variable (the mediator) constant.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Weather conditions has a direct effect on amount of rainfall. Weather conditions has a direct effect on crop yield. Amount of rainfall has a direct effect on crop yield.
+For those with weather conditions being good and amount of rainfall being small, the probability of crop yield being high is 0.3510. For those with weather conditions being bad and amount of rainfall being small, the probability of crop yield being high is 0.1420.
+Instruction: Consider the controlled direct effect (CDE) of weather conditions on crop yield.
+Question: Conditioned on amount of rainfall being small, if the weather conditions had been good, would the crop yield have been more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "Yes", "PROB": "0.2090"}
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'one-shot-IcL-CN':
+    """回答有关受控直接效应（CDE）的问题。计算受控直接效应包括比较两种情况下的个人结果：接受治疗和不接受治疗，同时保持第三个变量（中介变量）不变。
+
+输入信息：设想一个只有以下条件，而没有其他因素或因果关系的假设世界：天气状况对降雨量有直接影响。天气状况对农作物产量有直接影响。降雨量对农作物产量有直接影响。
+在天气状况为好且降雨量为小的条件下, 农作物产量为高的概率为0.3510。在天气状况为不好且降雨量为小的条件下, 农作物产量为高的概率为0.1420。
+指令：考虑天气状况作用于农作物产量的“受控直接效果”(controlled direct effect, CDE)。
+问题：在降雨量为小的条件下，假如天气状况为好，那么农作物产量更有可能为高吗？
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}： {"ANSWER":"是","PROB":"0.2090"}
+
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'three-shot-IcL':
+    """Answer questions about the Controlled Direct Effect (CDE). Computing the Controlled Direct Effect involves comparing the outcomes of individuals under two scenarios: receiving the treatment and not receiving the treatment, while holding a third variable (the mediator) constant.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Weather conditions has a direct effect on amount of rainfall. Weather conditions has a direct effect on crop yield. Amount of rainfall has a direct effect on crop yield.
+For those with weather conditions being good and amount of rainfall being small, the probability of crop yield being high is 0.3510. For those with weather conditions being bad and amount of rainfall being small, the probability of crop yield being high is 0.1420.
+Instruction: Consider the controlled direct effect (CDE) of weather conditions on crop yield.
+Question: Conditioned on amount of rainfall being small, if the weather conditions had been good, would the crop yield have been more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "Yes", "PROB": "0.2090"}
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Nlta has a direct effect on vhuj. Nlta has a direct effect on huit. Vhuj has a direct effect on xyrs. Vhuj has a direct effect on nxur. Xyrs has a direct effect on huit. Xyrs has a direct effect on nxur. Huit has a direct effect on nxur.
+
+Instruction: Consider the controlled direct effect (CDE) of nlta on nxur.
+Question: Conditioned on xyrs being low, vhuj being low and huit being low, if the nlta had been low, would the nxur have been more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "No", "PROB": "0.0000"}
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Work-life balance has a direct effect on amount of exercise. Work-life balance has a direct effect on appearance. Amount of exercise has a direct effect on appearance.
+For those with work-life balance being low and amount of exercise being low, the probability of appearance being low is 0.2287. For those with work-life balance being high and amount of exercise being low, the probability of appearance being low is 0.1287.
+Instruction: Consider the controlled direct effect (CDE) of work-life balance on appearance.
+Question: Conditioned on amount of exercise being low, if the work-life balance had been low, would the appearance have been more likely to be low?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "Yes", "PROB": "0.1000"}
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'zero-shot-CoT':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s Let's think step by step.
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'zero-shot-CoT-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s请逐步思考。
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'manual-CoT':
+    """Here are three examples for math problems about controlled direct effect (CDE) task with chain of thought.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Etsq has a direct effect on ahcp. Etsq has a direct effect on pcit. Ahcp has a direct effect on pcit. Fqyq has a direct effect on pcit.
+For those with etsq being low and ahcp being low, the probability of pcit being low is 0.7081. For those with etsq being high and ahcp being low, the probability of pcit being low is 0.5410.
+Instruction: Consider the controlled direct effect (CDE) of etsq on pcit.
+Question: Conditioned on ahcp being low, if the etsq had been low, would the pcit have been
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: With A represents etsq, B represents ahcp, and D represents pcit, we have P(D=0|A=0,B=0)=0.7081; P(D=0|A=1,B=0)=0.5410; Considering the edge A->D, and in this situation, empty is a valid backdoor adjustment set, we calculate CDE=P(D=0|do(A=0,B=0))-P(D=0|do(A=1,B=0))=P(D=0|A=0,B=0)-P(D=0|A=1,B=0)=0.7081-0.5410=0.1671>0. The answer is {"ANSWER": "Yes", "PROB": "0.1671"}.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Mental health has a direct effect on temperature. Temperature has a direct effect on government policies.
+
+Instruction: Consider the controlled direct effect (CDE) of mental health on government policies.
+Question: Conditioned on temperature being low, if the mental health had been high, would the government policies have been more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: With A represents mental health and C represents government policies, the edge A->C does not exist. The answer is {"ANSWER": "No", "PROB": "0.0000"}.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Bobg has a direct effect on viah. Bobg has a direct effect on afgs. Viah has a direct effect on afgs.
+For those with bobg being low and viah being high, the probability of afgs being low is 0.2091. For those with bobg being high and viah being high, the probability of afgs being low is 0.5622.
+Instruction: Consider the controlled direct effect
+Question: Conditioned on viah being high, if the bobg had been low, would the afgs have been more likely to be low?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: With A represents bobg, B represents viah and C represents afgs, we find P(C=0|A=0,B=1)=0.2091; P(C=0|A=1,B=1)=0.5622; Considering the edge A->C exists, and in this situation, empty set is a valid backdoor adjustment set, we calculate CDE=P(C=0|do(A=0,B=1))-P(C=0|do(A=1,B=1))=P(C=0|A=0,B=1)-P(C=0|A=1,B=1)=0.2091-0.5622=-0.3531<0. The answer is {"ANSWER": "No", "PROB": "-0.3531"}.
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'manual-CoT-CN':
+    """如下为一个使用思维链进行推理的关于“受控直接效果”(controlled direct effect, CDE)任务的数学问题：
+
+输入信息：设想一个只有以下条件，而没有其他因素或因果关系的假设世界：降雨量对土壤湿度水平有直接影响。降雨量对农作物产量有直接影响。土壤湿度水平对农作物产量有直接影响。
+在降雨量为大且土壤湿度水平为湿润的条件下, 农作物产量为高的概率为0.9092。在降雨量为小且土壤湿度水平为湿润的条件下, 农作物产量为高的概率为0.8062。
+指令：考虑降雨量作用于农作物产量的“受控直接效果”(controlled direct effect, CDE)。
+问题：在土壤湿度水平为湿润的条件下，假如降雨量为大，那么农作物产量更有可能为高吗？
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：用A代表降雨量, B代表土壤湿度水平, C代表农作物产量，边A->C存在。考虑到P(C=1|A=1,B=1)=0.9092，P(C=1|A=0,B=1)=0.8062，且该问题中有一个合法的后门调整集合：空集，所以CDE=P(C=1|do(A=1,B=1))-P(C=1|do(A=0,B=1))=P(C=1|A=1,B=1)-P(C=1|A=0,B=1)=0.9092-0.8062=0.1030>0。因此答案为{"ANSWER":"是","PROB":"0.1030"}。
+
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'explicit-function':
+    """You are a helpful assistant for math probability.
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'explicit-function-CN':
+    """你是一个用于计算数学概率的得力助手。
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['given_info'],
+                                        item['Background']['data_info'],
+                                        item['Instruction'], item['Question'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/CEG-O_E-CARE.py
+++ b/opencompass/datasets/calm/data_processing/prompt/CEG-O_E-CARE.py
@ -0,0 +1,207 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Cause: %s
+Effect: %s
+Question: why the cause can lead to the effect ?
+Answer:""",
+    'basic-CN':
+    """原因：%s
+结果：%s
+问题：为什么原因会导致这样的结果？
+答案：""",
+    'adversarial-ignore':
+    """Cause: %s
+Effect: %s
+Question: why the cause can lead to the effect ?
+Answer:""",
+    'adversarial-ignore-CN':
+    """原因：%s
+结果：%s
+问题：为什么原因会导致这样的结果？
+答案：""",
+    'adversarial-doubt':
+    """Cause: %s
+Effect: %s
+Question: why the cause can lead to the effect ?
+Answer:""",
+    'adversarial-doubt-CN':
+    """原因：%s
+结果：%s
+问题：为什么原因会导致这样的结果？
+答案：""",
+    'zero-shot-IcL':
+    """generate explanations for causal relations between events.
+Cause: %s
+Effect: %s
+Question: why the cause can lead to the effect ?
+Answer:""",
+    'zero-shot-IcL-CN':
+    """请生成事件之间因果关系的解释。
+原因：%s
+结果：%s
+问题：为什么原因会导致这样的结果？
+答案：""",
+    'one-shot-IcL':
+    """generate explanations for causal relations between events.
+Cause: The woman gave birth to a child.
+Effect: The child brought psycho-physical phenomena on a new life.
+Question: why the cause can lead to the effect ?
+Answer: Birth is the arising of the psycho-physical phenomena.
+
+Cause: %s
+Effect: %s
+Question: why the cause can lead to the effect ?
+Answer:""",
+    'one-shot-IcL-CN':
+    """请生成事件之间因果关系的解释。
+原因：这位女士生下了一个孩子。
+结果：这个孩子给新生活带来了心理-生理现象。
+问题：为什么原因会导致这样的结果？
+答案：出生是心理-生理现象的产生原因。
+
+原因：%s
+结果：%s
+问题：为什么原因会导致这样的结果？
+答案：""",
+    'three-shot-IcL':
+    """generate explanations for causal relations between events.
+Cause: The woman gave birth to a child.
+Effect: The child brought psycho-physical phenomena on a new life.
+Question: why the cause can lead to the effect ?
+Answer: Birth is the arising of the psycho-physical phenomena.
+
+Cause: Otters enter their new habitat.
+Effect: Otters start looking for abalone for food.
+Question: why the cause can lead to the effect ?
+Answer: Abalone are one of the first food items taken by otters as they move into new habitat.
+
+Cause: Lila loves classification of her things.
+Effect: Lila can find what she wants quickly.
+Question: why the cause can lead to the effect ?
+Answer: Classifications yield accuracy.
+
+Cause: %s
+Effect: %s
+Question: why the cause can lead to the effect ?
+Answer:""",
+    'three-shot-IcL-CN':
+    """请生成事件之间因果关系的解释。
+原因：这位女士生下了一个孩子。
+结果：这个孩子给生活带来了新的心理-生理现象。
+问题：为什么原因会导致这样的结果？
+答案：出生是心理-生理现象的起源。
+
+原因：水獭进入它们的新栖息地。
+结果：水獭开始寻找鲍鱼作为食物。
+问题：为什么原因会导致这样的结果？
+答案：鲍鱼是水獭搬进新栖息地时最先吃的食物之一。
+
+原因：莉拉喜欢对她的东西进行分类。
+结果：莉莉可以很快地找到她想要的东西。
+问题：为什么原因会导致这样的结果？
+答案：分类可以提高准确度。
+
+原因：%s
+结果：%s
+问题：为什么原因会导致这样的结果？
+答案：""",
+    'zero-shot-CoT':
+    """Cause: %s
+Effect: %s
+Question: why the cause can lead to the effect ? Let's think step by step.
+Answer:""",
+    'zero-shot-CoT-CN':
+    """原因：%s
+结果：%s
+问题：为什么原因会导致这样的结果？请逐步思考。
+答案：""",
+    'manual-CoT':
+    """Here we will provide eight chain-of-thought exemplars, followed by a causal explanation generating question that needs to be answered with chain-of-thought.
+
+Cause: His action led to the movement of the wheels.
+Effect: The machine was set in motion.
+Question: why the cause can lead to the effect?
+Answer(with chain-of-thought): Movement results in motion. The initial movement caused by the action eventually builds up and transitions into the sustained motion of the machine.
+
+Cause: All relatives entered the family room.
+Effect: They sat on the chairs one by one.
+Question: why the cause can lead to the effect?
+Answer(with chain-of-thought): Chairs sit in family rooms. The presence of chairs in the family room sets the stage for the expected behavior of sitting down when relatives enter the room.
+
+Cause: Seals are mammals.
+Effect: They can live well in winter.
+Question: why the cause can lead to the effect ? Let's think step by step.
+Answer(with chain-of-thought): Seals are protected from the cold by a thick layer of blubber combined with a thick fur coat. Thus, they could withstand cold temperatures and maintain their body heat. This adaptation aligns with the effect of being able to live well in winter.
+
+Cause: A stove is an enclosed space in which fuel is burned to provide heating.
+Effect: Its surfaces protect people from hurting.
+Question: why the cause can lead to the effect?
+Answer(with chain-of-thought): Stoves have surfaces. Stove surfaces are a crucial safety feature that shields individuals from direct contact with the heat and flames generated during the burning of fuel inside the stove.
+
+Cause: The student majored in medicine had to choose a research interest.
+Effect: He chose Psychiatry.
+Question: why the cause can lead to the effect?
+Answer(with chain-of-thought): Psychiatry is a branch of medicine. The student's background in medicine makes Psychiatry a logical and suitable research interest.
+
+Cause: The doctor told William that his eyesight was gradually losing.
+Effect: The doctor used radiotherapy to treat William.
+Question: why the cause can lead to the effect?
+Answer(with chain-of-thought): Radiotherapy uses low dose radiation to stop the progression of vision loss on the retina. It is a medical intervention that can be utilized to address certain conditions causing vision loss on the retina.
+
+Cause: The angel controls the Kingdom of Heaven.
+Effect: Dominion is part of his responsibility.
+Question: why the cause can lead to the effect?
+Answer(with chain-of-thought): Dominion is a type of the Kingdom of Heaven. By controlling the Kingdom of Heaven, the angel's responsibilities include exercising authority and rule, which align with the concept of dominion.
+
+Cause: The government published a new policy.
+Effect: The public knew its meaning.
+Question: why the cause can lead to the effect?
+Answer(with chain-of-thought): Policy makes senses. Policies are constructed to convey information in a way that makes sense to the readers.
+
+Cause: %s
+Effect: %s
+Question: why the cause can lead to the effect ?
+Answer:""",
+    'manual-CoT-CN':
+    """如下为三个使用思维链进行推理的问题:
+
+原因：莱勒有眼科医生。
+结果：莱尔的医生用激光治疗了他。
+问题：为什么原因会导致这样的结果？
+答案：眼科医生通常用激光治疗增生性视网膜病变。
+
+原因：作者运用了拟人手法来描述无生命物体。
+结果：读者觉得它好像有人类的能力。
+问题：为什么原因会导致这样的结果？
+答案：拟人手法是将无生命物体描述成具有人类特征的表达方式。
+
+原因：约翰想种一棵半耐寒多年生植物。
+结果：他种了蒲公英。
+问题：为什么原因会导致这样的结果？
+答案：蒲公英是半耐寒多年生植物。
+
+原因：%s
+结果：%s
+问题：为什么原因会导致这样的结果？
+答案：""",
+    'explicit-function':
+    """You are a helpful assistant for causal explanation generation.
+Cause: %s
+Effect: %s
+Question: why the cause can lead to the effect ?
+Answer:""",
+    'explicit-function-CN':
+    """你是一个用于因果解释生成的得力助手。
+原因：%s
+结果：%s
+问题：为什么原因会导致这样的结果？
+答案""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['cause'], item['effect'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/CEI-B.py
+++ b/opencompass/datasets/calm/data_processing/prompt/CEI-B.py
@ -0,0 +1,180 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """You will be presented with a causal graph in the following form: %s.
+There exist unobserved confounders between: %s.
+Question: Whether the causal effect of %s on %s is identified or not?
+Answer (Yes or No ?):""",
+    'basic-CN':
+    """给定如下因果图：%s。
+在这些变量间存在着不可观察的混淆变量：%s。
+问题：%s对%s的因果效应是否可以被识别？
+答案（是或否？）：""",
+    'adversarial-ignore':
+    """You will be presented with a causal graph in the following form: %s.
+There exist unobserved confounders between: %s.
+Question: Whether the causal effect of %s on %s is identified or not?
+Answer (Yes or No ?):""",
+    'adversarial-ignore-CN':
+    """给定如下因果图：%s。
+在这些变量间存在着不可观察的混淆变量：%s。
+问题：%s对%s的因果效应是否可以被识别？
+答案（是或否？）：""",
+    'adversarial-doubt':
+    """You will be presented with a causal graph in the following form: %s.
+There exist unobserved confounders between: %s.
+Question: Whether the causal effect of %s on %s is identified or not?
+Answer (Yes or No ?):""",
+    'adversarial-doubt-CN':
+    """给定如下因果图：%s。
+在这些变量间存在着不可观察的混淆变量：%s。
+问题：%s对%s的因果效应是否可以被识别？
+答案（是或否？）：""",
+    'zero-shot-IcL':
+    """Determine whether the causal effect can be identified given two variables on a causal graph.
+You will be presented with a causal graph in the following form: %s.
+There exist unobserved confounders between: %s.
+Question: Whether the causal effect of %s on %s is identified or not?
+Answer (Yes or No ?):""",
+    'zero-shot-IcL-CN':
+    """确定在因果图中给定两个变量的情况下，因果效应是否可以被识别。
+给定如下因果图：%s。
+在这些变量间存在着不可观察的混淆变量：%s。
+问题：%s对%s的因果效应是否可以被识别？
+答案（是或否？）：""",
+    'one-shot-IcL':
+    """Determine whether the causal effect can be identified given two variables on a causal graph.
+You will be presented with a causal graph in the following form: A causes E, A causes C, A causes B, B causes D, B causes E, and D causes E.
+There exist unobserved confounders between: B and E.
+Question: Whether the causal effect of B on E is identified or not?
+Answer (Yes or No ?): No
+
+You will be presented with a causal graph in the following form: %s.
+There exist unobserved confounders between: %s.
+Question: Whether the causal effect of %s on %s is identified or not?
+Answer (Yes or No ?):""",
+    'one-shot-IcL-CN':
+    """确定在因果图中给定两个变量的情况下，因果效应是否可以被识别。
+给定如下因果图：A导致E, A导致C, A导致B, B导致D, B导致E, 以及D导致E。
+在这些变量间存在着不可观察的混淆变量：B和E。
+问题：B对E的因果效应是否可以被识别？
+答案（是或否？）：否
+
+给定如下因果图：%s。
+在这些变量间存在着不可观察的混淆变量：%s。
+问题：%s对%s的因果效应是否可以被识别？
+答案（是或否？）：""",
+    'three-shot-IcL':
+    """Determine whether the causal effect can be identified given two variables on a causal graph.
+You will be presented with a causal graph in the following form: A causes E, A causes C, A causes B, B causes D, B causes E, and D causes E.
+There exist unobserved confounders between: B and E.
+Question: Whether the causal effect of B on E is identified or not?
+Answer (Yes or No ?): No
+
+You will be presented with a causal graph in the following form: A causes D, A causes E, B causes E, C causes D, and D causes E.
+There exist unobserved confounders between: C and D, and A and E.
+Question: Whether the causal effect of C on D is identified or not?
+Answer (Yes or No ?): No
+
+You will be presented with a causal graph in the following form: A causes D, A causes C, A causes B, B causes E, B causes D, and C causes D.
+There exist unobserved confounders between: B and D, C and D, and A and B.
+Question: Whether the causal effect of D on C is identified or not?
+Answer (Yes or No ?): Yes
+
+You will be presented with a causal graph in the following form: %s.
+There exist unobserved confounders between: %s.
+Question: Whether the causal effect of %s on %s is identified or not?
+Answer (Yes or No ?):""",
+    'three-shot-IcL-CN':
+    """确定在因果图中给定两个变量的情况下，因果效应是否可以被识别。
+给定如下因果图：A导致E, A导致C, A导致B, B导致D, B导致E, 以及D导致E。
+在这些变量间存在着不可观察的混淆变量：B和E。
+问题：B对E的因果效应是否可以被识别？
+答案（是或否？）：否
+
+给定如下因果图：A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+在这些变量间存在着不可观察的混淆变量：C和D, 以及A和E。
+问题：C对D的因果效应是否可以被识别？
+答案（是或否？）：否
+
+给定如下因果图：A导致D, A导致C, A导致B, B导致E, B导致D, 以及C导致D。
+在这些变量间存在着不可观察的混淆变量：B和D, C和D, 以及A和B。
+问题：D对C的因果效应是否可以被识别？
+答案（是或否？）：是
+
+给定如下因果图：%s。
+在这些变量间存在着不可观察的混淆变量：%s。
+问题：%s对%s的因果效应是否可以被识别？
+答案（是或否？）：""",
+    'zero-shot-CoT':
+    """You will be presented with a causal graph in the following form: %s.
+There exist unobserved confounders between: %s.
+Question: Whether the causal effect of %s on %s is identified or not? Let's think step by step.
+Answer (Yes or No ?):""",
+    'zero-shot-CoT-CN':
+    """给定如下因果图：%s。
+在这些变量间存在着不可观察的混淆变量：%s。
+问题：%s对%s的因果效应是否可以被识别？请逐步思考。
+答案（是或否？）：""",
+    'manual-CoT':
+    """Here are three examples of causal effect identification using chain of thought, and a question to answer.
+
+You will be presented with a causal graph in the following form: A causes E, A causes D, B causes D, B causes E, C causes E, and D causes E.
+There exist unobserved confounders between: B and D.
+Question: Whether the causal effect of B on E is identified or not?
+Answer (Yes or No ?): The unobserved confounders between B and D suggests there might be a causal path from the confounder to B. Therefore, there may be an unblocked back-door path from B to E, making the causal effect of B on E not identified. Therefore, the answer is No.
+
+You will be presented with a causal graph in the following form: A causes B, B causes C, B causes D, and D causes E.
+There exist unobserved confounders between: .
+Question: Whether the causal effect of A on B is identified or not?
+Answer (Yes or No ?): There are no unobserved confounders, and there is no unblocked back-door path from A to B, so the causal effect of A on B can be identified. Therefore, the answer is Yes.
+
+You will be presented with a causal graph in the following form: A causes D, A causes C, B causes D, B causes E, and C causes D.
+There exist unobserved confounders between: B and D, and C and D.
+Question: Whether the causal effect of A on B is identified or not?
+Answer (Yes or No ?): There are no unobserved confounders between A and B, and there is no unblocked back-door path from A to B, so the causal effect of A on B can be identified. Therefore, the answer is Yes.
+
+You will be presented with a causal graph in the following form: %s.
+There exist unobserved confounders between: %s.
+Question: Whether the causal effect of %s on %s is identified or not?
+Answer (Yes or No ?):
+""",
+    'manual-CoT-CN':
+    """如下为两个使用思维链进行推理的判断因果效应可否识别的示例，和一个需要回答的问题。
+
+给定如下因果图：A导致E, A导致D, B导致D, B导致E, C导致E, 以及D导致E。
+在这些变量间存在着不可观察的混淆变量：B和D。
+问题：B对E的因果效应是否可以被识别？
+答案（是或否？）：B和D之间存在不可观察的混淆变量说明可能存在从混淆变量指向B的因果路径。因此B到E可能存在无法被阻断的后门路径，导致B对E的因果效应不可被识别。因此答案为“否”。
+
+给定如下因果图：A导致B, B导致C, B导致D, 以及D导致E。
+在这些变量间存在着不可观察的混淆变量：。
+问题：A对B的因果效应是否可以被识别？
+答案（是或否？）：不存在不可观察的混淆变量，A到B不存在无法被阻断的后门路径，所以A对B的因果效应可以被识别。因此答案为“是”。
+
+给定如下因果图：%s。
+在这些变量间存在着不可观察的混淆变量：%s。
+问题：%s对%s的因果效应是否可以被识别？
+答案（是或否？）：
+""",
+    'explicit-function':
+    """You are a helpful assistant for causality identification.
+You will be presented with a causal graph in the following form: %s.
+There exist unobserved confounders between: %s.
+Question: Whether the causal effect of %s on %s is identified or not?
+Answer (Yes or No ?):""",
+    'explicit-function-CN':
+    """你是一个用于因果识别的得力助手。
+给定如下因果图：%s。
+在这些变量间存在着不可观察的混淆变量：%s。
+问题：%s对%s的因果效应是否可以被识别？
+答案（是或否？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['di_edges'], item['bi_edges'],
+                                        item['treatment'], item['outcome'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/CORR-B_correlation.py
+++ b/opencompass/datasets/calm/data_processing/prompt/CORR-B_correlation.py
@ -0,0 +1,134 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic': """Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'basic-CN': """输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'adversarial-ignore': """Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'adversarial-ignore-CN': """输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'adversarial-doubt': """Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'adversarial-doubt-CN': """输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'zero-shot-IcL': """Answer questions about correlation.
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'zero-shot-IcL-CN': """回答有关相关性的问题。
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'one-shot-IcL': """Answer questions about correlation.
+Input Info: The overall probability of alarm set by husband is 0.74. The probability of alarm not set by husband and ringing alarm is 0.09. The probability of alarm set by husband and ringing alarm is 0.51.
+Question: Is the chance of ringing alarm smaller when observing alarm set by husband?
+Answer (Yes or No ?): No.
+
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'one-shot-IcL-CN': """回答有关相关性的问题。
+输入信息：丈夫设置闹钟的总体概率为74%%，丈夫未设置闹钟而闹钟响起的概率为9%%，丈夫设置闹钟且闹钟响起的概率为51%%。
+问题：观察到丈夫设置闹钟是否会降低闹钟响铃的概率？
+答案（是或否？）：否
+
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'three-shot-IcL': """Answer questions about correlation.
+Input Info: The overall probability of alarm set by husband is 0.74. The probability of alarm not set by husband and ringing alarm is 0.09. The probability of alarm set by husband and ringing alarm is 0.51.
+Question: Is the chance of ringing alarm smaller when observing alarm set by husband?
+Answer (Yes or No ?): No.
+
+Input Info: The overall probability of alarm set by husband is 69%%. The probability of alarm not set by husband and ringing alarm is 15%%. The probability of alarm set by husband and ringing alarm is 38%%.
+Question: Is the chance of ringing alarm larger when observing alarm set by husband?
+Answer (Yes or No ?): yes
+
+Input Info: The overall probability of alarm set by husband is 86%%. The probability of alarm not set by husband and ringing alarm is 7%%. The probability of alarm set by husband and ringing alarm is 71%%.
+Question: Is the chance of ringing alarm larger when observing alarm set by husband?
+Answer (Yes or No ?): yes
+
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'three-shot-IcL-CN': """回答有关相关性的问题。
+输入信息：丈夫设置闹钟的总体概率为74%%，丈夫未设置闹钟而闹钟响起的概率为9%%，丈夫设置闹钟且闹钟响起的概率为51%%。
+问题：观察到丈夫设置闹钟是否会降低闹钟响铃的概率？
+答案（是或否？）：否
+
+输入信息：丈夫设置闹钟的总体概率为69%%，丈夫未设置闹钟而闹钟响起的概率为15%%，丈夫设置闹钟且闹钟响起的概率为38%%。
+问题：观察到丈夫设置闹钟是否会增加闹钟响铃的概率？
+答案（是或否？）：是
+
+输入信息：丈夫设置闹钟的总体概率为86%%，丈夫未设置闹钟而闹钟响起的概率为7%%，丈夫设置闹钟且闹钟响起的概率为71%%。
+问题：观察到丈夫设置闹钟是否会增加闹钟响铃的概率？
+答案（是或否？）：是
+
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'zero-shot-CoT': """Input Info: %s
+Question: %s Let's think step by step.
+Answer (Yes or No ?):""",
+    'zero-shot-CoT-CN': """输入信息：%s
+问题：%s 请逐步思考。
+答案（是或否？）：""",
+    'manual-CoT':
+    """Here are three examples of problems about considering correlation with chain of thought.
+
+Input Info: The overall probability of encouragement is 13%%. The probability of discouragement and high exam score is 24%%. The probability of encouragement and high exam score is 9%%.
+Question: Is the chance of high exam score larger when observing encouragement?
+Answer (Yes or No ?): Let X = encouragement level; V2 = studying habit; Y = exam score. The causal relations are: X->V2,X->Y,V2->Y. P(X=1=1) = 0.51\nP(Y=1, X=0=1) = 0.16\nP(Y=1, X=1=1) = 0.33. P(X = 1, Y = 1)/P(X = 1) - P(X = 0, Y = 1)/P(X = 0)=0.33/0.51 - 0.16/0.49 = 0.32>0. Thus, the chance of high exam score is larger when observing encouragement. Therefore, the answer is Yes.
+
+Input Info: The overall probability of high hospital bill is 53%%. The probability of low hospital bill and recovery is 34%%. The probability of high hospital bill and recovery is 16%%.
+Question: Is the chance of recovery larger when observing high hospital bill?
+Answer (Yes or No ?): Let V1 = age; X = hospital costs; Y = recovery. The causal relations are: V1->X,V1->Y,X->Y. P(X=1=1) = 0.53\nP(Y=1, X=0=1) = 0.34\nP(Y=1, X=1=1) = 0.16. P(X = 1, Y = 1)/P(X = 1) - P(X = 0, Y = 1)/P(X = 0)=0.16/0.53 - 0.34/0.47 = -0.43<0. Thus, the chance of recovery is not larger when observing high hospital bill. Therefore, the answer is No.
+
+Input Info: The overall probability of male gender is 7%%. The probability of non-male gender and freckles is 34%%. The probability of male gender and freckles is 3%%.
+Question: Is the chance of freckles smaller when observing male gender?
+Answer (Yes or No ?): Let V2 = residency status; X = gender; V3 = department competitiveness; Y = freckles. The causal relations are: X->V3,V2->V3,X->Y,V2->Y,V3->Y. P(X=1=1) = 0.07\nP(Y=1, X=0=1) = 0.34\nP(Y=1, X=1=1) = 0.03. P(X = 1, Y = 1)/P(X = 1) - P(X = 0, Y = 1)/P(X = 0)=0.03/0.07 - 0.34/0.93 = 0.03>0. Thus, the chance of freckles is not smaller when observing male gender. Therefore, the answer is No.
+
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'manual-CoT-CN': """如下为三个使用思维链进行推理的有关统计关联程度的问题：
+
+输入信息：丈夫设置闹钟的总体概率为86%%，丈夫未设置闹钟而闹钟响起的概率为7%%，丈夫设置闹钟且闹钟响起的概率为71%%。
+问题：观察到丈夫设置闹钟是否会增加闹钟响铃的概率？
+答案（是或否？）：令 X = 丈夫; V2 = 妻子; Y = 闹钟响。因果关系有：X->V2,X->Y,V2->Y。P(X=1=1) = 0.86\nP(Y=1, X=0=1) = 0.07\nP(Y=1, X=1=1) = 0.71。P(X = 1, Y = 1)/P(X = 1) - P(X = 0, Y = 1)/P(X = 0)=0.71/0.86 - 0.07/0.14 = 0.29>0。因此丈夫设置闹钟会增加闹钟响铃的概率。因此答案为“是”。
+
+输入信息：进行美黑沙龙护理的总体概率为1%%，没有进行美黑沙龙护理但皮肤被晒黑的概率是22%%。进行美黑沙龙护理后皮肤被晒黑的概率为0%%。
+问题：观察到进行美黑沙龙护理是否会增加皮肤被晒黑的概率？
+答案（是或否？）：令 V2 = 去海滩; X = 美黑沙龙护理; Y = 皮肤。因果关系有：X->Y,V2->Y。P(X=1=1) = 0.01\nP(Y=1, X=0=1) = 0.22\nP(Y=1, X=1=1) = 0.00。P(X = 1, Y = 1)/P(X = 1) - P(X = 0, Y = 1)/P(X = 0)=0.00/0.01 - 0.22/0.99 = 0.56>0。因此进行美黑沙龙护理会增加皮肤被晒黑的概率。因此答案为“是”。
+
+输入信息：乘坐电梯的总体概率为34%%。走楼梯导致企鹅死亡的概率为30%%。乘坐电梯导致企鹅死亡的概率为16%%。
+问题：观察到乘坐电梯是否会降低企鹅死亡的概率？
+答案（是或否？）：令 X = 我的决定; V2 = 企鹅的情绪; Y = 企鹅存活。因果关系有：X->V2,X->Y,V2->Y。P(X=1=1) = 0.34\nP(Y=1, X=0=1) = 0.30\nP(Y=1, X=1=1) = 0.16。P(X = 1, Y = 1)/P(X = 1) - P(X = 0, Y = 1)/P(X = 0)=0.35/0.60 - 0.23/0.40 = 0.01>0。因此乘坐电梯不会降低企鹅死亡的概率。因此答案为“否”。
+
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'explicit-function':
+    """You are a helpful assistant for identifying correlation.
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'explicit-function-CN': """你是一个识别相关关系的得力助手。
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['given_info'], item['question'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/CR-B_det-counterfactual.py
+++ b/opencompass/datasets/calm/data_processing/prompt/CR-B_det-counterfactual.py
@ -0,0 +1,135 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic': """Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'basic-CN': """输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'adversarial-ignore': """Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'adversarial-ignore-CN': """输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'adversarial-doubt': """Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'adversarial-doubt-CN': """输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'zero-shot-IcL': """Answer questions about deterministic counterfactual.
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'zero-shot-IcL-CN': """请回答有关确定性反事实的问题。
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'one-shot-IcL': """Answer questions about deterministic counterfactual.
+Input Info: We know that alarm set by husband causes alarm not set by wife. alarm set by husband or alarm set by wife causes ringing alarm.
+Question: Would the alarm rings the next morning if alarm not set by husband instead of alarm set by husband?
+Answer (Yes or No ?): Yes
+
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'one-shot-IcL-CN': """请回答有关确定性反事实的问题。
+输入信息：我们知道丈夫设置闹钟会导致妻子没有设置闹钟，丈夫设置闹钟或妻子设置闹钟会导致闹钟响铃。
+问题：如果丈夫没有设置闹钟，而不是丈夫设置闹钟，第二天早上闹钟会响吗？
+答案（是或否？）：是
+
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'three-shot-IcL': """Answer questions about deterministic counterfactual.
+Input Info: We know that alarm set by husband causes alarm not set by wife. alarm set by husband or alarm set by wife causes ringing alarm.
+Question: Would the alarm rings the next morning if alarm not set by husband instead of alarm set by husband?
+Answer (Yes or No ?): Yes
+
+Input Info: We know that alarm set by husband causes alarm set by wife. alarm set by husband or alarm set by wife causes ringing alarm.
+Question: Would the alarm rings the next morning if alarm not set by husband instead of alarm set by husband?
+Answer (Yes or No ?): no
+
+Input Info: We know that alarm set by husband causes alarm set by wife. alarm set by husband or alarm set by wife causes ringing alarm.
+Question: Would the alarm doesn't ring the next morning if alarm set by husband instead of alarm not set by husband?
+Answer (Yes or No ?): no
+
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'three-shot-IcL-CN': """请回答有关确定性反事实的问题。
+输入信息：我们知道丈夫设置闹钟会导致妻子没有设置闹钟，丈夫设置闹钟或妻子设置闹钟会导致闹钟响铃。
+问题：如果丈夫没有设置闹钟，而不是丈夫设置闹钟，第二天早上闹钟会响吗？
+答案（是或否？）：是
+
+输入信息：我们知道丈夫设置闹钟会导致妻子设置闹钟，丈夫设置闹钟或妻子设置闹钟会导致闹钟响铃。
+问题：如果丈夫没有设置闹钟，而不是丈夫设置闹钟，第二天早上闹钟会响吗？
+答案（是或否？）：否
+
+输入信息：我们知道丈夫设置闹钟会导致妻子设置闹钟，丈夫设置闹钟或妻子设置闹钟会导致闹钟响铃。
+问题：如果是丈夫设置闹钟，而不是丈夫没有设置闹钟，第二天早上闹钟不会响吗？
+答案（是或否？）：否
+
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'zero-shot-CoT': """Input Info: %s
+Question: %s Let's think step by step.
+Answer (Yes or No ?):""",
+    'zero-shot-CoT-CN': """输入信息：%s
+问题：%s请逐步思考。
+答案（是或否？）：""",
+    'manual-CoT':
+    """Here are three examples of problems about deterministic counterfactual with chain of thought.
+
+Input Info: We know that having a sister causes the corporal shooting and the private not shooting. the corporal shooting and the private shooting causes the prisoner's death.
+Question: Would the prisoner is dead if not having a sister instead of having a sister?
+Answer (Yes or No ?): Let X = having a sister; V3 = the private; V2 = the corporal; Y = prisoner. The causal relations are: X->V3,X->V2,V2->Y,V3->Y. Set Y_{X=0} = 1 | , then solve for Y, given the evidence and the action. V2 = X\nV3 = not V2\nY = V2 and V3. Then we get Y = [0] = 0 and 1. Thus, the prisoner would not be dead if not having a sister instead of having a sister. Therefore, the answer is No.
+
+Input Info: We know that citrus intake causes vitamin C deficiency, and we know that sufficient vitamin C causes straight hair.
+Question: Would the patient has curly hair if citrus intake instead of absence of citrus?
+Answer (Yes or No ?): Let X = eating citrus; V2 = vitmain C; Y = curly hair. The causal relations are: X->V2,V2->Y. Set Y_{X=1} = 1 | , then solve for Y, given the evidence and the action. V2 = not X\nY = not V2. Then we get Y = [1] = not 0. Thus, the patient would have curly hair if citrus intake instead of absence of citrus. Therefore, the answer is Yes.
+
+Input Info: We know that zuph causes not rixq. zuph and rixq causes xevu. We observed an individual is zuph.
+Question: Would an individual is not xevu if not rixq instead of rixq?
+Answer (Yes or No ?): Let V1 = zuph; X = rixq; Y = xevu. The causal relations are: V1->X,V1->Y,X->Y. Set Y_{X=0} = 0 | V1=1, then solve for Y, given the evidence and the action. V1 = 1\nX = not V1\nY = V1 and X. Then we get Y = 0 = 1 and 0. Thus, an individual would not be xevu if not rixq instead of rixq. Therefore, the answer is Yes.
+
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):
+""",
+    'manual-CoT-CN': """如下为三个使用思维链进行推理的有关反事实的问题：
+
+输入信息：我们知道丈夫设置闹钟会导致妻子设置闹钟，丈夫设置闹钟或妻子设置闹钟会导致闹钟响铃。
+问题：如果丈夫没有设置闹钟，而不是丈夫设置闹钟，第二天早上闹钟会响吗？
+答案（是或否？）：令 X = 丈夫; V2 = 妻子; Y = 闹钟响铃; 该问题下因果关系有：X->V2,X->Y,V2->Y。令Y_{X=0} = 1 | , 在已知事实和动作下求解Y。V2 = X\nY = X or V2。解得Y = 0 = 0 or 0。因此如果丈夫没有设置闹钟，而不是丈夫设置闹钟，第二天早上闹钟不会响。因此答案为“否”。
+
+输入信息：我们知道晚起床和交通拥堵会导致准时到校，我们观察到路上有严重的交通堵塞。
+问题：如果爱丽丝晚起床而不是准时起床，她会上学迟到吗？
+答案（是或否？）：令 V2 = 交通; X = 爱丽丝起床; Y = 爱丽丝到学校; 该问题下因果关系有：X->Y,V2->Y。令Y_{X=1} = 0 | V2=1，在在已知事实和动作下求解Y。V2 = 1\nY = X and V2。解得Y = 1 = 1 and 1。因此如果爱丽丝晚起床而不是准时起床，她不会上学迟到。因此答案为“否”。
+
+输入信息：我们知道摄入柑橘会导致维生素C缺乏，我们也知道摄入足够的维生素C会导致坏血病。
+问题：如果患者摄入柑橘而不是不摄入柑橘，他会从坏血病中康复吗？
+答案（是或否？）：令 X = 摄入柑橘; V2 = 维生素C; Y = 坏血病; 该问题下因果关系有：X->V2,V2->Y. Set Y_{X=1} = 0 | ，在在已知事实和动作下求解Y。V2 = not X\nY = V2。解得Y = [0] = 0。因此如果患者摄入柑橘而不是不摄入柑橘，他会从坏血病中康复。因此答案为“是”。
+
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'explicit-function':
+    """You are a helpful assistant for deterministic counterfactual.
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'explicit-function-CN': """你是用于决定论反事实的得力助手。
+输入信息：%s
+问题：%s
+答案（是或否？）""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['given_info'], item['question'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/CR-C_CRASS.py
+++ b/opencompass/datasets/calm/data_processing/prompt/CR-C_CRASS.py
@ -0,0 +1,328 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Input Event: %s
+Counterfactual Question: %s
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Option 4: %s
+Answer (Option 1 or 2 or 3 or 4?):""",
+    'basic-CN':
+    """输入事件：%s
+反事实问题：%s
+选项一：%s
+选项二：%s
+选项三：%s
+选项四：%s
+答案（选项一或选项二或选项三或选项四？）:""",
+    'adversarial-ignore':
+    """Input Event: %s
+Counterfactual Question: %s
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Option 4: %s
+Answer (Option 1 or 2 or 3 or 4?):""",
+    'adversarial-ignore-CN':
+    """输入事件：%s
+反事实问题：%s
+选项一：%s
+选项二：%s
+选项三：%s
+选项四：%s
+答案（选项一或选项二或选项三或选项四？）:""",
+    'adversarial-doubt':
+    """Input Event: %s
+Counterfactual Question: %s
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Option 4: %s
+Answer (Option 1 or 2 or 3 or 4?):""",
+    'adversarial-doubt-CN':
+    """输入事件：%s
+反事实问题：%s
+选项一：%s
+选项二：%s
+选项三：%s
+选项四：%s
+答案（选项一或选项二或选项三或选项四？）:""",
+    'zero-shot-IcL':
+    """Predict the effects of causal events by contemplating hypothetical situations or alternate realities. This involves altering specific elements or conditions of an actual event or situation.
+Input Event: %s
+Counterfactual Question: %s
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Option 4: %s
+Answer (Option 1 or 2 or 3 or 4?):""",
+    'zero-shot-IcL-CN':
+    """通过思考假设情况或另一种现实，预测因果事件的影响。这涉及改变实际事件或情况的特定元素或条件。
+输入事件：%s
+反事实问题：%s
+选项一：%s
+选项二：%s
+选项三：%s
+选项四：%s
+答案（选项一或选项二或选项三或选项四？）：""",
+    'one-shot-IcL':
+    """Predict the effects of causal events by contemplating hypothetical situations or alternate realities. This involves altering specific elements or conditions of an actual event or situation.
+Input Event: A woman opens a treasure chest.
+Counterfactual Question: What would have happened if the woman had not opened the treasure chest?
+Option 1:
+Option 2: The treasure chest would have been open.
+Option 3: That is not possible.
+Option 4: The treasure chest would have remained closed.
+Answer (Option 1 or 2 or 3 or 4?): 4
+
+Input Event: %s
+Counterfactual Question: %s
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Option 4: %s
+Answer (Option 1 or 2 or 3 or 4?):""",
+    'one-shot-IcL-CN':
+    """通过思考假设情况或另一种现实，预测因果事件的影响。这涉及改变实际事件或情况的特定元素或条件。
+输入事件：一名女子打开了一个宝藏箱。
+反事实问题：如果那位女士没有打开宝藏箱会怎么样？
+选项一：
+选项二：这个宝藏箱可能已经被打开了。
+选项三：那是不可能的。
+选项四：这个宝藏箱可能还会保持关闭状态。
+答案（选项一或选项二或选项三或选项四？）：四
+
+输入事件：%s
+反事实问题：%s
+选项一：%s
+选项二：%s
+选项三：%s
+选项四：%s
+答案（选项一或选项二或选项三或选项四？）：""",
+    'three-shot-IcL':
+    """Predict the effects of causal events by contemplating hypothetical situations or alternate realities. This involves altering specific elements or conditions of an actual event or situation.
+Input Event: A woman opens a treasure chest.
+Counterfactual Question: What would have happened if the woman had not opened the treasure chest?
+Option 1:
+Option 2: The treasure chest would have been open.
+Option 3: That is not possible.
+Option 4: The treasure chest would have remained closed.
+Answer (Option 1 or 2 or 3 or 4?): 4
+
+Input Event: A police officer calms down a hostage-taker.
+Counterfactual Question: What would have happened if the police officer had not calmed the hostage-taker?
+Option 1:
+Option 2: The hostages would have remained in danger.
+Option 3: That is not possible.
+Option 4: The hostage-taker would have released the hostages anyway.
+Answer (Option 1 or 2 or 3 or 4?): 2
+
+Input Event: A man talks about a lion.
+Counterfactual Question: What would have happened if the man had talked to the lion?
+Option 1: Without a barrier, the lion would have been eaten.
+Option 2:
+Option 3: Without a barrier, the man would have been eaten.
+Option 4: Nothing special would have happened.
+Answer (Option 1 or 2 or 3 or 4?): 3
+
+Input Event: %s
+Counterfactual Question: %s
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Option 4: %s
+Answer (Option 1 or 2 or 3 or 4?):""",
+    'three-shot-IcL-CN':
+    """通过思考假设情况或另一种现实，预测因果事件的影响。这涉及改变实际事件或情况的特定元素或条件。
+输入事件：一名女子打开了一个宝藏箱。
+反事实问题：如果那位女士没有打开宝藏箱会怎么样？
+选项一：
+选项二：这个宝藏箱可能已经被打开了。
+选项三：那是不可能的。
+选项四：这个宝藏箱可能还会保持关闭状态。
+答案（选项一或选项二或选项三或选项四？）：四
+
+输入事件：一个警察安抚了一位挟持者的情绪。
+反事实问题：如果警察没有安抚劫匪，会发生什么？
+选项一：
+选项二：这些人质可能仍然处于危险之中。
+选项三：那是不可能的。
+选项四：这位劫持者可能最终还是会释放人质的。
+答案（选项一或选项二或选项三或选项四？）：二
+
+输入事件：一位男子谈论一只狮子。
+反事实问题：如果那个男子跟狮子说话了会怎样？
+选项一：如果没有屏障的保护，狮子就会被吃掉。
+选项二：
+选项三：如果没有屏障的保护，那个男人就会被吃掉了。
+选项四：没什么特别的事情发生了。
+答案（选项一或选项二或选项三或选项四？）：三
+
+输入事件：%s
+反事实问题：%s
+选项一：%s
+选项二：%s
+选项三：%s
+选项四：%s
+答案（选项一或选项二或选项三或选项四？）：""",
+    'zero-shot-CoT':
+    """Input Event: %s
+    Counterfactual Question: %s Let's think step by step.
+    Option 1: %s
+    Option 2: %s
+    Option 3: %s
+    Option 4: %s
+    Answer (Option 1 or 2 or 3 or 4?):""",
+    'zero-shot-CoT-CN':
+    """输入事件：%s请逐步思考。
+反事实问题：%s
+选项一：%s
+选项二：%s
+选项三：%s
+选项四：%s
+答案（选项一或选项二或选项三或选项四？）:""",
+    'manual-CoT':
+    """Here we will provide eight chain-of-thought exemplars, followed by a multi-choice question that needs to be answered with chain-of-thought.
+
+Input Event: A bird lands in a forest.
+Counterfactual Question: What would have happened if a meteor had landed in the forest?
+Option 1: The bird would have liked the meteor.
+Option 2:
+Option 3: A big one would have started a wildfire.
+Option 4: That is not possible.
+Answer (Option 1 or 2 or 3 or 4? With chain-of-thought): The initial scenario is that a bird lands in a forest. The counterfactual question introduces a hypothetical scenario where a meteor lands in the same forest. The meteor has a high temperature and could have triggered a wildfire due to the resulting heat. Thus, the answer is Option 3: A big one would have started a wildfire.
+
+Input Event: A man reports a crime.
+Counterfactual Question: What would have happened if the man had cleared up the crime?
+Option 1: He would have been a detective.
+Option 2: The room would have been clean by now.
+Option 3: That is not possible.
+Option 4: He would have been the owner of the crime scene.
+Answer (Option 1 or 2 or 3 or 4? With chain-of-thought): The initial scenario is that a man reports a crime. The counterfactual question introduces a hypothetical scenario where the man had cleared up the crime. It suggests that he could have demonstrated qualities and skills similar to those of a detective. Thus, the answer is Option 1: He would have been a detective.
+
+Input Event: A country loses a war.
+Counterfactual Question: What would have happened if the country had won the war?
+Option 1: That is not possible.
+Option 2: Most people of the winning country would have been sad.
+Option 3: Most people of the winning country would have been happy.
+Option 4: The war would have continued.
+Answer (Option 1 or 2 or 3 or 4? With chain-of-thought): The initial scenario is that a country loses a war. The counterfactual question introduces a hypothetical scenario where the country had won the war. The prevailing sentiment among the population would be happiness due to the successful outcome of the conflict. Thus, the answer is Option 3: Most people of the winning country would have been happy.
+
+Input Event: A bird flies over a bridge.
+Counterfactual Question: What would have happened if the bird had hit the bridge?
+Option 1: The bird would have caused damage to the bridge.
+Option 2: That is not possible.
+Option 3:
+Option 4: The bird would have been injured.
+Answer (Option 1 or 2 or 3 or 4? With chain-of-thought): The initial scenario is that a bird flies over a bridge. The counterfactual question introduces a hypothetical scenario where the bird had hit the bridge. Then the bird will get injured due to the collision. Thus, the answer is Option 4: The bird would have been injured.
+
+Input Event: A girl mopped her floor.
+Counterfactual Question: What would have happened if the girl had poured mud on her floor?
+Option 1:
+Option 2: The floor would be dirty.
+Option 3: That is not possible.
+Option 4: The floor would be clean.
+Answer (Option 1 or 2 or 3 or 4? With chain-of-thought): The initial scenario is that a  girl mopped her floor. The counterfactual question introduces a hypothetical scenario where the girl had poured mud on her floor. Then the floor becomes dirty due to the presence of mud. Thus, the answer is Option 2: The floor would be dirty.
+
+Input Event: You accepted an MTurk HIT.
+Counterfactual Question: What would have happened if you had rejected the MTurk HIT?
+Option 1: The turker would have been disappointed.
+Option 2: That is not possible.
+Option 3:
+Option 4: The turker would have been happy.
+Answer (Option 1 or 2 or 3 or 4? With chain-of-thought): The initial scenario is that you have accepted a task on Amazon Mechanical Turk (MTurk). The counterfactual question introduces a hypothetical scenario where you reject the MTurk HIT. The worker (referred to as "turker") who submitted the work would likely have been disappointed since their efforts would not result in compensation. Thus, the answer is Option 1: The turker would have been disappointed.
+
+Input Event: A woman does not write an article.
+Counterfactual Question: What would have happened if the woman had written an article?
+Option 1: She would have gotten it published.
+Option 2:
+Option 3: That is not possible.
+Option 4: She would not have gotten it published.
+Answer (Option 1 or 2 or 3 or 4? With chain-of-thought): The initial scenario is that a woman does not write an article. The counterfactual question introduces a hypothetical scenario where the woman had written an article. Then she could have succeeded in getting it published. Thus, the answer is Option 1: She would have gotten it published.
+
+Input Event: A woman does not put pen to paper.
+Counterfactual Question: What would have happened if she had put pen to paper?
+Option 1: The woman would have moved her right hand.
+Option 2: That is not possible.
+Option 3: The woman would not have moved her right hand.
+Option 4:
+Answer (Option 1 or 2 or 3 or 4? With chain-of-thought): The initial scenario is that a  woman does not put pen to paper. The counterfactual question introduces a hypothetical scenario where she had put pen to paper. Then she would have naturally moved her right hand to perform the writing action. Thus, the answer is Option 1: The woman would have moved her right hand.
+
+Input Event: A woman opens a treasure chest.
+Counterfactual Question: What would have happened if the woman had not opened the treasure chest?
+Option 1:
+Option 2: The treasure chest would have been open.
+Option 3: That is not possible.
+Option 4: The treasure chest would have remained closed.
+Answer (Option 1 or 2 or 3 or 4? With chain-of-thought): The initial scenario is that a woman opens a treasure chest. The counterfactual question introduces a hypothetical scenario where the woman had not opened the treasure chest. Then the treasure chest would have remained closed. Thus, the answer is Option 3: That is not possible.
+
+Input Event: %s
+Counterfactual Question: %s
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Option 4: %s
+Answer (Option 1 or 2 or 3 or 4?):""",
+    'manual-CoT-CN':
+    """如下为三个使用思维链进行推理的问题:
+
+输入事件：位调酒师递来了饮料。
+反事实问题：如果这位调酒师喝掉这些饮料会怎么样？
+选项一：那是不可能的。
+选项二：客人可能就不会拿到他们点的饮料了。
+选项三：客人可能就会得到他们的酒水了。
+选项四：
+答案（选项一或选项二或选项三或选项四？）: 如果调酒师喝掉了饮料，那么显然这些饮料将不再可用，客人可能就不会拿到他们点的饮料了。因此答案是选项二。
+
+输入事件：一位女士聒噪且惹人讨厌。
+反事实问题：她要是更安静点会怎么样？
+选项一：在她身边可能会很高兴。
+选项二：
+选项三：和她呆在一起可能会很不愉快。
+选项四：那是不可能的。
+答案（选项一或选项二或选项三或选项四？）: 如果这位女士更安静点，周围的人不会再受到她的干扰，可能会很高兴。因此答案是选项一。
+
+输入事件：一位女性被一所大学录取了。
+反事实问题：这位女士如果被这所大学拒绝了会怎么样？
+选项一：那是不可能的。
+选项二：这位女士可能会很开心。
+选项三：
+选项四：这位女士可能会感到悲伤。
+答案（选项一或选项二或选项三或选项四？）: 这位女士可能会感到悲伤，因为她没有被这所大学录取。因此答案是选项四。
+
+输入事件：%s
+反事实问题：%s
+选项一：%s
+选项二：%s
+选项三：%s
+选项四：%s
+答案（选项一或选项二或选项三或选项四？）:""",
+    'explicit-function':
+    """You are a helpful assistant for counterfactual reasoning.
+Input Event: %s
+Counterfactual Question: %s
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Option 4: %s
+Answer (Option 1 or 2 or 3 or 4?):""",
+    'explicit-function-CN':
+    """你是一个用于反事实推理的得力助手。
+输入事件：%s
+反事实问题：%s
+选项一：%s
+选项二：%s
+选项三：%s
+选项四：%s
+答案（选项一或选项二或选项三或选项四？）""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['premise'], item['QCC'],
+                                        item['Answer1'], item['Answer2'],
+                                        item['Answer3'], item['Answer4'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/EAE-B_exp-away.py
+++ b/opencompass/datasets/calm/data_processing/prompt/EAE-B_exp-away.py
@ -0,0 +1,126 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic': """Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'basic-CN': """输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'adversarial-ignore': """Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'adversarial-ignore-CN': """输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'adversarial-doubt': """Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'adversarial-doubt-CN': """输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'zero-shot-IcL': """Answer questions about explaining away effect.
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'zero-shot-IcL-CN': """请回答关于相消解释作用的问题。
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'one-shot-IcL': """Answer questions about explaining away effect.
+Input Info: The overall probability of attractive appearance is 48%%. For people considered unattractive and are not famous, the probability of talent is 3%%. For people considered unattractive and are famous, the probability of talent is 9%%. For people considered attractive and are not famous, the probability of talent is 2%%. For people considered attractive and are famous, the probability of talent is 6%%.
+Question: If we look at people who are famous, does the chance of talent increase when attractive appearance?
+Answer (Yes or No ?):No.
+
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'one-shot-IcL-CN': """请回答关于相消解释作用的问题。
+输入信息：拥有迷人外表的总体概率是48%%。对于被认为外表不迷人且不出名的人来说，有天赋的概率是3%%。对于被认为没有外表不迷人但很出名的人来说，有天赋的概率是9%%。对于被认为外表迷人但不出名的人来说，有天赋的概率是2%%。对于被认为外表迷人且出名的人来说，有天赋的概率是6%%。
+问题：如果我们观察那些出名的人，当他们拥有迷人的外表时，其拥有天赋的概率会增加吗？
+答案（是或否？）：否
+
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'three-shot-IcL': """Answer questions about explaining away effect.
+Input Info: The overall probability of attractive appearance is 48%%. For people considered unattractive and are not famous, the probability of talent is 3%%. For people considered unattractive and are famous, the probability of talent is 9%%. For people considered attractive and are not famous, the probability of talent is 2%%. For people considered attractive and are famous, the probability of talent is 6%%.
+Question: If we look at people who are famous, does the chance of talent increase when attractive appearance?
+Answer (Yes or No ?):No.
+
+Input Info: The overall probability of attractive appearance is 56%%. For people considered unattractive and are not famous, the probability of talent is 7%%. For people considered unattractive and are famous, the probability of talent is 18%%. For people considered attractive and are not famous, the probability of talent is 4%%. For people considered attractive and are famous, the probability of talent is 14%%.
+Question: If we look at people who are famous, does the chance of talent increase when attractive appearance?
+Answer (Yes or No ?): no
+
+Input Info: The overall probability of talent is 59%%. For students who are not talented and rejected from elite institutions, the probability of being hard-working is 37%%. For students who are not talented and accepted to elite institutions, the probability of being hard-working is 73%%. For students who are talented and rejected from elite institutions, the probability of being hard-working is 30%%. For students who are talented and accepted to elite institutions, the probability of being hard-working is 63%%.
+Question: If we look at students accepted to elite institutions, does the chance of being hard-working decrease when talent?
+Answer (Yes or No ?): yes
+
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'three-shot-IcL-CN': """请回答关于相消解释作用的问题。
+输入信息：拥有迷人外表的总体概率是48%%。对于被认为外表不迷人且不出名的人来说，有天赋的概率是3%%。对于被认为没有外表不迷人但很出名的人来说，有天赋的概率是9%%。对于被认为外表迷人但不出名的人来说，有天赋的概率是2%%。对于被认为外表迷人且出名的人来说，有天赋的概率是6%%。
+问题：如果我们观察那些出名的人，当他们拥有迷人的外表时，其拥有天赋的概率会增加吗？
+答案（是或否？）：否
+
+输入信息：拥有迷人外表的总体概率是56%%。对于被认为外表不迷人且不出名的人来说，有天赋的概率是7%%。对于被认为没有外表不迷人但很出名的人来说，有天赋的概率是18%%。对于被认为外表迷人但不出名的人来说，有天赋的概率是4%%。对于被认为外表迷人且出名的人来说，有天赋的概率是14%%。
+问题：如果我们观察那些出名的人，当他们拥有迷人的外表时，其拥有天赋的概率会增加吗？
+答案（是或否？）：否
+
+输入信息：有天赋的总体概率是59%%。对于没有天赋并被精英学校拒之门外的学生来说，努力工作的概率是37%%。对于没有天赋却被精英学校录取的学生来说，努力工作的概率是73%%。对于有天赋却被精英学校拒之门外的学生来说，努力工作的概率是30%%。对于有天赋并被精英学校录取的学生来说，努力工作的概率是63%%。
+问题：如果我们观察那些被精英学校录取的学生，当他们有天赋时，其努力工作的概率会降低吗？
+答案（是或否？）：是
+
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'zero-shot-CoT': """Input Info: %s
+Question: %s Let's think step by step.
+Answer (Yes or No ?):""",
+    'zero-shot-CoT-CN': """输入信息：%s
+问题：%s请逐步思考。
+答案（是或否？）：""",
+    'manual-CoT':
+    """Here are three examples of problems about explaining away effect with chain of thought.
+
+Input Info: The overall probability of attractive appearance is 81%%. For people considered unattractive and are not famous, the probability of talent is 98%%. For people considered unattractive and are famous, the probability of talent is 92%%. For people considered attractive and are not famous, the probability of talent is 97%%. For people considered attractive and are famous, the probability of talent is 86%%.
+Question: If we look at people who are famous, does the chance of talent increase when attractive appearance?
+Answer (Yes or No ?): Let Y = talent; X = appearance; V3 = fame. The causal relations in this scenario are: X->V3,Y->V3. According to the question, we have: P(X=1) = 0.81\nP(Y=1 | X=0, V3=0) = 0.98\nP(Y=1 | X=0, V3=1) = 0.92\nP(Y=1 | X=1, V3=0) = 0.97\nP(Y=1 | X=1, V3=1) = 0.86. Calculate P(Y = 1 | X = 1, V3 = 1] - P(Y = 1 | V3 = 1)=P(Y=1 | X=1, V3=1) - (P(X=1) * P(Y=1 | X=1, V3=1) + P(X=0) * P(Y=1 | X=0, V3=1))=0.86 - (0.81*0.86 + 0.19*0.92) = -0.03<0. Thus, if we look at people who are famous, the chance of talent does not increase when attractive appearance. Therefore, the answer is No.
+
+Input Info: The overall probability of speaking english is 96%%. For people who do not speak english and are not famous, the probability of talent is 98%%. For people who do not speak english and are famous, the probability of talent is 95%%. For people who speak english and are not famous, the probability of talent is 98%%. For people who speak english and are famous, the probability of talent is 93%%.
+Question: If we look at people who are famous, does the chance of talent decrease when speaking english?
+Answer (Yes or No ?): Let Y = talent; X = ability to speak english; V3 = fame. The causal relations in this scenario are: X->V3,Y->V3. According to the question, we have: P(X=1) = 0.96\nP(Y=1 | X=0, V3=0) = 0.98\nP(Y=1 | X=0, V3=1) = 0.95\nP(Y=1 | X=1, V3=0) = 0.98\nP(Y=1 | X=1, V3=1) = 0.93. Calculate P(Y = 1 | X = 1, V3 = 1] - P(Y = 1 | V3 = 1)=P(Y=1 | X=1, V3=1) - (P(X=1) * P(Y=1 | X=1, V3=1) + P(X=0) * P(Y=1 | X=0, V3=1))=0.93 - (0.96*0.93 + 0.04*0.95) = -0.00=0. Thus, if we look at people who are famous, the chance of talent decreases when speaking english. Therefore, the answer is Yes.
+
+Input Info: The overall probability of talent is 82%%. For students who are not talented and rejected from elite institutions, the probability of brown eyes is 99%%. For students who are not talented and accepted to elite institutions, the probability of brown eyes is 82%%. For students who are talented and rejected from elite institutions, the probability of brown eyes is 96%%. For students who are talented and accepted to elite institutions, the probability of brown eyes is 53%%.
+Question: If we look at students accepted to elite institutions, does the chance of brown eyes increase when talent?
+Answer (Yes or No ?): Let Y = brown eyes; X = talent; V3 = elite institution admission status. The causal relations in this scenario are: X->V3,Y->V3. According to the question, we have: P(X=1) = 0.82\nP(Y=1 | X=0, V3=0) = 0.99\nP(Y=1 | X=0, V3=1) = 0.82\nP(Y=1 | X=1, V3=0) = 0.96\nP(Y=1 | X=1, V3=1) = 0.53. Calculate P(Y = 1 | X = 1, V3 = 1] - P(Y = 1 | V3 = 1)=P(Y=1 | X=1, V3=1) - (P(X=1) * P(Y=1 | X=1, V3=1) + P(X=0) * P(Y=1 | X=0, V3=1))=0.53 - (0.82*0.53 + 0.18*0.82) = -0.12<0. Thus, if we look at students accepted to elite institutions, the chance of brown eyes does not increase when talent. Therefore, the answer is No.
+
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'manual-CoT-CN': """如下为一个使用思维链进行推理的有关解释移除效应的问题：
+
+输入信息：呼吸系统有问题的总体概率是49%%。对于呼吸系统没有问题且未住院的人来说，骨折的概率是15%%。对于呼吸系统没有问题但住院的人来说，骨折的概率是31%%。对于呼吸系统有问题但未住院的人来说，骨折的概率是7%%。对于呼吸系统有问题且已住院的人来说，骨折的概率为27%%。
+问题：如果我们观察那些住院患者，当他们呼吸系统出现问题时，其骨折的概率会降低吗？
+答案（是或否？）：令 Y = 骨折; X = 呼吸系统问题; V3 = 住院状况; 该问题下的因果关系有： X->V3,Y->V3。由题目信息可知：P(X=1) = 0.49\nP(Y=1 | X=0, V3=0) = 0.15\nP(Y=1 | X=0, V3=1) = 0.31\nP(Y=1 | X=1, V3=0) = 0.07\nP(Y=1 | X=1, V3=1) = 0.27。计算P(Y = 1 | X = 1, V3 = 1] - P(Y = 1 | V3 = 1)=P(Y=1 | X=1, V3=1) - (P(X=1) * P(Y=1 | X=1, V3=1) + P(X=0) * P(Y=1 | X=0, V3=1))=0.27 - (0.49*0.27 + 0.51*0.31) = -0.01<0。因此，如果我们观察那些住院患者，当他们呼吸系统出现问题时，其骨折的概率会降低。因此答案为“是”。
+
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+    'explicit-function':
+    """You are a helpful assistant for explaining away effect.
+Input Info: %s
+Question: %s
+Answer (Yes or No ?):""",
+    'explicit-function-CN': """你是一个用于评估相消解释效应的得力助手。
+输入信息：%s
+问题：%s
+答案（是或否？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['given_info'], item['question'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/ECI-B_CTB.py
+++ b/opencompass/datasets/calm/data_processing/prompt/ECI-B_CTB.py
@ -0,0 +1,183 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'basic-CN':
+    """输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'adversarial-ignore':
+    """Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'adversarial-ignore-CN':
+    """输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'adversarial-doubt':
+    """Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'adversarial-doubt-CN':
+    """输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'zero-shot-IcL':
+    """determine whether there is a causal relationship between two events in a sentence.
+Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'zero-shot-IcL-CN':
+    """判断句子中的两个事件之间是否存在因果关系。
+输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'one-shot-IcL':
+    """determine whether there is a causal relationship between two events in a sentence.
+Input: The break in an undersea cable on that affected Seacom has been repaired
+Question: is there a causal relationship between "affected" and "Seacom" ?
+Answer (Yes or No ?):No.
+
+Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'one-shot-IcL-CN':
+    """判断句子中的两个事件之间是否存在因果关系。
+输入：影响东南非洲海底光缆系统的海底电缆断裂处已经修复。
+问题："影响"和"东南非洲海底光缆系统"之间是否存在因果关系？
+答案（是或否？）：否
+
+输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'three-shot-IcL':
+    """determine whether there is a causal relationship between two events in a sentence.
+Input: The break in an undersea cable on that affected Seacom has been repaired
+Question: is there a causal relationship between "affected" and "Seacom" ?
+Answer (Yes or No ?):No.
+
+Input: The actress , 26 , checked in late Thursday night , TMZ reports , barely making the deadline and dodging an arrest warrant .
+Question: is there a causal relationship between "checked in" and "deadline" ?
+Answer (Yes or No ?): Yes
+
+Input: In a statement , the White House said it would do " whatever is necessary " to ensure compliance with the sanctions .
+Question: is there a causal relationship between "ensure" and "do" ?
+Answer (Yes or No ?): No.
+
+Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'three-shot-IcL-CN':
+    """判断句子中的两个事件之间是否存在因果关系。
+输入：影响东南非洲海底光缆系统的海底电缆断裂处已经修复。
+问题："影响"和"东南非洲海底光缆系统"之间是否存在因果关系？
+答案（是或否？）：否
+
+输入：据TMZ报道，这位26岁的女演员在周四深夜赶到法院报到，刚好赶上截止日期并避免了被逮捕。
+问题："报到"和"截止日期"之间是否存在因果关系？
+答案（是或否？）：是
+
+输入：在一份声明中，白宫表示将采取’’一切必要手段”确保制裁得到遵守。
+问题："确保"和"采取"之间是否存在因果关系？
+答案（是或否？）：否
+
+输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'zero-shot-CoT':
+    """Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ? Let's think step by step.
+Answer:""",
+    'zero-shot-CoT-CN':
+    """输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？请逐步思考。
+答案（是或否？）：""",
+    'manual-CoT':
+    """Here we will provide eight chain-of-thought exemplars, followed by a causality identification question that needs to be answered with chain-of-thought.
+
+Input: The truck maker said the significant drop in net income will result in lower earnings for the fiscal year .
+Question: is there a causal relationship between "earnings" and "drop" ?
+Answer(Yes or No? With chain-of-thought): The term "drop" indicates a decrease or reduction in something, which in this case is the net income. Net income is directly related to earnings, as it represents the amount of profit left after deducting expenses from revenue. Thus, the answer is Yes.
+
+Input: said it plans to aggressively discount its major beer brands , setting the stage for a potentially bruising price war as the maturing industry 's growth continues to slow .
+Question: is there a causal relationship between "continues" and "setting" ?
+Answer(Yes or No? With chain-of-thought): The term "setting the stage" suggests preparing or creating a context for something. The word "continues" refers to the ongoing slowing down of the industry's growth. The slowing growth of the industry isn't directly a result of the company's action of setting the stage for a price war. Thus, the answer is No.
+
+Input: The charges were offset in part by a gain from the sale of the company 's construction division .
+Question: is there a causal relationship between "sale" and "gain" ?
+Answer(Yes or No? With chain-of-thought): The term "gain" suggests a positive financial outcome or benefit. The sale of the construction division directly leads to the gain mentioned. The act of selling the construction division causes or results in the gain mentioned. Thus, the answer is Yes.
+
+Input: The Atlanta-based airline , the third largest in the U.S., attributed the increase to higher passenger traffic , new international routes and reduced service by rival Eastern Airlines , which is in bankruptcy proceedings in the wake of a strike that began last spring .
+Question: is there a causal relationship between "began" and "increase" ?
+Answer(Yes or No? With chain-of-thought): The strike likely led to disruptions in Eastern Airlines' services. The text mentions that the Atlanta-based airline attributed an increase to "reduced service by rival Eastern Airlines." The strike (began) caused disruptions in Eastern Airlines' services, which in turn could have caused an increase in passenger traffic for the Atlanta-based airline. Thus, the answer is Yes.
+
+Input: We in fact have seen hate group numbers dropping through the nineties , uh but this year they jumped up uh twenty percent , quite a dramatic rise .
+Question: is there a causal relationship between "jumped" and "seen" ?
+Answer(Yes or No? With chain-of-thought): The term "jumped up" indicates a sudden increase in hate group numbers. The term "seen" suggests that the speaker has observed this increase. The act of seeing the increase (jumped) doesn't directly cause the increase itself. Thus, the answer is No.
+
+Input: Bertin Nadeau , newly appointed chairman and interim chief executive of Provigo , would n't say if Mr. Lortie was asked to leave .
+Question: is there a causal relationship between "leave" and "asked" ?
+Answer(Yes or No? With chain-of-thought): Bertin Nadeau wouldn't confirm whether Mr. Lortie was asked to leave. The actions here are Mr. Lortie leaving and Mr. Lortie being asked to leave. The act of Mr. Lortie leaving isn't directly caused by him being asked to leave. Thus, the answer is No.
+
+Input: The Kearny , N.J.-based maker of hair accessories and other cosmetic products said it cut the dividend due to its third-quarter loss of $ 992,000 , or 15 cents a share .
+Question: is there a causal relationship between "loss" and "said" ?
+Answer(Yes or No? With chain-of-thought): Losses can negatively impact a company's finances, reducing the funds available for distribution to shareholders (dividends). The term "said" indicates the company's communication about the dividend cut. The financial loss (loss) led to the company's decision to cut its dividend, which is the reason behind the communication (said) about the dividend cut. Thus, the answer is Yes.
+
+Input: Officials said the president himself met with Sununu Sunday .
+Question: is there a causal relationship between "met" and "said" ?
+Answer(Yes or No? With chain-of-thought):  The president met with Sununu on Sunday. Said" introduces the report or statement about the meeting, but it's not the cause of the meeting itself. The act of "saying" isn't the cause of the president "meeting" with Sununu. Thus, the answer is No.
+
+Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'manual-CoT-CN':
+    """如下为三个使用思维链进行推理的问题:
+
+输入：Sloan股份有限公司表示，该公司聘请了一家投资银行公司协助评估重组或合并方案，并报告截至9月30日的第三季度净亏损810万美元，即每股214美元。
+问题：\”协助\”和\”报道\”之间是否存在因果关系？
+答案（是或否？）：“协助”和“报道”表示投资银行公司的两个不同的动作，“报道”是“协助”的后续动作，但并没有因果关系。因此答案为“否”。
+
+输入：该公司表示，由于服务中心减少了库存、汽车市场低迷以及建筑市场竞争加剧等原因，导致其出货量下降。
+问题：\"低迷\"和\"减少\"之间是否存在因果关系？
+答案（是或否？）：“服务中心减少了库存”是“出货量下降”的原因之一，因此答案为“是”。
+
+输入：伦敦股市在纽约隔夜下跌以及洛威辞职后英镑贬值的情况下，初期受到压制。
+问题：\"下跌\"和\"压制\"之间是否存在因果关系？
+答案（是或否？）：伦敦股市下跌和英镑贬值等因素是导致股市受到压制的原因之一。因此答案为“是”。
+
+输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'explicit-function':
+    """You are a helpful assistant for event causality identification.
+Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'explicit-function-CN':
+    """你是一个识别事件因果关系的得力助手。
+输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+    if prompt_style in [
+            'basic', 'adversarial-ignore', 'adversarial-doubt',
+            'zero-shot-CoT', 'manual-CoT', 'zero-shot-IcL', 'one-shot-IcL',
+            'three-shot-IcL', 'explicit-function'
+    ]:
+        words = item['words']
+        sent = ' '.join(words)
+        events = item['events']
+        event1 = ' '.join([words[t] for t in events[0]])
+        event2 = ' '.join([words[t] for t in events[1]])
+        prompt = prompt_style_str + base % (sent, event1, event2)
+    else:
+        prompt = prompt_style_str + base % (item['sent'], item['event1'],
+                                            item['event2'])
+
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/ECI-B_ESC.py
+++ b/opencompass/datasets/calm/data_processing/prompt/ECI-B_ESC.py
@ -0,0 +1,183 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'basic-CN':
+    """输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'adversarial-ignore':
+    """Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'adversarial-ignore-CN':
+    """输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'adversarial-doubt':
+    """Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'adversarial-doubt-CN':
+    """输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'zero-shot-IcL':
+    """determine whether there is a causal relationship between two events in a sentence.
+Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'zero-shot-IcL-CN':
+    """判断句子中的两个事件之间是否存在因果关系。
+输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'one-shot-IcL':
+    """determine whether there is a causal relationship between two events in a sentence.
+Input: The break in an undersea cable on that affected Seacom has been repaired
+Question: is there a causal relationship between "affected" and "Seacom" ?
+Answer (Yes or No ?):No.
+
+Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'one-shot-IcL-CN':
+    """判断句子中的两个事件之间是否存在因果关系。
+输入：影响东南非洲海底光缆系统的海底电缆断裂处已经修复。
+问题："影响"和"东南非洲海底光缆系统"之间是否存在因果关系？
+答案（是或否？）：否
+
+输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'three-shot-IcL':
+    """determine whether there is a causal relationship between two events in a sentence.
+Input: The break in an undersea cable on that affected Seacom has been repaired
+Question: is there a causal relationship between "affected" and "Seacom" ?
+Answer (Yes or No ?):No.
+
+Input: The actress , 26 , checked in late Thursday night , TMZ reports , barely making the deadline and dodging an arrest warrant .
+Question: is there a causal relationship between "checked in" and "deadline" ?
+Answer (Yes or No ?): Yes
+
+Input: In a statement , the White House said it would do " whatever is necessary " to ensure compliance with the sanctions .
+Question: is there a causal relationship between "ensure" and "do" ?
+Answer (Yes or No ?): No.
+
+Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'three-shot-IcL-CN':
+    """判断句子中的两个事件之间是否存在因果关系。
+输入：影响东南非洲海底光缆系统的海底电缆断裂处已经修复。
+问题："影响"和"东南非洲海底光缆系统"之间是否存在因果关系？
+答案（是或否？）：否
+
+输入：据TMZ报道，这位26岁的女演员在周四深夜赶到法院报到，刚好赶上截止日期并避免了被逮捕。
+问题："报到"和"截止日期"之间是否存在因果关系？
+答案（是或否？）：是
+
+输入：在一份声明中，白宫表示将采取’’一切必要手段”确保制裁得到遵守。
+问题："确保"和"采取"之间是否存在因果关系？
+答案（是或否？）：否
+
+输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'zero-shot-CoT':
+    """Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ? Let's think step by step.
+Answer:""",
+    'zero-shot-CoT-CN':
+    """输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？请逐步思考。
+答案（是或否？）：""",
+    'manual-CoT':
+    """Here we will provide eight chain-of-thought exemplars, followed by a causality identification question that needs to be answered.
+
+Input: He was sentenced to life in prison for indecency with a child , aggravated sexual assault and two counts of aggravated assault with a deadly weapon , MyFoxHouston . com reports .
+Question: is there a causal relationship between "indecency" and "sentenced" ?
+Answer(Yes or No? With chain-of-thought): The other charges, such as "aggravated sexual assault" and "aggravated assault with a deadly weapon," also play a significant role in the severity of the sentence.  In this case, the causal relationship might not be direct, as the sentence could be the result of the cumulative impact of all the charges brought against the individual. Thus, the answer is No.
+
+Input: A fire - bomb attack on a bank in Greece killed at least three people Wednesday as police fought pitched battles with striking protestors furious at brutal budget cuts designed to avert national bankruptcy .
+Question: is there a causal relationship between "cuts" and "battles" ?
+Answer(Yes or No? With chain-of-thought): The severe budget cuts, aimed at averting national bankruptcy, have triggered public anger and protests. These protests have, in turn, escalated into violent clashes with the police. Thus, the answer is Yes.
+
+Input: “ Tonight was a peaceful vigil that devolved into a riot , ” Williams wrote .
+Question: is there a causal relationship between "vigil" and "devolved" ?
+Answer(Yes or No? With chain-of-thought): In this context, the transition from a vigil to a riot does not necessarily imply a direct causal relationship between the two. Rather, it indicates a shift or transformation from one type of event to another due to various factors. Thus, the answer is No.
+
+Input: Klitschko finally landed a long , straight right in the fifth , and the round ended with Thompson struggling on the ropes .
+Question: is there a causal relationship between "fifth" and "round" ?
+Answer(Yes or No? With chain-of-thought): There isn't a direct causal relationship between the "fifth" and the "round." The numbering of the round doesn't inherently cause an action or event; it merely designates the order. Thus, the answer is No.
+
+Input: Lyons said that Comeaux used a wheelchair in prison – he "claimed it was necessary for his mobility" – and added , "Since he fled on foot , that's obviously in question . "
+Question: is there a causal relationship between "question" and "fled" ?
+Answer(Yes or No? With chain-of-thought): The "question" about the necessity of Comeaux's wheelchair is directly caused by the fact that he "fled on foot," which contradicts his claim of needing the wheelchair for mobility. Thus, the answer is Yes.
+
+Input: Man charged with arson over Waitrose fire in Wellington
+Question: is there a causal relationship between "arson" and "fire" ?
+Answer(Yes or No? With chain-of-thought): The term "arson" inherently involves causing a fire intentionally. The act of arson is directly causal to the fire that results from it. Thus, the answer is Yes.
+
+Input: On Friday , 36 - year - old Duncan Raite died after slipping and falling about 60 metres ( 200 feet ) from a ridge .
+Question: is there a causal relationship between "slipping" and "falling" ?
+Answer(Yes or No? With chain-of-thought): When Duncan Raite slipped, he lost his footing or traction, which directly led to the effect of falling. The fall was a direct consequence of the slip. Thus, the answer is Yes.
+
+Input: Riots over harsh new austerity measures left three bank workers dead and engulfed the streets of Athens on Wednesday , as angry protesters tried to storm parliament , hurled Molotov cocktails at police and torched buildings .
+Question: is there a causal relationship between "storm" and "tried" ?
+Answer(Yes or No? With chain-of-thought): "Tried" doesn't directly cause "storm"; instead, "tried" describes the initial intention or effort that leads to the subsequent action of "storming." Thus, the answer is No.
+
+Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'manual-CoT-CN':
+    """如下为三个使用思维链进行推理的问题:
+
+输入：陪审团展示了一个可怕的时刻，一位无辜的母亲在接女儿放学时遭遇帮派袭击，在保护孩子时被枪杀
+问题：\"展示\”和\”袭击\”之间是否存在因果关系？
+答案（是或否？）：“陪审团展示”和“母亲被袭击”之间没有因果关系，因此答案为“否”。
+
+输入：警长副手被枪击致死
+问题：\”枪击\”和\”致死\”之间是否存在因果关系？
+答案（是或否？）：枪击是导致警长副手死亡的原因，因此答案为“是”。
+
+输入：地震真的很强烈，人们惊慌失措地涌上街头。
+问题：\"地震\"和\"惊慌\"之间是否存在因果关系？
+答案（是或否？）：强烈的“地震”导致了人们“惊慌”的情绪，因此答案为“是”。
+
+输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'explicit-function':
+    """You are a helpful assistant for event causality identification.
+Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'explicit-function-CN':
+    """你是一个识别事件因果关系的得力助手。
+输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+    if prompt_style in [
+            'basic', 'adversarial-ignore', 'adversarial-doubt',
+            'zero-shot-CoT', 'manual-CoT', 'zero-shot-IcL', 'one-shot-IcL',
+            'three-shot-IcL', 'explicit-function'
+    ]:
+        words = item['words']
+        sent = ' '.join(words)
+        events = item['events']
+        event1 = ' '.join([words[t] for t in events[0]])
+        event2 = ' '.join([words[t] for t in events[1]])
+        prompt = prompt_style_str + base % (sent, event1, event2)
+    else:
+        prompt = prompt_style_str + base % (item['sent'], item['event1'],
+                                            item['event2'])
+
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/ECI-B_MAVEN-ERE.py
+++ b/opencompass/datasets/calm/data_processing/prompt/ECI-B_MAVEN-ERE.py
@ -0,0 +1,183 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'basic-CN':
+    """输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'adversarial-ignore':
+    """Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'adversarial-ignore-CN':
+    """输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'adversarial-doubt':
+    """Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'adversarial-doubt-CN':
+    """输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'zero-shot-IcL':
+    """determine whether there is a causal relationship between two events in a sentence.
+Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'zero-shot-IcL-CN':
+    """判断句子中的两个事件之间是否存在因果关系。
+输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'one-shot-IcL':
+    """determine whether there is a causal relationship between two events in a sentence.
+Input: The break in an undersea cable on that affected Seacom has been repaired
+Question: is there a causal relationship between "affected" and "Seacom" ?
+Answer (Yes or No ?):No.
+
+Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'one-shot-IcL-CN':
+    """判断句子中的两个事件之间是否存在因果关系。
+输入：影响东南非洲海底光缆系统的海底电缆断裂处已经修复。
+问题："影响"和"东南非洲海底光缆系统"之间是否存在因果关系？
+答案（是或否？）：否
+
+输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'three-shot-IcL':
+    """determine whether there is a causal relationship between two events in a sentence.
+Input: The break in an undersea cable on that affected Seacom has been repaired
+Question: is there a causal relationship between "affected" and "Seacom" ?
+Answer (Yes or No ?):No.
+
+Input: The actress , 26 , checked in late Thursday night , TMZ reports , barely making the deadline and dodging an arrest warrant .
+Question: is there a causal relationship between "checked in" and "deadline" ?
+Answer (Yes or No ?): Yes
+
+Input: In a statement , the White House said it would do " whatever is necessary " to ensure compliance with the sanctions .
+Question: is there a causal relationship between "ensure" and "do" ?
+Answer (Yes or No ?): No.
+
+Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'three-shot-IcL-CN':
+    """判断句子中的两个事件之间是否存在因果关系。
+输入：影响东南非洲海底光缆系统的海底电缆断裂处已经修复。
+问题："影响"和"东南非洲海底光缆系统"之间是否存在因果关系？
+答案（是或否？）：否
+
+输入：据TMZ报道，这位26岁的女演员在周四深夜赶到法院报到，刚好赶上截止日期并避免了被逮捕。
+问题："报到"和"截止日期"之间是否存在因果关系？
+答案（是或否？）：是
+
+输入：在一份声明中，白宫表示将采取’’一切必要手段”确保制裁得到遵守。
+问题："确保"和"采取"之间是否存在因果关系？
+答案（是或否？）：否
+
+输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'zero-shot-CoT':
+    """Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ? Let's think step by step.
+Answer:""",
+    'zero-shot-CoT-CN':
+    """输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？请逐步思考。
+答案（是或否？）：""",
+    'manual-CoT':
+    """Here we will provide eight chain-of-thought exemplars, followed by a causality identification question that needs to be answered.
+
+Input: The events of the Dismissal led to only minor constitutional change .
+Question: is there a causal relationship between "events" and "change" ?
+Answer (Yes or No ?): The word "led to" indicates a causal relationship between "events" and "change." Thus, the answer is Yes.
+
+Input: They went into hiding secretly gaining support and strength .
+Question: is there a causal relationship between "went" and "gaining" ?
+Answer (Yes or No ?): "Went into hiding" indicates an action of seeking refuge or concealing oneself, which is separate from the action of "gaining support and strength." Thus, the answer is No.
+
+Input: On 7 January , the number of houses destroyed throughout the affected area was revised down from 38 to 32 and again down to 27 a few days later .
+Question: is there a causal relationship between "destroyed" and "affected" ?
+Answer (Yes or No ?): Throughout the affected area" suggests that the destruction of houses is a consequence of some event or situation that has affected the area. Thus, the answer is Yes.
+
+Input: There were both independent and signed bands who were booked to play , as well as many vendors for music and related paraphernalia .
+Question: is there a causal relationship between "signed" and "play" ?
+Answer (Yes or No ?): Both independent and signed bands were booked to play at the event. Thus, the answer is No.
+
+Input: Strong winds lashed North Florida , with sustained winds of 125 mph ( 205 km/h ) observed in St. Augustine .
+Question: is there a causal relationship between "lashed" and "observed" ?
+Answer (Yes or No ?): "Lashed" and "observed” are describing different aspects of the weather conditions, but are not causally linked. Thus, the answer is No.
+
+Input: In Thailand , the system produced significant storm surge , damaged or destroyed 1,700 homes , and killed two people .
+Question: is there a causal relationship between "storm" and "damaged" ?
+Answer (Yes or No ?): The storm in Thailand produced a significant storm surge, causing damage to or destruction of 1,700 homes. Thus, the answer is Yes.
+
+Input: Valencia , meanwhile , defeated English sides Arsenal and Leeds United in the knockout phase en route to the final .
+Question: is there a causal relationship between "defeated" and "final" ?
+Answer (Yes or No ?): Valencia defeated English sides Arsenal and Leeds United in the knockout phase of the competition, and as a result of these victories, they progressed or advanced to the final. Thus, the answer is Yes.
+
+Input: Arnold was injured early in the attack , and Morgan led the assault in his place before he became trapped in the lower city and was forced to surrender .
+Question: is there a causal relationship between "forced" and "injured" ?
+Answer (Yes or No ?): Arnold being injured early in the attack, and later, Morgan being forced to surrender in the lower city, these two events are not causally connected. Thus, the answer is No.
+
+Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'manual-CoT-CN':
+    """如下为三个使用思维链进行推理的问题:
+
+输入：然而，当塔赫马斯普目睹一个强大帝国曾经辉煌的首都的贫困和悲惨状况时，他泪流满面。
+问题：\"泪流满面\"和\"眼泪\"之间是否存在因果关系？
+答案（是或否？）：塔赫马斯普的泪流满面是因为他目睹了首都的贫困和悲惨状况。因此答案为“是”。
+
+输入：1811年11月29日的行动是拿破仑战争亚得里亚海战役期间，两个护卫舰中队在亚得里亚海上进行的一次小型海军交战。
+问题：\"交战\"和\"战役\"之间是否存在因果关系？
+答案（是或否？）："交战"和"战役"都涉及到军事行动，但它们不一定具有因果关系。因此答案为“否”。
+
+输入：阿富汗队在决赛中的风云人物拉希德·汗说，赢得洲际杯对我们来说是测试板球的良好准备”。
+问题：\"赢得\"和\"准备\"之间是否存在因果关系？
+答案（是或否？）：他们赢得洲际杯是为了准备测试板球比赛，因此可以认为赢得比赛导致了他们的准备行动。因此答案为“是”。
+
+输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+    'explicit-function':
+    """You are a helpful assistant for event causality identification.
+Input: %s
+Question: is there a causal relationship between \"%s\" and \"%s\" ?
+Answer (Yes or No ?):""",
+    'explicit-function-CN':
+    """你是一个识别事件因果关系的得力助手。
+输入：%s
+问题：\"%s\"和\"%s\"之间是否存在因果关系？
+答案（是或否？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+    if prompt_style in [
+            'basic', 'adversarial-ignore', 'adversarial-doubt',
+            'zero-shot-CoT', 'manual-CoT', 'zero-shot-IcL', 'one-shot-IcL',
+            'three-shot-IcL', 'explicit-function'
+    ]:
+        words = item['words']
+        sent = ' '.join(words)
+        events = item['events']
+        event1 = ' '.join([words[t] for t in events[0]])
+        event2 = ' '.join([words[t] for t in events[1]])
+        prompt = prompt_style_str + base % (sent, event1, event2)
+    else:
+        prompt = prompt_style_str + base % (item['sent'], item['event1'],
+                                            item['event2'])
+
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/ETT.py
+++ b/opencompass/datasets/calm/data_processing/prompt/ETT.py
@ -0,0 +1,182 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'basic-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'adversarial-ignore':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'adversarial-ignore-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'adversarial-doubt':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'adversarial-doubt-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'zero-shot-IcL':
+    """Answer questions about the Effect of the Treatment on the Treated (ETT). Computing the Effect of the Treatment on the Treated  involves focusing solely on the individuals who actually received the treatment. You compare their observed outcomes with what their outcomes would have been had they not received the treatment.
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'zero-shot-IcL-CN':
+    """回答有关 "治疗对受试者的影响"（ETT）的问题。计算治疗对受试者的影响时，只需关注实际接受治疗的个体。将观察到的结果与未接受治疗时的结果进行比较。
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'one-shot-IcL':
+    """Answer questions about the Effect of the Treatment on the Treated (ETT). Computing the Effect of the Treatment on the Treated  involves focusing solely on the individuals who actually received the treatment. You compare their observed outcomes with what their outcomes would have been had they not received the treatment.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Smku has a direct effect on eons. Smku has a direct effect on pgqh. Arbu has a direct effect on eons. Eons has a direct effect on pgqh.
+For those with arbu being low, the probability of eons being high is 0.2617. For those with arbu being high, the probability of eons being high is 0.0291.
+Instruction: Consider the effect of treatment on the treated (ETT) of arbu on eons.
+Question: For those with arbu being low, if their arbu had been high, would the eons have been more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "No", "PROB": "0.2326"}
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'one-shot-IcL-CN':
+    """回答有关 "治疗对受试者的影响"（ETT）的问题。计算治疗对受试者的影响时，只需关注实际接受治疗的个体。将观察到的结果与未接受治疗时的结果进行比较。
+
+输入信息：设想一个只有以下条件，而没有其他因素或因果关系的假设世界：Smku对eons有直接影响。Smku对pgqh有直接影响。Arbu对eons有直接影响。Eons对pgqh有直接影响。
+在arbu为低的条件下, eons为高的概率为0.2617。在arbu为高的条件下, eons为高的概率为0.0291。
+指令：考虑arbu作用于eons的“对被干预者的干预效果”(effect of treatment on the treated, ETT)。
+问题：对于那些arbu为低，假如arbu为高，那么eons更有可能为高吗？
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}： {"ANSWER":"否","PROB":"0.2326"}
+
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'three-shot-IcL':
+    """Answer questions about the Effect of the Treatment on the Treated (ETT). Computing the Effect of the Treatment on the Treated  involves focusing solely on the individuals who actually received the treatment. You compare their observed outcomes with what their outcomes would have been had they not received the treatment.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Smku has a direct effect on eons. Smku has a direct effect on pgqh. Arbu has a direct effect on eons. Eons has a direct effect on pgqh.
+For those with arbu being low, the probability of eons being high is 0.2617. For those with arbu being high, the probability of eons being high is 0.0291.
+Instruction: Consider the effect of treatment on the treated (ETT) of arbu on eons.
+Question: For those with arbu being low, if their arbu had been high, would the eons have been more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "No", "PROB": "0.2326"}
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Sales performance has a direct effect on air pressure. Sales performance has a direct effect on work-life balance. Air pressure has a direct effect on quality of teaching. Work-life balance has a direct effect on quality of teaching.
+
+Instruction: Consider the effect of treatment on the treated (ETT) of air pressure on work-life balance.
+Question: For those with air pressure being high, if their air pressure had been low, would the work-life balance have been more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "No", "PROB": "0.0000"}
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Weather conditions has a direct effect on temperature. Weather conditions has a direct effect on humidity. Temperature has a direct effect on humidity. Temperature has a direct effect on precipitation.
+For those with weather conditions being good, the probability of humidity being low is 0.8897. For those with weather conditions being bad, the probability of humidity being low is 0.7378.
+Instruction: Consider the effect of treatment on the treated (ETT) of weather conditions on humidity.
+Question: For those with weather conditions being good, if their weather conditions had been bad, would the humidity have been more likely to be low?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "No", "PROB": "0.1519"}
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'zero-shot-CoT':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s Let's think step by step.
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'zero-shot-CoT-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s请逐步思考。
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'manual-CoT':
+    """Here are three examples for math problems about effect of treatment on the treated (ETT) task with chain of thought.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Exxp has a direct effect on hnzi. Hnzi has a direct effect on mlhx. Ovlq has a direct effect on hnzi. Wtel has a direct effect on mlhx.
+For those with ovlq being low, the probability of hnzi being low is 0.5625. For those with ovlq being high, the probability of hnzi being low is 0.5062.
+Instruction: Consider the effect of treatment on the treated (ETT) of ovlq on hnzi.
+Question: For those with ovlq being low, if their ovlq had been high, would the hnzi have been more likely to be low?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: With B represents ovlq and C represents hnzi, we find P(C=0|B=0)=0.5625; P(C=0|B=1)=0.5062; Considering there is a path B->C from B to C, and in this situation empty set is a valid backdoor adjustment set, we calculate: ETT=E[C_{B=0}-C_{B=1}|B=0]=P(C=0|B=0)-P(C=0|B=1)=0.5625-0.5062=0.0563>0. The answer is: {"ANSWER": "No", "PROB": "0.0563"}.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Fcun has a direct effect on xmtp. Xmtp has a direct effect on mwcs. Xmtp has a direct effect on bkzf.
+For those with xmtp being low, the probability of mwcs being low is 0.8041. For those with xmtp being high, the probability of mwcs being low is 0.9343.
+Instruction: Consider the effect of treatment on the treated (ETT) of xmtp on mwcs.
+Question: For those with xmtp being low, if their xmtp had been high, would the mwcs have been more likely to be low?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: With B represents xmtp and C represents mwcs, we have P(C=0|B=0)=0.8041; P(C=0|B=1)=0.9343; Considering there is a path B->C from B to C, and in this situation, empty set is a valid backdoor adjustment set, we calculate ETT=E[C_{B=0}-C_{B=1}|B=0]=P(C=0|B=0)-P(C=0|B=1)=0.8041-0.9343=-0.1302<0. The answer is: {"ANSWER": "Yes", "PROB": "-0.1302"}.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Bfgu has a direct effect on fskd. Bfgu has a direct effect on nbzx.
+
+Instruction: Consider the effect of treatment on the treated (ETT) of nbzx on bfgu.
+Question: For those with nbzx being high, if their nbzx had been low, would the bfgu have been more likely to be low?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: With A represents bfgu and C represents nbzx, there is no path from C to A. The answer is: {"ANSWER": "No", "PROB": "0.0000"}.
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'manual-CoT-CN':
+    """如下为一个使用思维链进行推理的关于“对被干预者的干预效果”(effect of treatment on the treated, ETT)任务的数学问题：
+
+输入信息：设想一个只有以下条件，而没有其他因素或因果关系的假设世界：工作生活平衡水平对收入水平有直接影响。工作生活平衡水平对天赋水平有直接影响。工作生活平衡水平对政府政策有直接影响。收入水平对天赋水平有直接影响。天赋水平对政府政策有直接影响。
+在工作生活平衡水平为高的条件下, 政府政策为高的概率为0.1633。在工作生活平衡水平为低的条件下, 政府政策为高的概率为0.5540。
+指令：考虑工作生活平衡水平作用于政府政策的“对被干预者的干预效果”(effect of treatment on the treated, ETT)。
+问题：对于那些工作生活平衡水平为高，假如工作生活平衡水平为低，那么政府政策更有可能为高吗？
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：用A代表工作生活平衡水平, B代表收入水平, C代表天赋水平, D代表政府政策，A到D有一条或多条有向路径(例如A->B->C->D)，所以节点A是节点D的原因。考虑到P(D=1|A=1)=0.1633，P(D=1|A=0)=0.5540，且该问题中有一个合法的后门调整集合：空集，所以ETT=E[D_{A=1}-D_{A=0}|A=1]=P(D=1|A=1)-P(D=1|A=0)=0.1633-0.5540=-0.3907<0。因此答案为{"ANSWER":"是","PROB":"-0.3907"}。
+
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'explicit-function':
+    """You are a helpful assistant for math probability.
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'explicit-function-CN':
+    """你是一个用于计算数学概率的得力助手。
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['given_info'],
+                                        item['Background']['data_info'],
+                                        item['Instruction'], item['Question'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/FAS-C_FAS.py
+++ b/opencompass/datasets/calm/data_processing/prompt/FAS-C_FAS.py
@ -0,0 +1,375 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'basic-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足前门准则的变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'adversarial-ignore':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'adversarial-ignore-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足前门准则的变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'adversarial-doubt':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'adversarial-doubt-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足前门准则的变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'zero-shot-IcL':
+    """Objective:
+Your task is to identify the set of variables that satisfy the front-door criterion relative to an ordered pair of variables (X, Y) in a given causal graph.
+Background Information:
+- Back-door criterion is defined as follows:
+  1. The variable set Z must not contain any descendants of X.
+  2. Z blocks every path from X to Y that has an arrow pointing to X.
+- Front-door criterion is defined as follows:
+  1. The variable set Z blocks all directed paths from X to Y.
+  2. There are no back-door paths from X to Z.
+  3. All back-door paths from Z to Y are blocked by X.
+Input:
+- Description of the causal graph detailing the relationships between variables.
+- A question asking for the set of variables that satisfy the front-door criterion for a specified ordered pair of variables (X, Y).
+- Multiple-choice options for the set of variables that could satisfy the specified criterion.
+Output:
+- Answer to the question, provided in the format "Option N", where N is either 1, 2, or 3 based on the provided options.
+
+You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'zero-shot-IcL-CN':
+    """目标
+你的任务是在给定的因果图中，找出相对于有序变量对（X，Y）满足前门准则的变量集。
+背景信息：
+- 后门准则定义如下：
+  1. 变量集 Z 不能包含任何 X 的后代。
+  2. Z 阻止了从 X 到 Y 的每一条有箭头指向 X 的路径。
+- 前门准则的定义如下
+  1. 变量集 Z 阻止所有从 X 到 Y 的有向路径。
+  2. 没有从 X 到 Z 的后门路径。
+  3. 从 Z 到 Y 的所有后门路径都被 X 堵塞。
+输入
+- 详细描述变量之间关系的因果图。
+- 一个问题，询问符合指定有序变量对（X，Y）的前门标准的变量集。
+- 可满足指定标准的变量集合的多选选项。
+输出：
+- 问题答案，格式为 "选项 N"，根据所提供的选项，N 为 一、二 或 三。
+
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足前门准则的变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'one-shot-IcL':
+    """Objective:
+Your task is to identify the set of variables that satisfy the front-door criterion relative to an ordered pair of variables (X, Y) in a given causal graph.
+Background Information:
+- Back-door criterion is defined as follows:
+  1. The variable set Z must not contain any descendants of X.
+  2. Z blocks every path from X to Y that has an arrow pointing to X.
+- Front-door criterion is defined as follows:
+  1. The variable set Z blocks all directed paths from X to Y.
+  2. There are no back-door paths from X to Z.
+  3. All back-door paths from Z to Y are blocked by X.
+Input:
+- Description of the causal graph detailing the relationships between variables.
+- A question asking for the set of variables that satisfy the front-door criterion for a specified ordered pair of variables (X, Y).
+- Multiple-choice options for the set of variables that could satisfy the specified criterion.
+Output:
+- Answer to the question, provided in the format "Option N", where N is either 1, 2, or 3 based on the provided options.
+
+Example:
+You will be presented with a causal graph in the following form: A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (E, A) in the above causal graph?
+Option 1: D
+Option 2: C
+Option 3:
+Answer (Option 1 or Option 2 or Option 3 ?): Option 3
+
+You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'one-shot-IcL-CN':
+    """目标
+你的任务是在给定的因果图中，找出相对于有序变量对（X，Y）满足前门准则的变量集。
+背景信息：
+- 后门准则定义如下：
+  1. 变量集 Z 不能包含任何 X 的后代。
+  2. Z 阻止了从 X 到 Y 的每一条有箭头指向 X 的路径。
+- 前门准则的定义如下
+  1. 变量集 Z 阻止所有从 X 到 Y 的有向路径。
+  2. 没有从 X 到 Z 的后门路径。
+  3. 从 Z 到 Y 的所有后门路径都被 X 堵塞。
+输入
+- 详细描述变量之间关系的因果图。
+- 一个问题，询问符合指定有序变量对（X，Y）的前门标准的变量集。
+- 可满足指定标准的变量集合的多选选项。
+输出：
+- 问题答案，格式为 "选项 N"，根据所提供的选项，N 为 一、二 或 三。
+
+给定如下因果图：A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (E, A)，满足前门准则的变量集是哪个？
+选项一：D
+选项二：C
+选项三：
+答案（选项一或选项二或选项三？）：选项三
+
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足前门准则的变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'three-shot-IcL':
+    """Objective:
+Your task is to identify the set of variables that satisfy the front-door criterion relative to an ordered pair of variables (X, Y) in a given causal graph.
+Background Information:
+- Back-door criterion is defined as follows:
+  1. The variable set Z must not contain any descendants of X.
+  2. Z blocks every path from X to Y that has an arrow pointing to X.
+- Front-door criterion is defined as follows:
+  1. The variable set Z blocks all directed paths from X to Y.
+  2. There are no back-door paths from X to Z.
+  3. All back-door paths from Z to Y are blocked by X.
+Input:
+- Description of the causal graph detailing the relationships between variables.
+- A question asking for the set of variables that satisfy the front-door criterion for a specified ordered pair of variables (X, Y).
+- Multiple-choice options for the set of variables that could satisfy the specified criterion.
+Output:
+- Answer to the question, provided in the format "Option N", where N is either 1, 2, or 3 based on the provided options.
+
+Example:
+You will be presented with a causal graph in the following form: A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (E, A) in the above causal graph?
+Option 1: D
+Option 2: C
+Option 3:
+Answer (Option 1 or Option 2 or Option 3 ?): Option 3
+
+You will be presented with a causal graph in the following form: A causes B, A causes E, B causes E, B causes D, C causes E, and C causes D.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (A, D) in the above causal graph?
+Option 1: D
+Option 2: B
+Option 3: A
+Answer (Option 1 or Option 2 or Option 3 ?): Option 2
+
+You will be presented with a causal graph in the following form: A causes D, A causes C, B causes E, C causes D, and D causes E.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (A, E) in the above causal graph?
+Option 1: D
+Option 2: C
+Option 3: B
+Answer (Option 1 or Option 2 or Option 3 ?): Option 1
+
+You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'three-shot-IcL-CN':
+    """目标
+你的任务是在给定的因果图中，找出相对于有序变量对（X，Y）满足前门准则的变量集。
+背景信息：
+- 后门准则定义如下：
+  1. 变量集 Z 不能包含任何 X 的后代。
+  2. Z 阻止了从 X 到 Y 的每一条有箭头指向 X 的路径。
+- 前门准则的定义如下
+  1. 变量集 Z 阻止所有从 X 到 Y 的有向路径。
+  2. 没有从 X 到 Z 的后门路径。
+  3. 从 Z 到 Y 的所有后门路径都被 X 堵塞。
+输入
+- 详细描述变量之间关系的因果图。
+- 一个问题，询问符合指定有序变量对（X，Y）的前门标准的变量集。
+- 可满足指定标准的变量集合的多选选项。
+输出：
+- 问题答案，格式为 "选项 N"，根据所提供的选项，N 为 一、二 或 三。
+
+给定如下因果图：A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (E, A)，满足前门准则的变量集是哪个？
+选项一：D
+选项二：C
+选项三：
+答案（选项一或选项二或选项三？）：选项三
+
+给定如下因果图：A导致B, A导致E, B导致E, B导致D, C导致E, 以及C导致D。
+问题：对于上述因果图中的有序变量对 (A, D)，满足前门准则的变量集是哪个？
+选项一：D
+选项二：B
+选项三：A
+答案（选项一或选项二或选项三？）：选项二
+
+给定如下因果图：A导致D, A导致C, B导致E, C导致D, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (A, E)，满足前门准则的变量集是哪个？
+选项一：D
+选项二：C
+选项三：B
+答案（选项一或选项二或选项三？）：选项一
+
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足前门准则的变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'zero-shot-CoT':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph? Let's think step by step.
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'zero-shot-CoT-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足前门准则的变量集是哪个？请逐步思考。
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'manual-CoT':
+    """Here are eight examples for problems about finding front-door adjustment set with chain of thought. Note A is unobserved in the following questions.
+
+You will be presented with a causal graph in the following form: A causes D, A causes C, B causes E, C causes D, and D causes E.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (A, E) in the above causal graph?
+Option 1: D
+Option 2: C
+Option 3: B
+Answer (Option 1 or Option 2 or Option 3 ?): D intercepts the paths from A to E (A->D->E and A->C->D->E). There is no backdoor path from A to D or from D to E. Thus, D satisfies all three conditions of the front-door criterion for the relationship between A (treatment) and E (outcome). The answer is Option 1.
+
+You will be presented with a causal graph in the following form: A causes B, A causes E, B causes C, B causes E, C causes E, and D causes E.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (A, C) in the above causal graph?
+Option 1: D
+Option 2: B
+Option 3: C
+Answer (Option 1 or Option 2 or Option 3 ?): B intercepts the path from A to C (A→B→C). There is no back-door path from A to B or from B to C. Thus, B satisfies all three conditions of the front-door criterion for the relationship between A (treatment) and C (outcome). The answer is Option 2.
+
+You will be presented with a causal graph in the following form: A causes B, B causes C, C causes F, D causes F, and D causes E.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (A, C) in the above causal graph?
+Option 1: F
+Option 2: E
+Option 3: B
+Answer (Option 1 or Option 2 or Option 3 ?): B intercepts the path from A to C (A->B->C). There is no backdoor path from A to B or from B to C. Thus, B satisfies all three conditions of the front-door criterion for the relationship between A (treatment) and C (outcome). The answer is Option3.
+
+You will be presented with a causal graph in the following form: A causes D, A causes C, B causes D, B causes E, and C causes D.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (C, E) in the above causal graph?
+Option 1: A
+Option 2:
+Option 3: C
+Answer (Option 1 or Option 2 or Option 3 ?): A is unobserved. There is a backdoor path from C to D (C<-A->D) and a backdoor path from C to B (C<-A->D<-B). And path C->D<-B->E is a backdoor path from C to E. Thus, A, B, C and E do not satisfy front-door criterion. The answer is Option 2.
+
+You will be presented with a causal graph in the following form: A causes D, B causes C, C causes E, and D causes E.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (A, E) in the above causal graph?
+Option 1: D
+Option 2: A
+Option 3: B
+Answer (Option 1 or Option 2 or Option 3 ?): D intercepts the path from A to E (A->D->E). There is no back-door path from A to D or from D to E. Thus, D satisfies all three conditions of the front-door criterion for the relationship between A (treatment) and E (outcome). The answer is Option 1.
+
+You will be presented with a causal graph in the following form: A causes D, B causes C, B causes D, C causes E, and C causes D.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (B, E) in the above causal graph?
+Option 1: B
+Option 2: A
+Option 3: C
+Answer (Option 1 or Option 2 or Option 3 ?): C intercepts the path from B to E (B->C->E). There is no back-door path from B to C or from C to E. Thus, C satisfies all three conditions of the front-door criterion for the relationship between B (treatment) and E (outcome). The answer is Option 3.
+
+You will be presented with a causal graph in the following form: A causes E, A causes B, B causes C, and B causes D.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (A, C) in the above causal graph?
+Option 1: C
+Option 2: B
+Option 3: A
+Answer (Option 1 or Option 2 or Option 3 ?): B intercepts the path from A to C (A->B->C). There is no back-door path from A to B or from B to C. Thus, B satisfies all three conditions of the front-door criterion for the relationship between A (treatment) and C (outcome). The answer is Option 2.
+
+You will be presented with a causal graph in the following form:
+A causes B, B causes C, B causes E, B causes D, and D causes E.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (A, C) in the above causal graph?
+Option 1: B
+Option 2: E
+Option 3: C
+Answer (Option 1 or Option 2 or Option 3 ?): B intercepts the path from A and C (A->B->C). There is no back-door path from A to B or from B to C. Thus, B satisfies all three conditions of the front-door criterion for the relationship between A (treatment) and C (outcome). The answer is Option 1.
+
+You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'manual-CoT-CN':
+    """如下为两个使用思维链进行推理的判断前门变量集合的示例，和一个需要回答的问题。
+
+给定如下因果图：A导致D, A导致C, B导致E, C导致D, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (A, E)，满足前门准则的变量集是哪个？
+选项一：D
+选项二：C
+选项三：B
+答案（选项一或选项二或选项三？）：D截断了从A到E的路径 (A->D->E以及A->C->D->E)。从A到D和从D到E都没有后门路径。所以D满足从A到E的前门准则的三个条件。因此答案为选项一。
+
+给定如下因果图：A导致B, A导致E, B导致C, B导致E, C导致E, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (A, C)，满足前门准则的变量集是哪个？
+选项一：D
+选项二：B
+选项三：C
+答案（选项一或选项二或选项三？）：B截断了从A到C的路径(A→B→C)。从A到B和从B到C都没有后门路径。所以B满足从A到C的前门准则的三个条件。因此答案为选项二。
+
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足前门准则的变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'explicit-function':
+    """You are a helpful assistant for adjustment set analysis (front-door criterion).
+You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables that satisfies the front-door criterion relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'explicit-function-CN':
+    """你是一个用于调整集分析(前门准则)的得力助手。
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，满足前门准则的变量集是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['edges'], item['treatment'],
+                                        item['outcome'], item['option1'],
+                                        item['option2'], item['option3'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/IV-C_CaLM-IV.py
+++ b/opencompass/datasets/calm/data_processing/prompt/IV-C_CaLM-IV.py
@ -0,0 +1,327 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables is the instrumental variables relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'basic-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，工具变量是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'adversarial-ignore':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables is the instrumental variables relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'adversarial-ignore-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，工具变量是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'adversarial-doubt':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables is the instrumental variables relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'adversarial-doubt-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，工具变量是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'zero-shot-IcL':
+    """Objective:
+Your task is to identify the set of variables that serve as instrumental variables for a given ordered pair of treatment and outcome variables (X, Y) in a specified causal graph.
+Background Information:
+An instrumental variable Z must meet the following criteria:
+- (i) Z has a causal effect on X (treatment variable).
+- (ii) Z affects Y (outcome variable) only through its effect on X, and not directly (exclusion restriction).
+- (iii) There is no confounding between the effect of Z on Y, meaning that all common causes of Z and Y (if any) are controlled for in the study or are non-existent.
+Input:
+- Description of the causal graph, denoting which variables have causal relationships with each other.
+- A question that specifies the identification of instrumental variables with respect to an ordered pair of variables (X, Y).
+- Multiple-choice options representing sets of variables that could be instrumental.
+Output:
+- Answer in the format "Option N," where N is either 1, 2, or 3 based on the provided options.
+
+You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables is the instrument variables relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'zero-shot-IcL-CN':
+    """目标
+您的任务是在指定的因果关系图中，找出一组变量，作为给定的一对有序处理变量和结果变量（X，Y）的工具变量。
+背景信息：
+工具变量 Z 必须符合以下标准：
+- (i) Z 对 X（治疗变量）有因果效应。
+- (ii) Z 仅通过对 X 的影响而非直接影响 Y（结果变量）（排除限制）。
+- (iii) Z 对 Y 的影响之间不存在混杂因素，即 Z 和 Y 的所有共同原因（如有）都在研究中得到控制或不存在。
+输入：
+- 因果图描述，表示哪些变量之间存在因果关系。
+- 一个问题，指明如何确定与一对有序变量（X，Y）相关的工具变量。
+- 代表可能是工具变量的多选选项。
+输出：
+- 答案格式为 "选项 N"，根据所提供的选项，N 为 一、二 或 三。
+
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，工具变量是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'one-shot-IcL':
+    """Objective:
+Your task is to identify the set of variables that serve as instrumental variables for a given ordered pair of treatment and outcome variables (X, Y) in a specified causal graph.
+Background Information:
+An instrumental variable Z must meet the following criteria:
+- (i) Z has a causal effect on X (treatment variable).
+- (ii) Z affects Y (outcome variable) only through its effect on X, and not directly (exclusion restriction).
+- (iii) There is no confounding between the effect of Z on Y, meaning that all common causes of Z and Y (if any) are controlled for in the study or are non-existent.
+Input:
+- Description of the causal graph, denoting which variables have causal relationships with each other.
+- A question that specifies the identification of instrumental variables with respect to an ordered pair of variables (X, Y).
+- Multiple-choice options representing sets of variables that could be instrumental.
+Output:
+- Answer in the format "Option N," where N is either 1, 2, or 3 based on the provided options.
+
+Example:
+You will be presented with a causal graph in the following form: A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Which set of variables is the instrument variables relative to an ordered pair of variables (D, E) in the above causal graph?
+Option 1: A
+Option 2: C
+Option 3: E
+Answer (Option 1 or Option 2 or Option 3 ?): Option 2
+
+New Input:
+You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables is the instrument variables relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'one-shot-IcL-CN':
+    """目标
+您的任务是在指定的因果关系图中，找出一组变量，作为给定的一对有序处理变量和结果变量（X，Y）的工具变量。
+背景信息：
+工具变量 Z 必须符合以下标准：
+- (i) Z 对 X（治疗变量）有因果效应。
+- (ii) Z 仅通过对 X 的影响而非直接影响 Y（结果变量）（排除限制）。
+- (iii) Z 对 Y 的影响之间不存在混杂因素，即 Z 和 Y 的所有共同原因（如有）都在研究中得到控制或不存在。
+输入：
+- 因果图描述，表示哪些变量之间存在因果关系。
+- 一个问题，指明如何确定与一对有序变量（X，Y）相关的工具变量。
+- 代表可能是工具变量的多选选项。
+输出：
+- 答案格式为 "选项 N"，根据所提供的选项，N 为 一、二 或 三。
+
+例子：
+给定如下因果图：A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (D, E)，工具变量是哪个？
+选项一：A
+选项二：C
+选项三：E
+答案（选项一或选项二或选项三？）：选项二
+
+新输入：
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，工具变量是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'three-shot-IcL':
+    """Objective:
+Your task is to identify the set of variables that serve as instrumental variables for a given ordered pair of treatment and outcome variables (X, Y) in a specified causal graph.
+Background Information:
+An instrumental variable Z must meet the following criteria:
+- (i) Z has a causal effect on X (treatment variable).
+- (ii) Z affects Y (outcome variable) only through its effect on X, and not directly (exclusion restriction).
+- (iii) There is no confounding between the effect of Z on Y, meaning that all common causes of Z and Y (if any) are controlled for in the study or are non-existent.
+Input:
+- Description of the causal graph, denoting which variables have causal relationships with each other.
+- A question that specifies the identification of instrumental variables with respect to an ordered pair of variables (X, Y).
+- Multiple-choice options representing sets of variables that could be instrumental.
+Output:
+- Answer in the format "Option N," where N is either 1, 2, or 3 based on the provided options.
+
+Example:
+You will be presented with a causal graph in the following form: A causes D, A causes E, B causes E, C causes D, and D causes E.
+Question: Which set of variables is the instrument variables relative to an ordered pair of variables (D, E) in the above causal graph?
+Option 1: A
+Option 2: C
+Option 3: E
+Answer (Option 1 or Option 2 or Option 3 ?): Option 2
+
+You will be presented with a causal graph in the following form: A causes B, A causes E, B causes E, B causes D, C causes E, and C causes D.
+Question: Which set of variables is the instrument variables relative to an ordered pair of variables (B, D) in the above causal graph?
+Option 1: A
+Option 2: B
+Option 3: D
+Answer (Option 1 or Option 2 or Option 3 ?): Option 1
+
+You will be presented with a causal graph in the following form: A causes D, A causes B, C causes E, and D causes E.
+Question: Which set of variables is the instrument variables relative to an ordered pair of variables (D, E) in the above causal graph?
+Option 1: E
+Option 2: C
+Option 3: A
+Answer (Option 1 or Option 2 or Option 3 ?): Option 3
+
+New Input:
+You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables is the instrument variables relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'three-shot-IcL-CN':
+    """目标
+您的任务是在指定的因果关系图中，找出一组变量，作为给定的一对有序处理变量和结果变量（X，Y）的工具变量。
+背景信息：
+工具变量 Z 必须符合以下标准：
+- (i) Z 对 X（治疗变量）有因果效应。
+- (ii) Z 仅通过对 X 的影响而非直接影响 Y（结果变量）（排除限制）。
+- (iii) Z 对 Y 的影响之间不存在混杂因素，即 Z 和 Y 的所有共同原因（如有）都在研究中得到控制或不存在。
+输入：
+- 因果图描述，表示哪些变量之间存在因果关系。
+- 一个问题，指明如何确定与一对有序变量（X，Y）相关的工具变量。
+- 代表可能是工具变量的多选选项。
+输出：
+- 答案格式为 "选项 N"，根据所提供的选项，N 为 一、二 或 三。
+
+例子：
+给定如下因果图：A导致D, A导致E, B导致E, C导致D, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (D, E)，工具变量是哪个？
+选项一：A
+选项二：C
+选项三：E
+答案（选项一或选项二或选项三？）：选项二
+
+给定如下因果图：A导致B, A导致E, B导致E, B导致D, C导致E, 以及C导致D。
+问题：对于上述因果图中的有序变量对 (B, D)，工具变量是哪个？
+选项一：A
+选项二：B
+选项三：D
+答案（选项一或选项二或选项三？）：选项一
+
+给定如下因果图：A导致D, A导致B, C导致E, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (D, E)，工具变量是哪个？
+选项一：E
+选项二：C
+选项三：A
+答案（选项一或选项二或选项三？）：选项三
+
+新输入：
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，工具变量是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'zero-shot-CoT':
+    """You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables is the instrument variables relative to an ordered pair of variables (%s, %s) in the above causal graph? Let's think step by step.
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'zero-shot-CoT-CN':
+    """给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，工具变量是哪个？请逐步思考。
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'manual-CoT':
+    """Here are three examples of identifying instrumental variables using chain of thought, and a question to answer.
+
+You will be presented with a causal graph in the following form: A causes D, A causes C, B causes E, C causes D, and D causes E.
+Question: Which set of variables is the instrument variables relative to an ordered pair of variables (D, E) in the above causal graph?
+Option 1: A, C
+Option 2: B
+Option 3: D
+Answer (Option 1 or Option 2 or Option 3 ?): A causes D, and C causes D, meaning that A, C, and D are d-connected. Additionally, A and C are not directly related to E, indicating that A, C, and E are d-separated. Therefore, the answer is Option 1.
+
+You will be presented with a causal graph in the following form: A causes B, A causes E, A causes C, B causes C, B causes D, B causes E, and D causes E.
+Question: Which set of variables is the instrument variables relative to an ordered pair of variables (B, D) in the above causal graph?
+Option 1: B
+Option 2: A
+Option 3: D
+Answer (Option 1 or Option 2 or Option 3 ?): A causes B, meaning that A and B are d-connected. Also, A is not directly related to D, thus A and D are d-separated. Therefore, the answer is Option 2.
+
+You will be presented with a causal graph in the following form: A causes C, A causes D, B causes C, B causes D, and C causes E.
+Question: Which set of variables is the instrument variables relative to an ordered pair of variables (C, E) in the above causal graph?
+Option 1: D
+Option 2: C
+Option 3: B, A
+Answer (Option 1 or Option 2 or Option 3 ?): A causes C, and B causes C, meaning that B, A, and C are d-connected. Additionally, both B and A are not directly related to E, indicating that B, A, and E are d-separated. Therefore, the answer is Option 3.
+
+You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables is the instrument variables relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'manual-CoT-CN':
+    """如下为两个使用思维链进行推理的识别工具变量的示例，和一个需要回答的问题。
+
+给定如下因果图：A导致D, A导致C, B导致E, C导致D, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (D, E)，工具变量是哪个？
+选项一：A, C
+选项二：B
+选项三：D
+答案（选项一或选项二或选项三？）：A导致D，C导致D，即A ,C和D是d-连通的，且A和C不与E直接相关，即A, C与E是d-分离的。因此答案为选项一。
+
+给定如下因果图：A导致B, A导致E, A导致C, B导致C, B导致D, B导致E, 以及D导致E。
+问题：对于上述因果图中的有序变量对 (B, D)，工具变量是哪个？
+选项一：B
+选项二：A
+选项三：D
+答案（选项一或选项二或选项三？）：A导致B，即A和B是d-连通的，且A与D不直接相关，即A和D是d-分离的。因此答案为选项二。
+
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，工具变量是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）：""",
+    'explicit-function':
+    """You are a helpful assistant for adjustment set analysis (instrument variables).
+You will be presented with a causal graph in the following form: %s.
+Question: Which set of variables is the instrument variables relative to an ordered pair of variables (%s, %s) in the above causal graph?
+Option 1: %s
+Option 2: %s
+Option 3: %s
+Answer (Option 1 or Option 2 or Option 3 ?):""",
+    'explicit-function-CN':
+    """你是一个用于调整集分析(工具变量)的得力助手。
+给定如下因果图：%s。
+问题：对于上述因果图中的有序变量对 (%s, %s)，工具变量是哪个？
+选项一：%s
+选项二：%s
+选项三：%s
+答案（选项一或选项二或选项三？）""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['edges'], item['treatment'],
+                                        item['outcome'], item['option1'],
+                                        item['option2'], item['option3'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/NDE.py
+++ b/opencompass/datasets/calm/data_processing/prompt/NDE.py
@ -0,0 +1,177 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'basic-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'adversarial-ignore':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'adversarial-ignore-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'adversarial-doubt':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'adversarial-doubt-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'zero-shot-IcL':
+    """Answer questions about the Natural Direct Effect (NDE). Computing the Natural Direct Effect involves comparing the outcomes for individuals under two scenarios: receiving the treatment and not receiving the treatment, while allowing a mediator variable to take its natural course under each scenario.
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'zero-shot-IcL-CN':
+    """回答有关自然直接效应（NDE）的问题。计算自然直接效应需要比较两种情况下的个人结果：接受治疗和不接受治疗，同时允许中介变量在每种情况下自然发展。
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'one-shot-IcL':
+    """Answer questions about the Natural Direct Effect (NDE). Computing the Natural Direct Effect involves comparing the outcomes for individuals under two scenarios: receiving the treatment and not receiving the treatment, while allowing a mediator variable to take its natural course under each scenario.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Fbge has a direct effect on vijq. Fbge has a direct effect on twac. Fbge has a direct effect on vdla.
+For those with fbge being high, the probability of vdla being low is 0.1851. For those with fbge being low, the probability of vdla being low is 0.5311.
+Instruction: Consider the natural direct effect (NDE) of fbge on vdla.
+Question: Suppose the mediator keeps constant when fbge is changed to be high, would the vdla have been more likely to be low?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "No", "PROB": "-0.3460"}
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'one-shot-IcL-CN':
+    """回答有关自然直接效应（NDE）的问题。计算自然直接效应需要比较两种情况下的个人结果：接受治疗和不接受治疗，同时允许中介变量在每种情况下自然发展。
+
+输入信息：设想一个只有以下条件，而没有其他因素或因果关系的假设世界：Fbge对vijq有直接影响。Fbge对twac有直接影响。Fbge对vdla有直接影响。
+在fbge为高的条件下, vdla为低的概率为0.1851。在fbge为低的条件下, vdla为低的概率为0.5311。
+指令：考虑fbge作用于vdla的“自然直接效果”(natural direct effect, NDE)。
+问题：假如所有中间变量保持不变，而fbge变化为高，那么vdla更有可能为低吗？
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}： {"ANSWER":"否","PROB":"-0.3460"}
+
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'two-shot-IcL':
+    """Answer questions about the Natural Direct Effect (NDE). Computing the Natural Direct Effect involves comparing the outcomes for individuals under two scenarios: receiving the treatment and not receiving the treatment, while allowing a mediator variable to take its natural course under each scenario.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Fbge has a direct effect on vijq. Fbge has a direct effect on twac. Fbge has a direct effect on vdla.
+For those with fbge being high, the probability of vdla being low is 0.1851. For those with fbge being low, the probability of vdla being low is 0.5311.
+Instruction: Consider the natural direct effect (NDE) of fbge on vdla.
+Question: Suppose the mediator keeps constant when fbge is changed to be high, would the vdla have been more likely to be low?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "No", "PROB": "-0.3460"}
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Parent has a direct effect on first-born child. Parent has a direct effect on second-born child. Parent has a direct effect on third-born child. First-born child has a direct effect on second-born child. First-born child has a direct effect on third-born child.
+For those with parent being supportive, the probability of first-born child being favored is 0.2759. For those with parent being neglectful, the probability of first-born child being favored is 0.3249.
+Instruction: Consider the natural direct effect (NDE) of parent on first-born child.
+Question: Suppose the mediator keeps constant when parent is changed to be supportive, would the first-born child have been more likely to be favored?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "No", "PROB": "-0.0490"}
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'zero-shot-CoT':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s Let's think step by step.
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'zero-shot-CoT-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s请逐步思考。
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'manual-CoT':
+    """Here are three examples for math problems about natural direct effect (NDE) task with chain of thought.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Vqpf has a direct effect on uhxm. Vqpf has a direct effect on ezwx.
+For those with vqpf being high, the probability of uhxm being low is 0.8005. For those with vqpf being low, the probability of uhxm being low is 0.8489.
+Instruction: Consider the natural direct effect (NDE) of vqpf on uhxm.
+Question: Suppose the mediator keeps constant when vqpf is changed to be high, would the uhxm have been more likely to be low?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: With A represents vqpf and B represents uhxm, we have P(B=0|A=1)=0.8005; P(B=0|A=0)=0.8489; Considering edge A->B exists, and in this situation a valid mediator set: empty set, we calculate NDE=P(B=0|A=1)-P(B=0|A=0)=0.8005-0.8489=-0.0484<0. The answer is: {"ANSWER": "No", "PROB": "-0.0484"}.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Number of hours of studying for a test has a direct effect on test score. Test score has a direct effect on final grade in the class.
+
+Instruction: Consider the natural direct effect (NDE) of number of hours of studying for a test on final grade in the class.
+Question: Suppose the mediator keeps constant when number of hours of studying for a test is changed to be many, would the final grade in the class have been more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: With A represents number of hours of studying for a test and C represents final grade in the class, the edge A->C does not exist. The answer is: {"ANSWER": "No", "PROB": "0.0000"}.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Eosj has a direct effect on taaz. Taaz has a direct effect on sozj. Sozj has a direct effect on mffx.
+For those with taaz being high, the probability of sozj being high is 0.4763. For those with taaz being low, the probability of sozj being high is 0.3920.
+Instruction: Consider the natural direct effect (NDE) of taaz on sozj.
+Question: Suppose the mediator keeps constant when taaz is changed to be high, would the sozj have been more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: With B represents taaz and C represents sozj, we have P(C=1|B=1)=0.4763; P(C=1|B=0)=0.3920; Considering edge B->C exists, and in this situation, NDE=P(C=1|B=1)-P(C=1|B=0)=0.4763-0.3920=0.0843>0. The answer is: {"ANSWER": "Yes", "PROB": "0.0843
+"}.
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'manual-CoT-CN':
+    """如下为一个使用思维链进行推理的关于“自然直接效果”(natural direct effect, NDE)任务的数学问题：
+
+输入信息：设想一个只有以下条件，而没有其他因素或因果关系的假设世界：顾客对产品的满意度对产品的正面评价数量有直接影响。顾客对产品的满意度对产品收入有直接影响。产品的正面评价数量对产品销售表现有直接影响。产品销售表现对产品收入有直接影响。
+在顾客对产品的满意度为低的条件下, 产品的正面评价数量为高的概率为0.4636。在顾客对产品的满意度为高的条件下, 产品的正面评价数量为高的概率为0.9016。
+指令：考虑顾客对产品的满意度作用于产品的正面评价数量的“自然直接效果”(natural direct effect, NDE)。
+问题：假如所有中间变量保持不变，而顾客对产品的满意度变化为低，那么产品的正面评价数量更有可能为高吗？
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：用A代表顾客对产品的满意度, B代表产品的正面评价数量，边A->B存在。考虑到P(B=1|A=0)=0.4636，P(B=1|A=1)=0.9016，且该问题中有一个合法的中间变量集合: 空集。所以NDE=P(B=1|A=0)-P(B=1|A=1)=0.4636-0.9016=-0.4380<0。因此答案为{"ANSWER":"否,”PROB":"-0.4380"}。
+
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'explicit-function':
+    """You are a helpful assistant for math probability.
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'explicit-function-CN':
+    """你是一个用于计算数学概率的得力助手。
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['given_info'],
+                                        item['Background']['data_info'],
+                                        item['Instruction'], item['Question'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/NIE.py
+++ b/opencompass/datasets/calm/data_processing/prompt/NIE.py
@ -0,0 +1,176 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'basic-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'adversarial-ignore':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'adversarial-ignore-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'adversarial-doubt':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'adversarial-doubt-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'zero-shot-IcL':
+    """Answer questions about the Natural Indirect Effect (NIE). Computing the Natural Indirect Effect involves looking at the outcomes for individuals when the treatment is fixed but the mediator is allowed to change as it naturally would due to the treatment.
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'zero-shot-IcL-CN':
+    """回答有关自然间接效应（NIE）的问题。自然间接效应的计算方法是，当治疗方法固定不变，但允许中介因子因治疗方法而自然发生变化时，研究个体的结果。
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'one-shot-IcL':
+    """Answer questions about the Natural Indirect Effect (NIE). Computing the Natural Indirect Effect involves looking at the outcomes for individuals when the treatment is fixed but the mediator is allowed to change as it naturally would due to the treatment.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Income level has a direct effect on job satisfaction. Income level has a direct effect on life satisfaction. Education level has a direct effect on job satisfaction. Education level has a direct effect on happiness level. Job satisfaction has a direct effect on happiness level. Job satisfaction has a direct effect on life satisfaction. Happiness level has a direct effect on life satisfaction.
+For those with job satisfaction being not satisfied and education level being high, the probability of happiness level being low is 0.2180. For those with education level being low, the probability of job satisfaction being not satisfied is 0.5969. For those with education level being high, the probability of job satisfaction being not satisfied is 0.4075. For those with job satisfaction being satisfied and education level being high, the probability of happiness level being low is 0.1982.
+Instruction: Consider the natural indirect effect (NIE) of education level on happiness level.
+Question: Suppose education level is held constant and the mediator changes to whatever value it would have attained under education level changing to be low, would happiness level have been more likely to be low?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "Yes", "PROB": "0.0038"}
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'one-shot-IcL-CN':
+    """回答有关自然间接效应（NIE）的问题。自然间接效应的计算方法是，当治疗方法固定不变，但允许中介因子因治疗方法而自然发生变化时，研究个体的结果。
+
+输入信息：设想一个只有以下条件，而没有其他因素或因果关系的假设世界：收入水平对工作是否满意有直接影响。收入水平对生活是否满意有直接影响。教育水平对工作是否满意有直接影响。教育水平对幸福水平有直接影响。工作是否满意对幸福水平有直接影响。工作是否满意对生活是否满意有直接影响。幸福水平对生活是否满意有直接影响。
+在工作是否满意为不满意且教育水平为高的条件下, 幸福水平为低的概率为0.2180。在教育水平为低的条件下, 工作是否满意为不满意的概率为0.5969。在教育水平为高的条件下, 工作是否满意为不满意的概率为0.4075。在工作是否满意为满意且教育水平为高的条件下, 幸福水平为低的概率为0.1982。
+指令：考虑教育水平作用于幸福水平的“自然间接效果”(natural indirect effect, NIE)。
+问题：假如教育水平保持不变，而所有中间变量被改变为当它们在教育水平变化为低下的取值，那么幸福水平更有可能为低吗？
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}： {"ANSWER":"是","PROB":"0.0038"}
+
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'two-shot-IcL':
+    """Answer questions about the Natural Indirect Effect (NIE). Computing the Natural Indirect Effect involves looking at the outcomes for individuals when the treatment is fixed but the mediator is allowed to change as it naturally would due to the treatment.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Income level has a direct effect on job satisfaction. Income level has a direct effect on life satisfaction. Education level has a direct effect on job satisfaction. Education level has a direct effect on happiness level. Job satisfaction has a direct effect on happiness level. Job satisfaction has a direct effect on life satisfaction. Happiness level has a direct effect on life satisfaction.
+For those with job satisfaction being not satisfied and education level being high, the probability of happiness level being low is 0.2180. For those with education level being low, the probability of job satisfaction being not satisfied is 0.5969. For those with education level being high, the probability of job satisfaction being not satisfied is 0.4075. For those with job satisfaction being satisfied and education level being high, the probability of happiness level being low is 0.1982.
+Instruction: Consider the natural indirect effect (NIE) of education level on happiness level.
+Question: Suppose education level is held constant and the mediator changes to whatever value it would have attained under education level changing to be low, would happiness level have been more likely to be low?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "Yes", "PROB": "0.0038"}
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Sslg has a direct effect on gjot. Sslg has a direct effect on hlky. Etat has a direct effect on gjot. Etat has a direct effect on hlky. Gjot has a direct effect on hlky.
+
+Instruction: Consider the natural indirect effect (NIE) of sslg on gjot.
+Question: Suppose sslg is held constant and the mediator changes to whatever value it would have attained under sslg changing to be low, would gjot have been more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: {"ANSWER": "No", "PROB": "0.0000"}
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'zero-shot-CoT':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s Let's think step by step.
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'zero-shot-CoT-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s请逐步思考。
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'manual-CoT':
+    """Here are three examples for math problems about natural indirect effect (NIE) task with chain of thought.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Alor has a direct effect on geer. Tnkc has a direct effect on dzww. Dzww has a direct effect on geer.
+For those with dzww being low and tnkc being low, the probability of geer being high is 0.2261. For those with tnkc being high, the probability of dzww being low is 0.9090. For those with tnkc being low, the probability of dzww being low is 0.4752. For those with dzww being high and tnkc being low, the probability of geer being high is 0.0652.
+Instruction: Consider the natural indirect effect (NIE) of tnkc on geer.
+Question: Suppose tnkc is held constant and the mediator changes to whatever value it would have attained under tnkc changing to be high, would geer have been more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: With B represents tnkc, C represents dzww and D represents geer, we find: P(D=1|C=0,B=0)=0.2261; P(C=0|B=1)=0.9090; P(C=0|B=0)=0.4752; P(D=1|C=1,B=0)=0.0652; Considering there is an indirect connect between B and D(B->C->D), and in this situation, we find a valid mediator set: {C}, we calculate NIE=sum_{C} P(D=1|B=0,C)*[P(C|B=1)-P(C|B=0)]=P(D=1|B=0,C=0)*[P(C=0|B=1)-P(C=0|B=0)]+P(D=1|B=0,C=1)*[P(C=1|B=1)-P(C=1|B=0)]=0.2261*(0.9090-0.4752)+0.0652*(0.0910-0.5248)=0.0698>0. The answer is: {"ANSWER": "Yes", "PROB": "0.0698"}.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Vfvq has a direct effect on dupa. Vfvq has a direct effect on fbzr. Xizv has a direct effect on dupa.
+
+Instruction: Consider the natural indirect effect (NIE) of vfvq on fbzr.
+Question: Suppose vfvq is held constant and the mediator changes to whatever value it would have attained under vfvq changing to be high, would fbzr have been more likely to be high?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: With A represents vfvq and D represents fbzr, there is no indirect connect between A and D. The answer is: {"ANSWER": "No", "PROB": "0.0000"}.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Number of hours of studying for a test has a direct effect on test score. Test score has a direct effect on final grade in the class.
+For those with test score being low and number of hours of studying for a test being few, the probability of final grade in the class being low is 0.9320. For those with number of hours of studying for a test being many, the probability of test score being low is 0.2929. For those with number of hours of studying for a test being few, the probability of test score being low is 0.4453. For those with test score being high and number of hours of studying for a test being few, the probability of final grade in the class being low is 0.6552.
+Instruction: Consider the natural indirect effect (NIE) of number of hours of studying for a test on final grade in the class.
+Question: Suppose number of hours of studying for a test is held constant and the mediator changes to whatever value it would have attained under number of hours of studying for a test changing to be many, would final grade in the class have been more likely to be low?
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}: With A represents number of hours of studying for a test and C represents final grade in the class, we find P(C=0|B=0,A=0)=0.9320; P(B=0|A=1)=0.2929; P(B=0|A=0)=0.4453; P(C=0|B=1,A=0)=0.6552; Considering there is an indirect connect between A and C(A->B->C), and in this situation, we find a valid mediator set: {B}, we calculate NIE=sum_{B} P(C=0|A=0,B)*[P(B|A=1)-P(B|A=0)]=P(C=0|A=0,B=0)*[P(B=0|A=1)-P(B=0|A=0)]+P(C=0|A=0,B=1)*[P(B=1|A=1)-P(B=1|A=0)]=0.9320*(0.2929-0.4453)+0.6552*(0.7071-0.5547)=-0.0422<0. The answer is:  {"ANSWER": "No", "PROB": "-0.0422"}.
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'manual-CoT-CN':
+    """如下为一个使用思维链进行推理的关于“自然间接效果”(natural indirect effect, NIE)任务的数学问题：
+
+输入信息：设想一个只有以下条件，而没有其他因素或因果关系的假设世界：Zild对vean有直接影响。Zild对dhib有直接影响。Vean对dhib有直接影响。Dhib对maiw有直接影响。
+在vean为低且zild为高的条件下, dhib为高的概率为0.5548。在zild为低的条件下, vean为低的概率为0.6871。在zild为高的条件下, vean为低的概率为0.7006。在vean为高且zild为高的条件下, dhib为高的概率为0.9182。
+指令：考虑zild作用于dhib的“自然间接效果”(natural indirect effect, NIE)。
+问题：假如zild保持不变，而所有中间变量被改变为当它们在zild变化为低下的取值，那么dhib更有可能为高吗？
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：用A代表zild, B代表vean, C代表dhib,所以P(C=1|B=0,A=1)=0.5548; P(B=0|A=0)=0.6871; P(B=0|A=1)=0.7006; P(C=1|B=1,A=1)=0.9182; 考虑到从A到C存在节点数大于等于3的有向路径(例如 A->B->C)，且该问题中有一个合法的中间变量集合: {B}，所以NIE=sum_{B} P(C=1|A=1,B)*[P(B|A=0)-P(B|A=1)]=P(C=1|A=1,B=0)*[P(B=0|A=0)-P(B=0|A=1)]+P(C=1|A=1,B=1)*[P(B=1|A=0)-P(B=1|A=1)]=0.5548*(0.6871-0.7006)+0.9182*(0.3129-0.2994)=0.0049>0。因此答案为{"ANSWER":"是",”PROB":"0.0049"}。
+
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+    'explicit-function':
+    """You are a helpful assistant for math probability.
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places and a final "yes" or "no" answer in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'explicit-function-CN':
+    """你是一个用于计算数学概率的得力助手。
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['given_info'],
+                                        item['Background']['data_info'],
+                                        item['Instruction'], item['Question'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/PCD-B_COPA.py
+++ b/opencompass/datasets/calm/data_processing/prompt/PCD-B_COPA.py
@ -0,0 +1,205 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?):""",
+    'basic-CN':
+    """事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：""",
+    'adversarial-ignore':
+    """Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?):""",
+    'adversarial-ignore-CN':
+    """事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：""",
+    'adversarial-doubt':
+    """Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?):""",
+    'adversarial-doubt-CN':
+    """事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：""",
+    'zero-shot-IcL':
+    """determine whether there is a causal relationship between the two input events.
+Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?):""",
+    'zero-shot-IcL-CN':
+    """确定两个输入事件之间是否存在因果关系。
+事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：""",
+    'one-shot-IcL':
+    """determine whether there is a causal relationship between the two input events.
+Event A: My body cast a shadow over the grass.
+Event B: The sun was rising.
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?): Yes
+Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?):""",
+    'one-shot-IcL-CN':
+    """确定两个输入事件之间是否存在因果关系。
+事件一：我的身体投下了阴影，落在草地上。
+事件二：太阳正在升起。
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：是
+事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：""",
+    'three-shot-IcL':
+    """determine whether there is a causal relationship between the two input events.
+Event A: My body cast a shadow over the grass.
+Event B: The sun was rising.
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?): Yes
+
+Event A: The politician lost the election.
+Event B: He ran negative campaign ads.
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?): No
+
+Event A: The physician misdiagnosed the patient.
+Event B: The patient filed a malpractice lawsuit against the physician.
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?): Yes
+
+Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?):""",
+    'three-shot-IcL-CN':
+    """确定两个输入事件之间是否存在因果关系。
+事件一：我的身体投下了阴影，落在草地上。
+事件二：太阳正在升起。
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：是
+
+事件一：政治家在选举中落败了。
+事件二：他播放了负面竞选广告。
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：否
+
+事件一：这位医生误诊了病人。
+事件二：病人向医生提起了医疗事故诉讼。
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：是
+
+事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：""",
+    'zero-shot-CoT':
+    """Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ? Let's think step by step.
+Answer (Yes or No ?):""",
+    'zero-shot-CoT-CN':
+    """事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？请逐步思考。
+答案（是或否？）：""",
+    'manual-CoT':
+    """Here we will provide eight chain-of-thought exemplars, followed by a binary question that needs to be answered.
+
+Event A: My body cast a shadow over the grass.
+Event B: The sun was rising.
+Question: is there a causal relationship between Event A and Event B ?
+Answer(yes or no with chain of thought): The shadow is mostly being cast by the speaker’s body. There must be a light source in the correct position to form the shadow. And the sun is the most plausible cause of the shadow. Thus, Event B may be the cause of Event A. Therefore, the answer is yes.
+
+Event A: I hung up the phone.
+Event B: The caller identified himself to me.
+Question: is there a causal relationship between Event A and Event B ?
+Answer(yes or no with chain of thought): People always hung up the phone after the ending of their conversation, while they always identify themselves at the beginning of the call. Therefore, the answer is no.
+
+Event A: The cook stirred the ingredients in the bowl.
+Event B: The ingredients melted.
+Question: is there a causal relationship between Event A and Event B ?
+Answer(yes or no with chain of thought): Stirring is a common method used in cooking to blend and mix ingredients. But melting ingredients always need high temperature, which can not be brought by stirring. Therefore, the answer is no.
+
+Event A: The book became a huge bestseller.
+Event B: It was adapted into a movie.
+Question: is there a causal relationship between Event A and Event B ?
+Answer(yes or no with chain of thought): When a book becomes a huge bestseller, it often attracts the attention of filmmakers and can lead to movie adaptations, and authors generally gain more recognition and fame. Thus, Event B may be the effect of Event A. Therefore, the answer is yes.
+
+Event A: The man anticipated cold weather on his trip.
+Event B: He travelled with a big suitcase.
+Question: is there a causal relationship between Event A and Event B ?
+Answer(yes or no with chain of thought): When someone expects cold weather, they may take some warm clothes or other things to keep warm. But it is not logical for them to take a big suitcase. Therefore, the answer is no.
+
+Event A: I turned on the fan.
+Event B: I felt cool air pass over me.
+Question: is there a causal relationship between Event A and Event B ?
+Answer(yes or no with chain of thought): A typical function of a fan is to circulates air and creates a cooling effect. Thus, Event B may be the effect of Event A. Therefore, the answer is yes.
+
+Event A: The woman struggled to walk.
+Event B: She wore high heels.
+Question: is there a causal relationship between Event A and Event B ?
+Answer(yes or no with chain of thought): High heels can be uncomfortable and challenging to walk in for some individual. Therefore, Event B may be the cause of Event A. Therefore, the answer is yes.
+
+Event A: I vacuumed the carpet.
+Event B: My roommate spilled punch.
+Question: is there a causal relationship between Event A and Event B ?
+Answer(yes or no with chain of thought): Vacuum cleaners generally can't handle liquids like punch. Therefore, the answer is no.
+
+Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?): """,
+    'manual-CoT-CN':
+    """如下为三个使用思维链进行推理的问题:
+
+事件一：那个女孩许了一个愿望。
+事件二：她看到了一只黑猫。
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：看到一只黑猫通常不会导致人们许愿，因此答案是“否”。
+
+事件一：龙卷风袭击了这座城镇。
+事件二：法院大楼的屋顶被吹掉了。
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：龙卷风通常会带来强风，破坏建筑物，因此答案是“是”。
+
+事件一：商店收银员叫保安了。
+事件二：客户使用了假钞。
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：商店收银员叫保安通常是因为有可疑和异常情况，包括客户用假钞，因此答案是“是”。
+
+事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：""",
+    'explicit-function':
+    """You are a helpful assistant for event causality identification.
+Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?):""",
+    'explicit-function-CN':
+    """你是一个用于因果发现的得力助手。
+事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['premise'], item['hypothesis'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/PCD-B_E-CARE.py
+++ b/opencompass/datasets/calm/data_processing/prompt/PCD-B_E-CARE.py
@ -0,0 +1,205 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?):""",
+    'basic-CN':
+    """事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：""",
+    'adversarial-ignore':
+    """Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?):""",
+    'adversarial-ignore-CN':
+    """事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：""",
+    'adversarial-doubt':
+    """Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?):""",
+    'adversarial-doubt-CN':
+    """事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：""",
+    'zero-shot-IcL':
+    """determine whether there is a causal relationship between the two input events.
+Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?):""",
+    'zero-shot-IcL-CN':
+    """确定两个输入事件之间是否存在因果关系。
+事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：""",
+    'one-shot-IcL':
+    """determine whether there is a causal relationship between the two input events.
+Event A: My body cast a shadow over the grass.
+Event B: The sun was rising.
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?): Yes
+Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?):""",
+    'one-shot-IcL-CN':
+    """确定两个输入事件之间是否存在因果关系。
+事件一：我的身体投下了阴影，落在草地上。
+事件二：太阳正在升起。
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：是
+事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：""",
+    'three-shot-IcL':
+    """determine whether there is a causal relationship between the two input events.
+Event A: My body cast a shadow over the grass.
+Event B: The sun was rising.
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?): Yes
+
+Event A: The politician lost the election.
+Event B: He ran negative campaign ads.
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?): No
+
+Event A: The physician misdiagnosed the patient.
+Event B: The patient filed a malpractice lawsuit against the physician.
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?): Yes
+
+Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?):""",
+    'three-shot-IcL-CN':
+    """确定两个输入事件之间是否存在因果关系。
+事件一：我的身体投下了阴影，落在草地上。
+事件二：太阳正在升起。
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：是
+
+事件一：政治家在选举中落败了。
+事件二：他播放了负面竞选广告。
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：否
+
+事件一：这位医生误诊了病人。
+事件二：病人向医生提起了医疗事故诉讼。
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：是
+
+事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：""",
+    'zero-shot-CoT':
+    """Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ? Let's think step by step.
+Answer (Yes or No ?):""",
+    'zero-shot-CoT-CN':
+    """事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？请逐步思考。
+答案（是或否？）：""",
+    'manual-CoT':
+    """Here we will provide eight chain-of-thought exemplars, followed by a binary question that needs to be answered.
+
+Event A: Black's sweat always drips into his eyes.
+Event B: Black pulled out his eyelashes for beauty.
+Question: is there a causal relationship between Event A and Event B ?
+Answer(Yes or No with chain of thought): If Black intentionally removed his eyelashes, it could potentially lead to sweat dripping into his eyes due to the lack of eyelashes to provide some protection. Therefore, the answer is yes.
+
+Event A: It's half way through autumn.
+Event B: It has difficulty in running.
+Question: is there a causal relationship between Event A and Event B ?
+Answer(Yes or No with chain of thought): Autumn is a season characterized by falling leaves and cooler temperatures. It doesn't inherently imply any difficulty in running. Therefore, the answer is no.
+
+Event A: The man planned to make Tin by himself.
+Event B: He had to design the necessary components.
+Question: is there a causal relationship between Event A and Event B ?
+Answer(Yes or No with chain of thought): While planning to make Tin might suggest that components need to be designed, the act of planning does not necessarily dictate that designing components is the only option. Therefore, the answer is no.
+
+Event A:  He was shocked by his chemical deficiency.
+Event B: The patient with addiction watched the neuroscientist's lecture.
+Question: is there a causal relationship between Event A and Event B ?
+Answer(Yes or No with chain of thought): Neuroscience gives a deep insight of chemical imbalances in one's body. He might come to realize the lack of some  nutrients in his body after the neuroscience lecture, thus felt shocked. Therefore, the answer is yes.
+
+Event A:  Waterwheels started work efficiently.
+Event B: The mills can set to work.
+Question: is there a causal relationship between Event A and Event B ?
+Answer(Yes or No with chain of thought): When the waterwheels are working efficiently, it enables the mills to start operating. Therefore, the answer is yes.
+
+Event A: Mary has two pieces of farmland, but only one of them is used to grow crops every year.
+Event B: The less often used farmland produces more crops than the often used one.
+Question: is there a causal relationship between Event A and Event B ?
+Answer(Yes or No with chain of thought): While it's possible that less frequent usage can contribute to better soil health and potentially higher yields, the quality of soil, weather conditions, irrigation practices, and crop choices could also influence the yield. Therefore, the answer is no.
+
+Event A: Tom bought a lot of mangoes and coconuts.
+Event B: Tom buys imported tropical fruit every day.
+Question: is there a causal relationship between Event A and Event B ?
+Answer(Yes or No with chain of thought): The fact that Tom bought mangoes and coconuts (Event A) doesn't necessarily indicate a consistent pattern or preference for buying imported tropical fruit (Event B) every day. Therefore, the answer is no.
+
+Event A: He can just see something clearly in a short distance.
+Event B: Tom turned on his flashlight.
+Question: is there a causal relationship between Event A and Event B ?
+Answer(Yes or No with chain of thought): Turning on a flashlight provides additional light in the immediate vicinity, making objects visible in a short distance. Therefore, the answer is yes.
+
+Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?): """,
+    'manual-CoT-CN':
+    """如下为三个使用思维链进行推理的问题:
+
+事件一：莎莉的喉咙严重发炎了。
+事件二：莎莉发不出声音。
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：喉咙严重发炎会导致声音嘶哑或失声，因此答案是“是”。
+
+事件一：很多昆虫都被它们吃掉了。
+事件二：果园里有很多麻雀。
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：麻雀等鸟类喜欢吃昆虫，因此答案是“是”。
+
+事件一：它具有糖酵解功能。
+事件二：肌原纤维中含有不同数量的肌原丝。
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：糖酵解功能和肌原丝数量之间没有直接的因果联系，因此答案是“否”。
+
+事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：""",
+    'explicit-function':
+    """You are a helpful assistant for event causality identification.
+Event A: %s
+Event B: %s
+Question: is there a causal relationship between Event A and Event B ?
+Answer (Yes or No ?):""",
+    'explicit-function-CN':
+    """你是一个用于因果发现的得力助手。
+事件一：%s
+事件二：%s
+问题：事件一和事件二之间是否存在因果关系？
+答案（是或否？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['premise'], item['hypothesis'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/PCD-C_COPA.py
+++ b/opencompass/datasets/calm/data_processing/prompt/PCD-C_COPA.py
@ -0,0 +1,254 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Input Event: %s
+Question: Please select the %s of the input event from the following options.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):""",
+    'basic-CN':
+    """输入事件：%s
+问题：请从以下选项中选择输入事件的%s。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+    'adversarial-ignore':
+    """Input Event: %s
+Question: Please select the %s of the input event from the following options.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):""",
+    'adversarial-ignore-CN':
+    """输入事件：%s
+问题：请从以下选项中选择输入事件的%s。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+    'adversarial-doubt':
+    """Input Event: %s
+Question: Please select the %s of the input event from the following options.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):""",
+    'adversarial-doubt-CN':
+    """输入事件：%s
+问题：请从以下选项中选择输入事件的%s。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+    'zero-shot-IcL':
+    """Select the cause or effect of the input event from two options.
+Input Event: %s
+Question: Please select the %s of the input event from the following options.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):""",
+    'zero-shot-IcL-CN':
+    """从两个选项中选择输入事件的原因或结果。
+输入事件：%s
+问题：请从以下选项中选择输入事件的%s。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+    'one-shot-IcL':
+    """Select the cause or effect of the input event from two options.
+Input Event: My body cast a shadow over the grass.
+Question: Please select the cause of the input event from the following options.
+Option 1: The sun was rising.
+Option 2: The grass was cut.
+Answer (Option 1 or Option 2 ?): Option 1
+Input Event: %s
+Question: Please select the %s of the input event from the following options.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):""",
+    'one-shot-IcL-CN':
+    """从两个选项中选择输入事件的原因或结果。
+输入事件：我的身体投下了阴影，落在草地上。
+问题：请从以下选项中选择输入事件的原因。
+选项一：太阳正在升起。
+选项二：草被割了。
+答案（选项一或选项二？）：选项一
+输入事件：%s
+问题：请从以下选项中选择输入事件的%s。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+    'three-shot-IcL':
+    """Select the cause or effect of the input event from two options.
+Input Event: My body cast a shadow over the grass.
+Question: Please select the cause of the input event from the following options.
+Option 1: The sun was rising.
+Option 2: The grass was cut.
+Answer (Option 1 or Option 2 ?): Option 1
+
+Input Event: The politician lost the election.
+Question: Please select the cause of the input event from the following options.
+Option 1: He ran negative campaign ads.
+Option 2: No one voted for him.
+Answer (Option 1 or Option 2 ?): Option 2
+
+Input Event: The physician misdiagnosed the patient.
+Question: Please select the effect of the input event from the following options.
+Option 1: The patient filed a malpractice lawsuit against the physician.
+Option 2: The patient disclosed confidential information to the physician.
+Answer (Option 1 or Option 2 ?): Option 1
+
+Input Event: %s
+Question: Please select the %s of the input event from the following options.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):""",
+    'three-shot-IcL-CN':
+    """从两个选项中选择输入事件的原因或结果。
+输入事件：我的身体投下了阴影，落在草地上。
+问题：请从以下选项中选择输入事件的原因。
+选项一：太阳正在升起。
+选项二：草被割了。
+答案（选项一或选项二？）：选项一
+
+输入事件：政治家在选举中落败了。
+问题：请从以下选项中选择输入事件的原因。
+选项一：他播放了负面竞选广告。
+选项二：没有人投票给他。
+答案（选项一或选项二？）：选项二
+
+输入事件：这位医生误诊了病人。
+问题：请从以下选项中选择输入事件的结果。
+选项一：病人向医生提起了医疗事故诉讼。
+选项二：患者向医生透露了机密信息。
+答案（选项一或选项二？）：选项一
+
+输入事件：%s
+问题：请从以下选项中选择输入事件的%s。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+    'zero-shot-CoT':
+    """Input Event: %s
+Question: Please select the %s of the input event from the following options. Let's think step by step.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):""",
+    'zero-shot-CoT-CN':
+    """输入事件：%s
+问题：请从以下选项中选择输入事件的%s。请逐步思考。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+    'manual-CoT':
+    """
+Here we will provide eight chain-of-thought exemplars, where a few chain of thought demonstrations are provided as exemplars in prompting, followed by a question that needs to be answered.
+
+Input Event: My body cast a shadow over the grass
+Question: Please select the cause of the input event from the following options.
+Option 1: The sun was rising
+Option 2: The grass was cut.
+Answer (Option 1 or Option 2 ?) with chain-of-thought:
+The question is about the cause. The shadow is mostly being cast by the speaker’s body. There must be a light source in the correct position to form the shadow. Thus, the sun is the most plausible cause of the shadow. Therefore, the answer is Option 1: The sun was rising.
+
+Input Event: I hung up the phone.
+Question: Please select the cause of the input event from the following options.
+Option 1: The caller said goodbye to me.
+Option 2: The caller identified himself to me
+Answer (Option 1 or Option 2 ?) with chain-of-thought:
+The question is about the cause. People always hung up the phone after the ending of their conversation. People usually end a conversation by saying goodbye. Thus, the caller mostly said goodbye to the speaker. Therefore, the answer is Option 1: The caller said goodbye to me.
+
+Input Event: The cook stirred the ingredients in the bowl.
+Question: Please select the effect of the input event from the following options.
+Option 1: The ingredients melted.
+Option 2: The ingredients blended together.
+Answer (Option 1 or Option 2 ?) with chain-of-thought:
+The question is about the effect. Stirring is a common method used in cooking to blend and mix ingredients. Thus, the effect of stirring is blend together the ingredients.  Therefore, the answer is Option 2: the ingredients blended together.
+
+Input Event: The book became a huge bestseller.
+Question: Please select the effect of the input event from the following options.
+Option 1: It was adapted into a movie.
+Option 2: The author faded into obscurity.
+Answer (Option 1 or Option 2 ?) with chain-of-thought:
+The question is about the effect. When a book becomes a huge bestseller, it often attracts the attention of filmmakers and can lead to movie adaptations, and authors generally gain more recognition and fame. Thus, Option 1 seems to be the more plausible effect. Therefore, the answer is Option 1: it was adapted into a movie.
+
+Input Event: The man anticipated cold weather on his trip.
+Question: Please select the effect of the input event from the following options.
+Option 1: He packed warm clothing in his suitcase.
+Option 2: He travelled with a big suitcase.
+Answer (Option 1 or Option 2 ?) with chain-of-thought:
+The question is about the effect. When someone expects cold weather, it is logical for them to pack appropriate clothing to stay warm during their journey. Thus, Option 1 is a reasonable response to the anticipation of cold weather. Therefore, the answer is Option 1: The man anticipated cold weather on his trip.
+
+Input Event: I turned on the fan.
+Question: Please select the effect of the input event from the following options.
+Option 1: Water sprinkled onto my skin.
+Option 2: I felt cool air pass over me.
+Answer (Option 1 or Option 2 ?) with chain-of-thought:
+The question is about the effect. A typical function of a fan is to circulates air and creates a cooling effect. Therefore, the correct answer is Option 2: I felt cool air pass over me.
+
+Input Event: The woman struggled to walk.
+Question: Please select the cause of the input event from the following options.
+Option 1: She wore high heels.
+Option 2: She took off her shoes.
+Answer (Option 1 or Option 2 ?) with chain-of-thought:
+The question is about the cause. High heels can be uncomfortable and challenging to walk in for some individual. Thus Option 1 (She wore high heels) seems to be the more plausible cause of the woman struggling to walk. Therefore the answer is Option 1: She wore high heels.
+
+Input Event: I vacuumed the carpet.
+Question: Please select the cause of the input event from the following options.
+Option 1: My roommate spilled punch.
+Option 2: My dog shed hair.
+Answer (Option 1 or Option 2 ?) with chain-of-thought:
+The question is about the cause. Pets, especially dogs, often shed hair, which can accumulate on the carpet and necessitate vacuuming to keep the carpet clean and tidy. Thus the dog hair may be a more plausible reason for this question. Therefore, the answer is Option 2: My dog shed hair.
+
+Input Event: %s
+Question: Please select the %s of the input event from the following options.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):
+""",
+    'manual-CoT-CN':
+    """如下为三个使用思维链进行推理的问题
+
+输入事件：那个女孩许了一个愿望。
+问题：请从以下选项中选择输入事件的原因。
+选项一：她看到了一只黑猫。
+选项二：她看到了一颗流星。
+答案（选项一或选项二？）：人们在看到流星时会许愿，因此答案是选项二。
+
+输入事件：龙卷风袭击了这座城镇。
+问题：请从以下选项中选择输入事件的结果。
+选项一：法院大楼的屋顶被吹掉了。
+选项二：公路结冰了，很危险。
+答案（选项一或选项二？）：龙卷风通常会带来强风，破坏建筑物，因此答案是选项一。
+
+输入事件：商店收银员叫保安了。
+问题：请从以下选项中选择输入事件的原因。
+选项一：客户使用了假钞。
+选项二：客户忘记关车灯了。
+答案（选项一或选项二？）：商店收银员叫保安通常是因为有可疑和异常情况，包括客户用假钞，因此答案是选项一。
+
+输入事件：%s
+问题：请从以下选项中选择输入事件的%s。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+    'explicit-function':
+    """You are a helpful assistant for causal discovery.
+Input Event: %s
+Question: Please select the %s of the input event from the following options.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):""",
+    'explicit-function-CN':
+    """你是一个用于因果发现的得力助手。
+输入事件：%s
+问题：请从以下选项中选择输入事件的%s。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['premise'], item['ask-for'],
+                                        item['hypothesis1'],
+                                        item['hypothesis2'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/PCD-C_E-CARE.py
+++ b/opencompass/datasets/calm/data_processing/prompt/PCD-C_E-CARE.py
@ -0,0 +1,252 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Input Event: %s
+Question: Please select the %s of the input event from the following options.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):""",
+    'basic-CN':
+    """输入事件：%s
+问题：请从以下选项中选择输入事件的%s。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+    'adversarial-ignore':
+    """Input Event: %s
+Question: Please select the %s of the input event from the following options.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):""",
+    'adversarial-ignore-CN':
+    """输入事件：%s
+问题：请从以下选项中选择输入事件的%s。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+    'adversarial-doubt':
+    """Input Event: %s
+Question: Please select the %s of the input event from the following options.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):""",
+    'adversarial-doubt-CN':
+    """输入事件：%s
+问题：请从以下选项中选择输入事件的%s。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+    'zero-shot-IcL':
+    """Select the cause or effect of the input event from two options.
+Input Event: %s
+Question: Please select the %s of the input event from the following options.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):""",
+    'zero-shot-IcL-CN':
+    """从两个选项中选择输入事件的原因或结果。
+输入事件：%s
+问题：请从以下选项中选择输入事件的%s。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+    'one-shot-IcL':
+    """Select the cause or effect of the input event from two options.
+Input Event: My body cast a shadow over the grass.
+Question: Please select the cause of the input event from the following options.
+Option 1: The sun was rising.
+Option 2: The grass was cut.
+Answer (Option 1 or Option 2 ?): Option 1
+Input Event: %s
+Question: Please select the %s of the input event from the following options.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):""",
+    'one-shot-IcL-CN':
+    """从两个选项中选择输入事件的原因或结果。
+输入事件：我的身体投下了阴影，落在草地上。
+问题：请从以下选项中选择输入事件的原因。
+选项一：太阳正在升起。
+选项二：草被割了。
+答案（选项一或选项二？）：选项一
+输入事件：%s
+问题：请从以下选项中选择输入事件的%s。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+    'three-shot-IcL':
+    """Select the cause or effect of the input event from two options.
+Input Event: My body cast a shadow over the grass.
+Question: Please select the cause of the input event from the following options.
+Option 1: The sun was rising.
+Option 2: The grass was cut.
+Answer (Option 1 or Option 2 ?): Option 1
+
+Input Event: The politician lost the election.
+Question: Please select the cause of the input event from the following options.
+Option 1: He ran negative campaign ads.
+Option 2: No one voted for him.
+Answer (Option 1 or Option 2 ?): Option 2
+
+Input Event: The physician misdiagnosed the patient.
+Question: Please select the effect of the input event from the following options.
+Option 1: The patient filed a malpractice lawsuit against the physician.
+Option 2: The patient disclosed confidential information to the physician.
+Answer (Option 1 or Option 2 ?): Option 1
+
+Input Event: %s
+Question: Please select the %s of the input event from the following options.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):""",
+    'three-shot-IcL-CN':
+    """从两个选项中选择输入事件的原因或结果。
+输入事件：我的身体投下了阴影，落在草地上。
+问题：请从以下选项中选择输入事件的原因。
+选项一：太阳正在升起。
+选项二：草被割了。
+答案（选项一或选项二？）：选项一
+
+输入事件：政治家在选举中落败了。
+问题：请从以下选项中选择输入事件的原因。
+选项一：他播放了负面竞选广告。
+选项二：没有人投票给他。
+答案（选项一或选项二？）：选项二
+
+输入事件：这位医生误诊了病人。
+问题：请从以下选项中选择输入事件的结果。
+选项一：病人向医生提起了医疗事故诉讼。
+选项二：患者向医生透露了机密信息。
+答案（选项一或选项二？）：选项一
+
+输入事件：%s
+问题：请从以下选项中选择输入事件的%s。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+    'zero-shot-CoT':
+    """Input Event: %s
+Question: Please select the %s of the input event from the following options. Let's think step by step.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):""",
+    'zero-shot-CoT-CN':
+    """输入事件：%s
+问题：请从以下选项中选择输入事件的%s。请逐步思考。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+    'manual-CoT':
+    """Here we will provide eight chain-of-thought exemplars, where a few chain of thought demonstrations are provided as exemplars in prompting, followed by a question that needs to be answered.
+
+Input Event: Black's sweat always drips into his eyes.
+Question: Please select the cause of the input event from the following options.
+Option 1: Black pulled out his eyelashes for beauty.
+Option 2: Black doesn't like sleeping.
+Answer (Option 1 or Option 2 ?) with chain-of-thought:
+The question is about the cause. If Black intentionally removed his eyelashes, it could potentially lead to sweat dripping into his eyes due to the lack of eyelashes to provide some protection. Therefore, the answer is Option 1: Black pulled out his eyelashes for beauty.
+
+Input Event: It's half way through autumn.
+Question: Please select the effect of the input event from the following options.
+Option 1: It has difficulty in running.
+Option 2: It rains more in half autumn than in spring and summer combined.
+Answer (Option 1 or Option 2 ?) with chain-of-thought:
+The question is about the effect. Autumn is commonly associated with changing weather patterns, including increased rainfall in some regions. During the half autumn, there is more rainfall compared to the combined total of spring and summer. Therefore, the answer is Option 2: It rains more in half autumn than in spring and summer combined.
+
+Input Event: The man  planned to make Tin  by himself.
+Question: Please select the effect of the input event from the following options.
+Option 1: He had to design the necessary components.
+Option 2: He found cassiterite, carbon and  furnace.
+Answer (Option 1 or Option 2 ?) with chain-of-thought:
+The question is about the effect. Cassiterite, carbon and furnace are essential components for the process of extracting tin from its mineral deposit. Thus, finding cassiterite, carbon and furnace is a clear and relevant effect resulting from the man's plan to make Tin by himself. Therefore, the answer is Option 2: He found cassiterite, carbon and  furnace.
+
+Input Event: He was shocked by his chemical deficiency.
+Question: Please select the cause of the input event from the following options.
+Option 1: The food Tom ate contained bacteria.
+Option 2: The patient with addiction watched the neuroscientist's lecture.
+Answer (Option 1 or Option 2 ?) with chain-of-thought:
+The question is about the cause. Neuroscience gives a deep insight of chemical imbalances in one's body. He might come to realize the lack of some  nutrients in his body after the neuroscience lecture, thus felt shocked. Therefore, the answer is Option 2: The patient with addiction watched the neuroscientist's lecture.
+
+Input Event: Tom bought a lot of mangoes and coconuts.
+Question: Please select the cause of the input event from the following options.
+Option 1: Tom buys imported tropical fruit every day.
+Option 2: The doctor advised Tom to eat more tropical fruits to supplement his vitamins.
+Answer (Option 1 or Option 2 ?) with chain-of-thought:
+The question is about the cause. The doctor's advice to eat more tropical fruits, like mangoes and coconuts, as a source of vitamins could make Tom buy a lot of mangoes and coconuts. Therefore, the answer is Option 2: The doctor advised Tom to eat more tropical fruits to supplement his vitamins.
+
+Input Event: Waterwheels started work efficiently.
+Question: Please select the effect of the input event from the following options.
+Option 1: The mills can set to work.
+Option 2: The scientists have invented replacements.
+Answer (Option 1 or Option 2 ?) with chain-of-thought:
+The question is about the effect. When the waterwheels are working efficiently, it enables the mills to start operating. Therefore, the answer is Option 1: The mills can set to work.
+
+Input Event: Mary has two pieces of farmland, but only one of them is used to grow crops every year.
+Question: Please select the effect of the input event from the following options.
+Option 1: The often used farmland produces a lot more crops than the less often used one.
+Option 2: The less often used farmland produces more crops than the often used one.
+Answer (Option 1 or Option 2 ?) with chain-of-thought:
+The question is about the effect. Since regular cultivation can lead to healthier soil and better yields, the farmland that is used more frequently for growing crops is more productive. Therefore, the answer is Option 1: The often used farmland produces a lot more crops than the less often used one.
+
+Input Event: He can just see something clearly in a short distance.
+Question: Please select the cause of the input event from the following options.
+Option 1: Tom turned on his flashlight.
+Option 2: Tom measured the energy of the lightning during a thunderstorm.
+Answer (Option 1 or Option 2 ?) with chain-of-thought:
+The question is about the cause. Turning on a flashlight provides additional light in the immediate vicinity, making objects visible in a short distance. Therefore, the answer is Option 1: Tom turned on his flashlight.
+
+Input Event: %s
+Question: Please select the %s of the input event from the following options.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?) :""",
+    'manual-CoT-CN':
+    """如下为三个使用思维链进行推理的问题:
+
+输入事件：莎莉的喉咙严重发炎了。
+问题：请从以下选项中选择输入事件的结果。
+选项一：莎莉发不出声音。
+选项二：她的眼睛受伤了。
+答案（选项一或选项二？）：喉咙严重发炎会导致声音嘶哑或失声，因此答案是选项一。
+
+输入事件：很多昆虫都被它们吃掉了。
+问题：请从以下选项中选择输入事件的原因。
+选项一：果园里有很多麻雀。
+选项二：人类需要营养丰富的食物来维持生存。
+答案（选项一或选项二？）：麻雀等鸟类喜欢吃昆虫，因此答案是选项一。
+
+输入事件：它具有糖酵解功能。
+问题：请从以下选项中选择输入事件的原因。
+选项一：肌原纤维中含有不同数量的肌原丝。
+选项二：这种酶促进葡萄糖的分解。
+答案（选项一或选项二？）：酶有促进糖降解的功能，因此答案是选项二。
+
+输入事件：%s
+问题：请从以下选项中选择输入事件的%s。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+    'explicit-function':
+    """You are a helpful assistant for causal discovery.
+Input Event: %s
+Question: Please select the %s of the input event from the following options.
+Option 1: %s
+Option 2: %s
+Answer (Option 1 or Option 2 ?):""",
+    'explicit-function-CN':
+    """你是一个用于因果发现的得力助手。
+输入事件：%s
+问题：请从以下选项中选择输入事件的%s。
+选项一：%s
+选项二：%s
+答案（选项一或选项二？）：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['premise'], item['ask-for'],
+                                        item['hypothesis1'],
+                                        item['hypothesis2'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/PN.py
+++ b/opencompass/datasets/calm/data_processing/prompt/PN.py
@ -0,0 +1,172 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}:""",
+    'basic-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}：""",
+    'adversarial-ignore':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}:""",
+    'adversarial-ignore-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}：""",
+    'adversarial-doubt':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}:""",
+    'adversarial-doubt-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}：""",
+    'zero-shot-IcL':
+    """Answer questions about the Probability of Necessity (PN). Calculating the Probability of Necessity involves examining the outcomes of individuals who received the treatment and experienced the desired effect. The Probability of Necessity is the proportion of these individuals for whom the treatment was essential to achieve the outcome, meaning they would not have achieved the outcome without the treatment.
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}:""",
+    'zero-shot-IcL-CN':
+    """回答有关必要性概率 (PN) 的问题。必要概率的计算涉及对接受治疗并取得预期效果的个体的结果进行检查。必要概率是指在这些人中，治疗对取得疗效至关重要的比例，也就是说，如果没有治疗，他们就不会取得疗效。
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}：
+""",
+    'one-shot-IcL':
+    """Answer questions about the Probability of Necessity (PN). Calculating the Probability of Necessity involves examining the outcomes of individuals who received the treatment and experienced the desired effect. The Probability of Necessity is the proportion of these individuals for whom the treatment was essential to achieve the outcome, meaning they would not have achieved the outcome without the treatment.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: A company has a direct effect on company revenue. A company has a direct effect on company expenses. A company has a direct effect on company profit. Company revenue has a direct effect on company expenses.
+For those with a company being inefficient, the probability of company revenue being low is 0.3878. The probability of a company being inefficient and company revenue being low is 0.1900. The probability of a company being efficient and company revenue being high is 0.3871.
+Instruction: Consider the probability of necessity (PN) of a company on company revenue.
+Question: Given that a company was efficient and company revenue was high, what is the upper bound of the probability of the company revenue would have been low if the a company had been inefficient?
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}: {"PROB": "0.5110"}
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}:""",
+    'one-shot-IcL-CN':
+    """回答有关必要性概率 (PN) 的问题。必要概率的计算涉及对接受治疗并取得预期效果的个体的结果进行检查。必要概率是指在这些人中，治疗对取得疗效至关重要的比例，也就是说，如果没有治疗，他们就不会取得疗效。
+
+输入信息：设想一个只有以下条件，而没有其他因素或因果关系的假设世界：一个公司对公司收入有直接影响。一个公司对公司费用有直接影响。一个公司对公司利润有直接影响。公司收入对公司费用
+有直接影响。
+在一个公司为低效的条件下, 公司收入为低的概率为0.3878。一个公司为低效且公司收入为低的概率为0.1900。一个公司为高效且公司收入为高的概率为0.3871。
+指令：考虑一个公司作用于公司收入的必要性概率(probability of necessity, PN)。
+问题：给定一个公司为高效且公司收入为高, 那么假如一个公司为低效，此时公司收入为低的概率的上界是多少？
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}： {"PROB":"0.5110"}
+
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}：""",
+    'two-shot-IcL':
+    """Answer questions about the Probability of Necessity (PN). Calculating the Probability of Necessity involves examining the outcomes of individuals who received the treatment and experienced the desired effect. The Probability of Necessity is the proportion of these individuals for whom the treatment was essential to achieve the outcome, meaning they would not have achieved the outcome without the treatment.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: A company has a direct effect on company revenue. A company has a direct effect on company expenses. A company has a direct effect on company profit. Company revenue has a direct effect on company expenses.
+For those with a company being inefficient, the probability of company revenue being low is 0.3878. The probability of a company being inefficient and company revenue being low is 0.1900. The probability of a company being efficient and company revenue being high is 0.3871.
+Instruction: Consider the probability of necessity (PN) of a company on company revenue.
+Question: Given that a company was efficient and company revenue was high, what is the upper bound of the probability of the company revenue would have been low if the a company had been inefficient?
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}: {"PROB": "0.5110"}
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Czvl has a direct effect on bsaz. Vimz has a direct effect on bsaz. Bsaz has a direct effect on tava.
+For those with vimz being low, the probability of bsaz being low is 0.4591. The probability of vimz being low and bsaz being low is 0.1278. The probability of vimz being high and bsaz being high is 0.1813.
+Instruction: Consider the probability of necessity (PN) of vimz on bsaz.
+Question: Given that vimz was high and bsaz was high, what is the upper bound of the probability of the bsaz would have been low if the vimz had been low?
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}: {"PROB": "1.0000"}
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}:""",
+    'zero-shot-CoT':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s Let's think step by step.
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}:""",
+    'zero-shot-CoT-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s请逐步思考。
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}：""",
+    'manual-CoT':
+    """Here are two examples for math problems about probability of necessity (PN) task with chain of thought.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Market demand has a direct effect on amount of exercise. Market demand has a direct effect on weather condition. Market demand has a direct effect on sales performance. Amount of exercise has a direct effect on sales performance. Weather condition has a direct effect on sales performance.
+For those with market demand being low, the probability of sales performance being high is 0.3144. The probability of sales performance being high is 0.3216. The probability of market demand being high and sales performance being high is 0.1890.
+Instruction: Consider the probability of necessity (PN) of market demand on sales performance.
+Question: Given that market demand was high and sales performance was high, what is the lower bound of the probability of the sales performance would have been low if the market demand had been low?
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}: With A represents market demand, C represents weather condition and D represents sales performance, we have P(D=1|C=0,A=0)=0.4246; P(A=0)=0.4216; P(D=1|C=0,A=1)=0.4459; P(A=1)=0.5784; P(D=1)=0.3216; P(C=1,D=1)=0.1293; Calculate P(D=1|do(A=0))=P(D=1|A=0)=0.3144, then lower bound of PN is max{0, [P(D=1)-P(D=1|do(A=0))]/P(A=1,D=1)}\n=max{0, (0.3216-0.3144)/0.1890}\n=max{0, 0.0380}\n=0.0380. The answer is:  {"PROB": "0.0380"}.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Mktt has a direct effect on oroo. Mktt has a direct effect on tlxp. Mktt has a direct effect on enck. Oroo has a direct effect on tlxp.
+For those with oroo being low and mktt being low, the probability of tlxp being low is 0.5355. The probability of mktt being low is 0.6363. For those with oroo being low and mktt being high, the probability of tlxp being low is 0.2443. The probability of mktt being high is 0.3637. The probability of oroo being low and tlxp being low is 0.3148. The probability of oroo being high and tlxp being high is 0.1731.
+Instruction: Consider the probability of necessity (PN) of oroo on tlxp.
+Question: Given that oroo was high and tlxp was high, what is the upper bound of the probability of the tlxp would have been low if the oroo had been low?
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}: With A represents mktt, B represents oroo and C represents tlxp, we have: P(C=0|B=0,A=0)=0.5355; P(A=0)=0.6363; P(C=0|B=0,A=1)=0.2443; P(A=1)=0.3637; P(B=0,C=0)=0.3148; P(B=1,C=1)=0.1731; Calculate P(C=0|do(B=0))=sum_{A} P(C=0|B=0,A)*P(A)=P(C=0|B=0,A=0)*P(A=0)+P(C=0|B=0,A=1)*P(A=1), then the upper bound of PN is min{1, [P(C=0)|do(B=0)-P(B=0,C=0)]/P(B=1,C=1)}\n=min{1, (0.4296-0.3148)/0.1731}\n=min{1, 0.6632}\n=0.6632. The answer is:  {"PROB": "0.6632"}.
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}:""",
+    'manual-CoT-CN':
+    """如下为一个使用思维链进行推理的关于必要性概率(probability of necessity, PN)的数学问题：
+
+输入信息：设想一个只有以下条件，而没有其他因素或因果关系的假设世界：Cegl对mwcg有直接影响。Cegl对jeie有直接影响。Mwcg对jeie有直接影响。
+在cegl为低的条件下, mwcg为高的概率为0.6879。mwcg为高的概率为0.8162。cegl为高且mwcg为高的概率为0.6351。
+指令：考虑cegl作用于mwcg的必要性概率(probability of necessity, PN)。
+问题：给定cegl为高且mwcg为高, 那么假如cegl为低，此时mwcg为低的概率的下界是多少？
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}：用A代表cegl, B代表mwcg，所以P(B=1|A=0)=0.6879; P=0.8162; P(A=1,B=1)=0.6351; 计算PN的下界为max{0, [P-P(B=1|do)]/P(A=1,B=1)}\n=max{0, (0.8162-0.6879)/0.6351}\n=max{0, 0.2020}\n=0.2020。因此答案为{"PROB":"0.2020"}。
+
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}：""",
+    'explicit-function':
+    """You are a helpful assistant for math probability.
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'explicit-function-CN':
+    """你是一个用于计算数学概率的得力助手。
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['given_info'],
+                                        item['Background']['data_info'],
+                                        item['Instruction'], item['Question'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/prompt/PS.py
+++ b/opencompass/datasets/calm/data_processing/prompt/PS.py
@ -0,0 +1,171 @@
+# flake8: noqa: E501
+base_prompt_dict = {
+    'basic':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}:""",
+    'basic-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}：""",
+    'adversarial-ignore':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}:""",
+    'adversarial-ignore-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}：""",
+    'adversarial-doubt':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}:""",
+    'adversarial-doubt-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}：""",
+    'zero-shot-IcL':
+    """Answer questions about the Probability of Sufficiency (PS). Calculating the Probability of Sufficiency involves looking at the outcomes of individuals who received the treatment and experienced the desired effect. The Probability of Sufficiency is the proportion of these individuals for whom the treatment was enough to achieve the outcome, even if other pathways could also have led to the same result.
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}:""",
+    'zero-shot-IcL-CN':
+    """回答有关充分概率 (PS) 的问题。计算 "充分概率 "需要查看接受治疗并取得预期效果的个体的结果。充分概率是指即使其他途径也可能导致相同结果，但对这些人来说，治疗足以实现结果的比例。
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}：""",
+    'one-shot-IcL':
+    """Answer questions about the Probability of Sufficiency (PS). Calculating the Probability of Sufficiency involves looking at the outcomes of individuals who received the treatment and experienced the desired effect. The Probability of Sufficiency is the proportion of these individuals for whom the treatment was enough to achieve the outcome, even if other pathways could also have led to the same result.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Level of education has a direct effect on job performance. Job performance has a direct effect on salary. Job performance has a direct effect on job satisfaction. Salary has a direct effect on job satisfaction.
+For those with job performance being excellent, the probability of salary being low is 0.0539. The probability of salary being low is 0.0857. The probability of job performance being poor and salary being low is 0.0585.
+Instruction: Consider the probability of sufficiency (PS) of job performance on salary.
+Question: Given that job performance was poor and salary was low, what is the lower bound of the probability that salary would have been high if the job performance had been excellent?
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}: {"PROB": "0.5436"}
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}:""",
+    'one-shot-IcL-CN':
+    """回答有关充分概率 (PS) 的问题。计算 "充分概率 "需要查看接受治疗并取得预期效果的个体的结果。充分概率是指即使其他途径也可能导致相同结果，但对这些人来说，治疗足以实现结果的比例。
+
+输入信息：设想一个只有以下条件，而没有其他因素或因果关系的假设世界：教育水平对工作表现有直接影响。工作表现对薪水有直接影响。工作表现对工作满意度有直接影响。薪水对工作满
+意度有直接影响。
+在工作表现为出色的条件下, 薪水为低的概率为0.0539。薪水为低的概率为0.0857。工作表现为差劲且薪水为低的概率为0.0585。
+指令：考虑工作表现作用于薪水的充分性概率(probability of sufficiency, PS)。
+问题：给定工作表现为差劲且薪水为低, 假如工作表现为出色，此时薪水为高的概率的下界是多少？
+请根据上述信息，给出计算结果（答案保留四位小数），并给出最终答案“是“或”否“。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}： {"PROB":"0.5436"}
+
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}：""",
+    'two-shot-IcL':
+    """Answer questions about the Probability of Sufficiency (PS). Calculating the Probability of Sufficiency involves looking at the outcomes of individuals who received the treatment and experienced the desired effect. The Probability of Sufficiency is the proportion of these individuals for whom the treatment was enough to achieve the outcome, even if other pathways could also have led to the same result.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Level of education has a direct effect on job performance. Job performance has a direct effect on salary. Job performance has a direct effect on job satisfaction. Salary has a direct effect on job satisfaction.
+For those with job performance being excellent, the probability of salary being low is 0.0539. The probability of salary being low is 0.0857. The probability of job performance being poor and salary being low is 0.0585.
+Instruction: Consider the probability of sufficiency (PS) of job performance on salary.
+Question: Given that job performance was poor and salary was low, what is the lower bound of the probability that salary would have been high if the job performance had been excellent?
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}: {"PROB": "0.5436"}
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Ajlk has a direct effect on mzbw. Mzbw has a direct effect on bduo. Mzbw has a direct effect on vlmn. Bduo has a direct effect on vlmn.
+For those with ajlk being high, the probability of mzbw being low is 0.1978. The probability of mzbw being low is 0.2593. The probability of ajlk being low and mzbw being low is 0.1797.
+Instruction: Consider the probability of sufficiency (PS) of ajlk on mzbw.
+Question: Given that ajlk was low and mzbw was low, what is the lower bound of the probability that mzbw would have been high if the ajlk had been high?
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}: {"PROB": "0.3422"}
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}:""",
+    'zero-shot-CoT':
+    """Input Info: %s
+%s
+Instruction: %s
+Question: %s Let's think step by step.
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}:""",
+    'zero-shot-CoT-CN':
+    """输入信息：%s
+%s
+指令：%s
+问题：%s请逐步思考。
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}：""",
+    'manual-CoT':
+    """Here are two examples for math problems about probability of sufficiency (PS) task with chain of thought.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Time spent exercising has a direct effect on physical fitness level. Time spent exercising has a direct effect on overall health condition.
+For those with time spent exercising being adequate, the probability of overall health condition being poor is 0.0635. The probability of overall health condition being poor is 0.0912. The probability of time spent exercising being not enough and overall health condition being poor is 0.0534.
+Instruction: Consider the probability of sufficiency (PS) of time spent exercising on overall health condition.
+Question: Given that time spent exercising was not enough and overall health condition was poor, what is the lower bound of the probability that overall health condition would have been good if the time spent exercising had been adequate?
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}: With A represents time spent exercising and C represents overall health condition, we have P(C=0|A=1)=0.0635; P(C=0)=0.0912; P(A=0,C=0)=0.0534; CalculateP(C=0|do(A=1))=P(C=0|A=1)=0.0635, then the lower bound of PS is max{0, [P(C=0)-P(C=0|do(A=1))]/P(A=0,C=0)}\n=max{0, (0.0912-0.0635)/0.0534}\n=max{0, 0.5187}\n=0.5187. The answer is: {"PROB": "0.5187"}.
+
+Input Info: Imagine a self-contained, hypothetical world with only the following conditions, and without any unmentioned factors or causal relationships: Iykj has a direct effect on nptw. Iykj has a direct effect on tmex. Nptw has a direct effect on sgnl. Sgnl has a direct effect on tmex.
+For those with nptw being high, the probability of sgnl being low is 0.0632. The probability of sgnl being low is 0.0781. The probability of nptw being low and sgnl being low is 0.0375.
+Instruction: Consider the probability of sufficiency (PS) of nptw on sgnl.
+Question: Given that nptw was low and sgnl was low, what is the lower bound of the probability that sgnl would have been high if the nptw had been high?
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}: With B represents nptw, C represents sgnl, we have P(C=0|B=1)=0.0632; P(C=0)=0.0781; P(B=0,C=0)=0.0375; Calculate P(C=0|do(B=1))=P(C=0|B=1)=0.0632, then the lower bound of PS is max{0, [P(C=0)-P(C=0|do(B=1))]/P(B=0,C=0)}\n=max{0, (0.0781-0.0632)/0.0375}\n=max{0, 0.3973}\n=0.3973. The answer is: {"PROB": "0.3973"}.
+
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places in JSON format, like {"PROB": "0.1234"}:""",
+    'manual-CoT-CN':
+    """如下为一个使用思维链进行推理的关于充分性概率(probability of sufficiency, PS)的数学问题：
+
+输入信息：设想一个只有以下条件，而没有其他因素或因果关系的假设世界：Clwa对hvxd有直接影响。Clwa对szak有直接影响。
+在clwa为高的条件下, hvxd为低的概率为0.5569。hvxd为低的概率为0.6454。clwa为低且hvxd为低的概率为0.3623。
+指令：考虑clwa作用于hvxd的充分性概率(probability of sufficiency, PS)。
+问题：给定clwa为低且hvxd为低, 假如clwa为高，此时hvxd为高的概率的下界是多少？
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}：用A代表clwa, B代表hvxd，所以P(B=0|A=1)=0.5569; P=0.6454; P(A=0,B=0)=0.3623; 计算P(B=0|do)=P(B=0|A=1)=0.5569，所以PS的下界:为max{0, [P-P(B=0|do)]/P(A=0,B=0)}\n=max{0, (0.6454-0.5569)/0.3623}\n=max{0, 0.2443}\n=0.2443。因此答案为{"PROB":"0.2443"}。
+
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"PROB":"0.1234"}：""",
+    'explicit-function':
+    """You are a helpful assistant for math probability.
+Input Info: %s
+%s
+Instruction: %s
+Question: %s
+Provide the calculation result to four decimal places in JSON format, like {"ANSWER": "Yes", "PROB": "0.1234"}:""",
+    'explicit-function-CN':
+    """你是一个用于计算数学概率的得力助手。
+输入信息：%s
+%s
+指令：%s
+问题：%s
+请根据上述信息，给出计算结果（答案保留四位小数）。请以JSON格式返回最终结果，例如，{"ANSWER":"是","PROB":"0.1234"}：""",
+}
+
+
+def get_prompt(task_name, prompt_style, item, prompt_style_str=''):
+    base = base_prompt_dict[prompt_style]
+
+    prompt = prompt_style_str + base % (item['given_info'],
+                                        item['Background']['data_info'],
+                                        item['Instruction'], item['Question'])
+    return prompt
--- a/opencompass/datasets/calm/data_processing/task_hiearchy.py
+++ b/opencompass/datasets/calm/data_processing/task_hiearchy.py
@ -0,0 +1,125 @@
+task_hiearchy_dict = {
+    # association/
+    # correlation/
+    'CORR-B_correlation_CN': 'association/correlation/',
+    'CORR-B_correlation_EN': 'association/correlation/',
+    # explaining_away_effect/
+    'EAE-B_exp-away_CN': 'association/explaining_away_effect/',
+    'EAE-B_exp-away_EN': 'association/explaining_away_effect/',
+    # causal_discovery/
+    # abstract_reasoning/
+    'AR-B_CaLM-AR_CN': 'causal_discovery/abstract_reasoning/',
+    'AR-B_CaLM-AR_EN': 'causal_discovery/abstract_reasoning/',
+    # causal_attribution/
+    'CA-B_FA_CN': 'causal_discovery/causal_attribution/',
+    'CA-B_FA_EN': 'causal_discovery/causal_attribution/',
+    'CA-B_FP_CN': 'causal_discovery/causal_attribution/',
+    'CA-B_FP_EN': 'causal_discovery/causal_attribution/',
+    # event_causality_identification/
+    'ECI-B_CTB_CN': 'causal_discovery/event_causality_identification/',
+    'ECI-B_CTB_EN': 'causal_discovery/event_causality_identification/',
+    'ECI-B_ESC_CN': 'causal_discovery/event_causality_identification/',
+    'ECI-B_ESC_EN': 'causal_discovery/event_causality_identification/',
+    'ECI-B_MAVEN-ERE_CN': 'causal_discovery/event_causality_identification/',
+    'ECI-B_MAVEN-ERE_EN': 'causal_discovery/event_causality_identification/',
+    # pairwise_causal_discovery/
+    'PCD-B_COPA_CN': 'causal_discovery/pairwise_causal_discovery/',
+    'PCD-B_COPA_EN': 'causal_discovery/pairwise_causal_discovery/',
+    'PCD-B_E-CARE_CN': 'causal_discovery/pairwise_causal_discovery/',
+    'PCD-B_E-CARE_EN': 'causal_discovery/pairwise_causal_discovery/',
+    'PCD-C_COPA_CN': 'causal_discovery/pairwise_causal_discovery/',
+    'PCD-C_COPA_EN': 'causal_discovery/pairwise_causal_discovery/',
+    'PCD-C_E-CARE_CN': 'causal_discovery/pairwise_causal_discovery/',
+    'PCD-C_E-CARE_EN': 'causal_discovery/pairwise_causal_discovery/',
+    # counterfactual/
+    # actual_causality/
+    'AC-B_causal_judgement_CN': 'counterfactual/actual_causality/',
+    'AC-B_causal_judgement_EN': 'counterfactual/actual_causality/',
+    # causal_explanation_generation/
+    'CEG-O_E-CARE_CN': 'counterfactual/causal_explanation_generation/',
+    'CEG-O_E-CARE_EN': 'counterfactual/causal_explanation_generation/',
+    # counterfactual_reasoning/
+    'CR-B_det-counterfactual_CN': 'counterfactual/counterfactual_reasoning/',
+    'CR-B_det-counterfactual_EN': 'counterfactual/counterfactual_reasoning/',
+    'CR-C_CRASS_CN': 'counterfactual/counterfactual_reasoning/',
+    'CR-C_CRASS_EN': 'counterfactual/counterfactual_reasoning/',
+    # effect_of_the_treatment_on_the_treated/
+    'ETT-B_ETT-natural_CN':
+    'counterfactual/effect_of_the_treatment_on_the_treated/',
+    'ETT-B_ETT-natural_EN':
+    'counterfactual/effect_of_the_treatment_on_the_treated/',
+    'ETT-P_ETT-basic_CN':
+    'counterfactual/effect_of_the_treatment_on_the_treated/',
+    'ETT-P_ETT-basic_EN':
+    'counterfactual/effect_of_the_treatment_on_the_treated/',
+    'ETT-P_ETT-hard_CN':
+    'counterfactual/effect_of_the_treatment_on_the_treated/',
+    'ETT-P_ETT-hard_EN':
+    'counterfactual/effect_of_the_treatment_on_the_treated/',
+    # natural_direct_effect/
+    'NDE-B_NDE-natural_CN': 'counterfactual/natural_direct_effect/',
+    'NDE-B_NDE-natural_EN': 'counterfactual/natural_direct_effect/',
+    'NDE-P_NDE-basic_CN': 'counterfactual/natural_direct_effect/',
+    'NDE-P_NDE-basic_EN': 'counterfactual/natural_direct_effect/',
+    'NDE-P_NDE-hard_CN': 'counterfactual/natural_direct_effect/',
+    'NDE-P_NDE-hard_EN': 'counterfactual/natural_direct_effect/',
+    # natural_indirect_effect/
+    'NIE-B_NIE-natural_CN': 'counterfactual/natural_indirect_effect/',
+    'NIE-B_NIE-natural_EN': 'counterfactual/natural_indirect_effect/',
+    'NIE-P_NIE-basic_CN': 'counterfactual/natural_indirect_effect/',
+    'NIE-P_NIE-basic_EN': 'counterfactual/natural_indirect_effect/',
+    'NIE-P_NIE-hard_CN': 'counterfactual/natural_indirect_effect/',
+    'NIE-P_NIE-hard_EN': 'counterfactual/natural_indirect_effect/',
+    # probability_of_necessity/
+    'PN-P_PN-basic_CN': 'counterfactual/probability_of_necessity/',
+    'PN-P_PN-basic_EN': 'counterfactual/probability_of_necessity/',
+    'PN-P_PN-hard_CN': 'counterfactual/probability_of_necessity/',
+    'PN-P_PN-hard_EN': 'counterfactual/probability_of_necessity/',
+    # probability_of_sufficiency/
+    'PS-P_PS-basic_CN': 'counterfactual/probability_of_sufficiency/',
+    'PS-P_PS-basic_EN': 'counterfactual/probability_of_sufficiency/',
+    'PS-P_PS-hard_CN': 'counterfactual/probability_of_sufficiency/',
+    'PS-P_PS-hard_EN': 'counterfactual/probability_of_sufficiency/',
+    # intervention/
+    # average_treatment_effect/
+    'ATE-B_ATE-natural_CN': 'intervention/average_treatment_effect/',
+    'ATE-B_ATE-natural_EN': 'intervention/average_treatment_effect/',
+    'ATE-P_ATE-basic_CN': 'intervention/average_treatment_effect/',
+    'ATE-P_ATE-basic_EN': 'intervention/average_treatment_effect/',
+    'ATE-P_ATE-hard_CN': 'intervention/average_treatment_effect/',
+    'ATE-P_ATE-hard_EN': 'intervention/average_treatment_effect/',
+    # backdoor_adjustment_set/
+    'BAS-B_backadj_CN': 'intervention/backdoor_adjustment_set/',
+    'BAS-B_backadj_EN': 'intervention/backdoor_adjustment_set/',
+    'BAS-C_max-BAS_CN': 'intervention/backdoor_adjustment_set/',
+    'BAS-C_max-BAS_EN': 'intervention/backdoor_adjustment_set/',
+    'BAS-C_min-BAS_CN': 'intervention/backdoor_adjustment_set/',
+    'BAS-C_min-BAS_EN': 'intervention/backdoor_adjustment_set/',
+    'BAS-C_mix-BAS_CN': 'intervention/backdoor_adjustment_set/',
+    'BAS-C_mix-BAS_EN': 'intervention/backdoor_adjustment_set/',
+    # causal_effect_identification/
+    'CEI-B_0.2-UC_CN': 'intervention/causal_effect_identification/',
+    'CEI-B_0.2-UC_EN': 'intervention/causal_effect_identification/',
+    'CEI-B_0.4-UC_CN': 'intervention/causal_effect_identification/',
+    'CEI-B_0.4-UC_EN': 'intervention/causal_effect_identification/',
+    'CEI-B_0.6-UC_CN': 'intervention/causal_effect_identification/',
+    'CEI-B_0.6-UC_EN': 'intervention/causal_effect_identification/',
+    'CEI-B_0.8-UC_CN': 'intervention/causal_effect_identification/',
+    'CEI-B_0.8-UC_EN': 'intervention/causal_effect_identification/',
+    # collider_bias/
+    'CB-B_collider-bias_CN': 'intervention/collider_bias/',
+    'CB-B_collider-bias_EN': 'intervention/collider_bias/',
+    # controlled_direct_effect/
+    'CDE-B_CDE-natural_CN': 'intervention/controlled_direct_effect/',
+    'CDE-B_CDE-natural_EN': 'intervention/controlled_direct_effect/',
+    'CDE-P_CDE-basic_CN': 'intervention/controlled_direct_effect/',
+    'CDE-P_CDE-basic_EN': 'intervention/controlled_direct_effect/',
+    'CDE-P_CDE-hard_CN': 'intervention/controlled_direct_effect/',
+    'CDE-P_CDE-hard_EN': 'intervention/controlled_direct_effect/',
+    # frontdoor_adjustment_set/
+    'FAS-C_FAS_CN': 'intervention/frontdoor_adjustment_set/',
+    'FAS-C_FAS_EN': 'intervention/frontdoor_adjustment_set/',
+    # instrumental_variable/
+    'IV-C_CaLM-IV_CN': 'intervention/instrumental_variable/',
+    'IV-C_CaLM-IV_EN': 'intervention/instrumental_variable/',
+}
--- a/opencompass/datasets/calm/evaluation/accuracy/choice.py
+++ b/opencompass/datasets/calm/evaluation/accuracy/choice.py
@ -0,0 +1,4 @@
+def compute_acc(gt_list, pred_list):
+    correct_num = sum(pred == gt for gt, pred in zip(gt_list, pred_list))
+    acc = correct_num / len(gt_list)
+    return acc
--- a/opencompass/datasets/calm/evaluation/accuracy/open-ended.py
+++ b/opencompass/datasets/calm/evaluation/accuracy/open-ended.py
@ -0,0 +1,31 @@
+import jieba
+from rouge import Rouge
+
+
+def is_chinese(text):
+    for char in text:
+        if '\u4e00' <= char <= '\u9fff':
+            return True
+    return False
+
+
+def compute_acc(gt_list, pred_list):
+    rouge_l = 0
+    rouge = Rouge()
+
+    for pred, gold in zip(pred_list, gt_list):
+        if is_chinese(pred):
+            prediction = ' '.join(jieba.cut(pred))
+            gold = ' '.join(jieba.cut(gold))
+        else:
+            prediction = pred
+            gold = gold
+
+        try:
+            scores = rouge.get_scores(prediction, gold)
+            rouge_l += scores[0]['rouge-l']['r']
+        except Exception as e:
+            print(f'copmute rouge_l error occurred: {e}')
+            continue
+    avg_rougel = rouge_l / len(gt_list)
+    return avg_rougel
--- a/opencompass/datasets/calm/evaluation/accuracy/prob.py
+++ b/opencompass/datasets/calm/evaluation/accuracy/prob.py
@ -0,0 +1,9 @@
+def compute_acc(gt_list, pred_list):
+    correct_num = 0
+    for pred, gold in zip(pred_list, gt_list):
+        kept_pred = round(pred, 4) if (pred is not None) else pred
+        kept_gold = round(gold, 4)
+        if kept_pred == kept_gold:
+            correct_num += 1
+    acc = correct_num / len(gt_list)
+    return acc
--- a/opencompass/datasets/calm/evaluation/core_metrics.py
+++ b/opencompass/datasets/calm/evaluation/core_metrics.py
@ -0,0 +1,320 @@
+# flake8: noqa: E501
+import importlib
+import json
+from pathlib import Path
+
+task_to_accuracy_module_map = {
+    # association/
+    # correlation/
+    'CORR-B_correlation_CN': 'choice',
+    'CORR-B_correlation_EN': 'choice',
+    # explaining_away_effect/
+    'EAE-B_exp-away_CN': 'choice',
+    'EAE-B_exp-away_EN': 'choice',
+    # causal_discovery/
+    # abstract_reasoning/
+    'AR-B_CaLM-AR_CN': 'choice',
+    'AR-B_CaLM-AR_EN': 'choice',
+    # causal_attribution/
+    'CA-B_FA_CN': 'choice',
+    'CA-B_FA_EN': 'choice',
+    'CA-B_FP_CN': 'choice',
+    'CA-B_FP_EN': 'choice',
+    # event_causality_identification/
+    'ECI-B_CTB_CN': 'choice',
+    'ECI-B_CTB_EN': 'choice',
+    'ECI-B_ESC_CN': 'choice',
+    'ECI-B_ESC_EN': 'choice',
+    'ECI-B_MAVEN-ERE_CN': 'choice',
+    'ECI-B_MAVEN-ERE_EN': 'choice',
+    # pairwise_causal_discovery/
+    'PCD-B_COPA_CN': 'choice',
+    'PCD-B_COPA_EN': 'choice',
+    'PCD-B_E-CARE_CN': 'choice',
+    'PCD-B_E-CARE_EN': 'choice',
+    'PCD-C_COPA_CN': 'choice',
+    'PCD-C_COPA_EN': 'choice',
+    'PCD-C_E-CARE_CN': 'choice',
+    'PCD-C_E-CARE_EN': 'choice',
+    # counterfactual/
+    # actual_causality/
+    'AC-B_causal_judgement_CN': 'choice',
+    'AC-B_causal_judgement_EN': 'choice',
+    # causal_explanation_generation/
+    'CEG-O_E-CARE_CN': 'open-ended',
+    'CEG-O_E-CARE_EN': 'open-ended',
+    # counterfactual_reasoning/
+    'CR-B_det-counterfactual_CN': 'choice',
+    'CR-B_det-counterfactual_EN': 'choice',
+    'CR-C_CRASS_CN': 'choice',
+    'CR-C_CRASS_EN': 'choice',
+    # effect_of_the_treatment_on_the_treated/
+    'ETT-B_ETT-natural_CN': 'choice',
+    'ETT-B_ETT-natural_EN': 'choice',
+    'ETT-P_ETT-basic_CN': 'prob',
+    'ETT-P_ETT-basic_EN': 'prob',
+    'ETT-P_ETT-hard_CN': 'prob',
+    'ETT-P_ETT-hard_EN': 'prob',
+    # natural_direct_effect/
+    'NDE-B_NDE-natural_CN': 'choice',
+    'NDE-B_NDE-natural_EN': 'choice',
+    'NDE-P_NDE-basic_CN': 'prob',
+    'NDE-P_NDE-basic_EN': 'prob',
+    'NDE-P_NDE-hard_CN': 'prob',
+    'NDE-P_NDE-hard_EN': 'prob',
+    # natural_indirect_effect/
+    'NIE-B_NIE-natural_CN': 'choice',
+    'NIE-B_NIE-natural_EN': 'choice',
+    'NIE-P_NIE-basic_CN': 'prob',
+    'NIE-P_NIE-basic_EN': 'prob',
+    'NIE-P_NIE-hard_CN': 'prob',
+    'NIE-P_NIE-hard_EN': 'prob',
+    # probability_of_necessity/
+    'PN-P_PN-basic_CN': 'prob',
+    'PN-P_PN-basic_EN': 'prob',
+    'PN-P_PN-hard_CN': 'prob',
+    'PN-P_PN-hard_EN': 'prob',
+    # probability_of_sufficiency/
+    'PS-P_PS-basic_CN': 'prob',
+    'PS-P_PS-basic_EN': 'prob',
+    'PS-P_PS-hard_CN': 'prob',
+    'PS-P_PS-hard_EN': 'prob',
+    # intervention/
+    # average_treatment_effect/
+    'ATE-B_ATE-natural_CN': 'choice',
+    'ATE-B_ATE-natural_EN': 'choice',
+    'ATE-P_ATE-basic_CN': 'prob',
+    'ATE-P_ATE-basic_EN': 'prob',
+    'ATE-P_ATE-hard_CN': 'prob',
+    'ATE-P_ATE-hard_EN': 'prob',
+    # backdoor_adjustment_set/
+    'BAS-B_backadj_CN': 'choice',
+    'BAS-B_backadj_EN': 'choice',
+    'BAS-C_max-BAS_CN': 'choice',
+    'BAS-C_max-BAS_EN': 'choice',
+    'BAS-C_min-BAS_CN': 'choice',
+    'BAS-C_min-BAS_EN': 'choice',
+    'BAS-C_mix-BAS_CN': 'choice',
+    'BAS-C_mix-BAS_EN': 'choice',
+    # causal_effect_identification/
+    'CEI-B_0.2-UC_CN': 'choice',
+    'CEI-B_0.2-UC_EN': 'choice',
+    'CEI-B_0.4-UC_CN': 'choice',
+    'CEI-B_0.4-UC_EN': 'choice',
+    'CEI-B_0.6-UC_CN': 'choice',
+    'CEI-B_0.6-UC_EN': 'choice',
+    'CEI-B_0.8-UC_CN': 'choice',
+    'CEI-B_0.8-UC_EN': 'choice',
+    # collider_bias/
+    'CB-B_collider-bias_CN': 'choice',
+    'CB-B_collider-bias_EN': 'choice',
+    # controlled_direct_effect/
+    'CDE-B_CDE-natural_CN': 'choice',
+    'CDE-B_CDE-natural_EN': 'choice',
+    'CDE-P_CDE-basic_CN': 'prob',
+    'CDE-P_CDE-basic_EN': 'prob',
+    'CDE-P_CDE-hard_CN': 'prob',
+    'CDE-P_CDE-hard_EN': 'prob',
+    # frontdoor_adjustment_set/
+    'FAS-C_FAS_CN': 'choice',
+    'FAS-C_FAS_EN': 'choice',
+    # instrumental_variable/
+    'IV-C_CaLM-IV_CN': 'choice',
+    'IV-C_CaLM-IV_EN': 'choice',
+}
+
+
+def initialize_core_metric_evaluation_components(task):
+    """Loads the labeling and accuracy functions dynamically based on the
+    specified task for core metric computation.
+
+    Parameters:
+    - task: The specific task to load functions for.
+
+    Returns:
+    - Tuple containing the ground truth labeling function, prediction labeling function,
+      and the accuracy function.
+
+    Raises:
+    - NotImplementedError: If no functions are found for the specified task.
+    """
+    task_to_labeling_module_map = {
+        # association/
+        # correlation/
+        'CORR-B_correlation_CN': 'CLADDER',
+        'CORR-B_correlation_EN': 'CLADDER',
+        # explaining_away_effect/
+        'EAE-B_exp-away_CN': 'CLADDER',
+        'EAE-B_exp-away_EN': 'CLADDER',
+        # causal_discovery/
+        # abstract_reasoning/
+        'AR-B_CaLM-AR_CN': 'AR-B_CaLM-AR',
+        'AR-B_CaLM-AR_EN': 'AR-B_CaLM-AR',
+        # causal_attribution/
+        'CA-B_FA_CN': 'CA-B_FA',
+        'CA-B_FA_EN': 'CA-B_FA',
+        'CA-B_FP_CN': 'CA-B_FP',
+        'CA-B_FP_EN': 'CA-B_FP',
+        # event_causality_identification/
+        'ECI-B_CTB_CN': 'ECI',
+        'ECI-B_CTB_EN': 'ECI',
+        'ECI-B_ESC_CN': 'ECI',
+        'ECI-B_ESC_EN': 'ECI',
+        'ECI-B_MAVEN-ERE_CN': 'ECI',
+        'ECI-B_MAVEN-ERE_EN': 'ECI',
+        # pairwise_causal_discovery/
+        'PCD-B_COPA_CN': 'PCD-B',
+        'PCD-B_COPA_EN': 'PCD-B',
+        'PCD-B_E-CARE_CN': 'PCD-B',
+        'PCD-B_E-CARE_EN': 'PCD-B',
+        'PCD-C_COPA_CN': 'PCD-C',
+        'PCD-C_COPA_EN': 'PCD-C',
+        'PCD-C_E-CARE_CN': 'PCD-C',
+        'PCD-C_E-CARE_EN': 'PCD-C',
+        # counterfactual/
+        # actual_causality/
+        'AC-B_causal_judgement_CN': 'AC-B_causal_judgement',
+        'AC-B_causal_judgement_EN': 'AC-B_causal_judgement',
+        # causal_explanation_generation/
+        'CEG-O_E-CARE_CN': 'CEG-O_E-CARE',
+        'CEG-O_E-CARE_EN': 'CEG-O_E-CARE',
+        # counterfactual_reasoning/
+        'CR-B_det-counterfactual_CN': 'CLADDER',
+        'CR-B_det-counterfactual_EN': 'CLADDER',
+        'CR-C_CRASS_CN': 'CR-C_CRASS',
+        'CR-C_CRASS_EN': 'CR-C_CRASS',
+        # effect_of_the_treatment_on_the_treated/
+        'ETT-B_ETT-natural_CN': 'Natural',
+        'ETT-B_ETT-natural_EN': 'Natural',
+        'ETT-P_ETT-basic_CN': 'Probability',
+        'ETT-P_ETT-basic_EN': 'Probability',
+        'ETT-P_ETT-hard_CN': 'Probability',
+        'ETT-P_ETT-hard_EN': 'Probability',
+        # natural_direct_effect/
+        'NDE-B_NDE-natural_CN': 'Natural',
+        'NDE-B_NDE-natural_EN': 'Natural',
+        'NDE-P_NDE-basic_CN': 'Probability',
+        'NDE-P_NDE-basic_EN': 'Probability',
+        'NDE-P_NDE-hard_CN': 'Probability',
+        'NDE-P_NDE-hard_EN': 'Probability',
+        # natural_indirect_effect/
+        'NIE-B_NIE-natural_CN': 'Natural',
+        'NIE-B_NIE-natural_EN': 'Natural',
+        'NIE-P_NIE-basic_CN': 'Probability',
+        'NIE-P_NIE-basic_EN': 'Probability',
+        'NIE-P_NIE-hard_CN': 'Probability',
+        'NIE-P_NIE-hard_EN': 'Probability',
+        # probability_of_necessity/
+        'PN-P_PN-basic_CN': 'Probability',
+        'PN-P_PN-basic_EN': 'Probability',
+        'PN-P_PN-hard_CN': 'Probability',
+        'PN-P_PN-hard_EN': 'Probability',
+        # probability_of_sufficiency/
+        'PS-P_PS-basic_CN': 'Probability',
+        'PS-P_PS-basic_EN': 'Probability',
+        'PS-P_PS-hard_CN': 'Probability',
+        'PS-P_PS-hard_EN': 'Probability',
+        # intervention/
+        # average_treatment_effect/
+        'ATE-B_ATE-natural_CN': 'Natural',
+        'ATE-B_ATE-natural_EN': 'Natural',
+        'ATE-P_ATE-basic_CN': 'Probability',
+        'ATE-P_ATE-basic_EN': 'Probability',
+        'ATE-P_ATE-hard_CN': 'Probability',
+        'ATE-P_ATE-hard_EN': 'Probability',
+        # backdoor_adjustment_set/
+        'BAS-B_backadj_CN': 'CLADDER',
+        'BAS-B_backadj_EN': 'CLADDER',
+        'BAS-C_max-BAS_CN': 'AS',
+        'BAS-C_max-BAS_EN': 'AS',
+        'BAS-C_min-BAS_CN': 'AS',
+        'BAS-C_min-BAS_EN': 'AS',
+        'BAS-C_mix-BAS_CN': 'AS',
+        'BAS-C_mix-BAS_EN': 'AS',
+        # causal_effect_identification/
+        'CEI-B_0.2-UC_CN': 'CEI-B',
+        'CEI-B_0.2-UC_EN': 'CEI-B',
+        'CEI-B_0.4-UC_CN': 'CEI-B',
+        'CEI-B_0.4-UC_EN': 'CEI-B',
+        'CEI-B_0.6-UC_CN': 'CEI-B',
+        'CEI-B_0.6-UC_EN': 'CEI-B',
+        'CEI-B_0.8-UC_CN': 'CEI-B',
+        'CEI-B_0.8-UC_EN': 'CEI-B',
+        # collider_bias/
+        'CB-B_collider-bias_CN': 'CLADDER',
+        'CB-B_collider-bias_EN': 'CLADDER',
+        # controlled_direct_effect/
+        'CDE-B_CDE-natural_CN': 'Natural',
+        'CDE-B_CDE-natural_EN': 'Natural',
+        'CDE-P_CDE-basic_CN': 'Probability',
+        'CDE-P_CDE-basic_EN': 'Probability',
+        'CDE-P_CDE-hard_CN': 'Probability',
+        'CDE-P_CDE-hard_EN': 'Probability',
+        # frontdoor_adjustment_set/
+        'FAS-C_FAS_CN': 'AS',
+        'FAS-C_FAS_EN': 'AS',
+        # instrumental_variable/
+        'IV-C_CaLM-IV_CN': 'AS',
+        'IV-C_CaLM-IV_EN': 'AS',
+    }
+
+    labeling_module_name = task_to_labeling_module_map.get(task)
+    if labeling_module_name:
+        labeling_module = importlib.import_module(
+            f'opencompass.datasets.calm.evaluation.labeling.{labeling_module_name}'
+        )
+        get_ground_truth_label = labeling_module.get_gt_label
+        get_predicted_label = labeling_module.get_pred_label
+    else:
+        raise NotImplementedError(
+            f'No labeling functions found for task {task}.')
+
+    accuracy_module_name = task_to_accuracy_module_map.get(task)
+    if accuracy_module_name:
+        accuracy_module = importlib.import_module(
+            f'opencompass.datasets.calm.evaluation.accuracy.{accuracy_module_name}'
+        )
+        get_accuracy = accuracy_module.compute_acc
+    else:
+        raise NotImplementedError(
+            f'No accuracy functions found for task {task}.')
+
+    return get_ground_truth_label, get_predicted_label, get_accuracy
+
+
+def compute_core_metrics(items, task, prompt_style, gt_items):
+    """Computes core metrics for a given set of items based on the ground truth
+    items.
+
+    Args:
+        items (list): The list of items generated by the model.
+        task (str): The task type.
+        prompt_style (str): The prompt style.
+        gt_items (list): The list of ground truth items.
+
+    Returns:
+        tuple: A tuple containing the computed core metrics dictionary and the list of predicted labels.
+
+    Raises:
+        AssertionError: If there is an index mismatch between items and ground truth items.
+    """
+    core_metrics_dict = {}
+    get_gt_label, get_pred_label, compute_acc = initialize_core_metric_evaluation_components(
+        task)
+    gt_list, pred_list, pred_AP_list = [], [], []
+
+    # get labels
+    assert len(items) == len(
+        gt_items), 'Length mismatch between items and ground truth items.'
+    for item, gt_item in zip(items, gt_items):
+        gt_label = get_gt_label(gt_item)
+
+        type = task.split('-')[0]
+        pred_label = get_pred_label(item, gt_item, prompt_style, type)
+        gt_list.append(gt_label)
+        pred_list.append(pred_label)
+
+    # compute metrics
+    core_metrics_dict['Accuracy'] = compute_acc(gt_list, pred_list)
+
+    return core_metrics_dict, pred_list
--- a/opencompass/datasets/calm/evaluation/error/basic_adversarial/AC-B_causal_judgement.py
+++ b/opencompass/datasets/calm/evaluation/error/basic_adversarial/AC-B_causal_judgement.py
@ -0,0 +1,47 @@
+# flake8: noqa: E501
+import re
+
+
+def check_standalization(model_response, prompt_style, type):
+    if model_response.startswith(
+        ('no', '否', 'yes', '是', '- yes', '- 是', '- no', '- 否')):
+        return 0
+    else:
+        return 1
+
+
+def check_empty(model_response):
+    if model_response == '':
+        return 1
+    else:
+        return 0
+
+
+def check_repetition(model_response):
+    if any(response in model_response for response in [
+            'answer (yes or no ?)',
+            'question: how would a typical person answer each of the following questions about causation?',
+            '答案（是或否？）', '问题：对于以下关于因果关系的问题，一个普通人会怎么回答？'
+    ]):
+        return 1
+    else:
+        return 0
+
+
+def contains_chinese(text):
+    chinese_pattern = re.compile(r'[\u4e00-\u9fff]+')
+    result = 1 if chinese_pattern.search(text) is not None else 0
+    return result
+
+
+def contains_english(text):
+    english_pattern = re.compile(r'[A-Za-z]{2,}')
+    result = 1 if english_pattern.search(text) is not None else 0
+
+    return result
+
+
+def check_abnormality(preds):
+    abnormalities = 'All Yes' if all(pred == 1 for pred in preds) else \
+                    'All No' if all(pred == 0 for pred in preds) else 0
+    return abnormalities
--- a/opencompass/datasets/calm/evaluation/error/basic_adversarial/AR-B_CaLM-AR.py
+++ b/opencompass/datasets/calm/evaluation/error/basic_adversarial/AR-B_CaLM-AR.py
@ -0,0 +1,42 @@
+import re
+
+
+def check_standalization(model_response, prompt_style, type):
+    if model_response.startswith(('no', '否', 'yes', '是')):
+        return 0
+    else:
+        return 1
+
+
+def check_empty(model_response):
+    if model_response == '':
+        return 1
+    else:
+        return 0
+
+
+def check_repetition(model_response):
+    if any(response in model_response for response in
+           ['answer (yes or no ?)', 'input event: if', '输入信息：如果', '答案（是或否？）']):
+        return 1
+    else:
+        return 0
+
+
+def contains_chinese(text):
+    chinese_pattern = re.compile(r'[\u4e00-\u9fff]+')
+    result = 1 if chinese_pattern.search(text) is not None else 0
+    return result
+
+
+def contains_english(text):
+    english_pattern = re.compile(r'[A-Za-z]{2,}')
+    result = 1 if english_pattern.search(text) is not None else 0
+
+    return result
+
+
+def check_abnormality(preds):
+    abnormalities = 'All Yes' if all(pred == 1 for pred in preds) else \
+                    'All No' if all(pred == 0 for pred in preds) else 0
+    return abnormalities
--- a/opencompass/datasets/calm/evaluation/error/basic_adversarial/AS.py
+++ b/opencompass/datasets/calm/evaluation/error/basic_adversarial/AS.py
@ -0,0 +1,49 @@
+# flake8: noqa: E501
+import re
+
+
+def check_standalization(model_response, prompt_style, type):
+    if model_response.startswith(
+        ('option 1', 'option 2', 'option 3', '选项一', '选项二', '选项三')):
+        return 0
+    else:
+        return 1
+
+
+def check_empty(model_response):
+    if model_response == '':
+        return 1
+    else:
+        return 0
+
+
+def check_repetition(model_response):
+    if any(response in model_response for response in [
+            'answer (option 1 or option 2 or option 3 ?)',
+            'you will be presented with a causal graph in the following form:',
+            '答案（选项一或选项二或选项三？）', '给定如下因果图'
+    ]):
+        return 1
+    else:
+        return 0
+
+
+def contains_chinese(text):
+    chinese_pattern = re.compile(r'[\u4e00-\u9fff]+')
+    result = 1 if chinese_pattern.search(text) is not None else 0
+
+    return result
+
+
+def contains_english(text):
+    english_pattern = re.compile(r'[A-Za-z]{2,}')
+    result = 1 if english_pattern.search(text) is not None else 0
+
+    return result
+
+
+def check_abnormality(preds):
+    abnormalities = 'All option1' if all(pred == 1 for pred in preds) else \
+                    'All option2' if all(pred == 2 for pred in preds) else \
+                    'All option3' if all(pred == 3 for pred in preds) else 0
+    return abnormalities
--- a/opencompass/datasets/calm/evaluation/error/basic_adversarial/CA-B.py
+++ b/opencompass/datasets/calm/evaluation/error/basic_adversarial/CA-B.py
@ -0,0 +1,45 @@
+import re
+
+
+def check_standalization(model_response, prompt_style, type):
+    if model_response.startswith(('no', '否', 'yes', '是')):
+        return 0
+    else:
+        return 1
+
+
+def check_empty(model_response):
+    if model_response == '':
+        return 1
+    else:
+        return 0
+
+
+def check_repetition(model_response):
+    if any(response in model_response for response in [
+            'answer (yes or no ?)',
+            'you will be presented with a causal graph in the following form:',
+            '答案（是或否？）', '给定如下因果图'
+    ]):
+        return 1
+    else:
+        return 0
+
+
+def contains_chinese(text):
+    chinese_pattern = re.compile(r'[\u4e00-\u9fff]+')
+    result = 1 if chinese_pattern.search(text) is not None else 0
+    return result
+
+
+def contains_english(text):
+    english_pattern = re.compile(r'[A-Za-z]{2,}')
+    result = 1 if english_pattern.search(text) is not None else 0
+
+    return result
+
+
+def check_abnormality(preds):
+    abnormalities = 'All Yes' if all(pred == 1 for pred in preds) else \
+                    'All No' if all(pred == 0 for pred in preds) else 0
+    return abnormalities
--- a/opencompass/datasets/calm/evaluation/error/basic_adversarial/CEI-B.py
+++ b/opencompass/datasets/calm/evaluation/error/basic_adversarial/CEI-B.py
@ -0,0 +1,46 @@
+import re
+
+
+def check_standalization(model_response, prompt_style, type):
+    if model_response.startswith(('no', '否', 'yes', '是')):
+        return 0
+    else:
+        return 1
+
+
+def check_empty(model_response):
+    if model_response == '':
+        return 1
+    else:
+        return 0
+
+
+def check_repetition(model_response):
+    if any(response in model_response for response in [
+            'answer (yes or no ?)',
+            'you will be presented with a causal graph in the following form:',
+            '答案（是或否？）', '给定如下因果图'
+    ]):
+        return 1
+    else:
+        return 0
+
+
+def contains_chinese(text):
+    chinese_pattern = re.compile(r'[\u4e00-\u9fff]+')
+    result = 1 if chinese_pattern.search(text) is not None else 0
+
+    return result
+
+
+def contains_english(text):
+    english_pattern = re.compile(r'[A-Za-z]{2,}')
+    result = 1 if english_pattern.search(text) is not None else 0
+
+    return result
+
+
+def check_abnormality(preds):
+    abnormalities = 'All Yes' if all(pred == 1 for pred in preds) else \
+                    'All No' if all(pred == 0 for pred in preds) else 0
+    return abnormalities
--- a/opencompass/datasets/calm/evaluation/error/basic_adversarial/CLADDER.py
+++ b/opencompass/datasets/calm/evaluation/error/basic_adversarial/CLADDER.py
@ -0,0 +1,43 @@
+import re
+
+
+def check_standalization(model_response, prompt_style, type):
+    if model_response.startswith(('no', '否', 'yes', '是')):
+        return 0
+    else:
+        return 1
+
+
+def check_empty(model_response):
+    if model_response == '':
+        return 1
+    else:
+        return 0
+
+
+def check_repetition(model_response):
+    if any(response in model_response for response in
+           ['answer (yes or no ?)', 'input info', '输入信息：', '答案（是或否？）']):
+        return 1
+    else:
+        return 0
+
+
+def contains_chinese(text):
+    chinese_pattern = re.compile(r'[\u4e00-\u9fff]+')
+    result = 1 if chinese_pattern.search(text) is not None else 0
+
+    return result
+
+
+def contains_english(text):
+    english_pattern = re.compile(r'[A-Za-z]{2,}')
+    result = 1 if english_pattern.search(text) is not None else 0
+
+    return result
+
+
+def check_abnormality(preds):
+    abnormalities = 'All Yes' if all(pred == 1 for pred in preds) else \
+                    'All No' if all(pred == 0 for pred in preds) else 0
+    return abnormalities
--- a/opencompass/datasets/calm/evaluation/error/basic_adversarial/CR-C_CRASS.py
+++ b/opencompass/datasets/calm/evaluation/error/basic_adversarial/CR-C_CRASS.py
@ -0,0 +1,49 @@
+import re
+
+
+def check_standalization(model_response, prompt_style, type):
+    if model_response.startswith(
+        ('1', '2', '3', '4', 'option 1', 'option 2', 'option 3', 'option 4',
+         '选项一', '选项二', '选项三', '选项四')):
+        return 0
+    else:
+        return 1
+
+
+def check_empty(model_response):
+    if model_response == '':
+        return 1
+    else:
+        return 0
+
+
+def check_repetition(model_response):
+    if any(response in model_response for response in [
+            'answer (option 1 or 2 or 3 or 4?)', 'input event:',
+            '答案（选项一或选项二或选项三或选项四？）', '输入事件：'
+    ]):
+        return 1
+    else:
+        return 0
+
+
+def contains_chinese(text):
+    chinese_pattern = re.compile(r'[\u4e00-\u9fff]+')
+    result = 1 if chinese_pattern.search(text) is not None else 0
+
+    return result
+
+
+def contains_english(text):
+    english_pattern = re.compile(r'[A-Za-z]{2,}')
+    result = 1 if english_pattern.search(text) is not None else 0
+
+    return result
+
+
+def check_abnormality(preds):
+    abnormalities = 'All option1' if all(pred == 1 for pred in preds) else \
+                    'All option2' if all(pred == 2 for pred in preds) else \
+                    'All option3' if all(pred == 3 for pred in preds) else \
+                    'All option4' if all(pred == 4 for pred in preds) else 0
+    return abnormalities
--- a/opencompass/datasets/calm/evaluation/error/basic_adversarial/ECI.py
+++ b/opencompass/datasets/calm/evaluation/error/basic_adversarial/ECI.py
@ -0,0 +1,46 @@
+import re
+
+
+def check_standalization(model_response, prompt_style, type):
+    if model_response.startswith(('no', '否', 'yes', '是')):
+        return 0
+    else:
+        return 1
+
+
+def check_empty(model_response):
+    if model_response == '':
+        return 1
+    else:
+        return 0
+
+
+def check_repetition(model_response):
+    if any(response in model_response for response in [
+            'answer (yes or no ?)',
+            'you will be presented with a causal graph in the following form:',
+            '答案（是或否？）', '给定如下因果图'
+    ]):
+        return 1
+    else:
+        return 0
+
+
+def contains_chinese(text):
+    chinese_pattern = re.compile(r'[\u4e00-\u9fff]+')
+    result = 1 if chinese_pattern.search(text) is not None else 0
+
+    return result
+
+
+def contains_english(text):
+    english_pattern = re.compile(r'[A-Za-z]{2,}')
+    result = 1 if english_pattern.search(text) is not None else 0
+
+    return result
+
+
+def check_abnormality(preds):
+    abnormalities = 'All Yes' if all(pred == 1 for pred in preds) else \
+                    'All No' if all(pred == 0 for pred in preds) else 0
+    return abnormalities
--- a/opencompass/datasets/calm/evaluation/error/basic_adversarial/Natural.py
+++ b/opencompass/datasets/calm/evaluation/error/basic_adversarial/Natural.py
@ -0,0 +1,49 @@
+# flake8: noqa: E501
+import re
+
+
+def check_standalization(model_response, prompt_style, type):
+    if model_response.startswith(("{\"answer\":")) and model_response.endswith(
+        ('}')):
+        return 0
+    else:
+        return 1
+
+
+def check_empty(model_response):
+    if model_response == '':
+        return 1
+    else:
+        return 0
+
+
+def check_repetition(model_response):
+    if any(response in model_response for response in [
+            'input info: imagine a self-contained',
+            "provide the calculation result to four decimal places and a final \"yes\" or \"no\" answer in json format",
+            '输入信息：设想一个', '请根据上述信息，给出计算结果（答案保留四位小数）'
+    ]):
+        return 1
+    else:
+        return 0
+
+
+def contains_chinese(text):
+    chinese_pattern = re.compile(r'[\u4e00-\u9fff]+')
+    result = 1 if chinese_pattern.search(text) is not None else 0
+    return result
+
+
+def contains_english(text):
+    english_pattern = re.compile(
+        r'[A-Za-z]{7,}'
+    )  # Taking into account 'fake' and 'random' modes, and considering that the shortest occurrence of English characters in an 'answer' is of length 6, therefore detecting lengths of 7 or more.
+    result = 1 if english_pattern.search(text) is not None else 0
+
+    return result
+
+
+def check_abnormality(preds):
+    abnormalities = 'All Yes' if all(pred == 'yes' or pred == '是' for pred in preds) else \
+                    'All No' if all(pred == 'no' or pred == '否' for pred in preds) else 0
+    return abnormalities
--- a/opencompass/datasets/calm/evaluation/error/basic_adversarial/PCD-B.py
+++ b/opencompass/datasets/calm/evaluation/error/basic_adversarial/PCD-B.py
@ -0,0 +1,42 @@
+import re
+
+
+def check_standalization(model_response, prompt_style, type):
+    if model_response.startswith(('no', '否', 'yes', '是')):
+        return 0
+    else:
+        return 1
+
+
+def check_empty(model_response):
+    if model_response == '':
+        return 1
+    else:
+        return 0
+
+
+def check_repetition(model_response):
+    if any(response in model_response for response in
+           ['answer (yes or no ?)', 'event a:', '答案（是或否？）', '事件一：']):
+        return 1
+    else:
+        return 0
+
+
+def contains_chinese(text):
+    chinese_pattern = re.compile(r'[\u4e00-\u9fff]+')
+    result = 1 if chinese_pattern.search(text) is not None else 0
+    return result
+
+
+def contains_english(text):
+    english_pattern = re.compile(r'[A-Za-z]{2,}')
+    result = 1 if english_pattern.search(text) is not None else 0
+
+    return result
+
+
+def check_abnormality(preds):
+    abnormalities = 'All Yes' if all(pred == 1 for pred in preds) else \
+                    'All No' if all(pred == 0 for pred in preds) else 0
+    return abnormalities
--- a/opencompass/datasets/calm/evaluation/error/basic_adversarial/PCD-C.py
+++ b/opencompass/datasets/calm/evaluation/error/basic_adversarial/PCD-C.py
@ -0,0 +1,44 @@
+import re
+
+
+def check_standalization(model_response, prompt_style, type):
+    if model_response.startswith(('option 1', 'option 2', '选项一', '选项二')):
+        return 0
+    else:
+        return 1
+
+
+def check_empty(model_response):
+    if model_response == '':
+        return 1
+    else:
+        return 0
+
+
+def check_repetition(model_response):
+    if any(response in model_response for response in [
+            'answer (option 1 or option 2 ?)', 'input event:', '答案（选项一或选项二？）',
+            '输入事件：'
+    ]):
+        return 1
+    else:
+        return 0
+
+
+def contains_chinese(text):
+    chinese_pattern = re.compile(r'[\u4e00-\u9fff]+')
+    result = 1 if chinese_pattern.search(text) is not None else 0
+    return result
+
+
+def contains_english(text):
+    english_pattern = re.compile(r'[A-Za-z]{2,}')
+    result = 1 if english_pattern.search(text) is not None else 0
+
+    return result
+
+
+def check_abnormality(preds):
+    abnormalities = 'All option1' if all(pred == 0 for pred in preds) else \
+                    'All option2' if all(pred == 1 for pred in preds) else 0
+    return abnormalities
--- a/opencompass/datasets/calm/evaluation/error/basic_adversarial/Probability.py
+++ b/opencompass/datasets/calm/evaluation/error/basic_adversarial/Probability.py
@ -0,0 +1,63 @@
+# flake8: noqa: E501
+import re
+
+
+def check_standalization(model_response, prompt_style, type):
+    if any(match in type for match in ['NIE', 'NDE', 'ETT', 'CDE', 'ATE']):
+        if model_response.startswith(
+            ("{\"answer\":")) and model_response.endswith(('}')):
+            return 0
+        else:
+            return 1
+    elif any(match in type for match in ['PN', 'PS']):
+        if model_response.startswith(
+            ("{\"prob\":")) and model_response.endswith(('}')):
+            return 0
+        else:
+            return 1
+
+
+def check_empty(model_response):
+    if model_response == '':
+        return 1
+    else:
+        return 0
+
+
+def check_repetition(model_response):
+    if any(response in model_response for response in [
+            'input info: imagine a self-contained',
+            'provide the calculation result to four decimal places',
+            '输入信息：设想一个', '请根据上述信息，给出计算结果（答案保留四位小数）'
+    ]):
+        return 1
+    else:
+        return 0
+
+
+def contains_chinese(text):
+    chinese_pattern = re.compile(r'[\u4e00-\u9fff]+')
+    result = 1 if chinese_pattern.search(text) is not None else 0
+
+    return result
+
+
+def contains_english(text):
+    english_pattern = re.compile(r'[A-Za-z]{7,}')
+    # Taking into account 'fake' and 'random' modes, and
+    # considering that the shortest occurrence of English characters
+    # in an 'answer' is of length 6, therefore detecting
+    # lengths of 7 or more.
+    result = 1 if english_pattern.search(text) is not None else 0
+
+    return result
+
+
+def check_abnormality(preds):
+    affect_num = sum(
+        1 for pred in preds if pred == 0.1234
+    )  # 0.1234 is the example value in prompt for probability computation
+    affected = affect_num / len(preds)
+    abnormalities = 'All Yes' if affected == 1 else \
+                    'All No' if all(pred == 0 for pred in preds) else 0
+    return abnormalities
--- a/opencompass/datasets/calm/evaluation/errors.py
+++ b/opencompass/datasets/calm/evaluation/errors.py
@ -0,0 +1,253 @@
+# flake8: noqa: E501
+import importlib
+import json
+import os
+from pathlib import Path
+
+from ..evaluation.core_metrics import \
+    initialize_core_metric_evaluation_components
+
+
+def initialize_error_identification_components(task, prompt_style):
+    """Initialize error identification components.
+
+    Args:
+        task (str): The task for which error identification components are being initialized.
+        prompt_style (str): The style of prompt for error identification.
+
+    Returns:
+        Module: The error identification module corresponding to the provided task and prompt style.
+    """
+    prompt_style_to_error_module_map = {
+        'basic': 'basic_adversarial',
+        'basic-CN': 'basic_adversarial',
+        'adversarial-ignore': 'basic_adversarial',
+        'adversarial-ignore-CN': 'basic_adversarial',
+        'adversarial-doubt': 'basic_adversarial',
+        'adversarial-doubt-CN': 'basic_adversarial',
+        'zero-shot-IcL': 'icl',
+        'zero-shot-IcL-CN': 'icl',
+        'one-shot-IcL': 'icl',
+        'one-shot-IcL-CN': 'icl',
+        'three-shot-IcL': 'icl',
+        'three-shot-IcL-CN': 'icl',
+        'zero-shot-CoT': 'cot',
+        'zero-shot-CoT-CN': 'cot',
+        'manual-CoT': 'cot',
+        'manual-CoT-CN': 'cot'
+    }
+    task_to_error_module_map = {
+        # association/
+        # correlation/
+        'CORR-B_correlation_CN': 'CLADDER',
+        'CORR-B_correlation_EN': 'CLADDER',
+        # explaining_away_effect/
+        'EAE-B_exp-away_CN': 'CLADDER',
+        'EAE-B_exp-away_EN': 'CLADDER',
+        # causal_discovery/
+        # abstract_reasoning/
+        'AR-B_CaLM-AR_CN': 'AR-B_CaLM-AR',
+        'AR-B_CaLM-AR_EN': 'AR-B_CaLM-AR',
+        # causal_attribution/
+        'CA-B_FA_CN': 'CA-B',
+        'CA-B_FA_EN': 'CA-B',
+        'CA-B_FP_CN': 'CA-B',
+        'CA-B_FP_EN': 'CA-B',
+        # event_causality_identification/
+        'ECI-B_CTB_CN': 'ECI',
+        'ECI-B_CTB_EN': 'ECI',
+        'ECI-B_ESC_CN': 'ECI',
+        'ECI-B_ESC_EN': 'ECI',
+        'ECI-B_MAVEN-ERE_CN': 'ECI',
+        'ECI-B_MAVEN-ERE_EN': 'ECI',
+        # pairwise_causal_discovery/
+        'PCD-B_COPA_CN': 'PCD-B',
+        'PCD-B_COPA_EN': 'PCD-B',
+        'PCD-B_E-CARE_CN': 'PCD-B',
+        'PCD-B_E-CARE_EN': 'PCD-B',
+        'PCD-C_COPA_CN': 'PCD-C',
+        'PCD-C_COPA_EN': 'PCD-C',
+        'PCD-C_E-CARE_CN': 'PCD-C',
+        'PCD-C_E-CARE_EN': 'PCD-C',
+        # counterfactual/
+        # actual_causality/
+        'AC-B_causal_judgement_CN': 'AC-B_causal_judgement',
+        'AC-B_causal_judgement_EN': 'AC-B_causal_judgement',
+        # counterfactual_reasoning/
+        'CR-B_det-counterfactual_CN': 'CLADDER',
+        'CR-B_det-counterfactual_EN': 'CLADDER',
+        'CR-C_CRASS_CN': 'CR-C_CRASS',
+        'CR-C_CRASS_EN': 'CR-C_CRASS',
+        # effect_of_the_treatment_on_the_treated/
+        'ETT-B_ETT-natural_CN': 'Natural',
+        'ETT-B_ETT-natural_EN': 'Natural',
+        'ETT-P_ETT-basic_CN': 'Probability',
+        'ETT-P_ETT-basic_EN': 'Probability',
+        'ETT-P_ETT-hard_CN': 'Probability',
+        'ETT-P_ETT-hard_EN': 'Probability',
+        # natural_direct_effect/
+        'NDE-B_NDE-natural_CN': 'Natural',
+        'NDE-B_NDE-natural_EN': 'Natural',
+        'NDE-P_NDE-basic_CN': 'Probability',
+        'NDE-P_NDE-basic_EN': 'Probability',
+        'NDE-P_NDE-hard_CN': 'Probability',
+        'NDE-P_NDE-hard_EN': 'Probability',
+        # natural_indirect_effect/
+        'NIE-B_NIE-natural_CN': 'Natural',
+        'NIE-B_NIE-natural_EN': 'Natural',
+        'NIE-P_NIE-basic_CN': 'Probability',
+        'NIE-P_NIE-basic_EN': 'Probability',
+        'NIE-P_NIE-hard_CN': 'Probability',
+        'NIE-P_NIE-hard_EN': 'Probability',
+        # probability_of_necessity/
+        'PN-P_PN-basic_CN': 'Probability',
+        'PN-P_PN-basic_EN': 'Probability',
+        'PN-P_PN-hard_CN': 'Probability',
+        'PN-P_PN-hard_EN': 'Probability',
+        # probability_of_sufficiency/
+        'PS-P_PS-basic_CN': 'Probability',
+        'PS-P_PS-basic_EN': 'Probability',
+        'PS-P_PS-hard_CN': 'Probability',
+        'PS-P_PS-hard_EN': 'Probability',
+        # intervention/
+        # average_treatment_effect/
+        'ATE-B_ATE-natural_CN': 'Natural',
+        'ATE-B_ATE-natural_EN': 'Natural',
+        'ATE-P_ATE-basic_CN': 'Probability',
+        'ATE-P_ATE-basic_EN': 'Probability',
+        'ATE-P_ATE-hard_CN': 'Probability',
+        'ATE-P_ATE-hard_EN': 'Probability',
+        # backdoor_adjustment_set/
+        'BAS-B_backadj_CN': 'CLADDER',
+        'BAS-B_backadj_EN': 'CLADDER',
+        'BAS-C_max-BAS_CN': 'AS',
+        'BAS-C_max-BAS_EN': 'AS',
+        'BAS-C_min-BAS_CN': 'AS',
+        'BAS-C_min-BAS_EN': 'AS',
+        'BAS-C_mix-BAS_CN': 'AS',
+        'BAS-C_mix-BAS_EN': 'AS',
+        # causal_effect_identification/
+        'CEI-B_0.2-UC_CN': 'CEI-B',
+        'CEI-B_0.2-UC_EN': 'CEI-B',
+        'CEI-B_0.4-UC_CN': 'CEI-B',
+        'CEI-B_0.4-UC_EN': 'CEI-B',
+        'CEI-B_0.6-UC_CN': 'CEI-B',
+        'CEI-B_0.6-UC_EN': 'CEI-B',
+        'CEI-B_0.8-UC_CN': 'CEI-B',
+        'CEI-B_0.8-UC_EN': 'CEI-B',
+        # collider_bias/
+        'CB-B_collider-bias_CN': 'CLADDER',
+        'CB-B_collider-bias_EN': 'CLADDER',
+        # controlled_direct_effect/
+        'CDE-B_CDE-natural_CN': 'Natural',
+        'CDE-B_CDE-natural_EN': 'Natural',
+        'CDE-P_CDE-basic_CN': 'Probability',
+        'CDE-P_CDE-basic_EN': 'Probability',
+        'CDE-P_CDE-hard_CN': 'Probability',
+        'CDE-P_CDE-hard_EN': 'Probability',
+        # frontdoor_adjustment_set/
+        'FAS-C_FAS_CN': 'AS',
+        'FAS-C_FAS_EN': 'AS',
+        # instrumental_variable/
+        'IV-C_CaLM-IV_CN': 'AS',
+        'IV-C_CaLM-IV_EN': 'AS',
+    }
+
+    error_task_module_name = task_to_error_module_map.get(task)
+    error_prompt_module_name = prompt_style_to_error_module_map.get(
+        prompt_style)
+
+    if error_task_module_name and error_prompt_module_name:
+        error_module = importlib.import_module(
+            f'opencompass.datasets.calm.evaluation.error.{error_prompt_module_name}.{error_task_module_name}'
+        )
+        return error_module
+    else:
+        raise NotImplementedError(
+            f'No get_score function found for task {task} and prompt {prompt_style}.'
+        )
+
+
+def identify_model_errors(items, task, prompt_style, gt_items):
+    """Identify errors in model responses based on provided items, task, and
+    prompt style.
+
+    Args:
+        items (list): A list of items containing model responses.
+        task (str): The task type, note that CEG-O_E-CARE is not supported for error analysis.
+        prompt_style (str): The style of prompt used, note that explicit-function is not supported for error analysis.
+        gt_items (list): A list of ground truth items.
+
+    Returns:
+        dict: A dictionary containing error metrics for the model responses. (Same response to all questions, language inconsistency, limitation of instruction-following, repetition, empty response.)
+    """
+    if task == 'CEG-O_E-CARE' or prompt_style in [
+            'explicit-function', 'explicit-function-CN'
+    ]:
+        print(
+            'CEG-O_E-CARE and explicit-function prompts are not supported for error identification.'
+        )
+        return
+
+    language_error, nonstandrad, repetition, empty = 0., 0., 0., 0.
+    error_module = initialize_error_identification_components(
+        task, prompt_style)
+    get_gt_label, get_pred_label, compute_acc = initialize_core_metric_evaluation_components(
+        task)
+    pred_list = []
+
+    for item, gt_item in zip(items, gt_items):
+        pred_label = get_pred_label(item, gt_item, prompt_style,
+                                    task.split('-')[0])
+        pred_error = get_item_error(item, task, error_module, prompt_style)
+
+        pred_list.append(pred_label)
+        language_error += pred_error['language_error']
+        nonstandrad += pred_error['nonstandrad']
+        repetition += pred_error['repetition']
+        empty += pred_error['empty']
+
+    abnormalities = error_module.check_abnormality(pred_list)
+
+    return {
+        'Same response to all questions': 1 if abnormalities != 0 else 0,
+        'Language inconsistency': language_error / len(pred_list),
+        'Limitation of instruction-following': nonstandrad / len(pred_list),
+        'Repetition': repetition / len(pred_list),
+        'Empty response': empty / len(pred_list),
+    }
+
+
+def get_item_error(model_response, task, error_module, prompt_style):
+    """Analyze errors in a single model response for a given task and prompt
+    style.
+
+    Args:
+        model_response (str): The model's response to analyze.
+        task (str): The task type.
+        error_module: The error module containing error identification methods.
+        prompt_style (str): The style of prompt used.
+
+    Returns:
+        dict: A dictionary containing error metrics for the model response. (Language inconsistency, nonstandardization, repetition, empty response.)
+    """
+    model_response = model_response.strip().lower()
+    if 'CN' in task:
+        language_error = error_module.contains_english(model_response)
+    elif 'CN' not in task:
+        language_error = error_module.contains_chinese(model_response)
+
+    nonstandrad = error_module.check_standalization(model_response,
+                                                    prompt_style,
+                                                    type=task.split('-')[0])
+
+    repetition = error_module.check_repetition(model_response)
+
+    empty = error_module.check_empty(model_response)
+
+    return {
+        'language_error': language_error,
+        'nonstandrad': nonstandrad,
+        'repetition': repetition,
+        'empty': empty,
+    }
--- a/opencompass/datasets/calm/evaluation/labeling/AC-B_causal_judgement.py
+++ b/opencompass/datasets/calm/evaluation/labeling/AC-B_causal_judgement.py
@ -0,0 +1,57 @@
+# flake8: noqa: E501
+from .common_answers import (common_false_list, common_start_false_dict,
+                             common_start_true_dict, common_true_list)
+
+
+def get_gt_label(item):
+    if item['gt_answer'] == 'Yes':
+        gt_label = 1
+    elif item['gt_answer'] == 'No':
+        gt_label = 0
+    return gt_label
+
+
+def get_pred_label(model_response, item, prompt_style, type):
+    model_response = model_response.strip().lower()
+    low_index = len(model_response)
+    start_str1_dict = common_start_true_dict
+    start_str2_dict = common_start_false_dict
+
+    start_option1_list, start_option2_list = [], []
+    # some of the model will give response containing the question,
+    # we usually preprocess the response to remove the question part,
+    # but sometimes due to the model's response format, some of the
+    # question part is not removed, so here we are checking the response
+    # with the question part as well.
+    for key in start_str1_dict.keys():
+        for str1 in start_str1_dict[key]:
+            for i in range(key, len(str1) + 1):
+                start_option1_list.append(str1[-i:])
+    for key in start_str2_dict.keys():
+        for str2 in start_str2_dict[key]:
+            for i in range(key, len(str2) + 1):
+                start_option2_list.append(str2[-i:])
+
+    inner_option1_list = common_true_list
+    inner_option2_list = common_false_list
+    if '- yes' in model_response and '- no' in model_response \
+            or '- 是' in model_response and '- 否' in model_response:
+        label = -1
+    elif model_response.startswith(tuple(start_option1_list)):
+        label = 1
+    elif model_response.startswith(tuple(start_option2_list)):
+        label = 0
+    elif any(
+            model_response.find(option) > -1 and
+        (low_index := min(low_index, model_response.find(option))) > -1
+            for option in inner_option1_list):
+        label = 1
+        if any(option in model_response
+               and model_response.find(option) < low_index
+               for option in inner_option2_list):
+            label = 0
+    elif any(response in model_response for response in inner_option2_list):
+        label = 0
+    else:
+        return -1
+    return label
--- a/opencompass/datasets/calm/evaluation/labeling/AR-B_CaLM-AR.py
+++ b/opencompass/datasets/calm/evaluation/labeling/AR-B_CaLM-AR.py
@ -0,0 +1,47 @@
+# flake8: noqa: E501
+from .common_answers import (common_false_list, common_start_false_dict,
+                             common_start_true_dict, common_true_list)
+
+
+def get_gt_label(item):
+    return item['gt_answer']
+
+
+def get_pred_label(model_response, item, prompt_style, type):
+    model_response = model_response.strip().lower()
+    low_index = len(model_response)
+    start_str1_dict = common_start_true_dict
+    start_str2_dict = common_start_false_dict
+    start_option1_list, start_option2_list = [], []
+    # some of the model will give response containing the question,
+    # we usually preprocess the response to remove the question part,
+    # but sometimes due to the model's response format, some of the
+    # question part is not removed, so here we are checking the
+    # response with the question part as well.
+    for key1, key2 in zip(start_str1_dict.keys(), start_str2_dict.keys()):
+        for str1, str2 in zip(start_str1_dict[key1], start_str2_dict[key2]):
+            for i in range(key1, len(str1) + 1):
+                start_option1_list.append(str1[-i:])
+            for i in range(key2, len(str2) + 1):
+                start_option2_list.append(str2[-i:])
+
+    inner_option1_list = common_true_list
+    inner_option2_list = common_false_list
+    if model_response.startswith(tuple(start_option1_list)):
+        label = 1
+    elif model_response.startswith(tuple(start_option2_list)):
+        label = 0
+    elif any(model_response.find(option)>-1 and (low_index := min(low_index, model_response.find(option))) > -1 for option in inner_option1_list) \
+            or 'yes' in model_response and ('causes' in model_response or 'does cause' in model_response) \
+            or '是' in model_response and '会导致' in model_response:
+        label = 1
+        if any(option in model_response
+               and model_response.find(option) < low_index
+               for option in inner_option2_list):
+            label = 0
+    elif any(response in model_response for response in inner_option2_list) \
+            or '否' in model_response and '不会导致' in model_response:
+        label = 0
+    else:
+        return -1
+    return label
--- a/opencompass/datasets/calm/evaluation/labeling/AS.py
+++ b/opencompass/datasets/calm/evaluation/labeling/AS.py
@ -0,0 +1,115 @@
+# flake8: noqa: E501
+from .common_answers import (common_option_1_list, common_option_2_list,
+                             common_option_3_list, common_start_op1_dict,
+                             common_start_op2_dict, common_start_op3_dict)
+
+
+def get_gt_label(item):
+    return int(item['gt_answer'])
+
+
+def get_pred_label(model_response, item, prompt_style, type):
+    model_response = model_response.strip().lower()
+    low_index = len(model_response)
+    Answer1 = item['option1'].strip().lower()
+    Answer2 = item['option2'].strip().lower()
+    Answer3 = item['option3'].strip().lower()
+    start_str1_dict = {
+        **common_start_op1_dict,
+        len(Answer1) - 1: [
+            f'答案（选项一或选项二或选项三？）：{Answer1[:-1]}',
+            f'答案（选项一或选项二或选项三？）： {Answer1[:-1]}',
+            f'answer (option 1 or 2 or 3?):{Answer1[:-1]}',
+            f'answer (option 1 or 2 or 3?): {Answer1[:-1]}'
+        ]
+    }
+    start_str2_dict = {
+        **common_start_op2_dict,
+        len(Answer2) - 1: [
+            f'答案（选项一或选项二或选项三？）：{Answer2[:-1]}',
+            f'答案（选项一或选项二或选项三？）： {Answer2[:-1]}',
+            f'answer (option 1 or 2 or 3?):{Answer2[:-1]}',
+            f'answer (option 1 or 2 or 3?): {Answer2[:-1]}'
+        ]
+    }
+    start_str3_dict = {
+        **common_start_op3_dict,
+        len(Answer3) - 1: [
+            f'答案（选项一或选项二或选项三？）：{Answer3[:-1]}',
+            f'答案（选项一或选项二或选项三？）： {Answer3[:-1]}',
+            f'answer (option 1 or 2 or 3?):{Answer3[:-1]}',
+            f'answer (option 1 or 2 or 3?): {Answer3[:-1]}'
+        ]
+    }
+
+    start_option1_list, start_option2_list, start_option3_list = [], [], []
+    # some of the model will give response containing the question, we usually
+    # preprocess the response to remove the question part, but sometimes due to
+    # the model's response format, some of the question part is not removed, so
+    # here we are checking the response with the question part as well.
+    for key1, key2, key3 in zip(start_str1_dict.keys(), start_str2_dict.keys(),
+                                start_str3_dict.keys()):
+        for str1, str2, str3 in zip(start_str1_dict[key1],
+                                    start_str2_dict[key2],
+                                    start_str3_dict[key3]):
+            for i in range(key1, len(str1) + 1):
+                start_option1_list.append(str1[-i:])
+            for i in range(key2, len(str2) + 1):
+                start_option2_list.append(str2[-i:])
+            for i in range(key3, len(str3) + 1):
+                start_option3_list.append(str3[-i:])
+
+    inner_option1_list = [
+        'answer (option 1 or 2 or 3 ?): {}'.format(Answer1[:-1]),
+        '(option 1 or 2 or 3?): {}'.format({Answer1[:-1]})
+    ] + common_option_1_list
+    inner_option2_list = [
+        'answer (option 1 or 2 or 3 ?): {}'.format(Answer2[:-1]),
+        '(option 1 or 2 or 3?): {}'.format({Answer2[:-1]})
+    ] + common_option_2_list
+    inner_option3_list = [
+        'answer (option 1 or 2 or 3 ?): {}'.format(Answer3[:-1]),
+        '(option 1 or 2 or 3?): {}'.format({Answer3[:-1]})
+    ] + common_option_3_list
+
+    if any(option in model_response for option in ['选项一或选项二','选项二或选项三','option 1 or option 2', 'option2 or option 3']) \
+        or 'option 1' in model_response and 'option 2' in model_response and 'option 3' in model_response \
+        or '选项一' in model_response and '选项二' in model_response and '选项三' in model_response \
+        or len(model_response) == 0:
+        return -1
+    elif model_response.startswith(tuple(start_option1_list)) \
+        or any(Answer1 == option for option in [model_response]) \
+        or len(Answer1) > 1 and len(model_response) > 0 and (model_response in Answer1):
+        label = 1
+    elif model_response.startswith(tuple(start_option2_list)) \
+        or any(Answer2 == option for option in [model_response]) \
+        or len(Answer2) > 1 and len(model_response) > 0 and (model_response in Answer2):
+        label = 2
+    elif model_response.startswith(tuple(start_option3_list)) \
+        or any(Answer3 == option for option in [model_response]) \
+        or len(Answer3) > 1 and len(model_response) > 0 and (model_response in Answer3):
+        label = 3
+    elif any(model_response.find(option)>-1 and (low_index:=min(low_index, model_response.find(option)))>-1 for option in inner_option1_list)\
+        or '正确答案' in model_response and ('选项一' in model_response):
+        label = 1
+        if any(option in model_response
+               and model_response.find(option) < low_index
+               for option in inner_option2_list):
+            label = 2
+            if any(option in model_response
+                   and model_response.find(option) < low_index
+                   for option in inner_option3_list):
+                label = 3
+    elif any(model_response.find(option) > -1 for option in inner_option2_list)\
+        or '正确答案' in model_response and ('选项二' in model_response):
+        label = 2
+        if any(option in model_response
+               and model_response.find(option) < low_index
+               for option in inner_option3_list):
+            label = 3
+    elif any(model_response.find(option) > -1 for option in inner_option3_list)\
+        or '正确答案' in model_response and ('选项三' in model_response):
+        label = 3
+    else:
+        return -1
+    return label
--- a/opencompass/datasets/calm/evaluation/labeling/CA-B_FA.py
+++ b/opencompass/datasets/calm/evaluation/labeling/CA-B_FA.py
@ -0,0 +1,46 @@
+# flake8: noqa: E501
+from .common_answers import (common_false_list, common_start_false_dict,
+                             common_start_true_dict, common_true_list)
+
+
+def get_gt_label(item):
+    return item['gt_answer']
+
+
+def get_pred_label(model_response, item, prompt_style, type):
+    model_response = model_response.strip().lower()
+    low_index = len(model_response)
+    start_str1_dict = common_start_true_dict
+    start_str2_dict = common_start_false_dict
+
+    start_option1_list, start_option2_list = [], []
+    # some of the model will give response containing the question, we usually preprocess the response to remove the question part, but sometimes due to the model's response format, some of the question part is not removed, so here we are checking the response with the question part as well.
+    for key1, key2 in zip(start_str1_dict.keys(), start_str2_dict.keys()):
+        for str1, str2 in zip(start_str1_dict[key1], start_str2_dict[key2]):
+            for i in range(key1, len(str1) + 1):
+                start_option1_list.append(str1[-i:])
+            for i in range(key2, len(str2) + 1):
+                start_option2_list.append(str2[-i:])
+
+    inner_option1_list = [
+        'serves as the parent node of', 'serves as a parent node of'
+    ] + common_true_list
+    inner_option2_list = common_false_list
+    if model_response.startswith(tuple(start_option1_list)):
+        label = 1
+    elif model_response.startswith(tuple(start_option2_list)):
+        label = 0
+    elif any(model_response.find(option)>-1 and (low_index:=min(low_index, model_response.find(option)))>-1 for option in inner_option1_list) \
+        or 'yes' in model_response and ('is the ancestor of' in model_response or 'is an ancestor of' in model_response or 'serves as the ancestor node of' in model_response or 'serves as an ancestor node of' in model_response) \
+        or '是' in model_response and '祖先节点' in model_response:
+        label = 1
+        if any(option in model_response
+               and model_response.find(option) < low_index
+               for option in inner_option2_list):
+            label = 0
+    elif any(response in model_response for response in inner_option2_list)\
+        or '不是' in model_response and '祖先节点' in model_response:
+        label = 0
+    else:
+        return -1
+    return label
--- a/opencompass/datasets/calm/evaluation/labeling/CA-B_FP.py
+++ b/opencompass/datasets/calm/evaluation/labeling/CA-B_FP.py
@ -0,0 +1,49 @@
+# flake8: noqa: E501
+from .common_answers import (common_false_list, common_start_false_dict,
+                             common_start_true_dict, common_true_list)
+
+
+def get_gt_label(item):
+    return item['gt_answer']
+
+
+def get_pred_label(model_response, item, prompt_style, type):
+    model_response = model_response.strip().lower()
+    low_index = len(model_response)
+    start_str1_dict = common_start_true_dict
+    start_str2_dict = common_start_false_dict
+
+    start_option1_list, start_option2_list = [], []
+    # some of the model will give response containing the question, we usually
+    # preprocess the response to remove the question part, but sometimes due to
+    # the model's response format, some of the question part is not removed, so
+    # here we are checking the response with the question part as well.
+    for key1, key2 in zip(start_str1_dict.keys(), start_str2_dict.keys()):
+        for str1, str2 in zip(start_str1_dict[key1], start_str2_dict[key2]):
+            for i in range(key1, len(str1) + 1):
+                start_option1_list.append(str1[-i:])
+            for i in range(key2, len(str2) + 1):
+                start_option2_list.append(str2[-i:])
+
+    inner_option1_list = [
+        'serves as the parent node of', 'serves as a parent node of'
+    ] + common_true_list
+    inner_option2_list = common_false_list
+    if model_response.startswith(tuple(start_option1_list)):
+        label = 1
+    elif model_response.startswith(tuple(start_option2_list)):
+        label = 0
+    elif any(model_response.find(option)>-1 and (low_index := min(low_index, model_response.find(option))) > -1 for option in inner_option1_list) \
+        or 'yes' in model_response and 'is the parent of' in model_response \
+            or '是' in model_response and '父节点' in model_response:
+        label = 1
+        if any(option in model_response
+               and model_response.find(option) < low_index
+               for option in inner_option2_list):
+            label = 0
+    elif any(response in model_response for response in inner_option2_list)\
+            or ('不是' in model_response and '父节点' in model_response):
+        label = 0
+    else:
+        return -1
+    return label
--- a/opencompass/datasets/calm/evaluation/labeling/CEG-O_E-CARE.py
+++ b/opencompass/datasets/calm/evaluation/labeling/CEG-O_E-CARE.py
@ -0,0 +1,6 @@
+def get_gt_label(item):
+    return item['gt_answer']
+
+
+def get_pred_label(model_response, item, prompt_style, type):
+    return model_response
--- a/opencompass/datasets/calm/evaluation/labeling/CEI-B.py
+++ b/opencompass/datasets/calm/evaluation/labeling/CEI-B.py
@ -0,0 +1,93 @@
+# flake8: noqa: E501
+from .common_answers import (common_false_list, common_start_false_dict,
+                             common_start_true_dict, common_true_list)
+
+
+def get_gt_label(item):
+    return item['gt_answer']
+
+
+def get_pred_label(model_response, item, prompt_style, type):
+    model_response = model_response.strip().lower()
+
+    low_index = len(model_response)
+    start_str1_dict = common_start_true_dict
+    start_str2_dict = common_start_false_dict
+
+    start_option1_list, start_option2_list = [], []
+    # some of the model will give response containing the question,
+    # we usually preprocess the response to remove the question part,
+    # but sometimes due to the model's response format, some of the
+    # question part is not removed, so here we are checking the response
+    # with the question part as well.
+    for key in start_str1_dict.keys():
+        for str1 in start_str1_dict[key]:
+            for i in range(key, len(str1) + 1):
+                start_option1_list.append(str1[-i:])
+    for key in start_str2_dict.keys():
+        for str2 in start_str2_dict[key]:
+            for i in range(key, len(str2) + 1):
+                start_option2_list.append(str2[-i:])
+
+    inner_option1_list = [
+        'can be identified', '可以被识别', '能被识别', 'answer (yes or no?): yes',
+        'answer is yes', "\"yes\"", 'answer: yes', 'answer is: yes',
+        'answer is:\n\nyes', 'answer is:\nyes', 'is identified.',
+        'can be identified', '可以被识别', '能被识别', '答案是:是', '答案是:\n\n是', '答案是:\n是',
+        '答案:是', '答案是是', "\"是\"", '是的', '答案为“是”', '答案是“是”', '可以识别', '答案：是',
+        '答案：可以', '答案：“是”', 'thus answering yes', 'henceforth; answering yes',
+        'by answering yes', 'answeristheyes', 'answer would be yes',
+        'answer (yes)', 'hence answering yes', 'hence my answer yes',
+        'answer would definitely become yes', 'answer remains yes',
+        "my answer was 'yes'", 'thus concludes our answer yes',
+        'must answer yes', "answer should be 'yes'", "answer remains 'yes'",
+        'henceforth answering yes', 'answer should be marked yes',
+        'answer comes out yes', "should answer 'yes",
+        'our answer should be yes', 'you should answer yes',
+        'concluding answer - yes', 'answer should indeed say yes',
+        'answer : yes', 'answer should also be yes', 'hence answering yes',
+        'the answer is trivially yes', 'answer:  yes', 'the answer is (yes)',
+        '答案应为“是”'
+    ] + common_true_list
+    inner_option2_list = [
+        'not identified', '不能被识别', '无法被识别', 'answer (yes or no?): no',
+        'answer is no', "\"no\"", 'answer: no', 'answer is: no',
+        'answer is:\n\nno', 'answer is:\nno', 'not identified', '不能被识别',
+        '无法被识别', '答案是:否', '答案是:\n\n否', '答案是:\n否', '答案:否', '答案是否', "\"否\"",
+        '回答是:否', '答案为“否”', '答案是“否”', '因果效应不可被识别', '答案：否', '答案：无法识别',
+        '不存在可识别的因果效应', "doesn't have a causal relationship",
+        'the correct answer should be no', 'answer would be no',
+        'hence answering no', "answering your query 'no'",
+        'therefore answering no', 'answer would be “no”', 'thus answering no',
+        'this answers no', 'thus, answering no', 'answer should also be no',
+        'answer would also turn out to be no', 'answer would have to be no',
+        'answer would be – no', 'thus answering “no”', 'answer = no',
+        'answer should be no', 'answer would definitely be no',
+        'answer would need to be no', 'answer would need to be marked no',
+        'hence why i answered “no', "hence answering 'no'",
+        'answer must necessarily remain no', 'answer should marked no',
+        'answer would most likely be no', 'answer would also be no',
+        'answer for now might have to be `no`', 'henceforth - answer no',
+        'answer could only be no', 'answer would also be no',
+        'henceforth answering “no', 'answer would be no', 'hence answering no',
+        'cannot be identified', 'answer (yes or no ?): no', '答案为“不”',
+        'henceforth answering no', '答案为:否', '答案应该是“否', '因果效应不可被'
+    ] + common_false_list
+    if model_response.startswith(tuple(start_option1_list)):
+        label = 1
+    elif model_response.startswith(tuple(start_option2_list)):
+        label = 0
+    elif any(
+            model_response.find(option) > -1 and
+        (low_index := min(low_index, model_response.find(option))) > -1
+            for option in inner_option1_list):
+        label = 1
+        if any(option in model_response
+               and model_response.find(option) < low_index
+               for option in inner_option2_list):
+            label = 0
+    elif any(response in model_response for response in inner_option2_list):
+        label = 0
+    else:
+        return -1
+    return label
--- a/opencompass/datasets/calm/evaluation/labeling/CLADDER.py
+++ b/opencompass/datasets/calm/evaluation/labeling/CLADDER.py
@ -0,0 +1,58 @@
+# flake8: noqa: E501
+from .common_answers import (common_false_list, common_start_false_dict,
+                             common_start_true_dict, common_true_list)
+
+
+def get_gt_label(item):
+    if item['gt_answer'] == 'yes':
+        gt_label = 1
+    elif item['gt_answer'] == 'no':
+        gt_label = 0
+    return gt_label
+
+
+def get_pred_label(model_response, item, prompt_style, type):
+    model_response = model_response.strip().lower()
+    low_index = len(model_response)
+    start_str1_dict = common_start_true_dict
+    start_str2_dict = common_start_false_dict
+
+    start_option1_list, start_option2_list = [], []
+    # some of the model will give response containing the question,
+    # we usually preprocess the response to remove the question part,
+    # but sometimes due to the model's response format, some of the
+    # question part is not removed, so here we are checking the
+    # response with the question part as well.
+    for key in start_str1_dict.keys():
+        for str1 in start_str1_dict[key]:
+            for i in range(key, len(str1) + 1):
+                start_option1_list.append(str1[-i:])
+    for key in start_str2_dict.keys():
+        for str2 in start_str2_dict[key]:
+            for i in range(key, len(str2) + 1):
+                start_option2_list.append(str2[-i:])
+
+    inner_option1_list = ['method 1 is more correct', '使用方法1更准确'
+                          ] + common_true_list
+    inner_option2_list = [
+        'method 2 is more correct', 'method 2 is correct',
+        'correct to use method 2', '方法2比方法1更准确', '方法2'
+    ] + common_false_list
+    if model_response.startswith(tuple(start_option1_list)):
+        label = 1
+    elif model_response.startswith(tuple(start_option2_list)):
+        label = 0
+    elif any(
+            model_response.find(option) > -1 and
+        (low_index := min(low_index, model_response.find(option))) > -1
+            for option in inner_option1_list):
+        label = 1
+        if any(option in model_response
+               and model_response.find(option) < low_index
+               for option in inner_option2_list):
+            label = 0
+    elif any(response in model_response for response in inner_option2_list):
+        label = 0
+    else:
+        return -1
+    return label
--- a/opencompass/datasets/calm/evaluation/labeling/CR-C_CRASS.py
+++ b/opencompass/datasets/calm/evaluation/labeling/CR-C_CRASS.py
@ -0,0 +1,152 @@
+# flake8: noqa: E501
+from .common_answers import (common_option_1_list, common_option_2_list,
+                             common_option_3_list, common_option_4_list,
+                             common_start_op1_dict, common_start_op2_dict,
+                             common_start_op3_dict, common_start_op4_dict)
+
+
+def get_gt_label(item):
+    return int(item['gt_answer'])
+
+
+def get_pred_label(model_response, item, prompt_style, type):
+    model_response = model_response.strip().lower()
+    low_index = len(model_response)
+    Answer1 = item['Answer1'].strip().lower()
+    Answer2 = item['Answer2'].strip().lower()
+    Answer3 = item['Answer3'].strip().lower()
+    Answer4 = item['Answer4'].strip().lower()
+
+    start_str1_dict = {
+        **common_start_op1_dict,
+        len(Answer1) - 1: [
+            f'答案（选项一或选项二或选项三或选项四？）：{Answer1[:-1]}',
+            f'答案（选项一或选项二或选项三或选项四？）： {Answer1[:-1]}',
+            f'answer (option 1 or 2 or 3 or 4?):{Answer1[:-1]}',
+            f'answer (option 1 or 2 or 3 or 4?): {Answer1[:-1]}'
+        ]
+    }
+    start_str2_dict = {
+        **common_start_op2_dict,
+        len(Answer2) - 1: [
+            f'答案（选项一或选项二或选项三或选项四？）：{Answer2[:-1]}',
+            f'答案（选项一或选项二或选项三或选项四？）： {Answer2[:-1]}',
+            f'answer (option 1 or 2 or 3 or 4?):{Answer2[:-1]}',
+            f'answer (option 1 or 2 or 3 or 4?): {Answer2[:-1]}'
+        ]
+    }
+    start_str3_dict = {
+        **common_start_op3_dict,
+        len(Answer3) - 1: [
+            f'答案（选项一或选项二或选项三或选项四？）：{Answer3[:-1]}',
+            f'答案（选项一或选项二或选项三或选项四？）： {Answer3[:-1]}',
+            f'answer (option 1 or 2 or 3 or 4?):{Answer3[:-1]}',
+            f'answer (option 1 or 2 or 3 or 4?): {Answer3[:-1]}'
+        ]
+    }
+    start_str4_dict = {
+        **common_start_op4_dict,
+        len(Answer4) - 1: [
+            f'答案（选项一或选项二或选项三或选项四？）：{Answer4[:-1]}',
+            f'答案（选项一或选项二或选项三或选项四？）： {Answer4[:-1]}',
+            f'answer (option 1 or 2 or 3 or 4?):{Answer4[:-1]}',
+            f'answer (option 1 or 2 or 3 or 4?): {Answer4[:-1]}'
+        ]
+    }
+
+    start_option1_list,start_option2_list,start_option3_list,start_option4_list = [],[],[],[]
+    # some of the model will give response containing the question, we usually preprocess the response to remove the question part, but sometimes due to the model's response format, some of the question part is not removed, so here we are checking the response with the question part as well.
+    for key1, key2, key3, key4 in zip(start_str1_dict.keys(),
+                                      start_str2_dict.keys(),
+                                      start_str3_dict.keys(),
+                                      start_str4_dict.keys()):
+        for str1, str2, str3, str4 in zip(start_str1_dict[key1],
+                                          start_str2_dict[key2],
+                                          start_str3_dict[key3],
+                                          start_str4_dict[key4]):
+            for i in range(key1, len(str1) + 1):
+                start_option1_list.append(str1[-i:])
+            for i in range(key2, len(str2) + 1):
+                start_option2_list.append(str2[-i:])
+            for i in range(key3, len(str3) + 1):
+                start_option3_list.append(str3[-i:])
+            for i in range(key4, len(str4) + 1):
+                start_option4_list.append(str4[-i:])
+
+    inner_option1_list = [
+        'answer (option 1 or 2 or 3 or 4 ?): {}'.format(Answer1[:-1]),
+        '(option 1 or 2 or 3 or 4?): {}'.format({Answer1[:-1]})
+    ] + common_option_1_list
+    inner_option2_list = [
+        'answer (option 1 or 2 or 3 or 4 ?): {}'.format(Answer2[:-1]),
+        '(option 1 or 2 or 3 or 4?): {}'.format({Answer2[:-1]}),
+    ] + common_option_2_list
+    inner_option3_list = [
+        'answer (option 1 or 2 or 3 or 4 ?): {}'.format(Answer3[:-1]),
+        '(option 1 or 2 or 3 or 4?): {}'.format({Answer3[:-1]})
+    ] + common_option_3_list
+    inner_option4_list = [
+        'answer (option 1 or 2 or 3 or 4 ?): {}'.format(Answer4[:-1]),
+        '(option 1 or 2 or 3 or 4?): {}'.format({Answer4[:-1]})
+    ] + common_option_4_list
+
+    if any(option in model_response for option in ['选项一或选项二','选项三或选项四']) \
+        or '选项一' in model_response and '选项二' in model_response and '选项三' in model_response and '选项四' in model_response:
+        return -1
+    elif model_response.startswith(tuple(start_option1_list)) \
+        or any(Answer1 == option for option in [model_response]) \
+        or len(Answer1) > 1 and len(model_response) > 0 and (model_response in Answer1 or Answer1 in model_response):
+        label = 1
+    elif model_response.startswith(tuple(start_option2_list)) \
+        or any(Answer2 == option for option in [model_response]) \
+        or len(Answer2) > 1 and len(model_response) > 0 and (model_response in Answer2 or Answer2 in model_response):
+        label = 2
+    elif model_response.startswith(tuple(start_option3_list)) \
+        or any(Answer3 == option for option in [model_response]) \
+        or len(Answer3) > 1 and len(model_response) > 0 and (model_response in Answer3 or Answer3 in model_response):
+        label = 3
+    elif model_response.startswith(tuple(start_option4_list)) \
+        or any(Answer4 == option for option in [model_response]) \
+        or len(Answer4) > 1 and len(model_response) > 0 and (model_response in Answer4 or Answer4 in model_response):
+        label = 4
+    elif any(
+            model_response.find(option) > -1 and
+        (low_index := min(low_index, model_response.find(option))) > -1
+            for option in inner_option1_list):
+        label = 1
+        if any(option in model_response
+               and model_response.find(option) < low_index
+               for option in inner_option2_list):
+            label = 2
+            if any(option in model_response
+                   and model_response.find(option) < low_index
+                   for option in inner_option3_list):
+                label = 3
+                if any(option in model_response
+                       and model_response.find(option) < low_index
+                       for option in inner_option4_list):
+                    label = 4
+    elif any(
+            model_response.find(option) > -1 for option in inner_option2_list):
+        label = 2
+        if any(option in model_response
+               and model_response.find(option) < low_index
+               for option in inner_option3_list):
+            label = 3
+            if any(option in model_response
+                   and model_response.find(option) < low_index
+                   for option in inner_option4_list):
+                label = 4
+    elif any(
+            model_response.find(option) > -1 for option in inner_option3_list):
+        label = 3
+        if any(option in model_response
+               and model_response.find(option) < low_index
+               for option in inner_option4_list):
+            label = 4
+    elif any(
+            model_response.find(option) > -1 for option in inner_option4_list):
+        label = 4
+    else:
+        return -1
+    return label
--- a/opencompass/datasets/calm/evaluation/labeling/ECI.py
+++ b/opencompass/datasets/calm/evaluation/labeling/ECI.py
@ -0,0 +1,200 @@
+# flake8: noqa: E501
+from .common_answers import (common_false_list, common_start_false_dict,
+                             common_start_true_dict, common_true_list)
+
+
+def get_gt_label(item):
+    return item['gt_answer']
+
+
+def get_pred_label(model_response, item, prompt_style, type):
+    model_response = model_response.strip().lower()
+
+    low_index = len(model_response)
+    start_str1_dict = common_start_true_dict
+    start_str2_dict = common_start_false_dict
+
+    start_option1_list, start_option2_list = [], []
+    # some of the model will give response containing the question, we usually preprocess the response to remove the question part, but sometimes due to the model's response format, some of the question part is not removed, so here we are checking the response with the question part as well.
+    for key in start_str1_dict.keys():
+        for str1 in start_str1_dict[key]:
+            for i in range(key, len(str1) + 1):
+                start_option1_list.append(str1[-i:])
+    for key in start_str2_dict.keys():
+        for str2 in start_str2_dict[key]:
+            for i in range(key, len(str2) + 1):
+                start_option2_list.append(str2[-i:])
+
+    inner_option1_list = [
+        'is a causal relationship', '答案为“会', '答案是「是',
+        'is an independent causal factor', '构成因果关系', '之间存在因果', '是一个因果关系',
+        '体现了因果关系', "因果关系是\"是\"", '因果关系为“是”', '存在一个因果关系', '是因果链关系', '有着因果关系',
+        '具有因果', '是明显的因果关系', '具有“因果”关系', '存在了因果关系', "存在\"因果关系", '存在因果关系',
+        '答案是真', '因此是“是', '答案为“有', '答案是是', '答案为“(是', "答案为\"\"是", '答案是:“是',
+        "答案应为:\"是", "答案应为\"是", '答案为:是', "答案是:\"是", "答案应该是\"是", '答案为“yes',
+        '具有因果关系', '答案是 “是', '答案“是', '答案必须是“yes', '答案处为“是', '答案应是“是', '答案為“是',
+        '答案可以是“是', '答案的是“是', '答案为「是', "案为“\"是", "答案为 \"是", '答案是有', '答案是： 是',
+        '答案为：是', '答案是对', '答案是：是', '答案是：\n是', '答案应为“是', '答案：是', '答案应该是“是',
+        '答案(是或否？)：是的', "答案\"是\"", "答案都是\"是", '答案为是', '答案为 “是', "答案为\"是",
+        '答案为“是', "答案是\"是\"", '答案是“是', '答案是“是”', 'answer (yes or no?): yes',
+        'answer is yes', "\"yes\"", 'answer: yes', 'answer is: yes',
+        'answer is:\n\nyes', 'answer is:\nyes', 'is a causal relationship',
+        '答案为“是”', 'answer (yes )', 'answering your query, yes',
+        'there is a direct causal relationship', 'directly resulted in',
+        'there does appear to be a causal relationship',
+        'there is no explicit statement of a direct causal relationship between',
+        'there is no clear causal relationship', 'leads to', 'are caused by',
+        'yes, there is a suggested causal relationship',
+        'there could be a causal relationship',
+        'there is a potential causal relationship between', 'the result of',
+        'thus yes', '答案是：\nye', 'so yes', 'henceforth answering yes',
+        'hence yes', 'answer (yes)', 'in short, yes', 'hence - yes',
+        'correct response should be yes', 'thus, yes', 'in short - yes',
+        '答案是：\n\nyes', 'leading us to believe yes!', 'hence answering yes',
+        'therefore should read - yes', 'hence y es', 'therefore, yes',
+        'therefore yes', 'the correct response should be marked as “yes.',
+        'thus - yes', '因果关系是明确的', '答案是肯定的',
+        'there is a direct cause-and-effect relationship between',
+        'there is a cause-and-effect relationship between',
+        'there is a clear causal relationship between',
+        'there is a direct relationship with',
+        'there is a direct cause and effect relationship between',
+        'which implies a causal relationship between',
+        'has a causal relationship with', "the answer is \"yes.\"",
+        "therefore, the answer is \"yes,\"", 'answer (yes or no ? ) : yes',
+        'answe: yes', 'is a result of',
+        'this is a direct causal relationship between',
+        'there is a significant causal relationship',
+        'there is a clear cause and effect relationship',
+        'this would be a direct causal relationship',
+        'could be seen as a direct causal relationship between the two events',
+        'this causal relationship is direct',
+        'the causal relationship between the two events is therefore direct',
+        'this indicates a causal relationship',
+        'it is therefore a causal relationship',
+        'this is a direct causal relationship',
+        'this could be a causal relationship',
+        'the answer to this question is yes',
+        'refers to a cause-and-effect relationship',
+        "there's a direct causal relationship",
+        'the causal relationship is implied within',
+        "there's a causal relationship between",
+        'explicitly states a causal relationship',
+        'a causal relationship can be inferred',
+        'be a causal relationship between', 'the answer will be yes',
+        'answer should be yes', 'the answer could be yes',
+        'the answer for you is yes', 'this is the causal relationship between',
+        'indicates a direct causal relationship',
+        'should be a relationship between',
+        'definitely got a causal relationship', "thus answering 'yes'",
+        'thus answering yes', 'thereby answering yes',
+        'answer would thus be yes', "so answering 'yes'",
+        "hence answering 'yes'", "therefore answering 'yes",
+        'confirming our answer yes',
+        'an answer for this question would be yes', 'answer would be: yes',
+        'implying a yes answer', 'making the answer yes',
+        'incident does have a causal relationship',
+        'the cause and effect relationship exists',
+        'there is a direct cause relationship',
+        'must have a causal relationship', 'answer would be yes',
+        'a causal relationship exists between', 'answer(yes',
+        'answer for this question is yes', 'answer (yes',
+        'answer here is `yes`', 'answer might be yes', 'answer is a yes',
+        'the answer yes', 'henceforth – yes', 'thus indicating yes',
+        'hence indicating yes', "it's safe to say yes", "hence it's 'yes'",
+        "thus answering 'yes’", 'so it’s yes', 'thus it can be said yes',
+        'the correct response is yes', 'answering the question with a yes',
+        "the correct answer would be \"yes", "the answer is \"yes”",
+        "answer \"yes", 'the answer as yes', 'the answer to the question yes',
+        'the answer is causality', 'the answer is yes', "the answer is \"yes",
+        '答案是:是', '是因果关系', '有因果关系', '存在因果关', '因果关系存在'
+    ] + common_true_list
+    inner_option2_list = [
+        'there is no causal relationship', 'answer (yes or no?): no',
+        'answer is no', "\"no\"", 'answer: no', 'answer is: no',
+        'answer is:\n\nno', 'answer is:\nno',
+        'there is no causal relationship', '答案为“否”', 'answer (no )',
+        'answering your query - no', 'there is no direct causal',
+        'did not directly cause', 'not the sole or direct cause',
+        'not directly causally', 'is not definitively said to be caused by',
+        'the direction of causation is unclear',
+        'there is not necessarily a causal relationship between', '答案是：no',
+        'so no', 'henceforth answering no', 'hence no', 'answer (no)',
+        'in short, no', 'making our assessment no',
+        'correct response should be no', 'the answ er is no',
+        'thus answering no', 'therefore answering no', 'thus no',
+        'there is no direct cause and effect relationship between',
+        'not directly related',
+        'does not contain any direct cause-and-effect relationship between',
+        'no clear causal connection between',
+        'there is no direct cause-and-effect relationship between',
+        'is not a cause-and-effect relationship',
+        'is not a direct cause-and-effect relationship',
+        'there is not a direct causal relationship between',
+        "the answer is \"no.\"", 'the answer is therefore no',
+        'was not a direct result of',
+        'it is not a cause and effect relationship',
+        'there is no clear relationship between',
+        'there is no scientific evidence to support any specific causal relationship',
+        'there is no evidence to suggest a causal relationship',
+        'is not a causal relationship',
+        'no scientific evidence to support any causal relationship',
+        'does not mention a causal relationship between',
+        'the answer to this question is no',
+        "there isn't a cause-effect relationship between",
+        "there's no causaltiy relationship",
+        "this isn't a causal relationship",
+        'no causal relationship is observed',
+        "there isn't a causal relationship between",
+        "doesn't indicate a cause and effect relationship",
+        "doesn't indicate a causal relationship between", 'answer=no',
+        "don't suggest a causal relationship",
+        'does not indicate a causal relationship',
+        "doesn't provide any causal relationship", 'hence answering no',
+        "hence answering 'no'", 'answer should read : no',
+        'therefore answer would be no', "answers should read 'no",
+        'answer would need to be no', 'answering your above query : no',
+        'answer would be no', "therefore, answering 'no", 'answer:no',
+        'answer should remain no', 'the answer to this question would be no',
+        'answer is:no', "answer is therefore \"no.\"", 'making the answer no',
+        'the cause-and-effect relationship between these two words is not clear',
+        'answer must be no', 'answer is therefore no',
+        'there is no causality between', 'answer(no)', 'answer is, no',
+        "answer might be \"no.\"", 'answer it as no',
+        'should be the answer no', 'answering no', "thus answering 'no'",
+        'thus, no', "therefore 'no'", 'the answer can be no', 'answer is “no',
+        'the answer is mostly no', 'answer is probably not', "answer is \"no",
+        '答案是“否”', '答案（是或否？）：否', "答案是\"否\"", "答案为\"否\"", '答案是否', '答案为“否',
+        '答案为“不', '答案为“没有', '答案“否', '答案为“非”', '答案为“无”', '答案为”否', "答案为 \"否",
+        '答案为否', '答案是\\”否', '答案应该是“否', '答案是：\nno', '答案是：\n否', '答案是：\n不', '答案：否',
+        '答案应为“非', "答案\"否", '答案为**否', '答案在“否', '答案可能为“否', '答案返回“否', "答案为\"否",
+        '答案是“不', '答案应该为“否', "答案为'否", '答案为不存  在', '答案应为“否', '答案为《否', '答案是“无',
+        '答案为\\“否', '答案将是“否', '答案还是“否', '答案：“不', '答案 为“否', '答案应该是否',
+        'the answer is no', '不存在“因果”关系', "答案应为\"否", "答案应该是\"否", '答案是:否',
+        '答案为:否', "答案选择\"否", "答案是:\"否", "答案应该为\"否", "答案应为\"否", '答案选择为:否',
+        '答案为 “否', '答案为“非', '答案为“没', '不存在因果关系', '没有直接因果关系', '没有因果关系',
+        '不是一种因果关系', '不一定具有因果', '因果关系并不明确', '不包含因果关系', '并非因果关系', '因果关系“不存在',
+        '没有推理上的因果关系', '与因果关系无关', '没有明显的因果关系', '没有什么因果关系', '不是一个因果关系',
+        '不属于因果关系', '不能形成因果关系', '没有因果', '无因果关系', '因果关系是不存在', '不存在直接的因果',
+        '没有直接的因果', '因果关系不存在', '没有明确的因果', '不存在因果', '无直接因果',
+        'there is no implication of a causal relationship',
+        'is not a causal story', 'is not a causal factor', '答案是无'
+    ] + common_false_list
+    if model_response.startswith(tuple(start_option1_list)):
+        label = 1
+    elif model_response.startswith(tuple(start_option2_list)):
+        label = 0
+    elif any(
+            model_response.find(option) > -1 and
+        (low_index := min(low_index, model_response.find(option))) > -1
+            for option in inner_option1_list):
+        label = 1
+        if any(option in model_response
+               and model_response.find(option) < low_index
+               for option in inner_option2_list):
+            label = 0
+    elif any(response in model_response for response in inner_option2_list):
+        label = 0
+    else:
+        return -1
+    return label
--- a/opencompass/datasets/calm/evaluation/labeling/Natural.py
+++ b/opencompass/datasets/calm/evaluation/labeling/Natural.py
@ -0,0 +1,60 @@
+# flake8: noqa: E501
+import json
+import re
+
+from .common_answers import (add_quotes_to_unquoted, change_quotation,
+                             is_numeric)
+
+
+def get_gt_label(item):
+    return item['gt_answer'].strip().lower()
+
+
+def extract_answer(model_response, item, prompt_style, type):
+    model_response += '}'
+    if 'CoT' in prompt_style and any(
+            match in type for match in ['NIE', 'NDE', 'ETT', 'CDE', 'ATE']):
+        matches = re.findall(r'\{\"answer\":.*?\}', model_response, re.DOTALL)
+    else:
+        matches = re.findall(r'\{+.*?\}+', model_response,
+                             re.DOTALL | re.IGNORECASE)
+    matched_str = None
+    for match in matches:
+        if match:
+            matched_str = match.lower()
+            if matched_str.startswith('{{') and matched_str.endswith('}}}'):
+                matched_str = matched_str[1:-2]
+            elif matched_str.startswith('{{') and matched_str.endswith('}}'):
+                matched_str = matched_str[1:-1]
+            elif matched_str.startswith('{{') and matched_str.endswith('}'):
+                matched_str = matched_str[1:]
+            elif matched_str.startswith('{') and matched_str.endswith('}}'):
+                matched_str = matched_str[:-1]
+        else:
+            matched_str = None
+
+        if matched_str:
+            try:
+                inner_json_obj = json.loads(matched_str)
+            except json.JSONDecodeError:
+                # If parsing fails, try adding quotes to unquoted words and parse again
+                fixed_json_str = add_quotes_to_unquoted(matched_str)
+                fixed_json_str = change_quotation(fixed_json_str)
+                try:
+                    inner_json_obj = json.loads(fixed_json_str)
+                except:
+                    inner_json_obj = {}
+
+            prob_str_value = inner_json_obj.get('answer', None)
+            if prob_str_value is not None:
+                break
+    if matched_str is None:
+        prob_str_value = None
+
+    return prob_str_value
+
+
+def get_pred_label(model_response, item, prompt_style, type):
+    model_response = model_response.strip().lower()
+    pred = extract_answer(model_response, item, prompt_style, type)
+    return pred
--- a/opencompass/datasets/calm/evaluation/labeling/PCD-B.py
+++ b/opencompass/datasets/calm/evaluation/labeling/PCD-B.py
@ -0,0 +1,66 @@
+# flake8: noqa: E501
+from .common_answers import (common_false_list, common_start_false_dict,
+                             common_start_true_dict, common_true_list)
+
+
+def get_gt_label(item):
+    return item['gt_answer']
+
+
+def get_pred_label(model_response, item, prompt_style, type):
+    model_response = model_response.strip().lower()
+    low_index = len(model_response)
+
+    start_str1_dict = common_start_true_dict
+    start_str2_dict = common_start_false_dict
+
+    start_option1_list, start_option2_list = [], []
+    # some of the model will give response containing the question,
+    # we usually preprocess the response to remove the question part,
+    # but sometimes due to the model's response format, some of the
+    # question part is not removed, so here we are checking the
+    # response with the question part as well.
+    for key in start_str1_dict.keys():
+        for str1 in start_str1_dict[key]:
+            for i in range(key, len(str1) + 1):
+                start_option1_list.append(str1[-i:])
+    for key in start_str2_dict.keys():
+        for str2 in start_str2_dict[key]:
+            for i in range(key, len(str2) + 1):
+                start_option2_list.append(str2[-i:])
+
+    inner_option1_list = [
+        'there is a causal relationship', '存在因果关系', '有因果关系',
+        'answer (yes or no?): yes', 'answer is yes', "\"yes\"", 'answer: yes',
+        'answer is: yes', 'answer is:\n\nyes', 'answer is:\nyes',
+        'there is a causal relationship', '存在因果关系', '存在', '有因果关系', '答案是:是',
+        '答案是:\n\n是', '答案是:\n是', '答案:是', '答案是是', '答案为是', "\"是\"", '是的',
+        '存在明确的因果关系'
+    ] + common_true_list
+    inner_option2_list = [
+        'there is no causal relationship', '不存在因果关系', '没有因果关系', '没有明显的因果关系',
+        '不存在', 'answer (yes or no?): no', 'answer is no', "\"no\"",
+        'answer: no', 'answer is: no', 'answer is:\n\nno', 'answer is:\nno',
+        'there is no causal relationship', '不存在因果关系', '没有因果关系', '没有明显的因果关系',
+        '不存在', '答案是:否', '答案是:\n\n否', '答案是:\n否', '答案:否', '答案是否', '答案为否',
+        "\"否\"", '回答是:否', '没有直接的因果关系'
+    ] + common_false_list
+
+    if model_response.startswith(tuple(start_option1_list)):
+        return 1
+    elif model_response.startswith(tuple(start_option2_list)):
+        return 0
+    elif any(
+            model_response.find(option) > -1 and
+        (low_index := min(low_index, model_response.find(option))) > -1
+            for option in inner_option1_list):
+        label = 1
+        if any(option in model_response
+               and model_response.find(option) < low_index
+               for option in inner_option2_list):
+            label = 0
+        return label
+    elif any(response in model_response for response in inner_option2_list):
+        return 0
+    else:
+        return -1
--- a/opencompass/datasets/calm/evaluation/labeling/PCD-C.py
+++ b/opencompass/datasets/calm/evaluation/labeling/PCD-C.py
@ -0,0 +1,89 @@
+# flake8: noqa: E501
+from .common_answers import (common_option_1_list, common_option_2_list,
+                             common_start_op1_dict, common_start_op2_dict)
+
+
+def get_gt_label(item):
+    return item['gt_answer']
+
+
+def get_pred_label(model_response, item, prompt_style, type):
+    model_response = model_response.strip().lower()
+    hypothesis1 = item['hypothesis1'].strip().lower()
+    hypothesis2 = item['hypothesis2'].strip().lower()
+    len1 = len(hypothesis1)
+    len2 = len(hypothesis2)
+    low_index = len(model_response)
+    ask_for = item['ask-for']
+
+    start_str1_dict = {
+        **common_start_op1_dict,
+        len(hypothesis1) - 1: [
+            f'答案（选项一或选项二？）：{hypothesis1[:-1]}',
+            f'answer (option 1 or option 2) : {hypothesis1[:-1]}'
+        ]
+    }
+    start_str2_dict = {
+        **common_start_op2_dict,
+        len(hypothesis2) - 1: [
+            f'答案（选项一或选项二？）：{hypothesis2[:-1]}',
+            f'answer (option 1 or option 2) : {hypothesis2[:-1]}'
+        ]
+    }
+    start_option1_list, start_option2_list = [], []
+    # some of the model will give response containing the question, we usually preprocess the response to remove the question part, but sometimes due to the model's response format, some of the question part is not removed, so here we are checking the response with the question part as well.
+    for key in start_str1_dict.keys():
+        for str1 in start_str1_dict[key]:
+            for i in range(key, len(str1) + 1):
+                start_option1_list.append(str1[-i:])
+    for key in start_str2_dict.keys():
+        for str2 in start_str2_dict[key]:
+            for i in range(key, len(str2) + 1):
+                start_option2_list.append(str2[-i:])
+
+    inner_option1_list = [
+        'answer (option 1 or option 2 ?): {}'.format(hypothesis1[:len1 - 1]),
+        'answer (option 1 or option 2?): {}'.format({hypothesis1[:len1 - 1]}),
+        'the {} of the input event is that {}'.format(ask_for,
+                                                      hypothesis1[:len1 - 1]),
+        'the {} of the input event is option 1'.format(ask_for),
+        'because {}'.format(hypothesis1[:len1 - 1]), 'answer is option 1',
+        'answer is: option 1', 'answer: option 1', hypothesis1,
+        hypothesis1[:len1 - 1], 'should be 1', 'i believe option 1', 'is 1',
+        'select option 1', '正确答案是选项一', '答案为选项一', '应该选择选项一', '答案：选项一', '答案是选项一'
+    ] + common_option_1_list
+    inner_option2_list = [
+        'answer (option 1 or option 2 ?): {}'.format(hypothesis2[:len2 - 1]),
+        'answer (option 1 or option 2?): {}'.format({hypothesis2[:len2 - 1]}),
+        'the {} of the input event is that {}'.format(ask_for,
+                                                      hypothesis2[:len1 - 1]),
+        'the {} of the input event is option 2'.format(ask_for),
+        'because {}'.format(hypothesis2[:len2 - 1]), 'answer is option 2',
+        'answer is: option 2', 'answer: option 2', hypothesis2,
+        hypothesis2[:len2 - 1], 'should be 2', 'i believe option 2', 'is 2',
+        'select option 2', '正确答案是选项二', '答案为选项二', '应该选择选项二', '答案是选项二'
+    ] + common_option_2_list
+
+    if model_response.startswith(tuple(start_option1_list)) \
+        or any(hypothesis1 == option for option in [model_response, model_response[:len1], model_response + '.']) \
+        or model_response in hypothesis1 and len(model_response) > 1:
+        label = 0
+    elif model_response.startswith(tuple(start_option2_list)) \
+        or any(hypothesis2 == option for option in [model_response, model_response[:len2], model_response + '.']) \
+        or model_response in hypothesis2 and len(model_response) > 1:
+        label = 1
+    elif any(
+            model_response.find(option) > -1 and
+        (low_index := min(low_index, model_response.find(option))) > -1
+            for option in inner_option1_list):
+        label = 0
+        if any(option in model_response
+               and model_response.find(option) < low_index
+               for option in inner_option2_list):
+            label = 1
+    elif any(
+            model_response.find(option) > -1 for option in inner_option2_list):
+        label = 1
+    else:
+        return -1
+    return label
--- a/opencompass/datasets/calm/evaluation/labeling/Probability.py
+++ b/opencompass/datasets/calm/evaluation/labeling/Probability.py
@ -0,0 +1,64 @@
+# flake8: noqa: E501
+import json
+import re
+
+from .common_answers import (add_quotes_to_unquoted, change_quotation,
+                             is_numeric)
+
+
+def get_gt_label(item):
+    return item['gt_answer']
+
+
+# common function for maths
+def extract_prob(model_response, prompt_style, type):
+    model_response += '}'
+    if 'CoT' in prompt_style and any(
+            match in type for match in ['NIE', 'NDE', 'ETT', 'CDE', 'ATE']):
+        matches = re.findall(r'\{\"answer\":.*?\}', model_response, re.DOTALL)
+    else:
+        matches = re.findall(r'\{+.*?\}+', model_response,
+                             re.DOTALL | re.IGNORECASE)
+    matched_str = None
+    for match in matches:
+        if match:
+            matched_str = match.lower()
+            if matched_str.startswith('{{') and matched_str.endswith('}}}'):
+                matched_str = matched_str[1:-2]
+            elif matched_str.startswith('{{') and matched_str.endswith('}}'):
+                matched_str = matched_str[1:-1]
+            elif matched_str.startswith('{{') and matched_str.endswith('}'):
+                matched_str = matched_str[1:]
+            elif matched_str.startswith('{') and matched_str.endswith('}}'):
+                matched_str = matched_str[:-1]
+        else:
+            matched_str = None
+
+        if matched_str:
+            try:
+                inner_json_obj = json.loads(matched_str)
+            except json.JSONDecodeError:
+                # If parsing fails, try adding quotes to unquoted words and parse again
+                fixed_json_str = add_quotes_to_unquoted(matched_str)
+                fixed_json_str = change_quotation(fixed_json_str)
+                try:
+                    inner_json_obj = json.loads(fixed_json_str)
+                except:
+                    inner_json_obj = {}
+
+            prob_str_value = inner_json_obj.get('prob', None)
+            if prob_str_value is not None:
+                break
+    if matched_str is None:
+        prob_str_value = None
+
+    pred_value = float(prob_str_value) if prob_str_value and is_numeric(
+        prob_str_value) else None
+
+    return pred_value
+
+
+def get_pred_label(model_response, item, prompt_style, type):
+    model_response = model_response.strip().lower()
+    pred = extract_prob(model_response, prompt_style, type)
+    return pred
--- a/opencompass/datasets/calm/evaluation/labeling/common_answers.py
+++ b/opencompass/datasets/calm/evaluation/labeling/common_answers.py
@ -0,0 +1,318 @@
+import re
+
+common_true_list = [
+    'answer (yes or no?): yes', 'answer (yes or no? ): yes',
+    'answer (yes or no ?): yes', 'answer (yes or no ? ): yes', 'answer is yes',
+    "\"yes\"", 'say yes', 'as follows:\n\nyes', 'answer: yes',
+    'answer is: yes', 'answer is:\n\nyes', 'answer is:\nyes',
+    "answer is \"yes\"", 'should be yes', 'is yes', 'chose yes', '\n\nyes',
+    'the correct answer is yes', 'is identified', 'is identifiable',
+    'does cause', '答案是:是', '答案是:\n\n是', '“是”', '答案：是', '答案是:\n是', '答案:是',
+    '答案是是', "\"是\"", '是的', '方法1比方法2更准确', '答：\n\n是', '答案为：是', '```\nyes\n\n```',
+    'answer (yes or no?):yes', 'answer (yes or no? ):yes',
+    'answer (yes or no ?):yes', 'answer (yes or no ? ):yes', 'output: yes',
+    'answer (yes or no?): yes', 'answer is yes', "\"yes\"", 'say yes',
+    'as follows:\n\nyes', 'answer: yes', 'answer is: yes', 'answer is:\n\nyes',
+    'answer is:\nyes', '“是”', '答案：是', 'answer (yes or no?): yes',
+    'answer is yes', "\"yes\"", 'answer: yes', 'answer is: yes',
+    'answer is:\n\nyes', 'answer is:\nyes', '答案是:是', '答案是:\n\n是', '答案是:\n是',
+    '答案:是', '答案是是', "\"是\"", '是的', '答案: 是', '回答是:是',
+    'answer (yes or no?): yes', 'answer is yes', "\"yes\"", 'answer: yes',
+    'answer is: yes', 'answer is:\n\nyes', 'answer is:\nyes', '答案是:是',
+    '答案是:\n\n是', '答案是:\n是', '答案:是', '答案是是', "\"是\"", '是的', '答案: 是',
+    'answer (yes or no?): yes', 'answer (yes or no ?): yes', 'answer is yes',
+    "\"yes\"", 'answer: yes', 'answer is: yes', 'answer is:\n\nyes', 'so yes',
+    'therefore, yes', 'answer is:\nyes', 'method 1 is correct',
+    'correct to use method 1', 'chose yes', 'yes, it would', '答案是:是',
+    '答案是:\n\n是', '答案是:\n是', '答案:是', '答案是是', "\"是\"", '是的', '方法1比方法2更准确',
+    '答案: 是', '回答是:是', '答案为“是', '答案是“是', '答案应该是“是', '答案都是“是', '答案(是)', '答案是肯定',
+    '答案是：是', '答案：是', '答案是:“是', '答案为:是', '答案是「是', '答案为 “是', '存在因果关系', '答案是真',
+    '因此是“是', '答案为“有', '答案是是', '答案为“(是', "答案为\"\"是", '答案是:“是', "答案应为:\"是",
+    "答案应为\"是", '答案为:是', "答案是:\"是", "答案应该是\"是", '答案为“yes', '具有因果关系', '答案是 “是',
+    '答案“是', '答案必须是“yes', '答案处为“是', '答案应是“是', '答案為“是', '答案可以是“是', '答案的是“是',
+    '答案为「是', "案为“\"是", "答案为 \"是", '答案是有', '答案是： 是', '答案为：是', '答案是对', '答案是：是',
+    '答案是：\n是', '答案应为“是', '答案：是', '答案应该是“是', '答案(是或否？)：是的', "答案\"是\"",
+    "答案都是\"是", '答案为是', '答案为 “是', "答案为\"是", '答案为“是', "答案是\"是\"", '答案是“是',
+    '答案是“是”', 'answer (yes )', 'answering your query, yes', 'so yes',
+    'henceforth answering yes', 'hence yes', 'answer (yes)', 'in short, yes',
+    'hence - yes', 'correct response should be yes', 'thus, yes',
+    'in short - yes', '答案是：\n\nyes', 'leading us to believe yes!',
+    'hence answering yes', 'therefore should read - yes', 'hence y es',
+    'therefore, yes', 'therefore yes',
+    'the correct response should be marked as “yes.', 'thus - yes',
+    "the answer is \"yes.\"", "therefore, the answer is \"yes,\"",
+    'answer (yes or no ? ) : yes', 'answe: yes', "thus answering 'yes'",
+    'thus answering yes', 'thereby answering yes', 'answer would thus be yes',
+    "so answering 'yes'", "hence answering 'yes'", "therefore answering 'yes",
+    'confirming our answer yes', 'an answer for this question would be yes',
+    'answer would be: yes', 'implying a yes answer', 'making the answer yes',
+    'incident does have a causal relationship',
+    'the cause and effect relationship exists',
+    'there is a direct cause relationship', 'must have a causal relationship',
+    'answer would be yes', 'a causal relationship exists between',
+    'answer(yes', 'answer for this question is yes', 'answer (yes',
+    'answer here is `yes`', 'answer might be yes', 'answer is a yes',
+    'the answer yes', 'henceforth – yes', 'thus indicating yes',
+    'hence indicating yes', "it's safe to say yes", "hence it's 'yes'",
+    "thus answering 'yes’", 'so it’s yes', 'thus it can be said yes',
+    'the correct response is yes', 'answering the question with a yes',
+    "the correct answer would be \"yes", "the answer is \"yes”",
+    "answer \"yes", 'the answer as yes', 'the answer to the question yes',
+    'the answer is causality', 'the answer is yes', "the answer is \"yes",
+    '答案是:是'
+]
+common_false_list = [
+    'answer (yes or no?): no', 'answer (yes or no? ): no',
+    'answer (yes or no ?): no', 'answer (yes or no ? ): no', 'answer is no',
+    "\"no\"", 'say no', 'as follows:\n\nno', 'answer: no', 'answer is: no',
+    'answer is:\n\nno', 'answer is:\nno', 'should be no', "answer is \"no\"",
+    'is no', 'chose no', '\n\nno', 'the correct answer is no.',
+    'is not identified', '答案是:否', '答案是：否', '答：否', '答案是:\n\n否', '“否”', '答案：否',
+    '答案是:\n否', '答案:否', '答案是否', "\"否\"", '答案：不是', '回答是:否', '答：\n\n否', '不会导致',
+    '```\nno\n\n```', 'answer (yes or no?):no', 'answer (yes or no? ):no',
+    'answer (yes or no ?):no', 'answer (yes or no ? ):no', 'output: no',
+    'answer (yes or no?): no', 'answer is no', "\"no\"", 'say no',
+    'as follows:\n\nno', 'answer: no', 'answer is: no', 'answer is:\n\nno',
+    'answer is:\nno', '“否”', '答案：否', '答案：不是', 'answer (yes or no?): no',
+    'answer is no', "\"no\"", 'answer: no', 'answer is: no',
+    'answer is:\n\nno', 'answer is:\nno', 'does not cause', '答案是:否',
+    '答案是:\n\n否', '答案是:\n否', '答案:否', '答案是否', "\"否\"", '并没有导致', '回答是:否', '答案: 否',
+    '不会导致', '回答是:\n\n否', '回答是:\n否', 'answer (yes or no?): no', 'answer is no',
+    "\"no\"", 'answer: no', 'answer is: no', 'answer is:\n\nno',
+    'answer is:\nno', '答案是:否', '答案是:\n\n否', '答案是:\n否', '答案:否', '答案是否', "\"否\"",
+    '回答是:否', '答案: 否', 'answer (yes or no?): no', 'answer is no', "\"no\"",
+    'answer: no', 'answer is: no', 'answer is:\n\nno', 'answer is:\nno',
+    'method 2 is correct', 'correct to use method 2', 'chose no', '答案是:否',
+    '答案是:\n\n否', '答案是:\n否', '答案:否', '答案是否', "\"否\"", '回答是:否', '方法2比方法1更准确',
+    '方法2', '答案是:否', '答案是:\n\n否', '答案是:\n否', '答案:否', '答案是否', "\"否\"", '并没有导致',
+    '回答是:否', '答案: 否', '不会导致', '回答是:\n\n否', '回答是:\n否', '答案为“否', '答案是“否',
+    '答案是：\n\n- 否', '答案是：否', '答案为：否', '答案：否', '答案(是或否？)：否', '答案是不', '答案应该是“否',
+    '答案应为否', "答案应该是\"否", '答案应为“否', '答案为否', 'answering your query - no',
+    "the answer is \"no.\"", 'the answer is therefore no',
+    'hence answering no', "hence answering 'no'", 'answer should read : no',
+    'therefore answer would be no', "answers should read 'no",
+    'answer would need to be no', 'answering your above query : no',
+    'answer would be no', "therefore, answering 'no", 'answer:no',
+    'answer should remain no', 'the answer to this question would be no',
+    'answer is:no', "answer is therefore \"no.\"", 'making the answer no',
+    'answer(no)', 'answer is, no', "answer might be \"no.\"",
+    'answer it as no', 'should be the answer no', 'answering no',
+    "thus answering 'no'", 'thus, no', "therefore 'no'",
+    'the answer can be no', 'answer is “no', 'the answer is mostly no',
+    'answer is probably not', "answer is \"no", '答案是“否”', '答案（是或否？）：否',
+    "答案是\"否\"", "答案为\"否\"", '答案是否', '答案为“否', '答案为“不', '答案为“没有', '答案“否',
+    '答案为“非”', '答案为“无”', '答案为”否', "答案为 \"否", '答案为否', '答案是\\”否', '答案应该是“否',
+    '答案是：\nno', '答案是：\n否', '答案是：\n不', '答案：否', '答案应为“非', "答案\"否", '答案为**否',
+    '答案在“否', '答案可能为“否', '答案返回“否', "答案为\"否", '答案是“不', '答案应该为“否', "答案为'否",
+    '答案为不存  在', '答案应为“否', '答案为《否', '答案是“无', '答案为\\“否', '答案将是“否', '答案还是“否',
+    '答案：“不', '答案 为“否', '答案应该是否', 'the answer is no', '不存在“因果”关系', "答案应为\"否",
+    "答案应该是\"否", '答案是:否', '答案为:否', "答案选择\"否", "答案是:\"否", "答案应该为\"否", "答案应为\"否",
+    '答案选择为:否', '答案为 “否', '答案为“非', '答案为“没'
+]
+common_option_1_list = [
+    'answer (option 1 or 2 or 3 or 4 ?): option 1',
+    '(option 1 or 2 or 3 or 4?): option 1',
+    'answer (option 1 or 2 or 3 ?): option 1',
+    '(option 1 or 2 or 3?): option 1',
+    'answer (option 1 or option 2 ?): option 1',
+    'answer (option 1 or option 2?): option 1', 'should be 1', 'is 1',
+    'option 1', 'answer is 1', 'option one',
+    "the correct answer is \"option 1", '正确答案是选项一', '答案为选项一', '应该选择选项一',
+    '答案：选项一', '答案: 选项一', '答案:选项一', '答案是选项一', '选项一是正确', '选择选项一', '我认为选项一',
+    '我的回答是选项一', '我的回答是:选项一', 'option 1', 'answer is 1', 'option one', '答案：选项一',
+    'option  1', 'option#1', '选项1是最符合', '答案是选项1', '答案为选项1', '选项1是正确',
+    '答案应该是选项1', '答案是选项 1', 'answer is therefore option 1', '答案为选项一', '答案（选项一）',
+    '选项一正确', '选选项一', '答案选项一', '即选项一', '答案是：\n选项一', '答案为选项一', '是选项一', '选项一是正确',
+    '选项一为正确', '选项一的答案是正确', '答案为选项一', '答案:选项一', '是:选项一', '答案: 选项一'
+]
+common_option_2_list = [
+    'answer (option 1 or option 2 ?): option 2',
+    'answer (option 1 or option 2?): option 2',
+    'answer (option 1 or 2 or 3 or 4 ?): option 2',
+    '(option 1 or 2 or 3 or 4?): option 2',
+    'answer (option 1 or 2 or 3 ?): option 2',
+    '(option 1 or 2 or 3?): option 2', 'should be 2', 'is 2', 'option 2',
+    'answer is 2', 'option two', "the correct answer is \"option 2",
+    '正确答案是选项二', '答案为选项二', '应该选择选项二', '答案：选项二', '答案: 选项二', '答案:选项二', '答案是选项二',
+    '选项二是正确', '选择选项二', '我认为选项二', '我的回答是选项二', '我的回答是:选项二', 'option 2',
+    'answer is 2', 'option two', '答案：选项二', 'option ##two##', '选项2是满足',
+    '答案是选项2', '答案是选项 2', '答案为选项二', '答案是选项二', '是选项二', '答案（选项二）', '选项二正确',
+    '选选项二', '答案选项二', '即选项二', '答案是：\n选项二', '答案为选项二', '选项二是正确', '选项二为正确',
+    '答案为选项二', '答案:选项二', '是:选项二', '答案: 选项二'
+]
+common_option_3_list = [
+    'answer (option 1 or 2 or 3 or 4 ?): option 3',
+    '(option 1 or 2 or 3 or 4?): option 3',
+    'answer (option 1 or 2 or 3 ?): option 3',
+    '(option 1 or 2 or 3?): option 3', 'should be 3', 'is 3', 'option 3',
+    'answer is 3', 'option three', '正确答案是选项三', '答案为选项三', '应该选择选项三', '答案：选项三',
+    '答案: 选项三', '答案:选项三', '答案是选项三', '选项三是正确', '选择选项三', '我认为选项三', '我的回答是选项三',
+    '我的回答是:选项三', 'option 3', 'answer is 3', 'option three', '答案：选项三', '答案是选项3',
+    '选项 3 是正确', '选项3是正确', '答案是选项 3', '答案为选项三', '答案是选项三', '是选项三', '选项三正确',
+    '选选项三', '答案选项三', '即选项三', '答案是：\n选项三', '答案为选项三', '选项三是正确', '选项三为正确',
+    '答案为选项三', '答案:选项三', '是:选项三', '答案: 选项三'
+]
+common_option_4_list = [
+    'answer (option 1 or 2 or 3 or 4 ?): option 4',
+    '(option 1 or 2 or 3 or 4?): option 4', 'should be 4', 'is 4', 'option 4',
+    'answer is 4', 'option four', '正确答案是选项四', '答案为选项四', '应该选择选项四', '答案：选项四',
+    '答案: 选项四', '答案:选项四', '答案是选项四', '选项四是正确', '选择选项四', '我认为选项四', '我的回答是选项四',
+    '我的回答是:选项四', 'option 4', 'answer is 4', 'option four', '答案：选项四', '答案是选项4',
+    '选项 4 是正确', '选项4是正确', '答案是选项 4', '答案为选项四', '答案是选项四', '是选项四', '选项四正确',
+    '选选项四', '答案选项四', '即选项四', '答案是：\n选项四', '答案为选项四', '选项四是正确', '选项四为正确',
+    '答案为选项四', '答案:选项四', '是:选项四', '答案: 选项四'
+]
+
+common_start_true_dict = {
+    1: ['答案（是或否？）：是', '答案（是或否？）：- 是', '答案（是或否？）：\n\n是', '有'],
+    3: [
+        'answer (yes or no?): yes', 'answer (yes or no? ): yes',
+        'answer (yes or no ?): yes', 'answer (yes or no ? ): yes',
+        'answer (yes or no?): \n\nyes', 'answer (yes or no? ): \n\nyes',
+        'answer (yes or no ?): \n\nyes', 'answer (yes or no ? ): \n\nyes',
+        'answer (yes or no?):yes', 'answer (yes or no? ):yes',
+        'answer (yes or no ?):yes', 'answer (yes or no ? ):yes',
+        'answer (yes or no?): - yes', 'answer (yes or no? ): - yes',
+        'answer (yes or no ?): - yes', 'answer (yes or no ? ): - yes', '答案为“是”'
+    ],
+    4: [
+        'answer (yes or no?): true', 'answer (yes or no? ): true',
+        'answer(yes;', 'answer(yes)'
+    ],
+    5: ['answer (yes )']
+}
+common_start_false_dict = {
+    1: [
+        '答案（是或否？）：否', '答案（是或否？）：- 否', '答案（是或否？）：\n\n否', '答案（是或否？）：不',
+        '答案（是或否？）：- 不', '无'
+    ],
+    2: [
+        '答案（是或否？）：不是',
+        '答案（是或否？）：- 不是',
+        'answer (yes or no?): no',
+        'answer (yes or no? ): no',
+        'answer (yes or no ?): no',
+        'answer (yes or no ? ): no',
+        'answer (yes or no?):no',
+        'answer (yes or no? ):no',
+        'answer (yes or no ?):no',
+        'answer (yes or no ? ):no',
+        'answer (yes or no?): \n\nno',
+        'answer (yes or no? ): \n\nno',
+        'answer (yes or no ?): \n\nno',
+        'answer (yes or no ? ): \n\nno',
+        'answer (yes or no?): - no',
+        'answer (yes or no? ): - no',
+        'answer (yes or no ?): - no',
+        'answer (yes or no ? ): - no',
+    ],
+    3: ['答案为“否”', 'answer (no)', 'answer(no)'],
+    4: ['answer (no )', 'answe r(no )'],
+    5: [
+        'answer (yes or no?): false',
+        'answer (yes or no? ): false',
+    ],
+}
+
+common_start_op1_dict = {
+    1: [
+        '答案（选项一或选项二或选项三？）：选项一', '答案（选项一或选项二或选项三？）： 选项一', '答案（选项一或选项二或选项三？）：一',
+        '答案（选项一或选项二或选项三？）： 一', 'answer (option 1 or 2 or 3?) : option 1',
+        'answer (option 1 or 2 or 3?) : option1',
+        'answer (option 1 or 2 or 3?):1', 'answer (option 1 or 2 or 3?): 1',
+        '答案（选项一或选项二或选项三或选项四？）：选项一', '答案（选项一或选项二或选项三或选项四？）： 选项一',
+        '答案（选项一或选项二或选项三或选项四？）：一', '答案（选项一或选项二或选项三或选项四？）： 一',
+        'answer (option 1 or 2 or 3 or 4?) : option 1',
+        'answer (option 1 or 2 or 3 or 4?) : option1',
+        'answer (option 1 or 2 or 3 or 4?):1',
+        'answer (option 1 or 2 or 3 or 4?): 1', '答案（选项一或选项二？）：选项一',
+        'answer (option 1 or option 2) : option 1',
+        'answer (option 1 or option 2) : option1', 'answer: option 1',
+        'the correct answer is option 1', 'the answer is option 1'
+    ],
+    3: [
+        'answer (option 1 or 2 or 3?) : option one',
+        'answer (option 1 or 2 or 3 or 4?) : option one',
+        'answer (option 1 or option 2) : option one'
+    ],
+}
+common_start_op2_dict = {
+    1: [
+        '答案（选项一或选项二或选项三？）：选项二', '答案（选项一或选项二或选项三？）： 选项二', '答案（选项一或选项二或选项三？）：二',
+        '答案（选项一或选项二或选项三？）： 二', 'answer (option 1 or 2 or 3?) : option 2',
+        'answer (option 1 or 2 or 3?) : option2',
+        'answer (option 1 or 2 or 3?):2', 'answer (option 1 or 2 or 3?): 2',
+        '答案（选项一或选项二或选项三或选项四？）：选项二', '答案（选项一或选项二或选项三或选项四？）： 选项二',
+        '答案（选项一或选项二或选项三或选项四？）：二', '答案（选项一或选项二或选项三或选项四？）： 二',
+        'answer (option 1 or 2 or 3 or 4?) : option 2',
+        'answer (option 1 or 2 or 3 or 4?) : option2',
+        'answer (option 1 or 2 or 3 or 4?):2',
+        'answer (option 1 or 2 or 3 or 4?): 2', '答案（选项一或选项二？）：选项二',
+        'answer (option 1 or option 2) : option 2',
+        'answer (option 1 or option 2) : option2', 'answer: option 2',
+        'the correct answer is option 2', 'the answer is option 2'
+    ],
+    3: [
+        'answer (option 1 or 2 or 3?) : option two',
+        'answer (option 1 or 2 or 3 or 4?) : option two',
+        'answer (option 1 or option 2) : option two'
+    ],
+}
+common_start_op3_dict = {
+    1: [
+        '答案（选项一或选项二或选项三？）：选项三',
+        '答案（选项一或选项二或选项三？）： 选项三',
+        '答案（选项一或选项二或选项三？）：三',
+        '答案（选项一或选项二或选项三？）： 三'
+        'answer (option 1 or 2 or 3?) : option 3',
+        'answer (option 1 or 2 or 3?) : option3',
+        'answer (option 1 or 2 or 3?):3',
+        'answer (option 1 or 2 or 3?): 3',
+        '答案（选项一或选项二或选项三或选项四？）：选项三',
+        '答案（选项一或选项二或选项三或选项四？）： 选项三',
+        '答案（选项一或选项二或选项三或选项四？）：三',
+        '答案（选项一或选项二或选项三或选项四？）： 三'
+        'answer (option 1 or 2 or 3 or 4?) : option 3',
+        'answer (option 1 or 2 or 3 or 4?) : option3',
+        'answer (option 1 or 2 or 3 or 4?):3',
+        'answer (option 1 or 2 or 3 or 4?): 3',
+    ],
+    5: [
+        'answer (option 1 or 2 or 3?) : option three',
+        'answer (option 1 or 2 or 3 or 4?) : option three'
+    ],
+}
+common_start_op4_dict = {
+    1: [
+        '答案（选项一或选项二或选项三或选项四？）：选项四',
+        '答案（选项一或选项二或选项三或选项四？）： 选项四',
+        '答案（选项一或选项二或选项三或选项四？）：四',
+        '答案（选项一或选项二或选项三或选项四？）： 四'
+        'answer (option 1 or 2 or 3 or 4?) : option 4',
+        'answer (option 1 or 2 or 3 or 4?) : option4',
+        'answer (option 1 or 2 or 3 or 4?):4',
+        'answer (option 1 or 2 or 3 or 4?): 4',
+    ],
+    4: ['answer (option 1 or 2 or 3 or 4?) : option four']
+}
+
+
+# some shared answer processing functions in mathematically related tasks
+def is_numeric(value):
+    try:
+        float(value)
+        return True
+    except Exception:
+        return False
+
+
+def add_quotes_to_unquoted(json_str):
+    # This regex looks for words that are not surrounded by quotes.
+    return re.sub(r'(?<=[:,])\s*([\w_]+)\s*(?=[,:\]})])', r' "\1" ', json_str)
+
+
+def change_quotation(json_str):
+    json_str = re.sub(r'“', '"', json_str)
+    json_str = re.sub(r'”', '"', json_str)
+    json_str = re.sub(r'\'', '"', json_str)
+    return json_str
--- a/opencompass/datasets/calm/utils/load_items.py
+++ b/opencompass/datasets/calm/utils/load_items.py
@ -0,0 +1,18 @@
+import json
+from pathlib import Path
+
+
+def load_query_instances(path):
+    """Loads query instances from a JSON file.
+
+    Args:
+        path (str or Path): The path to the JSON file.
+
+    Returns:
+        list: A list of query instances loaded from the JSON file.
+    """
+    if isinstance(path, str):
+        path = Path(path)
+    with path.open('r', encoding='utf-8') as f:
+        item_list = [json.loads(line) for line in f.readlines()]
+    return item_list