diff --git a/docs/en/advanced_guides/general_math.md b/docs/en/advanced_guides/general_math.md
new file mode 100644
index 00000000..da9cfd2f
--- /dev/null
+++ b/docs/en/advanced_guides/general_math.md
@@ -0,0 +1,190 @@
+# General Math Evaluation Guidance
+
+## Introduction
+
+Mathematical reasoning is a crucial capability for large language models (LLMs). To evaluate a model's mathematical abilities, we need to test its capability to solve mathematical problems step by step and provide accurate final answers. OpenCompass provides a convenient way to evaluate mathematical reasoning through the CustomDataset and MATHEvaluator components.
+
+## Dataset Format
+
+The math evaluation dataset should be in either JSON Lines (.jsonl) or CSV format. Each problem should contain at least:
+
+- A problem statement
+- A solution/answer (typically in LaTeX format with the final answer in \\boxed{})
+
+Example JSONL format:
+
+```json
+{"problem": "Find the value of x if 2x + 3 = 7", "solution": "Let's solve step by step:\n2x + 3 = 7\n2x = 7 - 3\n2x = 4\nx = 2\nTherefore, \\boxed{2}"}
+```
+
+Example CSV format:
+
+```csv
+problem,solution
+"Find the value of x if 2x + 3 = 7","Let's solve step by step:\n2x + 3 = 7\n2x = 7 - 3\n2x = 4\nx = 2\nTherefore, \\boxed{2}"
+```
+
+## Configuration
+
+To evaluate mathematical reasoning, you'll need to set up three main components:
+
+1. Dataset Reader Configuration
+
+```python
+math_reader_cfg = dict(
+    input_columns=['problem'],  # Column name for the question
+    output_column='solution'    # Column name for the answer
+)
+```
+
+2. Inference Configuration
+
+```python
+math_infer_cfg = dict(
+    prompt_template=dict(
+        type=PromptTemplate,
+        template=dict(
+            round=[
+                dict(
+                    role='HUMAN',
+                    prompt='{problem}\nPlease reason step by step, and put your final answer within \\boxed{}.',
+                ),
+            ]
+        ),
+    ),
+    retriever=dict(type=ZeroRetriever),
+    inferencer=dict(type=GenInferencer),
+)
+```
+
+3. Evaluation Configuration
+
+```python
+math_eval_cfg = dict(
+    evaluator=dict(type=MATHEvaluator),
+)
+```
+
+## Using CustomDataset
+
+Here's how to set up a complete configuration for math evaluation:
+
+```python
+from mmengine.config import read_base
+from opencompass.models import TurboMindModelwithChatTemplate
+from opencompass.datasets import CustomDataset
+
+math_datasets = [
+    dict(
+        type=CustomDataset,
+        abbr='my-math-dataset',              # Dataset abbreviation
+        path='path/to/your/dataset',         # Path to your dataset file
+        reader_cfg=math_reader_cfg,
+        infer_cfg=math_infer_cfg,
+        eval_cfg=math_eval_cfg,
+    )
+]
+```
+
+## MATHEvaluator
+
+The MATHEvaluator is specifically designed to evaluate mathematical answers. It is developed based on the math_verify library, which provides mathematical expression parsing and verification capabilities, supporting extraction and equivalence verification for both LaTeX and general expressions.
+
+The MATHEvaluator implements:
+
+1. Extracts answers from both predictions and references using LaTeX extraction
+2. Handles various LaTeX formats and environments
+3. Verifies mathematical equivalence between predicted and reference answers
+4. Provides detailed evaluation results including:
+   - Accuracy score
+   - Detailed comparison between predictions and references
+   - Parse results of both predicted and reference answers
+
+The evaluator supports:
+
+- Basic arithmetic operations
+- Fractions and decimals
+- Algebraic expressions
+- Trigonometric functions
+- Roots and exponents
+- Mathematical symbols and operators
+
+Example evaluation output:
+
+```python
+{
+    'accuracy': 85.0,  # Percentage of correct answers
+    'details': [
+        {
+            'predictions': 'x = 2',           # Parsed prediction
+            'references': 'x = 2',         # Parsed reference
+            'correct': True            # Whether they match
+        },
+        # ... more results
+    ]
+}
+```
+
+## Complete Example
+
+Here's a complete example of how to set up math evaluation:
+
+```python
+from mmengine.config import read_base
+from opencompass.models import TurboMindModelwithChatTemplate
+from opencompass.datasets import CustomDataset
+from opencompass.openicl.icl_evaluator.math_evaluator import MATHEvaluator
+from opencompass.openicl.icl_prompt_template import PromptTemplate
+from opencompass.openicl.icl_retriever import ZeroRetriever
+from opencompass.openicl.icl_inferencer import GenInferencer
+
+# Dataset reader configuration
+math_reader_cfg = dict(input_columns=['problem'], output_column='solution')
+
+# Inference configuration
+math_infer_cfg = dict(
+    prompt_template=dict(
+        type=PromptTemplate,
+        template=dict(
+            round=[
+                dict(
+                    role='HUMAN',
+                    prompt='{problem}\nPlease reason step by step, and put your final answer within \\boxed{}.',
+                ),
+            ]
+        ),
+    ),
+    retriever=dict(type=ZeroRetriever),
+    inferencer=dict(type=GenInferencer),
+)
+
+# Evaluation configuration
+math_eval_cfg = dict(
+    evaluator=dict(type=MATHEvaluator),
+)
+
+# Dataset configuration
+math_datasets = [
+    dict(
+        type=CustomDataset,
+        abbr='my-math-dataset',
+        path='path/to/your/dataset.jsonl',  # or .csv
+        reader_cfg=math_reader_cfg,
+        infer_cfg=math_infer_cfg,
+        eval_cfg=math_eval_cfg,
+    )
+]
+
+# Model configuration
+models = [
+    dict(
+        type=TurboMindModelwithChatTemplate,
+        abbr='your-model-name',
+        path='your/model/path',
+        # ... other model configurations
+    )
+]
+
+# Output directory
+work_dir = './outputs/math_eval'
+```
diff --git a/docs/en/index.rst b/docs/en/index.rst
index 7181c459..0b15a2b8 100644
--- a/docs/en/index.rst
+++ b/docs/en/index.rst
@@ -40,7 +40,6 @@ We always welcome *PRs* and *Issues* for the betterment of OpenCompass.
    user_guides/experimentation.md
    user_guides/metrics.md
    user_guides/summarizer.md
-   user_guides/corebench.md
 
 .. _Prompt:
 .. toctree::
@@ -61,16 +60,12 @@ We always welcome *PRs* and *Issues* for the betterment of OpenCompass.
    advanced_guides/new_dataset.md
    advanced_guides/custom_dataset.md
    advanced_guides/new_model.md
-   advanced_guides/evaluation_lmdeploy.md
-   advanced_guides/evaluation_lightllm.md
    advanced_guides/accelerator_intro.md
+   advanced_guides/general_math.md
    advanced_guides/code_eval.md
    advanced_guides/code_eval_service.md
-   advanced_guides/prompt_attack.md
-   advanced_guides/longeval.md
    advanced_guides/subjective_evaluation.md
    advanced_guides/circular_eval.md
-   advanced_guides/contamination_eval.md
    advanced_guides/needleinahaystack_eval.md
 
 .. _Tools:
diff --git a/docs/zh_cn/advanced_guides/general_math.md b/docs/zh_cn/advanced_guides/general_math.md
new file mode 100644
index 00000000..8e8d2fa6
--- /dev/null
+++ b/docs/zh_cn/advanced_guides/general_math.md
@@ -0,0 +1,190 @@
+# 数学能力评测
+
+## 简介
+
+数学推理能力是大语言模型(LLMs)的一项关键能力。为了评估模型的数学能力，我们需要测试其逐步解决数学问题并提供准确最终答案的能力。OpenCompass 通过 CustomDataset 和 MATHEvaluator 组件提供了一种便捷的数学推理评测方式。
+
+## 数据集格式
+
+数学评测数据集应该是 JSON Lines (.jsonl) 或 CSV 格式。每个问题至少应包含：
+
+- 问题陈述
+- 解答/答案（通常使用 LaTeX 格式，最终答案需要用 \\boxed{} 括起来）
+
+JSONL 格式示例：
+
+```json
+{"problem": "求解方程 2x + 3 = 7", "solution": "让我们逐步解决：\n2x + 3 = 7\n2x = 7 - 3\n2x = 4\nx = 2\n因此，\\boxed{2}"}
+```
+
+CSV 格式示例：
+
+```csv
+problem,solution
+"求解方程 2x + 3 = 7","让我们逐步解决：\n2x + 3 = 7\n2x = 7 - 3\n2x = 4\nx = 2\n因此，\\boxed{2}"
+```
+
+## 配置说明
+
+要进行数学推理评测，你需要设置三个主要组件：
+
+1. 数据集读取配置
+
+```python
+math_reader_cfg = dict(
+    input_columns=['problem'],  # 问题列的名称
+    output_column='solution'    # 答案列的名称
+)
+```
+
+2. 推理配置
+
+```python
+math_infer_cfg = dict(
+    prompt_template=dict(
+        type=PromptTemplate,
+        template=dict(
+            round=[
+                dict(
+                    role='HUMAN',
+                    prompt='{problem}\n请逐步推理，并将最终答案放在 \\boxed{} 中。',
+                ),
+            ]
+        ),
+    ),
+    retriever=dict(type=ZeroRetriever),
+    inferencer=dict(type=GenInferencer),
+)
+```
+
+3. 评测配置
+
+```python
+math_eval_cfg = dict(
+    evaluator=dict(type=MATHEvaluator),
+)
+```
+
+## 使用 CustomDataset
+
+以下是如何设置完整的数学评测配置：
+
+```python
+from mmengine.config import read_base
+from opencompass.models import TurboMindModelwithChatTemplate
+from opencompass.datasets import CustomDataset
+
+math_datasets = [
+    dict(
+        type=CustomDataset,
+        abbr='my-math-dataset',              # 数据集简称
+        path='path/to/your/dataset',         # 数据集文件路径
+        reader_cfg=math_reader_cfg,
+        infer_cfg=math_infer_cfg,
+        eval_cfg=math_eval_cfg,
+    )
+]
+```
+
+## MATHEvaluator
+
+MATHEvaluator 是专门设计用于评估数学答案的评测器。它基于 math_verify 库进行开发，该库提供了数学表达式解析和验证功能，支持 LaTeX 和一般表达式的提取与等价性验证。
+
+MATHEvaluator 具有以下功能：
+
+1. 使用 LaTeX 提取器从预测和参考答案中提取答案
+2. 处理各种 LaTeX 格式和环境
+3. 验证预测答案和参考答案之间的数学等价性
+4. 提供详细的评测结果，包括：
+   - 准确率分数
+   - 预测和参考答案的详细比较
+   - 预测和参考答案的解析结果
+
+评测器支持：
+
+- 基本算术运算
+- 分数和小数
+- 代数表达式
+- 三角函数
+- 根式和指数
+- 数学符号和运算符
+
+评测输出示例：
+
+```python
+{
+    'accuracy': 85.0,  # 正确答案的百分比
+    'details': [
+        {
+            'predictions': 'x = 2',           # 解析后的预测答案
+            'references': 'x = 2',         # 解析后的参考答案
+            'correct': True            # 是否匹配
+        },
+        # ... 更多结果
+    ]
+}
+```
+
+## 完整示例
+
+以下是设置数学评测的完整示例：
+
+```python
+from mmengine.config import read_base
+from opencompass.models import TurboMindModelwithChatTemplate
+from opencompass.datasets import CustomDataset
+from opencompass.openicl.icl_evaluator.math_evaluator import MATHEvaluator
+from opencompass.openicl.icl_prompt_template import PromptTemplate
+from opencompass.openicl.icl_retriever import ZeroRetriever
+from opencompass.openicl.icl_inferencer import GenInferencer
+
+# 数据集读取配置
+math_reader_cfg = dict(input_columns=['problem'], output_column='solution')
+
+# 推理配置
+math_infer_cfg = dict(
+    prompt_template=dict(
+        type=PromptTemplate,
+        template=dict(
+            round=[
+                dict(
+                    role='HUMAN',
+                    prompt='{problem}\n请逐步推理，并将最终答案放在 \\boxed{} 中。',
+                ),
+            ]
+        ),
+    ),
+    retriever=dict(type=ZeroRetriever),
+    inferencer=dict(type=GenInferencer),
+)
+
+# 评测配置
+math_eval_cfg = dict(
+    evaluator=dict(type=MATHEvaluator),
+)
+
+# 数据集配置
+math_datasets = [
+    dict(
+        type=CustomDataset,
+        abbr='my-math-dataset',
+        path='path/to/your/dataset.jsonl',  # 或 .csv
+        reader_cfg=math_reader_cfg,
+        infer_cfg=math_infer_cfg,
+        eval_cfg=math_eval_cfg,
+    )
+]
+
+# 模型配置
+models = [
+    dict(
+        type=TurboMindModelwithChatTemplate,
+        abbr='your-model-name',
+        path='your/model/path',
+        # ... 其他模型配置
+    )
+]
+
+# 输出目录
+work_dir = './outputs/math_eval'
+```
diff --git a/docs/zh_cn/index.rst b/docs/zh_cn/index.rst
index 827c7d91..8c6620ca 100644
--- a/docs/zh_cn/index.rst
+++ b/docs/zh_cn/index.rst
@@ -41,7 +41,6 @@ OpenCompass 上手路线
    user_guides/experimentation.md
    user_guides/metrics.md
    user_guides/summarizer.md
-   user_guides/corebench.md
 
 .. _提示词:
 .. toctree::
@@ -61,17 +60,12 @@ OpenCompass 上手路线
    advanced_guides/new_dataset.md
    advanced_guides/custom_dataset.md
    advanced_guides/new_model.md
-   advanced_guides/evaluation_lmdeploy.md
-   advanced_guides/evaluation_lightllm.md
    advanced_guides/accelerator_intro.md
+   advanced_guides/general_math.md
    advanced_guides/code_eval.md
    advanced_guides/code_eval_service.md
-   advanced_guides/prompt_attack.md
-   advanced_guides/longeval.md
    advanced_guides/subjective_evaluation.md
    advanced_guides/circular_eval.md
-   advanced_guides/contamination_eval.md
-   advanced_guides/compassbench_intro.md
    advanced_guides/needleinahaystack_eval.md
 
 .. _工具: