first commit

2026-05-10 13:52:46 +08:00
commit ccc63d1e70
4583 changed files with 584341 additions and 0 deletions
--- a/sn-search-academic/SKILL.md
+++ b/sn-search-academic/SKILL.md
@@ -0,0 +1,287 @@
+---
+name: sn-search-academic
+description: "多源学术搜索：ArXiv、Semantic Scholar（含引用数）、PubMed、Wikipedia。支持按章节读取 ArXiv HTML 全文和 PMC 全文。触发词：学术论文、文献调研、引用数据、生物医学文献、百科查询。一站式多源工具。"
+---
+
+# sn-search-academic - 学术搜索
+
+搜索 ArXiv、Semantic Scholar、PubMed、Wikipedia 四个学术平台，并提供 ArXiv 和 PMC 的**全文章节阅读**能力。全部免费，部分脚本有可选 API key 可提升限额。
+
+## 依赖
+
+运行脚本前先安装本 skill 的 Python 依赖：
+
+```bash
+python3 -m pip install -r skills/sn-search-academic/requirements.txt
+```
+
+如果项目使用 `uv` 环境：
+
+```bash
+uv pip install -r skills/sn-search-academic/requirements.txt
+```
+
+`arxiv_paper.py` 需要 `beautifulsoup4` 解析 ArXiv HTML；其他脚本主要依赖 `httpx` 发起请求。
+
+## 可用脚本
+
+| 脚本 | 平台 | 用途 | API key |
+|------|------|------|---------|
+| `arxiv_search.py` | ArXiv | 预印本搜索，支持作者/标题/ID查询 | 无需 |
+| `arxiv_paper.py` | ArXiv HTML | 按章节读取 ArXiv 论文全文 | 无需 |
+| `semantic_scholar_search.py` | Semantic Scholar | 全学科搜索，含引用数和 TLDR | 无需（有 key 限额更高） |
+| `semantic_scholar_refs.py` | Semantic Scholar | 引用追溯：查论文的参考文献（backward）或被引论文（forward） | 无需（有 key 限额更高） |
+| `pubmed_search.py` | PubMed | 生医文献搜索，含结构化摘要和 PMC ID | 无需（有 key 限额更高） |
+| `pmc_paper.py` | PMC | 按章节读取 PMC 开放获取论文全文 | 无需（有 key 限额更高） |
+| `wikipedia_search.py` | Wikipedia | 百科文章搜索，支持多语言 | 无需 |
+
+## 参数说明
+
+### arxiv_search.py
+
+```bash
+python3 scripts/arxiv_search.py <query> [选项]
+```
+
+| 参数 | 说明 | 默认值 |
+|------|------|--------|
+| `query` | 搜索关键词（使用 `--id-list` 时可省略） | — |
+| `--limit`, `-n` | 返回结果数量 | 10 |
+| `--category`, `-c` | ArXiv 分类过滤（见下方"ArXiv 分类速查"） | — |
+| `--sort` | 排序方式：`relevance`, `date`, `submitted` | relevance |
+| `--author`, `-a` | 按作者过滤，多个用逗号分隔 | — |
+| `--title-only` | 仅在标题中搜索 | — |
+| `--id-list` | 直接按 arXiv ID 获取元数据，逗号分隔 | — |
+
+```bash
+python3 scripts/arxiv_search.py "transformer attention mechanism" --limit 5
+python3 scripts/arxiv_search.py "diffusion model" --author "ho jonathan" --category cs.CV
+python3 scripts/arxiv_search.py --id-list "2409.05591,2301.07041"
+```
+
+**输出字段**：`title`, `url`, `snippet`（摘要）, `arxiv_id`, `authors`, `published`, `updated`, `pdf_url`, `html_url`, `categories`, `primary_category`, `comment`, `journal_ref`, `doi`
+
+### arxiv_paper.py
+
+按章节读取 ArXiv 论文正文（需论文有 HTML 版本，2020 年后多数论文支持）。
+
+```bash
+python3 scripts/arxiv_paper.py <arxiv_id> [--section SECTION_NAME]
+```
+
+| 参数 | 说明 |
+|------|------|
+| `arxiv_id` | arXiv ID（如 `2409.05591` 或 `2409.05591v2`） |
+| `--section`, `-s` | 章节名（大小写不敏感，支持部分匹配）。不指定则列出所有章节。 |
+
+```bash
+python3 scripts/arxiv_paper.py 2409.05591                      # 列出章节
+python3 scripts/arxiv_paper.py 2409.05591 --section introduction
+python3 scripts/arxiv_paper.py 2409.05591 --section method
+```
+
+**列出章节输出字段**：`arxiv_id`, `abs_url`, `html_url`, `pdf_url`, `section_count`, `sections[]`（name, level）
+
+**读取章节输出字段**：`arxiv_id`, `section`, `level`, `content`, `char_count`
+
+### semantic_scholar_search.py
+
+```bash
+python3 scripts/semantic_scholar_search.py <query> [选项]
+```
+
+| 参数 | 说明 | 默认值 |
+|------|------|--------|
+| `query` | 搜索关键词（必填） | — |
+| `--limit`, `-n` | 返回结果数量 | 10 |
+| `--api-key` | Semantic Scholar API Key（也可通过 `S2_API_KEY` 环境变量） | — |
+
+```bash
+python3 scripts/semantic_scholar_search.py "transformer architecture" --limit 5
+python3 scripts/semantic_scholar_search.py "RLHF language model" --limit 10
+```
+
+**输出字段**：`title`, `url`, `snippet`（摘要，缺失时降级为 tldr）, `tldr`, `authors`, `year`, `venue`, `publication_date`, `citation_count`, `influential_citation_count`, `reference_count`, `is_open_access`, `open_access_pdf`, `fields_of_study`, `publication_types`, `doi`, `arxiv_id`, `paper_id`
+
+### semantic_scholar_refs.py
+
+引用追溯：给定一篇论文，查询它的参考文献（backward）或被引论文（forward）。
+
+```bash
+python3 scripts/semantic_scholar_refs.py <paper_id> <direction> [选项]
+```
+
+| 参数 | 说明 | 默认值 |
+|------|------|--------|
+| `paper_id` | 论文标识符：S2 ID、DOI（`10.xxxx/...`）、ArXiv ID（`2301.07041`）、PMID（`PMID:12345678`） | — |
+| `direction` | `references`=参考文献（backward），`citations`=被引论文（forward） | — |
+| `--limit`, `-n` | 返回结果数量 | 20 |
+| `--min-citations` | 最低引用数过滤 | 0 |
+| `--year-min` | 最早年份过滤 | — |
+| `--year-max` | 最晚年份过滤 | — |
+| `--api-key` | Semantic Scholar API Key（可选） | — |
+
+```bash
+# 查看某篇论文引用了哪些论文（backward：找奠基工作）
+python3 scripts/semantic_scholar_refs.py 2301.07041 references --limit 10
+
+# 查看某篇论文被谁引用（forward：找后续进展）
+python3 scripts/semantic_scholar_refs.py 2301.07041 citations --limit 10 --min-citations 50
+
+# 用 DOI 查引用，限定 2023 年以后
+python3 scripts/semantic_scholar_refs.py "10.1038/s41586-024-07487-w" citations --year-min 2023
+
+# 找高引参考文献
+python3 scripts/semantic_scholar_refs.py ARXIV:2005.14165 references --min-citations 100 --limit 5
+```
+
+**输出字段**：`title`, `url`, `snippet`（摘要/tldr）, `authors`, `year`, `venue`, `citation_count`, `influential_citation_count`, `is_open_access`, `open_access_pdf`, `doi`, `arxiv_id`, `paper_id`, `citation_contexts`（引用上下文句子，最多 3 条）, `citation_intents`（引用意图）
+
+**输出额外字段**：`source_paper`（被查询论文的标题/年份/引用数）, `total_available`（该方向总论文数）, `returned`（过滤后返回数）
+
+### pubmed_search.py
+
+支持 PubMed 查询语法，如字段限定（`cancer[Title]`）、日期范围（`2024[pdat]`）。
+
+```bash
+python3 scripts/pubmed_search.py <query> [选项]
+```
+
+| 参数 | 说明 | 默认值 |
+|------|------|--------|
+| `query` | 搜索关键词，支持 PubMed 查询语法 | — |
+| `--limit`, `-n` | 返回结果数量 | 10 |
+| `--api-key` | NCBI API Key（可选，限额从 3 req/s 升至 10 req/s） | — |
+
+```bash
+python3 scripts/pubmed_search.py "CRISPR gene editing" --limit 5
+python3 scripts/pubmed_search.py "Alzheimer[Title] AND treatment[Title]" --limit 5
+```
+
+**输出字段**：`title`, `url`, `snippet`（结构化摘要）, `authors`, `pmid`, `pmc_id`（有值则可传入 `pmc_paper.py`）, `pmc_url`, `journal`, `pub_date`, `volume`, `issue`, `pages`, `keywords`, `pub_types`, `doi`
+
+### pmc_paper.py
+
+读取 PubMed Central 开放获取全文（约 700 万篇生医论文，占 PubMed 约 35%）。`pubmed_search.py` 结果中 `pmc_id` 为 `null` 的论文无法使用本工具。
+
+```bash
+python3 scripts/pmc_paper.py <pmc_id> [--section SECTION_NAME]
+python3 scripts/pmc_paper.py --pmid <pmid> [--section SECTION_NAME]
+```
+
+| 参数 | 说明 |
+|------|------|
+| `pmc_id` | PMC ID（如 `PMC11119143` 或 `11119143`） |
+| `--pmid` | PubMed ID，自动转换为 PMC ID（与 `pmc_id` 二选一） |
+| `--section`, `-s` | 章节名（大小写不敏感，支持部分匹配）。不指定则列出所有章节。 |
+| `--api-key` | NCBI API Key（可选） |
+
+```bash
+python3 scripts/pmc_paper.py PMC11119143                       # 列出章节
+python3 scripts/pmc_paper.py PMC11119143 --section introduction
+python3 scripts/pmc_paper.py --pmid 38786024 --section conclusion
+```
+
+**列出章节输出字段**：`pmc_id`, `pmid`, `title`, `pmc_url`, `section_count`, `sections[]`（name, level，含子章节层级）
+
+**读取章节输出字段**：`pmc_id`, `section`, `level`, `content`（含子章节文本）, `char_count`
+
+### wikipedia_search.py
+
+```bash
+python3 scripts/wikipedia_search.py <query> [选项]
+```
+
+| 参数 | 说明 | 默认值 |
+|------|------|--------|
+| `query` | 搜索关键词（必填） | — |
+| `--limit`, `-n` | 返回结果数量 | 10 |
+| `--lang`, `-l` | 语言版本（`en`, `zh`, `ja`, `de`, `fr` 等） | en |
+
+```bash
+python3 scripts/wikipedia_search.py "machine learning" --limit 5
+python3 scripts/wikipedia_search.py "深度学习" --lang zh --limit 5
+```
+
+## 全文阅读工作流
+
+搜索脚本返回摘要，阅读脚本返回正文。两者配合可按需精读，节省 token。
+
+**ArXiv 论文**：
+1. `arxiv_search.py` 搜索 → 获取 `arxiv_id`
+2. `arxiv_paper.py <id>` 列章节 → `arxiv_paper.py <id> --section introduction` 快速判断是否深入
+3. 按需读取 `method` / `experiment` / `conclusion`
+
+**PMC 生医论文**：
+1. `pubmed_search.py` 搜索 → 结果中取 `pmc_id`（非 null 才有全文）
+2. `pmc_paper.py <pmc_id>` 列章节 → 按需读取关键章节
+
+## 引用追溯工作流
+
+通过论文的引用关系发现关键词搜索覆盖不到的相关工作。
+
+**Backward（找奠基工作）**：
+1. 关键词搜索找到高相关论文 → 取其 `paper_id` 或 `arxiv_id`
+2. `semantic_scholar_refs.py <id> references --min-citations 50` → 找到高引参考文献
+3. 筛选与研究问题相关的条目 → 用 `arxiv_paper.py` 或 `pmc_paper.py` 深入阅读
+
+**Forward（找后续进展）**：
+1. 找到领域奠基论文或关键论文 → 取其 ID
+2. `semantic_scholar_refs.py <id> citations --year-min 2024 --min-citations 10` → 找到近期高引跟进工作
+3. 筛选与研究问题相关的条目 → 深入阅读
+
+**Citation Chain（追溯演化路径）**：
+1. 从种子论文 A 出发 → backward 找到 A 的关键参考文献 B
+2. 从 B 出发 → forward 找到引用 B 的后续工作（可能发现 A 没引用的相关论文 C）
+3. 形成 B → A → ... 和 B → C → ... 的知识脉络
+
+## ArXiv 分类速查
+
+顶层领域可直接用（如 `--category cs`），子分类更精确（如 `--category cs.AI`）。
+
+| 领域 | 分类代码 | 说明 |
+|------|---------|------|
+| **计算机科学** | `cs.AI` | 人工智能 |
+| | `cs.LG` | 机器学习 |
+| | `cs.CL` | 计算语言学 / NLP |
+| | `cs.CV` | 计算机视觉 |
+| | `cs.IR` | 信息检索 |
+| | `cs.RO` | 机器人 |
+| | `cs.SE` | 软件工程 |
+| | `cs.DC` | 分布式/并行计算 |
+| | `cs.NI` | 网络与互联网 |
+| | `cs.CR` | 密码学与安全 |
+| | `cs.DB` | 数据库 |
+| | `cs.HC` | 人机交互 |
+| **统计** | `stat.ML` | 统计机器学习 |
+| | `stat.AP` | 应用统计 |
+| | `stat.ME` | 统计方法论 |
+| **数学** | `math.OC` | 优化与控制 |
+| | `math.ST` | 统计理论 |
+| | `math.CO` | 组合数学 |
+| **物理** | `physics` | 物理（全类） |
+| | `cond-mat` | 凝聚态物理 |
+| | `quant-ph` | 量子物理 |
+| | `hep-th` | 高能理论物理 |
+| **经济/金融** | `econ.GN` | 经济学综合 |
+| | `q-fin.CP` | 计算金融 |
+| | `q-fin.ST` | 统计金融 |
+| **生物/医学** | `q-bio.NC` | 神经科学 |
+| | `q-bio.GN` | 基因组学 |
+| | `q-bio.QM` | 定量方法 |
+
+## 输出格式
+
+所有脚本输出标准 JSON：
+
+```json
+{
+  "success": true,
+  "query": "...",
+  "provider": "arxiv|semantic_scholar|pubmed|wikipedia",
+  "items": [{"title": "...", "url": "...", "snippet": "...", ...}],
+  "error": null
+}
+```
+
+`arxiv_paper.py` 和 `pmc_paper.py` 不走 `items` 格式，直接返回结构化对象（见各自"输出字段"说明）。
--- a/sn-search-academic/requirements.txt
+++ b/sn-search-academic/requirements.txt
@@ -0,0 +1,2 @@
+httpx>=0.25.0
+beautifulsoup4>=4.12.0
--- a/sn-search-academic/scripts/pycache/search_utils.cpython-311.pyc
+++ b/sn-search-academic/scripts/pycache/search_utils.cpython-311.pyc
--- a/sn-search-academic/scripts/arxiv_paper.py
+++ b/sn-search-academic/scripts/arxiv_paper.py
@@ -0,0 +1,304 @@
+#!/usr/bin/env python3
+"""
+ArXiv 论文章节阅读器。
+
+通过解析 arXiv HTML 版本（LaTeXML 转换），支持：
+  - 列出论文所有章节结构
+  - 按章节名称提取正文内容（大小写不敏感，支持部分匹配）
+
+用法：
+  python3 arxiv_paper.py 2409.05591                        # 列出章节
+  python3 arxiv_paper.py 2409.05591 --section introduction  # 读取指定章节
+  python3 arxiv_paper.py 2409.05591 --section method
+"""
+from __future__ import annotations
+
+import argparse
+import json
+import re
+import sys
+from typing import Any
+
+from search_utils import get_client, print_json
+
+BeautifulSoup: Any = None
+NavigableString: Any = None
+Tag: Any = None
+
+
+def ensure_bs4() -> None:
+    """Load BeautifulSoup only when the script needs to parse paper HTML."""
+    global BeautifulSoup, NavigableString, Tag
+    if BeautifulSoup is not None:
+        return
+
+    try:
+        from bs4 import BeautifulSoup as Bs4BeautifulSoup
+        from bs4 import NavigableString as Bs4NavigableString
+        from bs4 import Tag as Bs4Tag
+    except ImportError:
+        print_json({
+            "success": False,
+            "error": "缺少 beautifulsoup4，请运行：python3 -m pip install -r skills/sn-search-academic/requirements.txt",
+        })
+        sys.exit(1)
+
+    BeautifulSoup = Bs4BeautifulSoup
+    NavigableString = Bs4NavigableString
+    Tag = Bs4Tag
+
+HTML_BASE = "https://arxiv.org/html"
+ABS_BASE = "https://arxiv.org/abs"
+PDF_BASE = "https://arxiv.org/pdf"
+
+# ── HTML 获取 ─────────────────────────────────────────────────────────────────
+
+def fetch_html(arxiv_id: str) -> str:
+    """获取 arXiv HTML 版本，不存在时抛出有意义的错误。"""
+    url = f"{HTML_BASE}/{arxiv_id}"
+    with get_client(timeout=45, headers={"Accept": "text/html,application/xhtml+xml"}) as client:
+        resp = client.get(url)
+
+    if resp.status_code == 404:
+        raise ValueError(
+            f"论文 {arxiv_id} 暂无 HTML 版本。"
+            "可能原因：论文较老（2018 年前）、非 LaTeX 来源或尚未转换。"
+            f"请直接阅读 PDF：{PDF_BASE}/{arxiv_id}"
+        )
+    resp.raise_for_status()
+    return resp.text
+
+
+# ── 文本清洗 ──────────────────────────────────────────────────────────────────
+
+def _elem_to_text(elem: Tag) -> str:
+    """
+    将 HTML 元素转为可读文本。
+    - math 元素：优先用 LaTeX 注解，否则用 alttext，再降级为 [MATH]
+    - 图表标题：保留
+    - 跳过 .ltx_note（脚注编号）等噪音节点
+    """
+    parts: list[str] = []
+
+    for node in elem.descendants:
+        if not isinstance(node, NavigableString):
+            continue
+
+        parent = node.parent
+        if parent is None:
+            continue
+
+        tag = parent.name
+
+        # 跳过脚注编号、引用上标等噪音
+        parent_classes = parent.get("class") or []
+        if any(c in parent_classes for c in ("ltx_note_mark", "ltx_ref_tag", "ltx_tag")):
+            continue
+
+        # math 元素：取 LaTeX 注解
+        if tag == "annotation":
+            encoding = parent.get("encoding", "")
+            if "tex" in encoding.lower() or "latex" in encoding.lower():
+                latex = node.strip()
+                if latex:
+                    parts.append(f"${latex}$")
+            continue
+
+        # 跳过 math 内部的非注解文本（MathML 结构文本很乱）
+        in_math = False
+        for ancestor in parent.parents:
+            if ancestor.name == "math":
+                in_math = True
+                break
+        if in_math:
+            continue
+
+        text = str(node)
+        if text.strip():
+            parts.append(text)
+
+    raw = "".join(parts)
+    # 合并多余空白，保留段落换行
+    raw = re.sub(r"[ \t]+", " ", raw)
+    raw = re.sub(r"\n{3,}", "\n\n", raw)
+    return raw.strip()
+
+
+# ── 章节提取 ──────────────────────────────────────────────────────────────────
+
+def extract_sections(html: str) -> list[dict[str, Any]]:
+    """
+    从 arXiv HTML 提取所有章节（含摘要）。
+
+    返回列表，每项：
+      name   - 章节标题（含编号，如 "1 Introduction"）
+      level  - 层级（0=摘要, 1=h2, 2=h3）
+      text   - 正文文本
+    """
+    ensure_bs4()
+    soup = BeautifulSoup(html, "html.parser")
+    sections: list[dict[str, Any]] = []
+
+    # ── 摘要 ──
+    abstract_elem = soup.find(class_=re.compile(r"\bltx_abstract\b"))
+    if abstract_elem:
+        # 去掉 "Abstract" 标题行
+        for h in abstract_elem.find_all(["h2", "h6"], class_=re.compile(r"ltx_title")):
+            h.decompose()
+        abstract_text = _elem_to_text(abstract_elem)
+        if abstract_text:
+            sections.append({"name": "Abstract", "level": 0, "text": abstract_text})
+
+    # ── 正文各 section ──
+    for sec in soup.find_all("section", class_=re.compile(r"\bltx_section\b|\bltx_appendix\b")):
+        # 找本层标题（不要子 section 的标题）
+        heading: Tag | None = None
+        for h_tag in ["h2", "h3", "h4"]:
+            candidate = sec.find(h_tag, class_=re.compile(r"\bltx_title\b"), recursive=False)
+            if candidate:
+                heading = candidate
+                break
+
+        if heading is None:
+            # 有些 section 标题在首个 div 里
+            for h_tag in ["h2", "h3", "h4"]:
+                candidate = sec.find(h_tag, class_=re.compile(r"\bltx_title\b"))
+                if candidate:
+                    heading = candidate
+                    break
+
+        if heading is None:
+            continue
+
+        # 清理标题（去尾部 ¶ permalink、多余空白）
+        heading_text = heading.get_text(" ", strip=True).rstrip("¶").strip()
+        heading_text = re.sub(r"\s+", " ", heading_text)
+        level = {"h2": 1, "h3": 2, "h4": 3}.get(heading.name, 1)
+
+        # 提取本 section 的文本（排除子 section，避免重复）
+        sec_copy = BeautifulSoup(str(sec), "html.parser").find("section")
+        # 移除子 section
+        for child_sec in sec_copy.find_all("section", recursive=False):
+            child_sec.decompose()
+        # 移除标题自身
+        for h in sec_copy.find_all(["h2", "h3", "h4"], class_=re.compile(r"\bltx_title\b"), recursive=False):
+            h.decompose()
+
+        text = _elem_to_text(sec_copy)
+
+        if not text.strip():
+            continue
+
+        sections.append({"name": heading_text, "level": level, "text": text})
+
+    return sections
+
+
+# ── 匹配章节名 ────────────────────────────────────────────────────────────────
+
+def _match_section(sections: list[dict], query: str) -> dict | None:
+    """大小写不敏感 + 去数字前缀的模糊匹配。"""
+    q = query.lower().strip()
+
+    def clean(name: str) -> str:
+        """去掉 '1 ' / '1. ' 等数字前缀。"""
+        return re.sub(r"^\d+[\.\s]+", "", name).lower().strip()
+
+    # 精确匹配
+    for s in sections:
+        if s["name"].lower() == q or clean(s["name"]) == q:
+            return s
+
+    # 前缀 / 包含匹配
+    for s in sections:
+        if clean(s["name"]).startswith(q) or q in clean(s["name"]):
+            return s
+
+    return None
+
+
+# ── 对外接口 ──────────────────────────────────────────────────────────────────
+
+def cmd_list_sections(arxiv_id: str) -> dict[str, Any]:
+    """列出论文所有章节（不含正文）。"""
+    html = fetch_html(arxiv_id)
+    sections = extract_sections(html)
+    return {
+        "success": True,
+        "arxiv_id": arxiv_id,
+        "abs_url": f"{ABS_BASE}/{arxiv_id}",
+        "html_url": f"{HTML_BASE}/{arxiv_id}",
+        "pdf_url": f"{PDF_BASE}/{arxiv_id}",
+        "section_count": len(sections),
+        "sections": [{"name": s["name"], "level": s["level"]} for s in sections],
+        "error": None,
+    }
+
+
+def cmd_read_section(arxiv_id: str, section_name: str) -> dict[str, Any]:
+    """读取指定章节的正文内容。"""
+    html = fetch_html(arxiv_id)
+    sections = extract_sections(html)
+    matched = _match_section(sections, section_name)
+
+    if matched is None:
+        available = [s["name"] for s in sections]
+        return {
+            "success": False,
+            "arxiv_id": arxiv_id,
+            "section": section_name,
+            "content": None,
+            "error": f"未找到章节 '{section_name}'，可用章节：{available}",
+        }
+
+    return {
+        "success": True,
+        "arxiv_id": arxiv_id,
+        "abs_url": f"{ABS_BASE}/{arxiv_id}",
+        "section": matched["name"],
+        "level": matched["level"],
+        "content": matched["text"],
+        "char_count": len(matched["text"]),
+        "error": None,
+    }
+
+
+# ── CLI ───────────────────────────────────────────────────────────────────────
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="ArXiv 论文章节阅读器",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+示例：
+  python3 arxiv_paper.py 2409.05591                          列出所有章节
+  python3 arxiv_paper.py 2409.05591 --section introduction   读取 Introduction
+  python3 arxiv_paper.py 2409.05591 --section method         读取 Method/Methods
+  python3 arxiv_paper.py 2409.05591 --section conclusion     读取 Conclusion
+""",
+    )
+    parser.add_argument("arxiv_id", help="arXiv 论文 ID（如 2409.05591 或 2409.05591v2）")
+    parser.add_argument(
+        "--section", "-s",
+        metavar="SECTION_NAME",
+        help="要读取的章节名（大小写不敏感，支持部分匹配）。不指定则列出所有章节。",
+    )
+    args = parser.parse_args()
+
+    try:
+        if args.section:
+            result = cmd_read_section(args.arxiv_id.strip(), args.section.strip())
+        else:
+            result = cmd_list_sections(args.arxiv_id.strip())
+        print_json(result)
+    except Exception as e:
+        print_json({
+            "success": False,
+            "arxiv_id": args.arxiv_id,
+            "error": str(e),
+        })
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/sn-search-academic/scripts/arxiv_search.py
+++ b/sn-search-academic/scripts/arxiv_search.py
@@ -0,0 +1,239 @@
+#!/usr/bin/env python3
+"""
+ArXiv 论文搜索。通过 ArXiv API（返回 Atom XML）。
+
+支持：
+  - 全文 / 标题 / 摘要 / 作者字段搜索
+  - 分类过滤、排序
+  - 按 ID 列表直接拉取论文元数据
+  - 布尔组合查询（AND / OR / ANDNOT）
+
+示例：
+  python3 arxiv_search.py "attention mechanism"
+  python3 arxiv_search.py "transformer" --category cs.CL --sort date
+  python3 arxiv_search.py "diffusion model" --author "ho jonathan"
+  python3 arxiv_search.py "ViT" --title-only
+  python3 arxiv_search.py --id-list 2409.05591,2301.00001
+"""
+from __future__ import annotations
+
+import sys
+import xml.etree.ElementTree as ET
+
+from search_utils import build_parser, get_client, make_item, make_result, print_json
+
+API_URL = "https://export.arxiv.org/api/query"
+
+# Atom XML 命名空间
+NS = {
+    "atom": "http://www.w3.org/2005/Atom",
+    "arxiv": "http://arxiv.org/schemas/atom",
+}
+
+
+def build_search_query(
+    query: str,
+    category: str | None = None,
+    author: str | None = None,
+    title_only: bool = False,
+) -> str:
+    """
+    构建 arXiv 查询字符串。
+
+    字段前缀：
+      all:  全字段（默认）
+      ti:   仅标题
+      au:   作者（支持通配 au:smi*）
+      abs:  摘要
+      cat:  分类
+    布尔运算符必须大写：AND / OR / ANDNOT
+    """
+    # 主查询字段
+    field = "ti" if title_only else "all"
+    parts = [f"{field}:{query}"]
+
+    if author:
+        # 多个作者用 OR 连接，支持 "lastname firstname" 格式
+        author_terms = [f"au:{a.strip()}" for a in author.split(",") if a.strip()]
+        if author_terms:
+            parts.append(f"({' OR '.join(author_terms)})")
+
+    if category:
+        parts.append(f"cat:{category}")
+
+    return " AND ".join(parts)
+
+
+def fetch_by_ids(id_list: list[str], limit: int) -> list[dict]:
+    """通过 ID 列表直接获取论文元数据（不做文本搜索）。"""
+    params = {
+        "id_list": ",".join(id_list[:limit]),
+        "max_results": min(len(id_list), limit, 100),
+    }
+    with get_client(timeout=30, headers={"Accept": "application/xml"}) as client:
+        resp = client.get(API_URL, params=params)
+        resp.raise_for_status()
+    return _parse_entries(ET.fromstring(resp.text), limit)
+
+
+def search(
+    query: str,
+    limit: int,
+    category: str | None = None,
+    sort_by: str = "relevance",
+    author: str | None = None,
+    title_only: bool = False,
+) -> list[dict]:
+    """执行 ArXiv 关键词搜索。"""
+    search_query = build_search_query(query, category, author, title_only)
+
+    sort_map = {
+        "relevance": "relevance",
+        "date": "lastUpdatedDate",
+        "submitted": "submittedDate",
+    }
+
+    params = {
+        "search_query": search_query,
+        "start": 0,
+        "max_results": min(limit, 100),
+        "sortBy": sort_map.get(sort_by, "relevance"),
+        "sortOrder": "descending",
+    }
+
+    with get_client(timeout=30, headers={"Accept": "application/xml"}) as client:
+        resp = client.get(API_URL, params=params)
+        resp.raise_for_status()
+
+    return _parse_entries(ET.fromstring(resp.text), limit)
+
+
+def _parse_entries(root: ET.Element, limit: int) -> list[dict]:
+    """从 Atom XML 解析论文条目。"""
+    items = []
+
+    for entry in root.findall("atom:entry", NS)[:limit]:
+        title = _text(entry, "atom:title").replace("\n", " ").strip()
+        summary = _text(entry, "atom:summary").replace("\n", " ").strip()
+        published = _text(entry, "atom:published")
+        updated = _text(entry, "atom:updated")
+
+        # 获取论文链接（优先 abs 页面）
+        url = ""
+        pdf_url = ""
+        for link in entry.findall("atom:link", NS):
+            href = link.get("href", "")
+            if link.get("title") == "pdf":
+                pdf_url = href
+            elif link.get("type") == "text/html" or "/abs/" in href:
+                url = href
+        if not url:
+            url = _text(entry, "atom:id")
+
+        # 从 abs URL 或 id 提取 arxiv_id
+        arxiv_id = ""
+        raw_id = _text(entry, "atom:id")
+        if "/abs/" in raw_id:
+            arxiv_id = raw_id.split("/abs/")[-1]
+        elif raw_id.startswith("http"):
+            arxiv_id = raw_id.split("/")[-1]
+
+        # 获取作者
+        authors = [_text(a, "atom:name") for a in entry.findall("atom:author", NS)]
+
+        # 获取分类
+        categories = [c.get("term", "") for c in entry.findall("atom:category", NS)]
+
+        comment = _text(entry, "arxiv:comment")
+        journal_ref = _text(entry, "arxiv:journal_ref")
+        doi = _text(entry, "arxiv:doi")
+        primary_category = entry.find("arxiv:primary_category", NS)
+        primary_cat = primary_category.get("term", "") if primary_category is not None else ""
+
+        # HTML 版本链接（较新论文有）
+        html_url = f"https://arxiv.org/html/{arxiv_id}" if arxiv_id else None
+
+        items.append(make_item(
+            title=title,
+            url=url,
+            snippet=summary,
+            arxiv_id=arxiv_id if arxiv_id else None,
+            authors=authors,
+            published=published,
+            updated=updated,
+            pdf_url=pdf_url,
+            html_url=html_url,
+            categories=categories,
+            primary_category=primary_cat if primary_cat else None,
+            comment=comment if comment else None,
+            journal_ref=journal_ref if journal_ref else None,
+            doi=doi if doi else None,
+        ))
+
+    return items
+
+
+def _text(elem: ET.Element, tag: str) -> str:
+    """安全获取子元素文本。"""
+    child = elem.find(tag, NS)
+    return child.text.strip() if child is not None and child.text else ""
+
+
+def main():
+    parser = build_parser("搜索 ArXiv 学术论文")
+    parser.add_argument("--category", "-c", help="ArXiv 分类过滤（如 cs.AI, cs.CL, math.CO）")
+    parser.add_argument(
+        "--sort", default="relevance",
+        choices=["relevance", "date", "submitted"],
+        help="排序方式（默认 relevance）",
+    )
+    parser.add_argument(
+        "--author", "-a",
+        help="按作者过滤（如 'hinton'，多个作者用逗号分隔）",
+    )
+    parser.add_argument(
+        "--title-only", action="store_true",
+        help="仅在标题中搜索（默认搜索全字段）",
+    )
+    parser.add_argument(
+        "--id-list",
+        help="直接按 arXiv ID 获取元数据，逗号分隔（如 2409.05591,2301.00001）。指定此项时 query 参数可留空。",
+    )
+    # 当使用 --id-list 时 query 可选
+    parser.prog = "arxiv_search.py"
+
+    # 为了支持 --id-list 时 query 可省略，临时让 query 可选
+    for action in parser._positionals._group_actions:
+        if action.dest == "query":
+            action.nargs = "?"
+            action.default = ""
+            break
+
+    args = parser.parse_args()
+
+    try:
+        if args.id_list:
+            id_list = [i.strip() for i in args.id_list.split(",") if i.strip()]
+            items = fetch_by_ids(id_list, args.limit)
+            query_str = f"id_list:{args.id_list}"
+        else:
+            if not args.query:
+                parser.error("请提供搜索关键词，或使用 --id-list 按 ID 查询")
+            items = search(
+                args.query,
+                args.limit,
+                category=args.category,
+                sort_by=args.sort,
+                author=args.author,
+                title_only=args.title_only,
+            )
+            query_str = args.query
+
+        print_json(make_result(True, query_str, "arxiv", items))
+    except Exception as e:
+        print_json(make_result(False, getattr(args, "query", "") or "", "arxiv", [], str(e)))
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/sn-search-academic/scripts/pmc_paper.py
+++ b/sn-search-academic/scripts/pmc_paper.py
@@ -0,0 +1,454 @@
+#!/usr/bin/env python3
+"""
+PMC 论文全文章节阅读器。
+
+通过 NCBI E-utilities 获取 PubMed Central 全文 XML（JATS 格式），支持：
+  - 列出论文所有章节结构（含子章节层级）
+  - 按章节名称提取正文内容（大小写不敏感，支持部分匹配）
+  - 通过 PMID 自动解析到 PMC ID
+
+用法：
+  python3 pmc_paper.py PMC11119143                          # 列出章节
+  python3 pmc_paper.py 11119143                             # 同上（自动补 PMC 前缀）
+  python3 pmc_paper.py PMC11119143 --section introduction   # 读取指定章节
+  python3 pmc_paper.py --pmid 38786024 --section method     # 从 PMID 出发
+"""
+from __future__ import annotations
+
+import argparse
+import re
+import sys
+import xml.etree.ElementTree as ET
+from typing import Any
+
+from search_utils import get_client, print_json
+
+EFETCH_URL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
+ELINK_URL  = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi"
+
+# ── ID 处理 ───────────────────────────────────────────────────────────────────
+
+def normalize_pmc_id(raw: str) -> str:
+    """规范化 PMC ID：去掉 'PMC' 前缀，只保留数字部分。"""
+    return re.sub(r"^[Pp][Mm][Cc]", "", raw.strip())
+
+
+def pmid_to_pmc(pmid: str, api_key: str | None = None) -> str | None:
+    """通过 elink 将 PMID 转换为 PMC ID（数字形式）。"""
+    params: dict[str, Any] = {
+        "dbfrom": "pubmed",
+        "db": "pmc",
+        "id": pmid,
+        "retmode": "json",
+    }
+    if api_key:
+        params["api_key"] = api_key
+
+    with get_client(timeout=20) as client:
+        resp = client.get(ELINK_URL, params=params)
+        resp.raise_for_status()
+
+    data = resp.json()
+    for linkset in data.get("linksets", []):
+        for db in linkset.get("linksetdbs", []):
+            if db.get("dbto") == "pmc" and db.get("linkname") == "pubmed_pmc":
+                links = db.get("links", [])
+                if links:
+                    return str(links[0])
+    return None
+
+
+# ── XML 拉取 ──────────────────────────────────────────────────────────────────
+
+def fetch_pmc_xml(pmc_num: str, api_key: str | None = None) -> ET.Element:
+    """获取 PMC 全文 XML，返回根元素。"""
+    params: dict[str, Any] = {
+        "db": "pmc",
+        "id": pmc_num,
+        "rettype": "xml",
+        "retmode": "xml",
+    }
+    if api_key:
+        params["api_key"] = api_key
+
+    with get_client(timeout=45) as client:
+        resp = client.get(EFETCH_URL, params=params)
+        resp.raise_for_status()
+
+    root = ET.fromstring(resp.text)
+
+    # 检查是否找到论文
+    article = root.find(".//article")
+    if article is None:
+        raise ValueError(
+            f"PMC{pmc_num} 未找到全文。"
+            "可能原因：该论文不在 PMC 开放获取库中，或 ID 有误。"
+        )
+    return root
+
+
+# ── JATS XML 文本提取 ─────────────────────────────────────────────────────────
+
+# 跳过这些标签的全部内容（噪音节点）
+_SKIP_TAGS = {"ref", "ref-list", "fn", "fn-group", "permissions", "author-notes",
+              "glossary", "ack"}  # ack=Acknowledgements，可按需保留
+
+# 转为占位符的标签
+_FORMULA_TAGS = {"disp-formula", "inline-formula", "mml:math", "tex-math"}
+
+
+def _elem_to_text(elem: ET.Element, depth: int = 0) -> str:
+    """
+    将 JATS XML 元素递归转为可读文本。
+
+    处理规则：
+    - <p>: 段落，末尾加换行
+    - <title>: 跳过（章节标题在上层已处理）
+    - <sec>: 子章节，递归（用缩进区分层级）
+    - <list>/<list-item>: 转为 bullet 列表
+    - <disp-formula>/<inline-formula>: 替换为 [FORMULA]
+    - <fig>: 跳过图像内容，保留 caption
+    - <table-wrap>: 保留 label+caption
+    - <xref>/<ext-link>: 直接取文本内容
+    - <bold>/<italic>/<underline>: 取文本内容
+    """
+    tag = elem.tag.split("}")[-1] if "}" in elem.tag else elem.tag  # 去 namespace
+
+    if tag in _SKIP_TAGS:
+        return ""
+
+    if tag in _FORMULA_TAGS:
+        return " [FORMULA] "
+
+    if tag == "title":
+        return ""  # 由调用方处理
+
+    if tag == "p":
+        text = _collect_text(elem)
+        return text.strip() + "\n\n" if text.strip() else ""
+
+    if tag in ("bold", "italic", "underline", "named-content", "styled-content",
+               "ext-link", "uri", "xref", "sup", "sub", "monospace"):
+        return _collect_text(elem)
+
+    if tag == "list":
+        parts = []
+        for li in elem.findall("list-item"):
+            item_text = "".join(_elem_to_text(c) for c in li).strip()
+            if item_text:
+                parts.append(f"• {item_text}")
+        return "\n".join(parts) + "\n\n" if parts else ""
+
+    if tag == "disp-quote":
+        text = "".join(_elem_to_text(c) for c in elem).strip()
+        return f"> {text}\n\n" if text else ""
+
+    if tag == "fig":
+        # 只保留 caption
+        caption = elem.find(".//caption")
+        if caption is not None:
+            cap_text = "".join(_elem_to_text(c) for c in caption).strip()
+            label = elem.findtext("label", "Figure")
+            return f"[{label}: {cap_text}]\n\n" if cap_text else ""
+        return ""
+
+    if tag == "table-wrap":
+        label = elem.findtext("label", "Table")
+        caption = elem.find(".//caption")
+        cap_text = ""
+        if caption is not None:
+            cap_text = "".join(_elem_to_text(c) for c in caption).strip()
+        return f"[{label}: {cap_text}]\n\n" if cap_text else f"[{label}]\n\n"
+
+    if tag == "sec":
+        # 子章节：递归处理，标题加缩进
+        sub_title_elem = elem.find("title")
+        sub_title = ""
+        if sub_title_elem is not None:
+            sub_title = _collect_text(sub_title_elem).strip()
+
+        parts = []
+        if sub_title:
+            indent = "  " * depth
+            parts.append(f"\n{indent}### {sub_title}\n\n")
+        for child in elem:
+            child_tag = child.tag.split("}")[-1] if "}" in child.tag else child.tag
+            if child_tag == "title":
+                continue
+            parts.append(_elem_to_text(child, depth + 1))
+        return "".join(parts)
+
+    # 默认：递归子节点
+    return "".join(_elem_to_text(c, depth) for c in elem)
+
+
+def _collect_text(elem: ET.Element) -> str:
+    """收集元素的所有文本（含子节点，跳过公式）。"""
+    parts = []
+    if elem.text:
+        parts.append(elem.text)
+    for child in elem:
+        child_tag = child.tag.split("}")[-1] if "}" in child.tag else child.tag
+        if child_tag in _FORMULA_TAGS:
+            parts.append("[FORMULA]")
+        elif child_tag in _SKIP_TAGS:
+            pass
+        else:
+            parts.append(_collect_text(child))
+        if child.tail:
+            parts.append(child.tail)
+    return "".join(parts)
+
+
+# ── 章节提取 ──────────────────────────────────────────────────────────────────
+
+def _extract_sections_from(container: ET.Element, level: int = 1) -> list[dict[str, Any]]:
+    """递归提取 sec 节点，返回扁平章节列表。"""
+    sections: list[dict[str, Any]] = []
+    for sec in container.findall("sec"):
+        title_elem = sec.find("title")
+        title = _collect_text(title_elem).strip() if title_elem is not None else f"Section {len(sections)+1}"
+
+        # 正文：本 sec 的直接子节点（排除 sec 和 title）
+        text_parts = []
+        for child in sec:
+            child_tag = child.tag.split("}")[-1] if "}" in child.tag else child.tag
+            if child_tag in ("title", "sec"):
+                continue
+            text_parts.append(_elem_to_text(child))
+
+        text = "".join(text_parts).strip()
+
+        # 子章节递归
+        subsections = _extract_sections_from(sec, level + 1)
+
+        sections.append({
+            "name": title,
+            "level": level,
+            "text": text,
+            "subsections": subsections,
+        })
+    return sections
+
+
+def extract_all_sections(root: ET.Element) -> list[dict[str, Any]]:
+    """
+    从 PMC JATS XML 提取所有章节。
+    顺序：Abstract → Body sections（含子章节）
+    """
+    sections: list[dict[str, Any]] = []
+
+    article = root.find(".//article")
+    if article is None:
+        return sections
+
+    # ── 摘要 ──
+    abstract = article.find(".//abstract")
+    if abstract is not None:
+        # 结构化摘要（含 sec）
+        if abstract.findall("sec"):
+            abs_parts = []
+            for sec in abstract.findall("sec"):
+                sec_title = sec.findtext("title", "")
+                sec_text_parts = []
+                for child in sec:
+                    if child.tag != "title":
+                        sec_text_parts.append(_elem_to_text(child))
+                part = "".join(sec_text_parts).strip()
+                if sec_title:
+                    abs_parts.append(f"{sec_title}: {part}")
+                else:
+                    abs_parts.append(part)
+            abs_text = "\n\n".join(abs_parts)
+        else:
+            abs_text = "".join(_elem_to_text(c) for c in abstract).strip()
+
+        if abs_text:
+            sections.append({"name": "Abstract", "level": 0, "text": abs_text, "subsections": []})
+
+    # ── Body ──
+    body = article.find(".//body")
+    if body is not None:
+        sections.extend(_extract_sections_from(body, level=1))
+
+    return sections
+
+
+# ── 章节匹配 ──────────────────────────────────────────────────────────────────
+
+def _flatten_sections(sections: list[dict], result: list | None = None) -> list[dict]:
+    """将嵌套章节扁平化，便于搜索。"""
+    if result is None:
+        result = []
+    for s in sections:
+        result.append(s)
+        _flatten_sections(s.get("subsections", []), result)
+    return result
+
+
+def match_section(sections: list[dict], query: str) -> dict | None:
+    """大小写不敏感 + 去数字前缀的模糊匹配（搜索所有层级）。"""
+    q = query.lower().strip()
+    flat = _flatten_sections(sections)
+
+    def clean(name: str) -> str:
+        return re.sub(r"^\d+[\.\s]+", "", name).lower().strip()
+
+    # 精确匹配
+    for s in flat:
+        if s["name"].lower() == q or clean(s["name"]) == q:
+            return s
+
+    # 包含/前缀匹配
+    for s in flat:
+        c = clean(s["name"])
+        if c.startswith(q) or q in c:
+            return s
+
+    return None
+
+
+# ── 对外接口 ──────────────────────────────────────────────────────────────────
+
+def _section_outline(sections: list[dict], depth: int = 0) -> list[dict]:
+    """生成章节目录（只含 name 和 level，递归）。"""
+    outline = []
+    for s in sections:
+        outline.append({"name": s["name"], "level": s["level"]})
+        if s.get("subsections"):
+            outline.extend(_section_outline(s["subsections"], depth + 1))
+    return outline
+
+
+def cmd_list_sections(pmc_num: str, api_key: str | None = None) -> dict[str, Any]:
+    """列出 PMC 论文所有章节目录。"""
+    root = fetch_pmc_xml(pmc_num, api_key)
+    sections = extract_all_sections(root)
+
+    # 从 XML 拿标题
+    title = root.findtext(".//article-title", "")
+    pmid = root.findtext(".//article-id[@pub-id-type='pmid']", "")
+
+    return {
+        "success": True,
+        "pmc_id": f"PMC{pmc_num}",
+        "pmid": pmid or None,
+        "title": title,
+        "pmc_url": f"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC{pmc_num}/",
+        "section_count": len(_flatten_sections(sections)),
+        "sections": _section_outline(sections),
+        "error": None,
+    }
+
+
+def cmd_read_section(pmc_num: str, section_name: str, api_key: str | None = None) -> dict[str, Any]:
+    """读取指定章节的正文内容（含子章节文本）。"""
+    root = fetch_pmc_xml(pmc_num, api_key)
+    sections = extract_all_sections(root)
+    matched = match_section(sections, section_name)
+
+    if matched is None:
+        flat = _flatten_sections(sections)
+        available = [s["name"] for s in flat]
+        return {
+            "success": False,
+            "pmc_id": f"PMC{pmc_num}",
+            "section": section_name,
+            "content": None,
+            "error": f"未找到章节 '{section_name}'，可用章节：{available}",
+        }
+
+    # 合并本节文本 + 子章节文本
+    def collect_text(s: dict) -> str:
+        parts = [s["text"]]
+        for sub in s.get("subsections", []):
+            sub_text = collect_text(sub)
+            if sub_text.strip():
+                parts.append(f"\n### {sub['name']}\n\n{sub_text}")
+        return "\n\n".join(p for p in parts if p.strip())
+
+    content = collect_text(matched)
+
+    return {
+        "success": True,
+        "pmc_id": f"PMC{pmc_num}",
+        "pmc_url": f"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC{pmc_num}/",
+        "section": matched["name"],
+        "level": matched["level"],
+        "content": content,
+        "char_count": len(content),
+        "error": None,
+    }
+
+
+# ── CLI ───────────────────────────────────────────────────────────────────────
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="PMC 论文全文章节阅读器",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+示例：
+  python3 pmc_paper.py PMC11119143                           列出所有章节
+  python3 pmc_paper.py 11119143                              同上（自动补前缀）
+  python3 pmc_paper.py PMC11119143 --section introduction    读取 Introduction
+  python3 pmc_paper.py PMC11119143 --section method          读取 Methods
+  python3 pmc_paper.py --pmid 38786024                       从 PMID 列章节
+  python3 pmc_paper.py --pmid 38786024 --section conclusion  从 PMID 读章节
+""",
+    )
+    parser.add_argument(
+        "pmc_id", nargs="?",
+        help="PMC ID（如 PMC11119143 或 11119143）。与 --pmid 二选一。",
+    )
+    parser.add_argument(
+        "--pmid",
+        help="PubMed ID，自动转换为 PMC ID（需要论文在 PMC 开放获取库中）",
+    )
+    parser.add_argument(
+        "--section", "-s",
+        metavar="SECTION_NAME",
+        help="要读取的章节名（大小写不敏感，支持部分匹配）。不指定则列出所有章节。",
+    )
+    parser.add_argument(
+        "--api-key",
+        help="NCBI API Key（可选，提升限额从 3 req/s 到 10 req/s）",
+    )
+    args = parser.parse_args()
+
+    api_key = getattr(args, "api_key", None)
+
+    try:
+        # 解析 PMC 数字 ID
+        if args.pmid:
+            pmc_num = pmid_to_pmc(args.pmid, api_key)
+            if not pmc_num:
+                print_json({
+                    "success": False,
+                    "pmid": args.pmid,
+                    "error": f"PMID {args.pmid} 在 PMC 中无对应全文。该论文可能未开放获取。",
+                })
+                sys.exit(1)
+        elif args.pmc_id:
+            pmc_num = normalize_pmc_id(args.pmc_id)
+        else:
+            parser.error("请提供 PMC ID 或使用 --pmid 指定 PubMed ID")
+
+        if args.section:
+            result = cmd_read_section(pmc_num, args.section.strip(), api_key)
+        else:
+            result = cmd_list_sections(pmc_num, api_key)
+
+        print_json(result)
+
+    except Exception as e:
+        print_json({
+            "success": False,
+            "pmc_id": f"PMC{pmc_num}" if "pmc_num" in dir() else None,
+            "error": str(e),
+        })
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/sn-search-academic/scripts/pubmed_search.py
+++ b/sn-search-academic/scripts/pubmed_search.py
@@ -0,0 +1,165 @@
+#!/usr/bin/env python3
+"""PubMed 生物医学文献搜索。通过 NCBI E-utilities API。"""
+from __future__ import annotations
+
+import sys
+import xml.etree.ElementTree as ET
+
+from search_utils import build_parser, get_client, make_item, make_result, print_json
+
+ESEARCH_URL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
+EFETCH_URL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
+
+
+def search(query: str, limit: int, api_key: str | None = None) -> list[dict]:
+    """执行 PubMed 搜索（两步：esearch 获取 PMID，efetch 获取完整记录含摘要）。"""
+    base_params: dict = {"api_key": api_key} if api_key else {}
+
+    # Step 1: esearch 获取 PMID 列表
+    with get_client(timeout=30) as client:
+        resp = client.get(ESEARCH_URL, params={
+            **base_params,
+            "db": "pubmed",
+            "term": query,
+            "retmax": min(limit, 100),
+            "retmode": "json",
+            "sort": "relevance",
+        })
+        resp.raise_for_status()
+        pmids = resp.json().get("esearchresult", {}).get("idlist", [])
+
+    if not pmids:
+        return []
+
+    # Step 2: efetch 获取完整 XML 记录（含摘要）
+    with get_client(timeout=30) as client:
+        resp = client.get(EFETCH_URL, params={
+            **base_params,
+            "db": "pubmed",
+            "id": ",".join(pmids[:limit]),
+            "rettype": "xml",
+            "retmode": "xml",
+        })
+        resp.raise_for_status()
+
+    root = ET.fromstring(resp.text)
+    items = []
+
+    for article in root.findall(".//PubmedArticle"):
+        medline = article.find("MedlineCitation")
+        if medline is None:
+            continue
+
+        pmid_elem = medline.find("PMID")
+        pmid = pmid_elem.text if pmid_elem is not None else ""
+
+        article_data = medline.find("Article")
+        if article_data is None:
+            continue
+
+        # 标题
+        title_elem = article_data.find("ArticleTitle")
+        title = "".join(title_elem.itertext()) if title_elem is not None else ""
+
+        # 摘要（支持结构化摘要，如 BACKGROUND/METHODS/RESULTS/CONCLUSIONS）
+        abstract_parts = []
+        abstract_elem = article_data.find("Abstract")
+        if abstract_elem is not None:
+            for ab in abstract_elem.findall("AbstractText"):
+                label = ab.get("Label")
+                text = "".join(ab.itertext()).strip()
+                if label:
+                    abstract_parts.append(f"{label}: {text}")
+                else:
+                    abstract_parts.append(text)
+        abstract = " ".join(abstract_parts)
+
+        # 作者
+        authors = []
+        author_list = article_data.find("AuthorList")
+        if author_list is not None:
+            for author in author_list.findall("Author"):
+                last = author.findtext("LastName", "")
+                fore = author.findtext("ForeName", "")
+                name = f"{fore} {last}".strip() if fore else last
+                if name:
+                    authors.append(name)
+
+        # 期刊信息
+        journal = article_data.find("Journal")
+        journal_name = ""
+        pub_date = ""
+        volume = ""
+        issue = ""
+        if journal is not None:
+            journal_name = journal.findtext("Title", "") or journal.findtext("ISOAbbreviation", "")
+            ji = journal.find("JournalIssue")
+            if ji is not None:
+                volume = ji.findtext("Volume", "")
+                issue = ji.findtext("Issue", "")
+                pd = ji.find("PubDate")
+                if pd is not None:
+                    year = pd.findtext("Year", "")
+                    month = pd.findtext("Month", "")
+                    day = pd.findtext("Day", "")
+                    pub_date = " ".join(filter(None, [year, month, day]))
+
+        # 页码
+        pages = article_data.findtext(".//MedlinePgn", "")
+
+        # DOI 和 PMC ID（从 ArticleIdList 提取）
+        doi = None
+        pmc_id = None
+        for id_elem in article.findall(".//ArticleId"):
+            id_type = id_elem.get("IdType", "")
+            if id_type == "doi":
+                doi = id_elem.text
+            elif id_type == "pmc" and id_elem.text:
+                # 规范化：去掉 "PMC" 前缀，只保留数字
+                pmc_id = id_elem.text.lstrip("PMCpmc").strip() or id_elem.text
+
+        # MeSH 关键词
+        keywords = [kw.text for kw in medline.findall(".//Keyword") if kw.text]
+
+        # 文献类型
+        pub_types = [pt.text for pt in article_data.findall(".//PublicationType") if pt.text]
+
+        url = f"https://pubmed.ncbi.nlm.nih.gov/{pmid}/"
+        pmc_url = f"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC{pmc_id}/" if pmc_id else None
+
+        items.append(make_item(
+            title=title,
+            url=url,
+            snippet=abstract,
+            authors=authors,
+            pmid=pmid,
+            pmc_id=f"PMC{pmc_id}" if pmc_id else None,
+            pmc_url=pmc_url,
+            journal=journal_name if journal_name else None,
+            pub_date=pub_date if pub_date else None,
+            volume=volume if volume else None,
+            issue=issue if issue else None,
+            pages=pages if pages else None,
+            keywords=keywords if keywords else None,
+            pub_types=pub_types if pub_types else None,
+            doi=doi,
+        ))
+
+    return items
+
+
+def main():
+    parser = build_parser("搜索 PubMed 生物医学文献")
+    parser.add_argument("--api-key", help="NCBI API Key（可选，限额从 3 req/s 提升至 10 req/s）")
+    args = parser.parse_args()
+
+    try:
+        items = search(args.query, args.limit, getattr(args, "api_key", None))
+        print_json(make_result(True, args.query, "pubmed", items))
+    except Exception as e:
+        print_json(make_result(False, args.query, "pubmed", [], str(e)))
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/sn-search-academic/scripts/search_utils.py
+++ b/sn-search-academic/scripts/search_utils.py
@@ -0,0 +1,150 @@
+"""
+搜索 Skill 共享工具库。
+
+提供标准 JSON 输出、CLI 脚手架、httpx helper 和配置读取。
+所有搜索脚本通过 sys.path 导入此模块。
+"""
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import sys
+from typing import Any
+
+try:
+    import httpx
+except ImportError:
+    json.dump(
+        {
+            "success": False,
+            "error": "缺少 httpx，请运行：python3 -m pip install -r skills/sn-search-academic/requirements.txt",
+        },
+        sys.stdout,
+        ensure_ascii=False,
+    )
+    sys.stdout.write("\n")
+    sys.exit(1)
+
+# ---------------------------------------------------------------------------
+# 标准输出
+# ---------------------------------------------------------------------------
+
+def make_result(
+    success: bool,
+    query: str,
+    provider: str,
+    items: list[dict[str, Any]],
+    error: str | None = None,
+) -> dict[str, Any]:
+    """构造标准化的搜索结果。"""
+    return {
+        "success": success,
+        "query": query,
+        "provider": provider,
+        "items": items,
+        "error": error,
+    }
+
+
+def make_item(
+    title: str,
+    url: str,
+    snippet: str = "",
+    **extra: Any,
+) -> dict[str, Any]:
+    """构造标准化的搜索结果条目。"""
+    item: dict[str, Any] = {"title": title, "url": url, "snippet": snippet}
+    for k, v in extra.items():
+        if v not in (None, "", [], {}):
+            item[k] = v
+    return item
+
+
+def print_json(data: dict[str, Any]) -> None:
+    """将结果 JSON 输出到 stdout。"""
+    json.dump(data, sys.stdout, ensure_ascii=False, indent=2)
+    sys.stdout.write("\n")
+    sys.stdout.flush()
+
+
+# ---------------------------------------------------------------------------
+# CLI 脚手架
+# ---------------------------------------------------------------------------
+
+def build_parser(description: str) -> argparse.ArgumentParser:
+    """创建带有通用参数的 ArgumentParser。"""
+    parser = argparse.ArgumentParser(description=description)
+    parser.add_argument("query", help="搜索关键词")
+    parser.add_argument("--limit", "-n", type=int, default=10, help="返回结果数量（默认 10）")
+    return parser
+
+
+# ---------------------------------------------------------------------------
+# httpx helper
+# ---------------------------------------------------------------------------
+
+_DEFAULT_TIMEOUT = 15
+_DEFAULT_UA = (
+    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
+    "AppleWebKit/537.36 (KHTML, like Gecko) "
+    "Chrome/125.0.0.0 Safari/537.36"
+)
+
+
+def get_client(
+    timeout: int = _DEFAULT_TIMEOUT,
+    headers: dict[str, str] | None = None,
+    **kwargs: Any,
+) -> httpx.Client:
+    """返回预配置的 httpx.Client。"""
+    default_headers = {
+        "User-Agent": _DEFAULT_UA,
+        "Accept": "application/json",
+    }
+    if headers:
+        default_headers.update(headers)
+    return httpx.Client(
+        timeout=timeout,
+        headers=default_headers,
+        follow_redirects=True,
+        **kwargs,
+    )
+
+
+# ---------------------------------------------------------------------------
+# 配置读取
+# ---------------------------------------------------------------------------
+
+def get_key(env_var: str, cli_arg: str | None = None) -> str | None:
+    """读取 API key：CLI 参数 > 环境变量。"""
+    if cli_arg:
+        return cli_arg
+    return os.environ.get(env_var)
+
+
+# ---------------------------------------------------------------------------
+# 脚本入口辅助
+# ---------------------------------------------------------------------------
+
+def run_search(
+    provider: str,
+    search_fn,  # Callable[[str, int, ...], list[dict]]
+    parser: argparse.ArgumentParser | None = None,
+    extra_kwargs_fn=None,  # Callable[[Namespace], dict] 从 args 提取额外参数
+) -> None:
+    """通用脚本入口：解析参数 → 执行搜索 → 输出 JSON。"""
+    if parser is None:
+        parser = build_parser(f"Search {provider}")
+    args = parser.parse_args()
+
+    extra = {}
+    if extra_kwargs_fn:
+        extra = extra_kwargs_fn(args)
+
+    try:
+        items = search_fn(args.query, args.limit, **extra)
+        print_json(make_result(True, args.query, provider, items))
+    except Exception as e:
+        print_json(make_result(False, args.query, provider, [], str(e)))
+        sys.exit(1)
--- a/sn-search-academic/scripts/semantic_scholar_refs.py
+++ b/sn-search-academic/scripts/semantic_scholar_refs.py
@@ -0,0 +1,238 @@
+#!/usr/bin/env python3
+"""Semantic Scholar 引用追溯：查询论文的参考文献（backward）和被引论文（forward）。"""
+from __future__ import annotations
+
+import argparse
+import sys
+
+from search_utils import get_client, make_item, print_json
+
+API_BASE = "https://api.semanticscholar.org/graph/v1/paper"
+
+# paper-level fields（嵌套在 citedPaper/citingPaper 下）
+# 注意: tldr 在 nested 请求中容易触发 rate limit，不请求
+PAPER_FIELDS = [
+    "title", "abstract", "year", "venue", "publicationDate",
+    "authors", "citationCount", "influentialCitationCount",
+    "isOpenAccess", "openAccessPdf", "externalIds", "fieldsOfStudy",
+]
+
+# edge-level fields（引用关系本身的属性）
+EDGE_FIELDS = ["contexts", "intents"]
+
+
+def resolve_paper_id(identifier: str) -> str:
+    """将各种论文标识符转为 Semantic Scholar 可接受的格式。
+
+    支持:
+      - Semantic Scholar paper ID (40-char hex)
+      - DOI: 10.xxxx/... → DOI:10.xxxx/...
+      - ArXiv ID: 2301.07041 → ARXIV:2301.07041
+      - PubMed ID: PMID:12345678
+      - URL: https://www.semanticscholar.org/paper/... → 提取 ID
+    """
+    identifier = identifier.strip()
+
+    # S2 URL
+    if "semanticscholar.org/paper/" in identifier:
+        # URL 末尾的 40-char hex
+        parts = identifier.rstrip("/").split("/")
+        return parts[-1]
+
+    # DOI
+    if identifier.startswith("10."):
+        return f"DOI:{identifier}"
+    if identifier.lower().startswith("doi:"):
+        return identifier
+
+    # ArXiv
+    if identifier.lower().startswith("arxiv:"):
+        return identifier.upper()
+    # 形如 2301.07041 或 2301.07041v2
+    if "." in identifier and identifier.replace(".", "").replace("v", "").isdigit():
+        return f"ARXIV:{identifier}"
+
+    # PMID
+    if identifier.lower().startswith("pmid:"):
+        return identifier.upper()
+
+    # 假设是 S2 paper ID
+    return identifier
+
+
+def fetch_refs(
+    paper_id: str,
+    direction: str,
+    limit: int,
+    min_citations: int,
+    year_min: int | None,
+    year_max: int | None,
+    api_key: str | None = None,
+) -> dict:
+    """获取论文的 references 或 citations。"""
+    resolved = resolve_paper_id(paper_id)
+    endpoint = f"{API_BASE}/{resolved}/{direction}"
+
+    headers: dict[str, str] = {}
+    if api_key:
+        headers["x-api-key"] = api_key
+
+    # S2 API 单次最多 1000，分页用 offset
+    # S2 references/citations 端点：paper fields 用 nested 前缀，edge fields 直接列出
+    # 格式: fields=contexts,intents,citedPaper.title,citedPaper.year,...
+    paper_key_prefix = "citedPaper" if direction == "references" else "citingPaper"
+    prefixed_fields = [f"{paper_key_prefix}.{f}" for f in PAPER_FIELDS]
+    all_fields = ",".join(EDGE_FIELDS + prefixed_fields)
+
+    params = {
+        "fields": all_fields,
+        # citations 端点按时间倒序返回，需要多取才能找到高引论文
+        # references 通常较少（几十条），多取无害
+        "limit": 1000,
+    }
+
+    with get_client(timeout=30, headers=headers) as client:
+        resp = client.get(endpoint, params=params)
+        resp.raise_for_status()
+        data = resp.json()
+
+    # 获取论文本体信息（用于输出上下文）
+    paper_resp = None
+    with get_client(timeout=15, headers=headers) as client:
+        try:
+            r = client.get(f"{API_BASE}/{resolved}", params={"fields": "title,year,citationCount"})
+            r.raise_for_status()
+            paper_resp = r.json()
+        except Exception:
+            pass
+
+    # direction=references 时结构是 {"data": [{"citedPaper": {...}, "contexts": [...], "intents": [...]}]}
+    # direction=citations 时结构是 {"data": [{"citingPaper": {...}, "contexts": [...], "intents": [...]}]}
+    paper_key = "citedPaper" if direction == "references" else "citingPaper"
+
+    items = []
+    for entry in data.get("data", []):
+        paper = entry.get(paper_key, {})
+        if not paper or not paper.get("title"):
+            continue
+
+        year = paper.get("year")
+        citation_count = paper.get("citationCount") or 0
+
+        # 过滤
+        if citation_count < min_citations:
+            continue
+        if year_min and year and year < year_min:
+            continue
+        if year_max and year and year > year_max:
+            continue
+
+        authors = [a.get("name", "") for a in paper.get("authors", [])]
+        external_ids = paper.get("externalIds") or {}
+        doi = external_ids.get("DOI")
+        arxiv_id = external_ids.get("ArXiv")
+        s2_id = paper.get("paperId", "")
+
+        url = f"https://www.semanticscholar.org/paper/{s2_id}" if s2_id else ""
+
+        abstract = paper.get("abstract") or ""
+        snippet = abstract
+
+        open_access_pdf = None
+        if paper.get("openAccessPdf"):
+            open_access_pdf = paper["openAccessPdf"].get("url")
+
+        # contexts: 引用该论文时的上下文句子（仅 citations 方向有意义）
+        contexts = entry.get("contexts") or []
+        intents = entry.get("intents") or []
+
+        item = make_item(
+            title=paper.get("title", ""),
+            url=url,
+            snippet=snippet,
+            authors=authors,
+            year=year,
+            venue=paper.get("venue") or None,
+            publication_date=paper.get("publicationDate"),
+            citation_count=citation_count,
+            influential_citation_count=paper.get("influentialCitationCount"),
+            is_open_access=paper.get("isOpenAccess"),
+            open_access_pdf=open_access_pdf,
+            fields_of_study=paper.get("fieldsOfStudy") or None,
+            doi=doi,
+            arxiv_id=arxiv_id,
+            paper_id=s2_id,
+            citation_contexts=contexts[:3] if contexts else None,  # 最多 3 条上下文
+            citation_intents=intents if intents else None,
+        )
+        items.append(item)
+
+    # 按引用数排序，取 top-N
+    items.sort(key=lambda x: x.get("citation_count", 0), reverse=True)
+    items = items[:limit]
+
+    result = {
+        "success": True,
+        "paper_id": resolved,
+        "direction": direction,
+        "provider": "semantic_scholar",
+        "items": items,
+        "total_available": len(data.get("data", [])),
+        "returned": len(items),
+        "error": None,
+    }
+    if paper_resp:
+        result["source_paper"] = {
+            "title": paper_resp.get("title"),
+            "year": paper_resp.get("year"),
+            "citation_count": paper_resp.get("citationCount"),
+        }
+
+    return result
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="查询论文的参考文献（backward）或被引论文（forward）"
+    )
+    parser.add_argument(
+        "paper_id",
+        help="论文标识符：S2 ID、DOI（如 10.1234/...）、ArXiv ID（如 2301.07041）、PMID（如 PMID:12345678）",
+    )
+    parser.add_argument(
+        "direction",
+        choices=["references", "citations"],
+        help="references=参考文献（backward），citations=被引论文（forward）",
+    )
+    parser.add_argument("--limit", "-n", type=int, default=20, help="返回结果数量（默认 20）")
+    parser.add_argument("--min-citations", type=int, default=0, help="最低引用数过滤（默认 0）")
+    parser.add_argument("--year-min", type=int, default=None, help="最早年份过滤")
+    parser.add_argument("--year-max", type=int, default=None, help="最晚年份过滤")
+    parser.add_argument("--api-key", help="Semantic Scholar API Key（可选）")
+    args = parser.parse_args()
+
+    try:
+        result = fetch_refs(
+            args.paper_id,
+            args.direction,
+            args.limit,
+            args.min_citations,
+            args.year_min,
+            args.year_max,
+            getattr(args, "api_key", None),
+        )
+        print_json(result)
+    except Exception as e:
+        print_json({
+            "success": False,
+            "paper_id": args.paper_id,
+            "direction": args.direction,
+            "provider": "semantic_scholar",
+            "items": [],
+            "error": str(e),
+        })
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/sn-search-academic/scripts/semantic_scholar_search.py
+++ b/sn-search-academic/scripts/semantic_scholar_search.py
@@ -0,0 +1,104 @@
+#!/usr/bin/env python3
+"""Semantic Scholar 论文搜索。通过 Semantic Scholar Graph API。"""
+from __future__ import annotations
+
+import sys
+
+from search_utils import build_parser, get_client, make_item, make_result, print_json
+
+API_URL = "https://api.semanticscholar.org/graph/v1/paper/search"
+
+FIELDS = ",".join([
+    "title", "abstract", "tldr", "year", "venue", "publicationVenue", "publicationDate",
+    "authors", "citationCount", "influentialCitationCount",
+    "referenceCount", "isOpenAccess", "openAccessPdf",
+    "externalIds", "fieldsOfStudy", "publicationTypes", "journal",
+])
+
+
+def search(query: str, limit: int, api_key: str | None = None) -> list[dict]:
+    """执行 Semantic Scholar 搜索。"""
+    headers: dict[str, str] = {}
+    if api_key:
+        headers["x-api-key"] = api_key
+
+    params = {
+        "query": query,
+        "limit": min(limit, 100),
+        "fields": FIELDS,
+    }
+
+    with get_client(timeout=30, headers=headers) as client:
+        resp = client.get(API_URL, params=params)
+        resp.raise_for_status()
+        data = resp.json()
+
+    items = []
+    for paper in data.get("data", [])[:limit]:
+        authors = [a.get("name", "") for a in paper.get("authors", [])]
+
+        open_access_pdf = None
+        if paper.get("openAccessPdf"):
+            open_access_pdf = paper["openAccessPdf"].get("url")
+
+        external_ids = paper.get("externalIds") or {}
+        doi = external_ids.get("DOI")
+        arxiv_id = external_ids.get("ArXiv")
+
+        paper_id = paper.get("paperId", "")
+        url = f"https://www.semanticscholar.org/paper/{paper_id}"
+
+        # 摘要：优先用 abstract，缺失时降级用 tldr
+        abstract = paper.get("abstract") or ""
+        tldr = (paper.get("tldr") or {}).get("text")
+        snippet = abstract or tldr or ""
+
+        # 期刊/会议：venue（脏字符串）+ publicationVenue（结构化）
+        venue = paper.get("venue") or (paper.get("journal") or {}).get("name")
+        pub_venue = paper.get("publicationVenue") or {}
+        publication_venue = {
+            k: pub_venue[k]
+            for k in ("id", "name", "type", "url")
+            if pub_venue.get(k)
+        } or None
+
+        items.append(make_item(
+            title=paper.get("title") or "",
+            url=url,
+            snippet=snippet,
+            tldr=tldr,
+            authors=authors,
+            year=paper.get("year"),
+            venue=venue if venue else None,
+            publication_venue=publication_venue,
+            publication_date=paper.get("publicationDate"),
+            citation_count=paper.get("citationCount"),
+            influential_citation_count=paper.get("influentialCitationCount"),
+            reference_count=paper.get("referenceCount"),
+            is_open_access=paper.get("isOpenAccess"),
+            open_access_pdf=open_access_pdf,
+            fields_of_study=paper.get("fieldsOfStudy") or None,
+            publication_types=paper.get("publicationTypes") or None,
+            doi=doi,
+            arxiv_id=arxiv_id,
+            paper_id=paper_id,
+        ))
+
+    return items
+
+
+def main():
+    parser = build_parser("搜索 Semantic Scholar 学术论文")
+    parser.add_argument("--api-key", help="Semantic Scholar API Key（可选，提高限额）")
+    args = parser.parse_args()
+
+    try:
+        items = search(args.query, args.limit, getattr(args, "api_key", None))
+        print_json(make_result(True, args.query, "semantic_scholar", items))
+    except Exception as e:
+        print_json(make_result(False, args.query, "semantic_scholar", [], str(e)))
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/sn-search-academic/scripts/wikipedia_search.py
+++ b/sn-search-academic/scripts/wikipedia_search.py
@@ -0,0 +1,79 @@
+#!/usr/bin/env python3
+"""Wikipedia 搜索。通过 MediaWiki API。"""
+from __future__ import annotations
+
+import sys
+
+from search_utils import build_parser, get_client, make_item, make_result, print_json
+
+
+def _api_url(lang: str) -> str:
+    return f"https://{lang}.wikipedia.org/w/api.php"
+
+
+def search(query: str, limit: int, lang: str = "en") -> list[dict]:
+    """执行 Wikipedia 搜索。"""
+    params = {
+        "action": "query",
+        "list": "search",
+        "srsearch": query,
+        "srlimit": min(limit, 50),
+        "srprop": "snippet|timestamp|wordcount|size|sectiontitle|sectionsnippet",
+        "format": "json",
+        "utf8": 1,
+    }
+
+    with get_client() as client:
+        resp = client.get(_api_url(lang), params=params)
+        resp.raise_for_status()
+        data = resp.json()
+
+    items = []
+    for result in data.get("query", {}).get("search", [])[:limit]:
+        title = result.get("title", "")
+        # snippet 是 HTML 片段，简单去标签
+        snippet = _strip_html(result.get("snippet", ""))
+        page_id = result.get("pageid", "")
+        url = f"https://{lang}.wikipedia.org/wiki/{title.replace(' ', '_')}"
+
+        section_title = result.get("sectiontitle", "")
+        section_snippet = _strip_html(result.get("sectionsnippet", ""))
+
+        items.append(make_item(
+            title=title,
+            url=url,
+            snippet=snippet,
+            word_count=result.get("wordcount"),
+            size=result.get("size"),
+            timestamp=result.get("timestamp"),
+            page_id=page_id,
+            section_title=section_title if section_title else None,
+            section_snippet=section_snippet if section_snippet else None,
+        ))
+
+    return items
+
+
+def _strip_html(html: str) -> str:
+    import re
+    text = re.sub(r"<[^>]+>", "", html)
+    text = re.sub(r"\s+", " ", text).strip()
+    return text
+
+
+def main():
+    parser = build_parser("搜索 Wikipedia 百科文章")
+    parser.add_argument("--lang", "-l", default="en",
+                        help="语言版本（默认 en，可选 zh, ja, de 等）")
+    args = parser.parse_args()
+
+    try:
+        items = search(args.query, args.limit, args.lang)
+        print_json(make_result(True, args.query, "wikipedia", items))
+    except Exception as e:
+        print_json(make_result(False, args.query, "wikipedia", [], str(e)))
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()