Commit Graph

15 Commits

Author SHA1 Message Date
Elaina
97e1ddf138 complete: full ablation + Phase4 quality evaluation + honest blog post
Phase2 complete ablation (added missing variants):
- Coverage-only: 20% contamination rate (confirms Gate is critical)
- Gate-only: +5.2 tokens vs Full (coverage optimization marginal on clean data)
- -Recency: 0 effect on clean data
- -IDF: 0 effect on clean data

Phase4 end-to-end quality evaluation:
- CGK vs Last-5 across 5 queries:
  * CGK: 42.2 tok, purity=1.000, anchor_recall=0.638, term_cov=0.380, contamination=0
  * Last-5: 67.6 tok, purity=0.280, anchor_recall=0.066, term_cov=0.080, contamination=5
- All quality metrics CGK >> Last-5 on synthetic clean data

Known honest limitations:
- Still no real dialogue data (synthetic 4-topic only)
- No real LLM calls (quality is rule-estimated)
- Parameter sensitivity only on clean data, not noisy real data
2026-04-22 22:48:25 +08:00
Elaina
9e44748f91 fix: anchor stopwords - remove generic question patterns causing cross-topic contamination
- Add ANCHOR_STOPWORDS set in anchor.py (真正通用的疑问pattern)
- Filter Chinese n-grams against stopwords in extract()
- Update sparse.py content_words extraction to use stopword-filtered query
- Diagnosis: 'Git rebase vs merge' query now correctly excludes Redis/asyncio blocks
- Phase1 results: Full CGK 42.6 tokens avg, 0% contamination (vs Last-5 67.6 tokens, 100%)
- Phase2 ablation: Gate-only accounts for most of the benefit
- Phase3 sensitivity: OVERLAP/NEW_RATIO thresholds insensitive on clean data;
  RECENT_WINDOW is the primary token budget control

Known honest limitations:
- Test set is clean 4-topic synthetic data (no real dirty dialogue)
- No strong baselines (BM25 ablation incomplete)
- No answer-level evaluation (only retrieval blocks measured)
- No parameter sensitivity on noisy real-world data
- Zero contamination on 5 queries is not generalizable
2026-04-22 22:30:18 +08:00
Elaina
2064eb7bdf docs: add DESIGN.md following Google Stitch spec 2026-04-22 19:33:01 +08:00
Elaina
d18a521f9c fix: 修复评审发现的4个高优先级问题
1. sparse.py: 话题切换过滤从赋0分改为continue,真正排除旧话题候选
2. gatekeeper.py: reset() 清空IDF缓存,避免新会话状态污染
3. gatekeeper.py: 句级裁剪后重新估算token数
4. sparse.py: content_words提取纳入所有英文单词(含单字符如'pg')和2字中文词
2026-04-22 12:21:52 +08:00
Elaina
c828fceae9 chore: update README with complete algorithm and 100-round 4-topic results 2026-04-22 12:12:04 +08:00
Elaina
07b66d3b58 chore: update README with full algorithm, remove concrete hardware specs 2026-04-22 11:14:19 +08:00
Elaina
9a2b1e3b6a chore: remove paper, add summary, update README 2026-04-22 10:49:11 +08:00
Elaina
8852f1b1fb chore: remove paper (未完成) 2026-04-22 10:43:50 +08:00
Elaina
a8204a50b5 docs: 更新 README.md,包含算法细节、局限性、适用场景 2026-04-22 09:49:17 +08:00
Elaina
93156cf736 docs: 修正论文与文档不一致处
- recency: '时间衰减' → '新鲜度奖励(越新越大)'
- 删除3.6节句级裁剪(未实现)
- 补充中间地带fallback规则(0.20≤overlap≤0.45默认继续)
- 修正MS MARCO作者:Liu→Nguyen
- 10ms延迟标注为理论估算,移除无依据数据
- 更新局限性描述与实现状态一致
2026-04-22 09:46:47 +08:00
Elaina
224295ccaf fix: selector gain函数使用IDF加权,与文档一致
- selector.select() 接收 idf_cache 参数
- gain = ΣIDF(t) for t ∈ new_anchors / cost^α(与文档公式一致)
- gatekeeper.select() 将 anchor_extractor._idf_cache 传入selector
- sparse.py recency 注释澄清为'新鲜度奖励'而非'时间衰减'
- 所有测试 9/9 通过
2026-04-22 09:45:30 +08:00
Elaina
7ced5d9a10 docs: 添加论文《上下文门控器》 2026-04-22 01:22:13 +08:00
Elaina
64ca67c051 fix: 修复 _active_topic 在话题切换后不更新的 bug
问题: _active_topic 只在 continue 时更新,switch 后停留在旧值,导致 overlap 计算失效。

修复:
- select() 每次都更新 _active_topic(无论是否切换)
- 同步调用 topic_gate.update_active_topic() 保持两份状态一致

同时更新 TopicGate 实例的活跃话题状态,解决两份状态独立的问题。
2026-04-22 01:14:13 +08:00
Elaina
bbaab47de4 docs: 添加 SPEC.md 规格文档 2026-04-22 01:12:03 +08:00
Elaina
071f9ef418 feat: 上下文门控器初始实现
- anchor.py: 锚点提取(中文 2/3-gram、英文单词、代码标识符)
- block.py: 对话块数据结构
- topic_gate.py: 话题门控(overlap/new_ratio 判断切换)
- sparse.py: 稀疏召回(BM25/IDF-overlap + exact match 加分)
- selector.py: 最小覆盖贪心选择
- gatekeeper.py: 完整流程封装
- tests/: 单元测试 + 端到端测试(含 MiniMax API 验证)

特性:
- 纯 Python,无额外模型依赖
- 支持 2 核 2G 环境
- 话题门控 + 稀疏召回 + 最小覆盖选择
2026-04-22 01:09:35 +08:00