Phase2 complete ablation (added missing variants):
- Coverage-only: 20% contamination rate (confirms Gate is critical)
- Gate-only: +5.2 tokens vs Full (coverage optimization marginal on clean data)
- -Recency: 0 effect on clean data
- -IDF: 0 effect on clean data
Phase4 end-to-end quality evaluation:
- CGK vs Last-5 across 5 queries:
* CGK: 42.2 tok, purity=1.000, anchor_recall=0.638, term_cov=0.380, contamination=0
* Last-5: 67.6 tok, purity=0.280, anchor_recall=0.066, term_cov=0.080, contamination=5
- All quality metrics CGK >> Last-5 on synthetic clean data
Known honest limitations:
- Still no real dialogue data (synthetic 4-topic only)
- No real LLM calls (quality is rule-estimated)
- Parameter sensitivity only on clean data, not noisy real data