Phase2 complete ablation (added missing variants): - Coverage-only: 20% contamination rate (confirms Gate is critical) - Gate-only: +5.2 tokens vs Full (coverage optimization marginal on clean data) - -Recency: 0 effect on clean data - -IDF: 0 effect on clean data Phase4 end-to-end quality evaluation: - CGK vs Last-5 across 5 queries: * CGK: 42.2 tok, purity=1.000, anchor_recall=0.638, term_cov=0.380, contamination=0 * Last-5: 67.6 tok, purity=0.280, anchor_recall=0.066, term_cov=0.080, contamination=5 - All quality metrics CGK >> Last-5 on synthetic clean data Known honest limitations: - Still no real dialogue data (synthetic 4-topic only) - No real LLM calls (quality is rule-estimated) - Parameter sensitivity only on clean data, not noisy real data
15 KiB
15 KiB