Add Stage 2.8 recall, quality gate, retries, and publish idempotency
This commit is contained in:
130
docs/plans/2026-06-10-ai-daily-full-chain-optimization.md
Normal file
130
docs/plans/2026-06-10-ai-daily-full-chain-optimization.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# AI Daily Full Chain Optimization Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Add the first quality safety layer for the AI daily report pipeline: semantic candidate recall, quality gate reporting, stage snapshots, and effective pipeline configuration.
|
||||
|
||||
**Architecture:** Keep the existing stage functions and add a rule-based Stage 2.8 between cross-day URL dedupe and LLM semantic dedupe. Quality gate stays deterministic and report-only for dry-run visibility, while publish blocking can consume its `blocking_errors` through the existing Stage 7/8 guard path. Runner persists stage artifacts from the pipeline result without changing generated content.
|
||||
|
||||
**Tech Stack:** Python standard library, `unittest`, existing dataclass models and pipeline modules.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Make Pipeline Config Effective
|
||||
|
||||
**Files:**
|
||||
- Modify: `ai_daily_report/pipeline.py`
|
||||
- Modify: `ai_daily_report/runner.py`
|
||||
- Test: `tests/test_stage0_to_4_pipeline.py`
|
||||
- Test: `tests/test_runner.py`
|
||||
|
||||
**Step 1: Write failing tests**
|
||||
|
||||
Use existing tests that call `run_stage0_to_stage4(..., semantic_dedup_max_deletion_ratio=0.1, rewrite_batch_size=1)` and expect Stage 4 `batch_count == 3`.
|
||||
|
||||
**Step 2: Run tests to verify failure**
|
||||
|
||||
Run: `python -m pytest tests/test_stage0_to_4_pipeline.py tests/test_runner.py -q`
|
||||
|
||||
Expected: failure from unexpected keyword arguments or ignored config.
|
||||
|
||||
**Step 3: Implement minimal code**
|
||||
|
||||
Thread `semantic_dedup_max_deletion_ratio` into `semantic_dedup_items()` and `rewrite_batch_size` into `rewrite_items()`. Read both from `pipeline.json` in `runner.py`.
|
||||
|
||||
**Step 4: Verify**
|
||||
|
||||
Run the same tests and expect pass.
|
||||
|
||||
### Task 2: Add Stage 2.8 Candidate Recall
|
||||
|
||||
**Files:**
|
||||
- Create: `ai_daily_report/candidate_recall.py`
|
||||
- Modify: `ai_daily_report/pipeline.py`
|
||||
- Test: `tests/test_candidate_recall.py`
|
||||
- Test: `tests/test_stage0_to_4_pipeline.py`
|
||||
|
||||
**Step 1: Write failing tests**
|
||||
|
||||
Add tests proving related Claude Fable/Mythos items are recalled even when Stage 2 title candidates are empty, while unrelated Gemini/Gemma items are not grouped by company name alone.
|
||||
|
||||
**Step 2: Run tests to verify failure**
|
||||
|
||||
Run: `python -m pytest tests/test_candidate_recall.py tests/test_stage0_to_4_pipeline.py -q`
|
||||
|
||||
Expected: import failure for the new module or zero recalled candidates.
|
||||
|
||||
**Step 3: Implement minimal code**
|
||||
|
||||
Use deterministic title similarity, token Jaccard, summary Jaccard, and strong entity overlap to produce candidate groups with `item_ids`, `reason`, `score`, and evidence fields.
|
||||
|
||||
**Step 4: Verify**
|
||||
|
||||
Run targeted tests and expect pass.
|
||||
|
||||
### Task 3: Add Quality Gate Reporting
|
||||
|
||||
**Files:**
|
||||
- Create: `ai_daily_report/quality_gate.py`
|
||||
- Modify: `ai_daily_report/pipeline.py`
|
||||
- Test: `tests/test_quality_gate.py`
|
||||
|
||||
**Step 1: Write failing tests**
|
||||
|
||||
Add tests for warnings when Stage 3 candidates are zero for large item sets, enabled sources fail, and required sources fail.
|
||||
|
||||
**Step 2: Run tests to verify failure**
|
||||
|
||||
Run: `python -m pytest tests/test_quality_gate.py -q`
|
||||
|
||||
Expected: import failure for the new module.
|
||||
|
||||
**Step 3: Implement minimal code**
|
||||
|
||||
Return a report with `warnings`, `blocking_errors`, `source_failures`, and `quality_gate_failed`. Add it after Stage 7 and propagate blocking errors into Stage 7 before publish.
|
||||
|
||||
**Step 4: Verify**
|
||||
|
||||
Run quality gate and publish-path tests.
|
||||
|
||||
### Task 4: Persist Stage Snapshots
|
||||
|
||||
**Files:**
|
||||
- Modify: `ai_daily_report/pipeline.py`
|
||||
- Modify: `ai_daily_report/runner.py`
|
||||
- Test: `tests/test_runner.py`
|
||||
|
||||
**Step 1: Write failing tests**
|
||||
|
||||
Assert that a mock run writes `stage0_sources.json`, `stage1_items.json`, `stage2_items.json`, `stage2_5_items.json`, `stage2_8_candidates.json`, `stage3_items.json`, `stage4_items.json`, and `quality_gate.json`.
|
||||
|
||||
**Step 2: Run tests to verify failure**
|
||||
|
||||
Run: `python -m pytest tests/test_runner.py -q`
|
||||
|
||||
Expected: snapshot files are missing.
|
||||
|
||||
**Step 3: Implement minimal code**
|
||||
|
||||
Have pipeline results carry an `artifacts` dict and have runner serialize the requested JSON files using the existing dataclass serializer.
|
||||
|
||||
**Step 4: Verify**
|
||||
|
||||
Run runner tests and inspect generated files through assertions.
|
||||
|
||||
### Task 5: Full Regression
|
||||
|
||||
**Files:**
|
||||
- All touched files
|
||||
|
||||
**Step 1: Run targeted tests**
|
||||
|
||||
Run: `python -m pytest tests/test_candidate_recall.py tests/test_quality_gate.py tests/test_stage0_to_4_pipeline.py tests/test_runner.py -q`
|
||||
|
||||
**Step 2: Run full test suite**
|
||||
|
||||
Run: `python -m pytest -q`
|
||||
|
||||
**Step 3: Fix regressions**
|
||||
|
||||
Fix only issues caused by this change set.
|
||||
Reference in New Issue
Block a user