Files

Mimikko-zeus b46cef2c7b Add Stage 2.8 recall, quality gate, retries, and publish idempotency

2026-06-10 21:31:13 +08:00

4.5 KiB

Raw Permalink Blame History

AI Daily Full Chain Optimization Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Add the first quality safety layer for the AI daily report pipeline: semantic candidate recall, quality gate reporting, stage snapshots, and effective pipeline configuration.

Architecture: Keep the existing stage functions and add a rule-based Stage 2.8 between cross-day URL dedupe and LLM semantic dedupe. Quality gate stays deterministic and report-only for dry-run visibility, while publish blocking can consume its blocking_errors through the existing Stage 7/8 guard path. Runner persists stage artifacts from the pipeline result without changing generated content.

Tech Stack: Python standard library, unittest, existing dataclass models and pipeline modules.

Task 1: Make Pipeline Config Effective

Files:

Modify: ai_daily_report/pipeline.py
Modify: ai_daily_report/runner.py
Test: tests/test_stage0_to_4_pipeline.py
Test: tests/test_runner.py

Step 1: Write failing tests

Use existing tests that call run_stage0_to_stage4(..., semantic_dedup_max_deletion_ratio=0.1, rewrite_batch_size=1) and expect Stage 4 batch_count == 3.

Step 2: Run tests to verify failure

Run: python -m pytest tests/test_stage0_to_4_pipeline.py tests/test_runner.py -q

Expected: failure from unexpected keyword arguments or ignored config.

Step 3: Implement minimal code

Thread semantic_dedup_max_deletion_ratio into semantic_dedup_items() and rewrite_batch_size into rewrite_items(). Read both from pipeline.json in runner.py.

Step 4: Verify

Run the same tests and expect pass.

Task 2: Add Stage 2.8 Candidate Recall

Files:

Create: ai_daily_report/candidate_recall.py
Modify: ai_daily_report/pipeline.py
Test: tests/test_candidate_recall.py
Test: tests/test_stage0_to_4_pipeline.py

Step 1: Write failing tests

Add tests proving related Claude Fable/Mythos items are recalled even when Stage 2 title candidates are empty, while unrelated Gemini/Gemma items are not grouped by company name alone.

Step 2: Run tests to verify failure

Run: python -m pytest tests/test_candidate_recall.py tests/test_stage0_to_4_pipeline.py -q

Expected: import failure for the new module or zero recalled candidates.

Step 3: Implement minimal code

Use deterministic title similarity, token Jaccard, summary Jaccard, and strong entity overlap to produce candidate groups with item_ids, reason, score, and evidence fields.

Step 4: Verify

Run targeted tests and expect pass.

Task 3: Add Quality Gate Reporting

Files:

Create: ai_daily_report/quality_gate.py
Modify: ai_daily_report/pipeline.py
Test: tests/test_quality_gate.py

Step 1: Write failing tests

Add tests for warnings when Stage 3 candidates are zero for large item sets, enabled sources fail, and required sources fail.

Step 2: Run tests to verify failure

Run: python -m pytest tests/test_quality_gate.py -q

Expected: import failure for the new module.

Step 3: Implement minimal code

Return a report with warnings, blocking_errors, source_failures, and quality_gate_failed. Add it after Stage 7 and propagate blocking errors into Stage 7 before publish.

Step 4: Verify

Run quality gate and publish-path tests.

Task 4: Persist Stage Snapshots

Files:

Modify: ai_daily_report/pipeline.py
Modify: ai_daily_report/runner.py
Test: tests/test_runner.py

Step 1: Write failing tests

Assert that a mock run writes stage0_sources.json, stage1_items.json, stage2_items.json, stage2_5_items.json, stage2_8_candidates.json, stage3_items.json, stage4_items.json, and quality_gate.json.

Step 2: Run tests to verify failure

Run: python -m pytest tests/test_runner.py -q

Expected: snapshot files are missing.

Step 3: Implement minimal code

Have pipeline results carry an artifacts dict and have runner serialize the requested JSON files using the existing dataclass serializer.

Step 4: Verify

Run runner tests and inspect generated files through assertions.

Task 5: Full Regression

Files:

All touched files

Step 1: Run targeted tests

Run: python -m pytest tests/test_candidate_recall.py tests/test_quality_gate.py tests/test_stage0_to_4_pipeline.py tests/test_runner.py -q

Step 2: Run full test suite

Run: python -m pytest -q

Step 3: Fix regressions

Fix only issues caused by this change set.

4.5 KiB Raw Permalink Blame History

AI Daily Full Chain Optimization Implementation Plan

Task 1: Make Pipeline Config Effective

Task 2: Add Stage 2.8 Candidate Recall

Task 3: Add Quality Gate Reporting

Task 4: Persist Stage Snapshots

Task 5: Full Regression

4.5 KiB

Raw Permalink Blame History