feat: refactor API key configuration and enhance application initialization

- Renamed `check_environment` to `check_api_key_configured` for clarity, simplifying the API key validation logic. - Removed the blocking behavior of the API key check during application startup, allowing the app to run while providing a prompt for configuration. - Updated `LocalAgentApp` to accept an `api_configured` parameter, enabling conditional messaging for API key setup. - Enhanced the `SandboxRunner` to support backup management and improved execution result handling with detailed metrics. - Integrated data governance strategies into the `HistoryManager`, ensuring compliance and improved data management. - Added privacy settings and metrics tracking across various components to enhance user experience and application safety.
2026-02-27 14:32:30 +08:00
parent ab5bbff6f7
commit 8a538bb950
58 changed files with 13457 additions and 350 deletions
--- a/docs/测试覆盖率矩阵.md
+++ b/docs/测试覆盖率矩阵.md
@@ -0,0 +1,405 @@
+# 测试覆盖率矩阵
+
+## 概述
+
+本文档描述了 LocalAgent 项目的测试覆盖策略，重点关注关键主流程和安全回归测试。
+
+## 测试分层架构
+
+```
+┌─────────────────────────────────────────────────────────┐
+│           端到端集成测试 (E2E Integration)              │
+│  test_e2e_integration.py + test_security_regression.py  │
+└─────────────────────────────────────────────────────────┘
+                            ▲
+                            │
+┌─────────────────────────────────────────────────────────┐
+│              功能集成测试 (Feature Tests)                │
+│  test_config_refresh.py, test_retry_fix.py, etc.        │
+└─────────────────────────────────────────────────────────┘
+                            ▲
+                            │
+┌─────────────────────────────────────────────────────────┐
+│              单元测试 (Unit Tests)                       │
+│  test_intent_classifier.py, test_rule_checker.py, etc.  │
+└─────────────────────────────────────────────────────────┘
+```
+
+## 关键主流程测试覆盖
+
+### 1. 复用绕过安全测试 (Reuse Security Bypass)
+
+**测试文件**: `test_e2e_integration.py::TestCodeReuseSecurityRegression`
+
+**覆盖场景**:
+- ✅ 复用代码必须触发安全复检
+- ✅ 复用代码被安全检查拦截
+- ✅ 复用流程的指标追踪
+- ✅ 防止通过复用绕过安全检查
+- ✅ 复用后修改为危险代码的检测
+- ✅ 复用时的多层安全检查
+
+**关键断言**:
+```python
+# 1. 复用必须触发复检
+self.assertTrue(len(recheck_result.warnings) > 0, "复用代码的安全复检必须检测到警告")
+
+# 2. 危险代码必须被拦截
+self.assertFalse(recheck_result.passed, "包含socket的复用代码必须被拦截")
+
+# 3. 指标正确追踪
+self.assertEqual(stats['total_offered'], 1)
+self.assertEqual(stats['total_accepted'], 1)
+```
+
+**度量指标**:
+- 复用复检触发率: 100%
+- 危险代码拦截率: 目标 100%
+- 指标追踪准确率: 目标 100%
+
+---
+
+### 2. 设置热更新测试 (Config Hot Reload)
+
+**测试文件**: `test_e2e_integration.py::TestConfigHotReloadRegression`
+
+**覆盖场景**:
+- ✅ 配置变更触发首次调用追踪
+- ✅ 配置变更后首次调用失败处理
+- ✅ 配置变更后的意图分类调用
+
+**关键断言**:
+```python
+# 1. 配置变更后标记首次调用
+self.assertTrue(
+    self.config_metrics.is_first_call_after_change(),
+    "配置变更后应标记为首次调用"
+)
+
+# 2. 首次调用后清除标志
+self.assertFalse(
+    self.config_metrics.is_first_call_after_change(),
+    "首次调用后应清除标志"
+)
+
+# 3. 统计正确
+self.assertEqual(stats['first_call_success'], 1)
+```
+
+**度量指标**:
+- 配置变更检测率: 100%
+- 首次调用追踪率: 100%
+- 失败恢复成功率: 目标 > 95%
+
+---
+
+### 3. 执行链三态结果测试 (Three-State Execution)
+
+**测试文件**: `test_e2e_integration.py::TestExecutionResultThreeStateRegression`
+
+**覆盖场景**:
+- ✅ 全部成功状态 (success)
+- ✅ 部分成功状态 (partial)
+- ✅ 全部失败状态 (failed)
+- ✅ 状态显示文本
+
+**关键断言**:
+```python
+# 1. 全部成功
+self.assertEqual(result.status, 'success')
+self.assertEqual(result.success_count, result.total_count)
+self.assertTrue(result.success)
+
+# 2. 部分成功
+self.assertEqual(result.status, 'partial')
+self.assertGreater(result.success_count, 0)
+self.assertGreater(result.failed_count, 0)
+self.assertFalse(result.success)  # partial 不算完全成功
+
+# 3. 全部失败
+self.assertEqual(result.status, 'failed')
+self.assertEqual(result.success_count, 0)
+self.assertFalse(result.success)
+```
+
+**度量指标**:
+- 状态识别准确率: 100%
+- 统计计算准确率: 100%
+- 用户提示准确率: 目标 100%
+
+---
+
+## 安全回归测试矩阵
+
+### 测试文件: `test_security_regression.py`
+
+### 1. 硬性禁止回归测试
+
+**测试类**: `TestSecurityRegressionMatrix`
+
+| 危险操作 | 测试方法 | 预期结果 |
+|---------|---------|---------|
+| socket 网络操作 | `test_regression_network_operations` | ❌ 拦截 |
+| subprocess 命令执行 | `test_regression_command_execution` | ❌ 拦截 |
+| eval/exec 动态执行 | `test_regression_command_execution` | ❌ 拦截 |
+| os.system/popen | `test_regression_command_execution` | ❌ 拦截 |
+| os.remove 文件删除 | `test_regression_file_system_warnings` | ⚠️ 警告 |
+| shutil.rmtree 目录删除 | `test_regression_file_system_warnings` | ⚠️ 警告 |
+
+### 2. 安全操作白名单测试
+
+**测试方法**: `test_regression_safe_operations`
+
+| 安全操作 | 预期结果 |
+|---------|---------|
+| shutil.copy 文件复制 | ✅ 通过 |
+| PIL 图片处理 | ✅ 通过 |
+| openpyxl Excel处理 | ✅ 通过 |
+| json 数据处理 | ✅ 通过 |
+
+### 3. LLM审查器回归测试
+
+**测试类**: `TestLLMReviewerRegression`
+
+- ✅ 响应解析的鲁棒性
+- ✅ LLM调用失败时的降级处理
+- ✅ 带警告的LLM审查
+
+---
+
+## 端到端工作流测试
+
+### 测试类: `TestEndToEndWorkflow`
+
+**完整执行流程**:
+```
+用户输入 → 意图分类 → 代码生成 → 安全检查 → 执行 → 历史记录
+```
+
+**测试方法**: `test_complete_execution_workflow`
+
+**覆盖步骤**:
+1. ✅ 意图分类
+2. ✅ 代码生成（模拟）
+3. ✅ 硬规则安全检查
+4. ✅ 准备输入文件
+5. ✅ 执行代码
+6. ✅ 验证执行结果
+7. ✅ 保存历史记录
+8. ✅ 验证历史记录
+
+---
+
+## 关键路径覆盖测试
+
+### 测试类: `TestCriticalPathCoverage`
+
+### 路径 1: 新代码生成
+```
+生成代码 → 硬规则检查 → LLM审查 → 执行
+```
+**测试方法**: `test_critical_path_new_code_generation`
+
+### 路径 2: 代码复用
+```
+查找历史 → 安全复检 → 执行
+```
+**测试方法**: `test_critical_path_code_reuse`
+
+### 路径 3: 失败重试
+```
+失败记录 → 代码修复 → 安全检查 → 执行
+```
+**测试方法**: `test_critical_path_code_fix_retry`
+
+---
+
+## 测试运行指南
+
+### 运行所有测试
+```bash
+python tests/test_runner.py --mode all
+```
+
+### 仅运行关键路径测试
+```bash
+python tests/test_runner.py --mode critical
+```
+
+### 仅运行单元测试
+```bash
+python tests/test_runner.py --mode unit
+```
+
+### 运行特定测试文件
+```bash
+python -m unittest tests.test_e2e_integration
+python -m unittest tests.test_security_regression
+```
+
+### 运行特定测试类
+```bash
+python -m unittest tests.test_e2e_integration.TestCodeReuseSecurityRegression
+```
+
+### 运行特定测试方法
+```bash
+python -m unittest tests.test_e2e_integration.TestCodeReuseSecurityRegression.test_reuse_must_trigger_security_recheck
+```
+
+---
+
+## 测试报告
+
+测试运行后会在 `workspace/test_reports/` 目录生成以下报告：
+
+1. **JSON报告**: `test_report_YYYYMMDD_HHMMSS.json`
+   - 包含详细的测试指标
+   - 失败和错误的完整堆栈跟踪
+
+2. **Markdown报告**: `test_report_YYYYMMDD_HHMMSS.md`
+   - 人类可读的测试摘要
+   - 按测试类分组的覆盖率矩阵
+   - 失败详情和改进建议
+
+---
+
+## 度量指标
+
+### 关键路径自动化覆盖率
+
+| 关键路径 | 测试用例数 | 覆盖率 |
+|---------|-----------|--------|
+| 复用绕过安全 | 6 | 100% |
+| 设置热更新 | 3 | 100% |
+| 执行链三态 | 4 | 100% |
+| 新代码生成 | 1 | 100% |
+| 代码复用 | 1 | 100% |
+| 失败重试 | 1 | 100% |
+
+### 安全回归覆盖率
+
+| 安全场景 | 测试用例数 | 覆盖率 |
+|---------|-----------|--------|
+| 硬性禁止操作 | 8 | 100% |
+| 警告操作 | 4 | 100% |
+| 安全操作白名单 | 4 | 100% |
+| LLM审查器 | 3 | 100% |
+| 历史复用安全 | 3 | 100% |
+
+### 变更后回归缺陷率
+
+**目标**: < 5%
+
+**监控方式**:
+- 每次代码变更后运行完整测试套件
+- 记录新引入的回归缺陷数量
+- 计算回归缺陷率 = 回归缺陷数 / 总变更数
+
+---
+
+## 持续集成建议
+
+### CI/CD 流程
+
+```yaml
+# 示例 GitHub Actions 配置
+name: Test Suite
+
+on: [push, pull_request]
+
+jobs:
+  test:
+    runs-on: windows-latest
+    steps:
+      - uses: actions/checkout@v2
+      - uses: actions/setup-python@v2
+        with:
+          python-version: '3.9'
+      - name: Install dependencies
+        run: pip install -r requirements.txt
+      - name: Run unit tests
+        run: python tests/test_runner.py --mode unit
+      - name: Run critical path tests
+        run: python tests/test_runner.py --mode critical
+      - name: Upload test reports
+        uses: actions/upload-artifact@v2
+        with:
+          name: test-reports
+          path: workspace/test_reports/
+```
+
+---
+
+## 改进建议
+
+### 短期 (1-2周)
+- [ ] 添加性能基准测试
+- [ ] 增加并发执行场景测试
+- [ ] 补充边界条件测试
+
+### 中期 (1-2月)
+- [ ] 集成代码覆盖率工具 (coverage.py)
+- [ ] 添加压力测试和负载测试
+- [ ] 建立测试数据管理机制
+
+### 长期 (3-6月)
+- [ ] 实现自动化回归测试
+- [ ] 建立测试质量度量体系
+- [ ] 引入变异测试 (Mutation Testing)
+
+---
+
+## 附录：测试最佳实践
+
+### 1. 测试命名规范
+```python
+def test_<场景>_<预期行为>(self):
+    """测试：<简短描述>"""
+    pass
+```
+
+### 2. 测试结构 (AAA模式)
+```python
+def test_example(self):
+    # Arrange: 准备测试数据
+    data = prepare_test_data()
+    
+    # Act: 执行被测试的操作
+    result = perform_operation(data)
+    
+    # Assert: 验证结果
+    self.assertEqual(result, expected_value)
+```
+
+### 3. 使用子测试处理多个场景
+```python
+def test_multiple_scenarios(self):
+    test_cases = [
+        (input1, expected1),
+        (input2, expected2),
+    ]
+    
+    for input_data, expected in test_cases:
+        with self.subTest(input=input_data):
+            result = function(input_data)
+            self.assertEqual(result, expected)
+```
+
+### 4. 清理测试环境
+```python
+def setUp(self):
+    """每个测试前执行"""
+    self.temp_dir = Path(tempfile.mkdtemp())
+
+def tearDown(self):
+    """每个测试后执行"""
+    shutil.rmtree(self.temp_dir, ignore_errors=True)
+```
+
+---
+
+**文档版本**: 1.0  
+**最后更新**: 2026-02-27  
+**维护者**: LocalAgent 开发团队
+