feat: refactor API key configuration and enhance application initialization
- Renamed `check_environment` to `check_api_key_configured` for clarity, simplifying the API key validation logic. - Removed the blocking behavior of the API key check during application startup, allowing the app to run while providing a prompt for configuration. - Updated `LocalAgentApp` to accept an `api_configured` parameter, enabling conditional messaging for API key setup. - Enhanced the `SandboxRunner` to support backup management and improved execution result handling with detailed metrics. - Integrated data governance strategies into the `HistoryManager`, ensuring compliance and improved data management. - Added privacy settings and metrics tracking across various components to enhance user experience and application safety.
This commit is contained in:
235
docs/P1-07_数据治理方案.md
Normal file
235
docs/P1-07_数据治理方案.md
Normal file
@@ -0,0 +1,235 @@
|
||||
# P1-07 数据治理优化方案
|
||||
|
||||
## 问题概述
|
||||
|
||||
**问题标题**: 历史记录明文持久化完整输入/代码/输出,缺少治理策略
|
||||
**问题类型**: 安全/数据一致性
|
||||
**所在位置**: history/manager.py:16, history/manager.py:69, ui/history_view.py:652
|
||||
**影响分析**: 本地泄露面扩大,调试日志可能含敏感路径/内容
|
||||
|
||||
## 解决方案
|
||||
|
||||
### 1. 数据脱敏模块 (`history/data_sanitizer.py`)
|
||||
|
||||
**功能特性**:
|
||||
- 支持 10+ 种敏感信息类型识别(文件路径、邮箱、电话、API密钥、密码等)
|
||||
- 智能脱敏策略(保留部分信息以便调试)
|
||||
- 敏感度评分算法(0-1分值)
|
||||
- 避免误判的特殊验证机制
|
||||
|
||||
**核心能力**:
|
||||
```python
|
||||
# 敏感信息检测
|
||||
matches = sanitizer.find_sensitive_data(text)
|
||||
|
||||
# 文本脱敏
|
||||
sanitized_text, matches = sanitizer.sanitize(text)
|
||||
|
||||
# 敏感度评分
|
||||
score = sanitizer.get_sensitivity_score(text) # 0.0 - 1.0
|
||||
```
|
||||
|
||||
### 2. 数据治理策略模块 (`history/data_governance.py`)
|
||||
|
||||
**三级分类保存**:
|
||||
|
||||
| 数据级别 | 敏感度阈值 | 保留期 | 处理方式 |
|
||||
|---------|-----------|--------|---------|
|
||||
| FULL(完整) | < 0.3 | 90天 | 无脱敏,完整保存 |
|
||||
| SANITIZED(脱敏) | 0.3 - 0.7 | 30天 | 敏感字段脱敏 |
|
||||
| MINIMAL(最小化) | ≥ 0.7 | 7天 | 仅保留元数据 |
|
||||
|
||||
**生命周期管理**:
|
||||
- 自动过期检查
|
||||
- 分级降级策略(完整→脱敏→归档→删除)
|
||||
- 归档目录独立存储
|
||||
|
||||
**度量指标收集**:
|
||||
- 各级别记录数量统计
|
||||
- 敏感字段命中率
|
||||
- 存储空间占用
|
||||
- 过期记录数量
|
||||
|
||||
### 3. 历史记录管理器增强 (`history/manager.py`)
|
||||
|
||||
**集成治理功能**:
|
||||
- 保存时自动应用治理策略
|
||||
- 启动时自动清理过期数据
|
||||
- 支持手动触发清理
|
||||
- 导出脱敏数据功能
|
||||
|
||||
**新增方法**:
|
||||
```python
|
||||
# 手动清理
|
||||
stats = manager.manual_cleanup()
|
||||
# 返回: {'archived': 5, 'deleted': 3, 'remaining': 92}
|
||||
|
||||
# 获取治理指标
|
||||
metrics = manager.get_governance_metrics()
|
||||
|
||||
# 导出脱敏数据
|
||||
count = manager.export_sanitized(output_path)
|
||||
```
|
||||
|
||||
### 4. 治理监控面板 (`ui/governance_panel.py`)
|
||||
|
||||
**可视化界面**:
|
||||
- 实时治理指标展示
|
||||
- 一键执行数据清理
|
||||
- 导出脱敏数据
|
||||
- 打开归档目录
|
||||
- 策略说明展示
|
||||
|
||||
### 5. 完整测试套件 (`tests/test_data_governance.py`)
|
||||
|
||||
**测试覆盖**:
|
||||
- 数据脱敏器测试(10+ 测试用例)
|
||||
- 治理策略测试(分类、过期、清理)
|
||||
- 历史管理器集成测试
|
||||
- 导出功能测试
|
||||
|
||||
## 度量指标
|
||||
|
||||
### 建议监控指标
|
||||
|
||||
1. **数据体积指标**
|
||||
- 总记录数
|
||||
- 各级别记录占比
|
||||
- 存储空间占用(MB)
|
||||
|
||||
2. **敏感字段命中率**
|
||||
- 各字段敏感信息检出次数
|
||||
- 敏感度分布统计
|
||||
|
||||
3. **过期清理完成率**
|
||||
- 待清理记录数
|
||||
- 归档成功率
|
||||
- 删除完成率
|
||||
- 最后清理时间
|
||||
|
||||
4. **治理效果指标**
|
||||
- 脱敏覆盖率
|
||||
- 数据降级次数
|
||||
- 归档文件数量
|
||||
|
||||
## 使用示例
|
||||
|
||||
### 基础使用(自动治理)
|
||||
|
||||
```python
|
||||
from history.manager import get_history_manager
|
||||
|
||||
# 获取管理器(自动启用治理)
|
||||
manager = get_history_manager()
|
||||
|
||||
# 添加记录时自动分类和脱敏
|
||||
record = manager.add_record(
|
||||
task_id='task-001',
|
||||
user_input='读取配置文件 /etc/config.json',
|
||||
code='with open("/etc/config.json") as f: ...',
|
||||
# ... 其他字段
|
||||
)
|
||||
|
||||
# 记录会自动:
|
||||
# 1. 分析敏感度
|
||||
# 2. 应用对应级别的治理策略
|
||||
# 3. 添加治理元数据
|
||||
# 4. 保存时收集度量指标
|
||||
```
|
||||
|
||||
### 手动清理
|
||||
|
||||
```python
|
||||
# 手动触发清理
|
||||
stats = manager.manual_cleanup()
|
||||
print(f"归档: {stats['archived']}, 删除: {stats['deleted']}")
|
||||
```
|
||||
|
||||
### 导出脱敏数据
|
||||
|
||||
```python
|
||||
from pathlib import Path
|
||||
|
||||
# 导出用于分享或备份
|
||||
count = manager.export_sanitized(Path("history_sanitized.json"))
|
||||
print(f"已导出 {count} 条脱敏记录")
|
||||
```
|
||||
|
||||
### 查看治理指标
|
||||
|
||||
```python
|
||||
metrics = manager.get_governance_metrics()
|
||||
print(f"总记录: {metrics.total_records}")
|
||||
print(f"完整保存: {metrics.full_records}")
|
||||
print(f"脱敏保存: {metrics.sanitized_records}")
|
||||
print(f"存储占用: {metrics.total_size_bytes / 1024 / 1024:.2f} MB")
|
||||
```
|
||||
|
||||
## 安全改进
|
||||
|
||||
### 改进前
|
||||
- ❌ 明文保存所有敏感信息
|
||||
- ❌ 无数据分级策略
|
||||
- ❌ 无过期清理机制
|
||||
- ❌ 无敏感信息检测
|
||||
- ❌ 无度量指标
|
||||
|
||||
### 改进后
|
||||
- ✅ 自动识别并脱敏 10+ 种敏感信息
|
||||
- ✅ 三级分类保存(完整/脱敏/最小化)
|
||||
- ✅ 自动过期清理和归档
|
||||
- ✅ 敏感度评分和分级
|
||||
- ✅ 完整的度量指标体系
|
||||
- ✅ 可视化监控面板
|
||||
- ✅ 导出脱敏数据功能
|
||||
|
||||
## 配置选项
|
||||
|
||||
可在 `history/manager.py` 中调整:
|
||||
|
||||
```python
|
||||
class HistoryManager:
|
||||
MAX_HISTORY_SIZE = 100 # 最大记录数
|
||||
AUTO_CLEANUP_ENABLED = True # 自动清理开关
|
||||
```
|
||||
|
||||
可在 `history/data_governance.py` 中调整:
|
||||
|
||||
```python
|
||||
# 分级阈值
|
||||
LEVEL_THRESHOLDS = {
|
||||
DataLevel.FULL: 0.0,
|
||||
DataLevel.SANITIZED: 0.3,
|
||||
DataLevel.MINIMAL: 0.7,
|
||||
}
|
||||
|
||||
# 保留期配置
|
||||
RETENTION_CONFIG = {
|
||||
DataLevel.FULL: 90, # 天
|
||||
DataLevel.SANITIZED: 30,
|
||||
DataLevel.MINIMAL: 7,
|
||||
}
|
||||
```
|
||||
|
||||
## 运行测试
|
||||
|
||||
```bash
|
||||
python tests/test_data_governance.py
|
||||
```
|
||||
|
||||
预期输出:
|
||||
- 数据脱敏器测试:6+ 通过
|
||||
- 数据治理策略测试:5+ 通过
|
||||
- 历史管理器测试:5+ 通过
|
||||
|
||||
## 总结
|
||||
|
||||
本方案通过四个核心模块实现了完整的数据治理体系:
|
||||
|
||||
1. **自动化**: 保存时自动分类、脱敏、清理
|
||||
2. **分级管理**: 根据敏感度三级保存,差异化保留期
|
||||
3. **可观测**: 完整的度量指标和可视化面板
|
||||
4. **可控性**: 支持手动清理、导出、归档管理
|
||||
|
||||
有效降低了本地数据泄露风险,同时保持了调试和追溯能力。
|
||||
|
||||
Reference in New Issue
Block a user