agent-skills/dogfood/references/multi-service-qa.md

# Multi-Service Site QA Patterns

## Architecture Recognition

When a site has multiple subdomains or services, first map the architecture:

| Indicator | What it means |
|-----------|--------------|
| Multiple `main.py` files in subdirectories | Separate service entry points |
| `shared/` directory with auth/cookie modules | Shared authentication across services |
| Different port numbers in config | Local dev runs separate processes |
| Subdomain routing (auth.ephron.ren, blog.ephron.ren) | Production reverse proxy setup |

## Common Multi-Service Patterns (FastAPI)

```
project/
├── auth/src/main.py        # Auth service (login, register, RBAC)
├── blog/src/main.py        # Blog service (posts, comments, likes)
├── canvas/src/main.py      # Canvas service (AI-generated pages)
├── prompt/src/main.py      # Prompt service (prompt CRUD)
├── home/src/main.py        # Homepage service
├── shared/                  # Shared modules (auth, CSRF, audit, templating)
│   ├── auth_users.py
│   ├── cookie_utils.py
│   ├── csrf.py
│   ├── templating.py
│   └── ports.py            # Service URL configuration
└── main.py                  # Unified launcher (starts all services)
```

## Cross-Service Cookie Auth Testing

1. Login on auth service → get `ephron_auth` cookie
2. Verify cookie domain is `.example.com` (not service-specific)
3. Test cookie propagation: visit each service, check logged-in state
4. Test logout: logout on one service, verify all services see logged-out state

## Route File Reading Strategy

For each service, read these files in order:
1. `src/routes/pages.py` — public page routes
2. `src/routes/admin.py` — admin/management routes
3. `src/routes/api.py` — API endpoints
4. `src/routes/service_api.py` — inter-service APIs
5. `src/services/auth.py` — auth helpers (what permissions are checked)

Extract from each route:
- `@router.get("/path")` or `@router.post("/path")` → HTTP method + path
- `_require_auth(ephron_auth, request, permission="X.Y.Z")` → required permission
- `@limiter.limit("N/minute")` → rate limit
- `Form(...)` parameters → required form fields
- `Cookie(default=None)` → cookie dependencies

## Test Matrix Generation

For each discovered route, create test cases:
- **Happy path**: valid inputs, correct auth → expected success
- **Auth failure**: no cookie / wrong role → expected redirect or 403
- **Validation failure**: missing fields, invalid data → expected error
- **Rate limit**: exceed the limit → expected 429
- **CSRF**: missing/invalid CSRF token → expected rejection

## Consistency Checks Across Services

Build a comparison table:
| Feature | Service A | Service B | Service C |
|---------|-----------|-----------|-----------|
| mobile.css loaded? | ✅ | ❌ | ❌ |
| loader.js loaded? | ❌ | ✅ | ✅ |
| Site navigation? | ✅ | ✅ | ❌ |
| user-scalable? | yes | no | no |

Inconsistencies are bugs — all services sharing a design system should be consistent.

## Curl-Based QA Techniques (Session-Proven)

When browser automation is unavailable, these curl patterns reliably test multi-service sites:

### Cookie Management
```bash
# Each curl -c (save) / -b (read) needs a SEPARATE cookie file per request chain
curl -s -c /tmp/c1.txt https://auth.example.com/login > /tmp/login.html
curl -s -b /tmp/c1.txt -c /tmp/c2.txt -X POST https://auth.example.com/api/login \
  -d "username=user&password=pass&csrf_token=$CSRF" > /dev/null
# Verify: grep ephron /tmp/c2.txt
```

### CSRF Token Extraction (FastAPI/Tortoise patterns)
```bash
# Most reliable — matches name= then grabs value:
grep -oP 'name="csrf_token"[^>]*value="\K[^"]+' /tmp/page.html | head -1

# Fallback variants:
grep -oP 'csrf_token.*?value="\K[^"]+' /tmp/page.html | head -1
grep -i 'csrf' /tmp/page.html | grep -oP 'value="\K[^"]+' | head -1
```

### API Login: JSON vs Form-Encoded
```bash
# Modern FastAPI services use /api/login with JSON:
curl -s -b /tmp/c.txt -c /tmp/c.txt -X POST https://auth.example.com/api/login \
  -H "Content-Type: application/json" \
  -d '{"username":"user","password":"pass","csrf_token":"TOKEN"}'

# Legacy form-encoded (action="/login"):
curl -s -b /tmp/c.txt -c /tmp/c.txt -X POST https://auth.example.com/login \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "username=user&password=pass&csrf_token=$CSRF"
```

### Post-Login Redirect Chain
```bash
# Follow 303 redirect chain automatically:
curl -sL -b /tmp/c.txt -c /tmp/c.txt -X POST https://auth.example.com/api/login \
  -d "username=u&password=p&csrf_token=$CSRF" -w "\nHTTP:%{http_code}"
# Get final status: curl -sL ... -o /dev/null -w "%{http_code}"
```

### Health Checks (All Services at Once)
```bash
for svc in www auth blog canvas prompt; do
  result=$(curl -s "https://$svc.example.com/health")
  echo "$svc: $result"
done
```

### Security Headers (All Services)
```bash
for svc in www auth blog canvas prompt; do
  echo "=== $svc ==="
  curl -sI "https://$svc.example.com/" | grep -iE \
    'x-content-type|x-frame|referrer-policy|content-security|set-cookie'
done
```

### CSP Deep Analysis — script-src-elem Override Trap
```bash
# Extract full CSP header
curl -sI https://www.example.com/admin | grep -i content-security-policy

# Look for script-src-elem which OVERRIDES script-src for <script> elements:
# BAD:  script-src 'self' 'unsafe-inline'; script-src-elem 'self' https://cdn.example.com;
# GOOD: script-src 'self' 'unsafe-inline'; script-src-elem 'self' 'unsafe-inline' https://cdn.example.com;
#
# If script-src-elem exists without 'unsafe-inline', ALL inline <script> tags are blocked.
# Symptoms: onclick handlers call undefined functions, buttons do nothing, no JS errors in console
# (CSP violations appear as pageerror events, not console.error)
```

### Cookie Security Verification
```bash
# Capture Set-Cookie on login response:
curl -sI -c /tmp/c.txt -X POST https://auth.example.com/api/login \
  -d "username=u&password=p&csrf_token=t" 2>/dev/null | grep -i set-cookie
# Expected: HttpOnly; Secure; SameSite=lax; Max-Age=604800; Domain=.example.com
```

### Session Fixation Check
```bash
# Before login: record cookie
curl -sI -c /tmp/before.txt https://auth.example.com/login | grep -i set-cookie
# (GET requests rarely set auth cookies)

# After login: cookie must change
curl -s -b /tmp/before.txt -c /tmp/after.txt -X POST .../api/login ...
grep ephron_auth /tmp/after.txt
# Session ID must be different from before
```

### Known Rate Limits (ephron.ren observed)
```bash
# Auth login failures: 5/min → 429
# Auth registration: 6/hour → 429 (use existing test accounts)
# Blog comments: 6/min
# Blog likes toggle: 11/min
# Save/publish ops: 21/min
```

### Delegate Task Sizing for Large Test Suites

When testing 100+ cases across multiple modules, delegate_task has a 600s timeout. Size tasks carefully:

| Task Type | Max Cases per Delegate | Reason |
|-----------|----------------------|--------|
| Curl-only HTTP tests | 15-20 | Each curl = 1-3s + overhead |
| Browser interactions | 5-8 | Each interaction = 10-30s |
| Mixed curl + Playwright | 8-12 | Browser calls dominate time |

**Faster alternative**: Use `execute_code` with `from hermes_tools import terminal` for in-process execution. No delegation overhead, same capabilities.

```python
from hermes_tools import terminal
results = {}
r = terminal("curl -s -o /dev/null -w '%{http_code}' https://example.com/")
results["T-001"] = {"status": "PASS" if "200" in r["output"] else "FAIL", "detail": f"HTTP {r['output']}"}
```

### CSRF Token Synchronization Pitfall (curl)

When testing forms that require CSRF tokens, the token in the cookie changes on every GET request. If you GET a page, extract the CSRF token, then POST with a **different** cookie jar, the tokens won't match and you'll get "CSRF token 验证失败".

```bash
# WRONG: separate cookie jars for GET and POST
curl -s -b /tmp/jar1.txt https://example.com/admin > /tmp/page.html  # sets new CSRF cookie
curl -s -b /tmp/jar2.txt -X POST ... -d "csrf_token=$CSRF"           # different jar = mismatch!

# RIGHT: same cookie jar for GET and POST in sequence
curl -s -b /tmp/jar.txt -c /tmp/jar.txt https://example.com/admin > /tmp/page.html
CSRF=$(grep -oP 'name="csrf_token"[^>]*value="\K[^"]+' /tmp/page.html | head -1)
curl -s -b /tmp/jar.txt -c /tmp/jar.txt -X POST ... -d "csrf_token=$CSRF"
```

**Why this happens**: FastAPI/Starlette CSRF middleware generates a new token on each GET and stores it in the `ephron_csrf` cookie. The POST handler compares the form token against the cookie token — they must come from the same request chain.

**Multiple forms on one page**: If a page has N forms, there will be N CSRF tokens in the HTML but only ONE in the cookie. Each form's token is unique. Extract the token from the specific form you need (use context-aware parsing, not just `head -1`).

### Owner vs Admin Permission Testing Pattern

When a site has RBAC (user < admin < owner), test with all roles:

```bash
# Login as each role
for role in owner admin user; do
  curl -s -c /tmp/$role.txt -X POST https://auth.example.com/api/login \
    -d "username=Elaina_$role&password=Pass123!" -o /dev/null
done

# Test each protected endpoint with each role
for role in owner admin user; do
  status=$(curl -s -b /tmp/$role.txt -o /dev/null -w '%{http_code}' https://example.com/admin/roles)
  echo "$role -> /admin/roles: $status"
done
```

**Key insight**: If admin role can't access a page but the nav bar shows the link, it's a UX bug (hidden nav items for unauthorized roles) or a permission misconfiguration.

### Content Restoration for Destructive Tests

When tests modify content (create invite codes, publish posts, change settings):

1. **Before testing**: Save current state
   ```bash
   # Save homepage content
   curl -s -b /tmp/admin.txt https://www.example.com/admin | grep -oP 'initialContent = JSON\.parse\("\K[^"]*' > /tmp/homepage_backup.json

   # Save blog post slugs
   curl -s https://blog.example.com/ | grep -oP '/posts/[a-z0-9-]+' | sort -u > /tmp/blog_slugs.txt
   ```

2. **During testing**: Create test data with identifiable markers (e.g., `QA_TEST_TEMP` in notes/titles)

3. **After testing**: Clean up test data
   ```bash
   # Delete test invite codes
   curl -s -b /tmp/owner.txt -X POST https://auth.example.com/admin/invites/delete \
     -d "csrf_token=$CSRF&code=$TEST_CODE"
   ```

4. **Verify restoration**: Check that original content is unchanged
   ```bash
   for slug in $(cat /tmp/blog_slugs.txt); do
     status=$(curl -s -o /dev/null -w '%{http_code}' "https://blog.example.com/posts/$slug")
     echo "$slug: $status"
   done
   ```

### Module-by-Module Testing with Incremental Commits

For large QA tasks (100+ test cases across many modules), the user may want results committed after each module:

1. Create `test-results.md` with placeholder sections for all modules
2. Test module N → update the module section in test-results.md
3. `git add test-results.md && git commit -m "模块N完成: 通过X/失败Y" && git push`
4. Report progress to user
5. Repeat for next module

**Document structure per module**:
```markdown
## 模块 N：名称

**状态**: ✅ 已完成
**执行时间**: YYYY-MM-DD HH:MM - HH:MM
**测试结果**: 通过 X / 失败 Y / 阻塞 Z（共 N 项）

| 编号 | 结果 | 备注 |
|------|------|------|
| X-001 | ✅ 通过 | detail |
| X-002 | ❌ 失败 | 🔴 description |

### 模块 N 小结
- Summary bullets

### 💡 模块 N 优化建议
1. **🔴 [Critical]**: description
2. **🟡 [High]**: description
```

**Why per-module commits**: Gives the user incremental visibility, prevents data loss if the session breaks, and creates a clean git history.

### Registration Rate Limiting Pitfall

Registration endpoints typically have strict rate limits (e.g., 6/hour). When testing multiple registration scenarios (password validation, username checks, invite codes), the rate limit kicks in and blocks subsequent tests with 429, masking the real behavior.

**Workaround**:
- Test rate-limited endpoints LAST in each module
- Use existing test accounts for non-registration tests
- Note which tests were blocked by rate limiting in results
- Space out registration tests or use different IPs if possible

### Common API Field Names (FastAPI/Pydantic patterns)
```bash
# Blog likes toggle: field is `post_slug` (NOT `slug`)
curl -X POST https://blog.example.com/api/likes/toggle \
  -H "Content-Type: application/json" \
  -d '{"post_slug":"article-slug"}'

# Blog comments: post_slug + content + parent_id (nullable)
curl -X POST https://blog.example.com/api/comments/ \
  -H "Content-Type: application/json" \
  -d '{"post_slug":"article-slug","content":"text","parent_id":null}'
```

### Template Encoding Checks (BOM / Leading Whitespace)
```bash
# BOM marker: UTF-8 EF BB BF appears before DOCTYPE
xxd /tmp/page.html | head -3

# Leading newline before DOCTYPE: 0a 3c 21 44 4f ...
head -c 20 /tmp/page.html | xxd

# Python source BOM check:
xxd app.py | head -1
```

## Static Analysis Checks (no browser needed)

```bash
# Check for BOM markers
xxd file.html | head -3
# Look for: ef bb bf (UTF-8 BOM)

# Check for leading whitespace before DOCTYPE
head -c 20 file.html | xxd

# Check CSS variable definitions
grep -n "\-\-warning-bg|--error-bg|--success-bg" file.html

# Check for accessibility issues
grep -n 'user-scalable=no' *.html
grep -n 'alt=""' *.html
grep -n 'aria-hidden' *.html

# Check security headers
curl -sI https://example.com | grep -i "x-content-type|x-frame|referrer-policy|content-security"
```