first commit

2026-05-10 13:52:46 +08:00
commit ccc63d1e70
4583 changed files with 584341 additions and 0 deletions
--- a/dogfood/SKILL.md
+++ b/dogfood/SKILL.md
@@ -0,0 +1,419 @@
+---
+name: dogfood
+description: "Exploratory QA of web apps: find bugs, evidence, reports."
+version: 1.0.0
+metadata:
+  hermes:
+    tags: [qa, testing, browser, web, dogfood]
+    related_skills: []
+---
+
+# Dogfood: Systematic Web Application QA Testing
+
+## Overview
+
+This skill guides you through systematic exploratory QA testing of web applications. It supports **two execution modes** depending on context:
+
+1. **Browser-first** (default): Use browser toolset to navigate, interact, and capture evidence live.
+2. **Source-code-first** (fallback): When browser automation is unavailable, slow, or times out, analyze the source code to enumerate all routes/endpoints/pages, then build a comprehensive test plan document. Execute browser tests selectively afterward.
+
+For **multi-service sites** (e.g., separate auth/blog/canvas services), prefer the source-code-first approach — it produces more complete coverage faster than crawling.
+
+## Prerequisites
+
+- Browser toolset: either the built-in tools (`browser_navigate`, `browser_snapshot`, etc.) **or** Playwright Python (see `references/playwright-qa.md`) — optional if using source-code-first mode
+- A target URL and testing scope from the user
+- Source code access (repo clone or codebase) — strongly recommended for multi-service sites
+
+## Inputs
+
+The user provides:
+1. **Target URL** — the entry point for testing
+2. **Scope** — what areas/features to focus on (or "full site" for comprehensive testing)
+3. **Output directory** (optional) — where to save screenshots and the report (default: `./dogfood-output`)
+
+## Workflow
+
+Follow this 5-phase systematic workflow:
+
+### Phase 1: Plan
+
+1. Create the output directory structure:
+   ```
+   {output_dir}/
+   ├── screenshots/       # Evidence screenshots
+   └── report.md          # Final report (generated in Phase 5)
+   ```
+2. Identify the testing scope based on user input.
+3. Build a rough sitemap by planning which pages and features to test:
+   - Landing/home page
+   - Navigation links (header, footer, sidebar)
+   - Key user flows (sign up, login, search, checkout, etc.)
+   - Forms and interactive elements
+   - Edge cases (empty states, error pages, 404s)
+
+### Phase 2: Explore
+
+For each page or feature in your plan:
+
+1. **Navigate** to the page:
+   ```
+   browser_navigate(url="https://example.com/page")
+   ```
+
+2. **Take a snapshot** to understand the DOM structure:
+   ```
+   browser_snapshot()
+   ```
+
+3. **Check the console** for JavaScript errors:
+   ```
+   browser_console(clear=true)
+   ```
+   Do this after every navigation and after every significant interaction. Silent JS errors are high-value findings.
+
+4. **Take an annotated screenshot** to visually assess the page and identify interactive elements:
+   ```
+   browser_vision(question="Describe the page layout, identify any visual issues, broken elements, or accessibility concerns", annotate=true)
+   ```
+   The `annotate=true` flag overlays numbered `[N]` labels on interactive elements. Each `[N]` maps to ref `@eN` for subsequent browser commands.
+
+5. **Test interactive elements** systematically:
+   - Click buttons and links: `browser_click(ref="@eN")`
+   - Fill forms: `browser_type(ref="@eN", text="test input")`
+   - Test keyboard navigation: `browser_press(key="Tab")`, `browser_press(key="Enter")`
+   - Scroll through content: `browser_scroll(direction="down")`
+   - Test form validation with invalid inputs
+   - Test empty submissions
+
+6. **After each interaction**, check for:
+   - Console errors: `browser_console()`
+   - Visual changes: `browser_vision(question="What changed after the interaction?")`
+   - Expected vs actual behavior
+
+### Phase 3: Collect Evidence
+
+For every issue found:
+
+1. **Take a screenshot** showing the issue:
+   ```
+   browser_vision(question="Capture and describe the issue visible on this page", annotate=false)
+   ```
+   Save the `screenshot_path` from the response — you will reference it in the report.
+
+2. **Record the details**:
+   - URL where the issue occurs
+   - Steps to reproduce
+   - Expected behavior
+   - Actual behavior
+   - Console errors (if any)
+   - Screenshot path
+
+3. **Classify the issue** using the issue taxonomy (see `references/issue-taxonomy.md`):
+   - Severity: Critical / High / Medium / Low
+   - Category: Functional / Visual / Accessibility / Console / UX / Content
+
+### Phase 4: Categorize
+
+1. Review all collected issues.
+2. De-duplicate — merge issues that are the same bug manifesting in different places.
+3. Assign final severity and category to each issue.
+4. Sort by severity (Critical first, then High, Medium, Low).
+5. Count issues by severity and category for the executive summary.
+
+### Phase 5: Report
+
+Generate the final report using the template at `templates/dogfood-report-template.md`.
+
+The report must include:
+1. **Executive summary** with total issue count, breakdown by severity, and testing scope
+2. **Per-issue sections** with:
+   - Issue number and title
+   - Severity and category badges
+   - URL where observed
+   - Description of the issue
+   - Steps to reproduce
+   - Expected vs actual behavior
+   - Screenshot references (use `MEDIA:<screenshot_path>` for inline images)
+   - Console errors if relevant
+3. **Summary table** of all issues
+4. **Testing notes** — what was tested, what was not, any blockers
+
+Save the report to `{output_dir}/report.md`.
+
+## Alternative Workflow: Source-Code-First (for multi-service / slow-browser sites)
+
+When the target site has source code available, or browser automation is too slow/times out:
+
+### Step 1: Clone and Map the Codebase
+```bash
+git clone <repo_url> /tmp/qa-target
+cd /tmp/qa-target
+find . -name "routes*.py" -o -name "main.py" -o -name "pages.py" -o -name "admin.py" | sort
+```
+
+### Step 2: Enumerate All Routes
+Read each route file and extract:
+- HTTP method + path (e.g., `GET /posts/{slug}`)
+- Required auth/permissions
+- Rate limits
+- Form fields and validation rules
+
+Build a **complete URL inventory** — this is your test matrix.
+
+### Step 3: Analyze Static Assets and Templates
+Check template files for:
+- CSS variable definitions (look for `:root` blocks)
+- JS includes (what scripts are loaded vs missing)
+- Encoding issues (BOM markers, leading newlines before DOCTYPE)
+- Accessibility: `alt` attributes, `user-scalable`, skip links
+
+### Step 4: HTTP-Level Testing (no browser needed)
+Use `curl` to test:
+- Page loads (HTTP status codes)
+- Static asset availability
+- Response headers (security headers, CSP)
+- Redirect chains (login flows)
+- API endpoints (with/without auth cookies)
+
+### Step 5: Generate Structured Test Plan
+Output a markdown document with:
+- Service architecture table
+- Test accounts and auth mechanism
+- Per-module test case tables: `编号 | 测试内容 | 测试步骤 | 预期结果 | 优先级`
+- Known issues found during source analysis
+- Cross-cutting concerns (consistency, accessibility, security)
+
+### Step 6: Selective Browser Execution
+Only use browser automation for:
+- Login/register interactive flows
+- Visual verification of known issues
+- Console error capture
+- Screenshot evidence
+
+## Alternative Workflow: Test Plan Gap Analysis (improving existing plans)
+
+When the user already has a test plan and wants to **improve/complete it** against source code:
+
+### Step 1: Load Both Inputs in Parallel
+```
+1. Read the existing test-plan.md
+2. Clone or pull the target repo
+3. Use delegate_task to analyze ALL route files + security mechanisms in one pass
+   - Extract: endpoints, form fields, rate limits, cookie params, CSRF mechanism,
+     validation rules, ownership models, public APIs
+   - Focus on FEATURES not COVERED by the existing plan
+```
+
+### Step 2: Systematic Comparison
+For each service, compare source code findings against existing test cases:
+- **Endpoints**: Are all HTTP methods + paths covered?
+- **Validation rules**: Password complexity, username blacklists, email uniqueness, slug format
+- **Rate limits**: Are all limiters documented with correct values?
+- **Security mechanisms**: CSRF token format/expiry, cookie attributes, redirect validation
+- **APIs**: Public JSON APIs, service-to-service APIs, ownership isolation
+- **Edge cases**: CRLF normalization, content size limits, cascading deletes
+
+### Step 3: Patch, Don't Rewrite
+Use targeted `patch` edits to add missing test cases within existing sections:
+- Insert after the related existing case (e.g., `A-015a` after `A-015`)
+- Use sub-numbering convention: `X-NNNa` for insertions between `X-0NN` and `X-0NN+1`
+- Preserve existing case numbers — never renumber
+- Add new subsections only when the entire category is missing (e.g., Service API)
+
+### Step 4: Update Statistics
+After all patches:
+```bash
+# Count per-module
+grep -c '^| H-[0-9]' test-plan.md  # Home
+grep -c '^| A-[0-9]' test-plan.md  # Auth
+# ... etc for each prefix
+
+# Count total (catches sub-numbered too)
+grep -c '^| [A-Z]*-[0-9]' test-plan.md
+```
+Update both the header stats table and the footer summary table.
+Bump the version number.
+
+### Step 5: Commit
+```bash
+git add test-plan.md && git commit -m "vN.M: 完善测试计划，新增 X 个测试用例 (old→new)" && git push
+```
+
+### Pitfalls for Gap Analysis
+- **Don't renumber existing cases**: Use `X-NNNa` sub-numbering to insert between existing cases. Renumbering breaks any existing references (issue tracker, test automation).
+- **Count carefully**: `grep -c '^| X-[0-9]'` misses sub-numbered entries like `A-015a`. Use `'^| [A-Z]*-[0-9]'` for total count, but per-module counts with the prefix filter are usually accurate enough.
+- **Don't duplicate**: Check if a concept is already covered under a different name before adding. "草稿可见" and "草稿预览" might be the same test.
+- **delegate_task for source analysis**: Don't read 40+ route files manually. A single delegate_task with a well-structured prompt produces a complete analysis in one pass.
+
+## Alternative Workflow: Module-by-Module Testing with Incremental Commits
+
+When the user has an existing test plan (e.g., `test-plan.md` in a repo) and wants to execute it module by module, committing results after each:
+
+### Step 1: Initialize Results Document
+Create `test-results.md` with a summary table and placeholder sections for every module. Include: module name, status (⏳), execution time, and empty test result tables.
+
+### Step 2: Test Module → Update → Commit Loop
+For each module:
+1. Execute tests (curl for HTTP-level, Playwright for browser-level)
+2. Update the module's section in `test-results.md` with results
+3. Update the summary table (pass/fail/blocked counts)
+4. Add a "模块 N 小结" section with key findings
+5. Add a "💡 模块 N 优化建议" section with prioritized recommendations (user explicitly wants these persisted in the document, not just in chat)
+6. `git add test-results.md && git commit -m "模块N: 通过X/失败Y" && git push`
+7. Report progress to user before starting next module
+8. **⚠️ If any test modified content, restore it BEFORE committing**
+
+### Step 3: Parallel Delegation
+Use `delegate_task` with 3 parallel tasks for curl-based modules. Each task tests a group of modules and returns JSON results. Browser-based modules must run sequentially.
+
+### Pitfalls for Module-by-Module
+- **Don't wait until the end to commit**: Session may break, losing all work
+- **Restore content after destructive tests**: Save state before, verify after
+- **Rate limiting blocks repeated tests**: Test rate-limited endpoints last
+- **CSRF token sync**: Use same cookie jar for GET+POST (see `references/multi-service-qa.md`)
+- **Optimization suggestions go in the document, not just the chat**: User wants them persisted
+
+## Deliverables: Split Test Plan + Issue List
+
+**Always split QA output into two separate documents:**
+
+1. **`test-plan.md`** — Structured test cases with execution steps and expected results
+2. **`issue-list.md`** — Known issues found during analysis, with severity and fix suggestions
+
+Do NOT merge them into one report. Users need the test plan for execution assignment and the issue list for bug tracking. Each document should be self-contained.
+
+### Test Plan Structure
+- Service architecture table (services, URLs, ports)
+- Test accounts and auth mechanism explanation
+- Per-module test case tables: `编号 | 测试内容 | 测试步骤 | 预期结果 | 账号 | 优先级`
+- Cover both public pages AND admin/management pages
+- Include a cross-cutting section for security, consistency, accessibility
+
+### Issue List Structure
+- Summary table (count by severity level)
+- Per-issue entries: module, page, phenomenon, impact, root cause, source code location, fix suggestion, priority
+- Consistency matrix (which pages have which features/assets)
+- Fix priority recommendation (immediate / soon / backlog)
+
+## Comprehensive QA Dimensions Checklist
+
+When the user asks for "full" or "complete" testing, cover ALL of these dimensions. If you only covered page navigation and login flows, the plan is incomplete.
+
+### Core (always cover)
+1. **Functional** — Page loads, navigation, CRUD operations, form submissions
+2. **Auth & Permissions** — Login/logout, RBAC, cross-service cookie propagation, admin vs user access
+3. **Input Validation** — Form validation, empty submissions, boundary values, special characters
+
+### Security (cover for any site with user input)
+4. **Cookie Security** — HttpOnly, Secure, SameSite attributes; Max-Age; domain scope
+5. **CSRF Protection** — Token presence, double-submit pattern, token expiry, replay resistance
+6. **Redirect Safety** — Open redirect via `redirect` parameter; validate against allowed domains
+7. **Rate Limiting** — Per-endpoint limits; account lockout; IP-based limits
+8. **File Upload Safety** — Allowed extensions, size limits, filename sanitization, path traversal prevention
+9. **Input Injection** — XSS in user-generated content, SQL injection attempts, path traversal in slugs
+
+### Session & State
+10. **Token Lifecycle** — Expiration behavior, role changes mid-session (DB role vs token role), token format validation
+11. **Concurrent Access** — Race conditions on shared resources, optimistic locking
+
+### Content & Rendering
+12. **Edge Case Content** — Empty states, very long text, special characters (CJK, emoji), Markdown/LaTeX rendering
+13. **Encoding** — BOM markers, UTF-8 consistency, DOCTYPE prefix cleanliness
+
+### SEO & Metadata
+14. **Meta Tags** — `<title>`, `<meta description>`, canonical URLs
+15. **Open Graph** — `og:title`, `og:description`, `og:image`, `og:url` per page
+16. **Structured Data** — robots.txt, sitemap.xml, RSS feed validity
+
+### Accessibility
+17. **WCAG Basics** — `alt` attributes, `user-scalable`, color contrast, skip-to-content links
+18. **Keyboard Navigation** — Tab order, focus management, ARIA labels
+
+### Performance & Compatibility
+19. **Page Load** — Static asset availability, CDN reliability (especially in China), resource count
+20. **Responsive Design** — Breakpoints, mobile layout, touch targets
+21. **Cross-Browser** — Chrome/Firefox/Safari/Edge rendering differences
+
+### Operations
+22. **Health Checks** — `/health` endpoint availability per service
+23. **Error Handling** — 404 pages, 500 error responses, graceful degradation
+24. **Logging & Audit** — Audit trail for admin actions, login attempts
+
+### Consistency (cross-service)
+25. **Asset Inclusion** — Which pages include mobile.css, loader.js, etc.
+26. **Navigation** — Which pages have site-wide nav bar
+27. **Security Headers** — X-Content-Type-Options, X-Frame-Options, Referrer-Policy, CSP
+
+## Pitfalls
+
+- **🔴 CRITICAL: Always backup content before write operations**: When testing CRUD endpoints (save, publish, create, update), the test payload (including XSS test strings, dummy data, empty fields) CAN overwrite real production content. Before any write test:
+  1. `curl -s -b cookies SITE/admin` → extract current content_json / initialContent → save to `/tmp/backup_<service>.json`
+  2. Perform test
+  3. Restore original content via Playwright (set form fields + `collectFormData()` + submit)
+  - **This is not optional.** A session that deletes user content without restoring it is a failed session.
+- **🔴 CRITICAL: Restore content IMMEDIATELY after destructive tests**: Don't wait until end of session. If a test modifies content, restore it in the same turn. Session interruptions, timeouts, or context limits can prevent later restoration.
+- **🔴 CRITICAL: XSS payloads in form fields persist**: When you fill a form field with `<script>alert(1)</script>` for XSS testing, that value gets saved to the database if the form is submitted. Always use Playwright's `page.evaluate()` to set values directly on form elements, NOT `page.fill()` which triggers input events that may activate auto-save.
+- **⚠️ Do NOT parallelize browser delegate_tasks for QA**: Each browser interaction is slow (navigate + snapshot + screenshot = 10-30s). 3 parallel browser tasks will all timeout at 600s. Run browser tests sequentially or use source-code-first mode.
+- **⚠️ Curl-only delegate tasks also timeout with large batches**: A delegate_task with 30+ curl test cases can hit the 600s limit (each curl call = 1-3s + overhead). Split large test batches into smaller tasks (~15-20 cases each) or use `execute_code` with `from hermes_tools import terminal` for direct in-process execution (faster, no delegation overhead).
+- **⚠️ Client-side-only validation is a security finding**: When CSP blocks inline JS (see `script-src-elem` pitfall), any validation that only exists in client JS (password strength, field format, confirmation matching) becomes bypassable. Always test registration/submission with curl to verify server-side validation exists independently.
+- **⚠️ API authentication order matters**: Some endpoints validate request body BEFORE checking authentication, returning 422 (validation error) instead of 401 (unauthenticated). Test: `curl -X POST /api/endpoint -d 'invalid'` without auth — should get 401, not 422. This is a security issue (leaks endpoint existence and field requirements).
+- **⚠️ Fulltext search can silently fail**: Search endpoints with `mode=fulltext` may return 0 results while `mode=simple` works fine. Always test both modes with the same query. Common causes: search index not built, tokenizer (jieba) not installed, BM25 ranking misconfigured.
+- **⚠️ Rate limiting blocks subsequent tests**: Registration endpoints with strict limits (e.g., 6/hour) will block all remaining registration-related tests with 429. Strategy: test non-registration endpoints first, registration tests last, and note which tests were blocked.
+- **⚠️ Present the test plan BEFORE executing**: Show the user the complete test plan first. If they say "is this really all of it?", the plan is missing dimensions. Refer to the Comprehensive QA Dimensions Checklist above.
+- **⚠️ "全部加上" means ALL dimensions**: When the user says to add everything, do not skip any dimension. Write all 25+ categories into the test plan even if some have only 1-2 test cases.
+- **Multi-service auth**: Sites with shared cookies (e.g., `.ephron.ren` domain) need login on ONE service first, then verify cookie propagation to others. Don't try to login on each service independently.
+- **Encoding bugs**: Always hex-dump HTML source to check for BOM markers (`ef bb bf`) or leading newlines before DOCTYPE. Use: `xxd file.html | head -5`. For Python source files, also check: `xxd file.py | head -1`.
+- **CSRF tokens**: Many form submissions require CSRF tokens. Extract from the page first, then include in POST requests. Don't forget the CSRF cookie (`ephron_csrf`). Note: CSRF cookies are HttpOnly=false (by design, so JS can read them).
+- **Rate limits**: Note rate limit values from source code (e.g., `@limiter.limit("5/minute")`). When testing auth failures, stay under the limit or you'll get 429s that mask the real bug.
+- **Template vs runtime issues**: Some issues (empty content, missing sections) may be data issues, not code bugs. Verify by checking if the data source (database/content files) actually has content.
+- **File delivery fallback**: When sending files via QQ/WeChat fails, push to a Gitea repo as a fallback delivery mechanism.
+- **Source code security analysis**: Always check these files when available: `cookie_utils.py` (cookie params), `csrf.py` (CSRF mechanism), `redirect.py` (open redirect validation), `security_headers.py` (CSP/headers), `auth.py` (token format, lockout), `validators.py` (slug/path validation), `limiter.py` (rate limit config).
+- **⚠️ CSP `script-src-elem` silently kills inline JS**: When a page has inline `<script>` but buttons call functions defined there (e.g., `onclick="saveDraft()"`), always verify the CSP header. The `script-src-elem` directive **overrides** `script-src` for script elements — so `script-src 'unsafe-inline'` combined with `script-src-elem 'self' https://cdn.example.com` blocks ALL inline scripts. Symptoms: functions report "not defined", buttons do nothing, no network requests on click. Detection: check `typeof fnName` in browser console, or look for CSP error in console: `Executing inline script violates the following Content Security Policy directive 'script-src-elem'`. Fix: add `'unsafe-inline'` to `script-src-elem`, use nonce/hash, or extract inline scripts to external `.js` files.
+- **⚠️ CSP `form-action 'self'` blocks cross-origin redirects after form submission**: When a form POSTs to a same-origin endpoint (allowed by `form-action 'self'`), but the server responds with 303 redirect to a **different origin** (e.g., `auth.example.com` → `www.example.com`), the browser blocks the redirect. CSP `form-action` applies to the **entire redirect chain** resulting from form submission, not just the form's action URL. Symptoms: form appears to submit (POST in network tab), cookie gets set server-side, but page stays on the form URL — no navigation. Console error: `Sending form data to '...' violates the following Content Security Policy directive: "form-action 'self'"`. Detection: (1) test same-origin redirect (should work) vs cross-origin redirect (should fail); (2) `curl -sI` the 303 response — if it carries CSP with `form-action 'self'`, that's the blocker. Fix options: (a) skip CSP header on 303 redirect responses (empty body, CSP adds no protection); (b) use JS-based redirect instead of server-side 303; (c) add allowed origins to `form-action`. Key insight: this breaks any auth flow where login service is on a different subdomain than target pages. See `references/session-learnings-ephron-qa.md` for full reproduction steps.
+
+## Scope Ambiguity Pitfall
+
+When the user asks to "inspect a server" or "巡检服务器" **without providing a URL**:
+- Clarify whether they mean the **local machine Hermes runs on** (system resources, running processes, disk/memory) or a **remote web service** (HTTP endpoints, app health).
+- **Default assumption**: If the user mentions a domain name (e.g., "巡检 ephron.ren" or "check blog.ephron.ren"), they mean the remote web service. If they say "your server" or "the machine you're on", they mean the local machine.
+- When in doubt, ask: "是巡检本机还是远程服务？"
+
+## Tools Reference
+
+| Tool | Purpose |
+|------|---------|
+| `browser_navigate` | Go to a URL |
+| `browser_snapshot` | Get DOM text snapshot (accessibility tree) |
+| `browser_click` | Click an element by ref (`@eN`) or text |
+| `browser_type` | Type into an input field |
+| `browser_scroll` | Scroll up/down on the page |
+| `browser_back` | Go back in browser history |
+| `browser_press` | Press a keyboard key |
+| `browser_vision` | Screenshot + AI analysis; use `annotate=true` for element labels |
+| `browser_console` | Get JS console output and errors |
+| **Playwright Python** | Full browser automation via script — use when built-in tools unavailable or need programmatic control (see `references/playwright-qa.md`) |
+
+## Related References
+
+- `references/issue-taxonomy.md` — severity and category classification for issues
+- `references/server-inspection.md` — local server inspection checklist: system resources, listening ports, processes, Docker, security services; also covers scope ambiguity (local vs. remote), route file reading strategy, cross-service cookie auth testing, static analysis checks
+- `references/qa-dimensions-checklist.md` — comprehensive 25-dimension QA checklist for "full site" testing requests
+- `references/playwright-qa.md` — Playwright Python setup, patterns, event monitoring, CSP bug detection
+- `references/session-learnings-ephron-qa.md` — concrete findings from ephron.ren QA: CSP override, password validation gaps, fulltext search failure, delegate sizing
+
+## Templates
+
+- `templates/dogfood-report-template.md` — issue list template (the output with bugs found)
+- `templates/test-plan-template.md` — test plan template (structured test cases with steps)
+
+## Tips
+
+- **Always check `browser_console()` after navigating and after significant interactions.** Silent JS errors are among the most valuable findings.
+- **Use `annotate=true` with `browser_vision`** when you need to reason about interactive element positions or when the snapshot refs are unclear.
+- **Test with both valid and invalid inputs** — form validation bugs are common.
+- **Scroll through long pages** — content below the fold may have rendering issues.
+- **Test navigation flows** — click through multi-step processes end-to-end.
+- **Check responsive behavior** by noting any layout issues visible in screenshots.
+- **Don't forget edge cases**: empty states, very long text, special characters, rapid clicking.
+- When reporting screenshots to the user, include `MEDIA:<screenshot_path>` so they can see the evidence inline.
--- a/dogfood/references/issue-taxonomy.md
+++ b/dogfood/references/issue-taxonomy.md
@@ -0,0 +1,109 @@
+# Issue Taxonomy
+
+Use this taxonomy to classify issues found during dogfood QA testing.
+
+## Severity Levels
+
+### Critical
+The issue makes a core feature completely unusable or causes data loss.
+
+**Examples:**
+- Application crashes or shows a blank white page
+- Form submission silently loses user data
+- Authentication is completely broken (can't log in at all)
+- Payment flow fails and charges the user without completing the order
+- Security vulnerability (e.g., XSS, exposed credentials in console)
+
+### High
+The issue significantly impairs functionality but a workaround may exist.
+
+**Examples:**
+- A key button does nothing when clicked (but refreshing fixes it)
+- Search returns no results for valid queries
+- Form validation rejects valid input
+- Page loads but critical content is missing or garbled
+- Navigation link leads to a 404 or wrong page
+- Uncaught JavaScript exceptions in the console on core pages
+
+### Medium
+The issue is noticeable and affects user experience but doesn't block core functionality.
+
+**Examples:**
+- Layout is misaligned or overlapping on certain screen sections
+- Images fail to load (broken image icons)
+- Slow performance (visible loading delays > 3 seconds)
+- Form field lacks proper validation feedback (no error message on bad input)
+- Console warnings that suggest deprecated or misconfigured features
+- Inconsistent styling between similar pages
+
+### Low
+Minor polish issues that don't affect functionality.
+
+**Examples:**
+- Typos or grammatical errors in text content
+- Minor spacing or alignment inconsistencies
+- Placeholder text left in production ("Lorem ipsum")
+- Favicon missing
+- Console info/debug messages that shouldn't be in production
+- Subtle color contrast issues that don't fail WCAG requirements
+
+## Categories
+
+### Functional
+Issues where features don't work as expected.
+
+- Buttons/links that don't respond
+- Forms that don't submit or submit incorrectly
+- Broken user flows (can't complete a multi-step process)
+- Incorrect data displayed
+- Features that work partially
+
+### Visual
+Issues with the visual presentation of the page.
+
+- Layout problems (overlapping elements, broken grids)
+- Broken images or missing media
+- Styling inconsistencies
+- Responsive design failures
+- Z-index issues (elements hidden behind others)
+- Text overflow or truncation
+
+### Accessibility
+Issues that prevent or hinder access for users with disabilities.
+
+- Missing alt text on meaningful images
+- Poor color contrast (fails WCAG AA)
+- Elements not reachable via keyboard navigation
+- Missing form labels or ARIA attributes
+- Focus indicators missing or unclear
+- Screen reader incompatible content
+
+### Console
+Issues detected through JavaScript console output.
+
+- Uncaught exceptions and unhandled promise rejections
+- Failed network requests (4xx, 5xx errors in console)
+- Deprecation warnings
+- CORS errors
+- Mixed content warnings (HTTP resources on HTTPS page)
+- Excessive console.log output left from development
+
+### UX (User Experience)
+Issues where functionality works but the experience is poor.
+
+- Confusing navigation or information architecture
+- Missing loading indicators (user doesn't know something is happening)
+- No feedback after user actions (e.g., button click with no visible result)
+- Inconsistent interaction patterns
+- Missing confirmation dialogs for destructive actions
+- Poor error messages that don't help the user recover
+
+### Content
+Issues with the text, media, or information on the page.
+
+- Typos and grammatical errors
+- Placeholder/dummy content in production
+- Outdated information
+- Missing content (empty sections)
+- Broken or dead links to external resources
+- Incorrect or misleading labels
--- a/dogfood/references/multi-service-qa.md
+++ b/dogfood/references/multi-service-qa.md
@@ -0,0 +1,356 @@
+# Multi-Service Site QA Patterns
+
+## Architecture Recognition
+
+When a site has multiple subdomains or services, first map the architecture:
+
+| Indicator | What it means |
+|-----------|--------------|
+| Multiple `main.py` files in subdirectories | Separate service entry points |
+| `shared/` directory with auth/cookie modules | Shared authentication across services |
+| Different port numbers in config | Local dev runs separate processes |
+| Subdomain routing (auth.ephron.ren, blog.ephron.ren) | Production reverse proxy setup |
+
+## Common Multi-Service Patterns (FastAPI)
+
+```
+project/
+├── auth/src/main.py        # Auth service (login, register, RBAC)
+├── blog/src/main.py        # Blog service (posts, comments, likes)
+├── canvas/src/main.py      # Canvas service (AI-generated pages)
+├── prompt/src/main.py      # Prompt service (prompt CRUD)
+├── home/src/main.py        # Homepage service
+├── shared/                  # Shared modules (auth, CSRF, audit, templating)
+│   ├── auth_users.py
+│   ├── cookie_utils.py
+│   ├── csrf.py
+│   ├── templating.py
+│   └── ports.py            # Service URL configuration
+└── main.py                  # Unified launcher (starts all services)
+```
+
+## Cross-Service Cookie Auth Testing
+
+1. Login on auth service → get `ephron_auth` cookie
+2. Verify cookie domain is `.example.com` (not service-specific)
+3. Test cookie propagation: visit each service, check logged-in state
+4. Test logout: logout on one service, verify all services see logged-out state
+
+## Route File Reading Strategy
+
+For each service, read these files in order:
+1. `src/routes/pages.py` — public page routes
+2. `src/routes/admin.py` — admin/management routes  
+3. `src/routes/api.py` — API endpoints
+4. `src/routes/service_api.py` — inter-service APIs
+5. `src/services/auth.py` — auth helpers (what permissions are checked)
+
+Extract from each route:
+- `@router.get("/path")` or `@router.post("/path")` → HTTP method + path
+- `_require_auth(ephron_auth, request, permission="X.Y.Z")` → required permission
+- `@limiter.limit("N/minute")` → rate limit
+- `Form(...)` parameters → required form fields
+- `Cookie(default=None)` → cookie dependencies
+
+## Test Matrix Generation
+
+For each discovered route, create test cases:
+- **Happy path**: valid inputs, correct auth → expected success
+- **Auth failure**: no cookie / wrong role → expected redirect or 403
+- **Validation failure**: missing fields, invalid data → expected error
+- **Rate limit**: exceed the limit → expected 429
+- **CSRF**: missing/invalid CSRF token → expected rejection
+
+## Consistency Checks Across Services
+
+Build a comparison table:
+| Feature | Service A | Service B | Service C |
+|---------|-----------|-----------|-----------|
+| mobile.css loaded? | ✅ | ❌ | ❌ |
+| loader.js loaded? | ❌ | ✅ | ✅ |
+| Site navigation? | ✅ | ✅ | ❌ |
+| user-scalable? | yes | no | no |
+
+Inconsistencies are bugs — all services sharing a design system should be consistent.
+
+## Curl-Based QA Techniques (Session-Proven)
+
+When browser automation is unavailable, these curl patterns reliably test multi-service sites:
+
+### Cookie Management
+```bash
+# Each curl -c (save) / -b (read) needs a SEPARATE cookie file per request chain
+curl -s -c /tmp/c1.txt https://auth.example.com/login > /tmp/login.html
+curl -s -b /tmp/c1.txt -c /tmp/c2.txt -X POST https://auth.example.com/api/login \
+  -d "username=user&password=pass&csrf_token=$CSRF" > /dev/null
+# Verify: grep ephron /tmp/c2.txt
+```
+
+### CSRF Token Extraction (FastAPI/Tortoise patterns)
+```bash
+# Most reliable — matches name= then grabs value:
+grep -oP 'name="csrf_token"[^>]*value="\K[^"]+' /tmp/page.html | head -1
+
+# Fallback variants:
+grep -oP 'csrf_token.*?value="\K[^"]+' /tmp/page.html | head -1
+grep -i 'csrf' /tmp/page.html | grep -oP 'value="\K[^"]+' | head -1
+```
+
+### API Login: JSON vs Form-Encoded
+```bash
+# Modern FastAPI services use /api/login with JSON:
+curl -s -b /tmp/c.txt -c /tmp/c.txt -X POST https://auth.example.com/api/login \
+  -H "Content-Type: application/json" \
+  -d '{"username":"user","password":"pass","csrf_token":"TOKEN"}'
+
+# Legacy form-encoded (action="/login"):
+curl -s -b /tmp/c.txt -c /tmp/c.txt -X POST https://auth.example.com/login \
+  -H "Content-Type: application/x-www-form-urlencoded" \
+  -d "username=user&password=pass&csrf_token=$CSRF"
+```
+
+### Post-Login Redirect Chain
+```bash
+# Follow 303 redirect chain automatically:
+curl -sL -b /tmp/c.txt -c /tmp/c.txt -X POST https://auth.example.com/api/login \
+  -d "username=u&password=p&csrf_token=$CSRF" -w "\nHTTP:%{http_code}"
+# Get final status: curl -sL ... -o /dev/null -w "%{http_code}"
+```
+
+### Health Checks (All Services at Once)
+```bash
+for svc in www auth blog canvas prompt; do
+  result=$(curl -s "https://$svc.example.com/health")
+  echo "$svc: $result"
+done
+```
+
+### Security Headers (All Services)
+```bash
+for svc in www auth blog canvas prompt; do
+  echo "=== $svc ==="
+  curl -sI "https://$svc.example.com/" | grep -iE \
+    'x-content-type|x-frame|referrer-policy|content-security|set-cookie'
+done
+```
+
+### CSP Deep Analysis — script-src-elem Override Trap
+```bash
+# Extract full CSP header
+curl -sI https://www.example.com/admin | grep -i content-security-policy
+
+# Look for script-src-elem which OVERRIDES script-src for <script> elements:
+# BAD:  script-src 'self' 'unsafe-inline'; script-src-elem 'self' https://cdn.example.com;
+# GOOD: script-src 'self' 'unsafe-inline'; script-src-elem 'self' 'unsafe-inline' https://cdn.example.com;
+#
+# If script-src-elem exists without 'unsafe-inline', ALL inline <script> tags are blocked.
+# Symptoms: onclick handlers call undefined functions, buttons do nothing, no JS errors in console
+# (CSP violations appear as pageerror events, not console.error)
+```
+
+### Cookie Security Verification
+```bash
+# Capture Set-Cookie on login response:
+curl -sI -c /tmp/c.txt -X POST https://auth.example.com/api/login \
+  -d "username=u&password=p&csrf_token=t" 2>/dev/null | grep -i set-cookie
+# Expected: HttpOnly; Secure; SameSite=lax; Max-Age=604800; Domain=.example.com
+```
+
+### Session Fixation Check
+```bash
+# Before login: record cookie
+curl -sI -c /tmp/before.txt https://auth.example.com/login | grep -i set-cookie
+# (GET requests rarely set auth cookies)
+
+# After login: cookie must change
+curl -s -b /tmp/before.txt -c /tmp/after.txt -X POST .../api/login ...
+grep ephron_auth /tmp/after.txt
+# Session ID must be different from before
+```
+
+### Known Rate Limits (ephron.ren observed)
+```bash
+# Auth login failures: 5/min → 429
+# Auth registration: 6/hour → 429 (use existing test accounts)
+# Blog comments: 6/min
+# Blog likes toggle: 11/min
+# Save/publish ops: 21/min
+```
+
+### Delegate Task Sizing for Large Test Suites
+
+When testing 100+ cases across multiple modules, delegate_task has a 600s timeout. Size tasks carefully:
+
+| Task Type | Max Cases per Delegate | Reason |
+|-----------|----------------------|--------|
+| Curl-only HTTP tests | 15-20 | Each curl = 1-3s + overhead |
+| Browser interactions | 5-8 | Each interaction = 10-30s |
+| Mixed curl + Playwright | 8-12 | Browser calls dominate time |
+
+**Faster alternative**: Use `execute_code` with `from hermes_tools import terminal` for in-process execution. No delegation overhead, same capabilities.
+
+```python
+from hermes_tools import terminal
+results = {}
+r = terminal("curl -s -o /dev/null -w '%{http_code}' https://example.com/")
+results["T-001"] = {"status": "PASS" if "200" in r["output"] else "FAIL", "detail": f"HTTP {r['output']}"}
+```
+
+### CSRF Token Synchronization Pitfall (curl)
+
+When testing forms that require CSRF tokens, the token in the cookie changes on every GET request. If you GET a page, extract the CSRF token, then POST with a **different** cookie jar, the tokens won't match and you'll get "CSRF token 验证失败".
+
+```bash
+# WRONG: separate cookie jars for GET and POST
+curl -s -b /tmp/jar1.txt https://example.com/admin > /tmp/page.html  # sets new CSRF cookie
+curl -s -b /tmp/jar2.txt -X POST ... -d "csrf_token=$CSRF"           # different jar = mismatch!
+
+# RIGHT: same cookie jar for GET and POST in sequence
+curl -s -b /tmp/jar.txt -c /tmp/jar.txt https://example.com/admin > /tmp/page.html
+CSRF=$(grep -oP 'name="csrf_token"[^>]*value="\K[^"]+' /tmp/page.html | head -1)
+curl -s -b /tmp/jar.txt -c /tmp/jar.txt -X POST ... -d "csrf_token=$CSRF"
+```
+
+**Why this happens**: FastAPI/Starlette CSRF middleware generates a new token on each GET and stores it in the `ephron_csrf` cookie. The POST handler compares the form token against the cookie token — they must come from the same request chain.
+
+**Multiple forms on one page**: If a page has N forms, there will be N CSRF tokens in the HTML but only ONE in the cookie. Each form's token is unique. Extract the token from the specific form you need (use context-aware parsing, not just `head -1`).
+
+### Owner vs Admin Permission Testing Pattern
+
+When a site has RBAC (user < admin < owner), test with all roles:
+
+```bash
+# Login as each role
+for role in owner admin user; do
+  curl -s -c /tmp/$role.txt -X POST https://auth.example.com/api/login \
+    -d "username=Elaina_$role&password=Pass123!" -o /dev/null
+done
+
+# Test each protected endpoint with each role
+for role in owner admin user; do
+  status=$(curl -s -b /tmp/$role.txt -o /dev/null -w '%{http_code}' https://example.com/admin/roles)
+  echo "$role -> /admin/roles: $status"
+done
+```
+
+**Key insight**: If admin role can't access a page but the nav bar shows the link, it's a UX bug (hidden nav items for unauthorized roles) or a permission misconfiguration.
+
+### Content Restoration for Destructive Tests
+
+When tests modify content (create invite codes, publish posts, change settings):
+
+1. **Before testing**: Save current state
+   ```bash
+   # Save homepage content
+   curl -s -b /tmp/admin.txt https://www.example.com/admin | grep -oP 'initialContent = JSON\.parse\("\K[^"]*' > /tmp/homepage_backup.json
+   
+   # Save blog post slugs
+   curl -s https://blog.example.com/ | grep -oP '/posts/[a-z0-9-]+' | sort -u > /tmp/blog_slugs.txt
+   ```
+
+2. **During testing**: Create test data with identifiable markers (e.g., `QA_TEST_TEMP` in notes/titles)
+
+3. **After testing**: Clean up test data
+   ```bash
+   # Delete test invite codes
+   curl -s -b /tmp/owner.txt -X POST https://auth.example.com/admin/invites/delete \
+     -d "csrf_token=$CSRF&code=$TEST_CODE"
+   ```
+
+4. **Verify restoration**: Check that original content is unchanged
+   ```bash
+   for slug in $(cat /tmp/blog_slugs.txt); do
+     status=$(curl -s -o /dev/null -w '%{http_code}' "https://blog.example.com/posts/$slug")
+     echo "$slug: $status"
+   done
+   ```
+
+### Module-by-Module Testing with Incremental Commits
+
+For large QA tasks (100+ test cases across many modules), the user may want results committed after each module:
+
+1. Create `test-results.md` with placeholder sections for all modules
+2. Test module N → update the module section in test-results.md
+3. `git add test-results.md && git commit -m "模块N完成: 通过X/失败Y" && git push`
+4. Report progress to user
+5. Repeat for next module
+
+**Document structure per module**:
+```markdown
+## 模块 N：名称
+
+**状态**: ✅ 已完成
+**执行时间**: YYYY-MM-DD HH:MM - HH:MM
+**测试结果**: 通过 X / 失败 Y / 阻塞 Z（共 N 项）
+
+| 编号 | 结果 | 备注 |
+|------|------|------|
+| X-001 | ✅ 通过 | detail |
+| X-002 | ❌ 失败 | 🔴 description |
+
+### 模块 N 小结
+- Summary bullets
+
+### 💡 模块 N 优化建议
+1. **🔴 [Critical]**: description
+2. **🟡 [High]**: description
+```
+
+**Why per-module commits**: Gives the user incremental visibility, prevents data loss if the session breaks, and creates a clean git history.
+
+### Registration Rate Limiting Pitfall
+
+Registration endpoints typically have strict rate limits (e.g., 6/hour). When testing multiple registration scenarios (password validation, username checks, invite codes), the rate limit kicks in and blocks subsequent tests with 429, masking the real behavior.
+
+**Workaround**:
+- Test rate-limited endpoints LAST in each module
+- Use existing test accounts for non-registration tests
+- Note which tests were blocked by rate limiting in results
+- Space out registration tests or use different IPs if possible
+
+### Common API Field Names (FastAPI/Pydantic patterns)
+```bash
+# Blog likes toggle: field is `post_slug` (NOT `slug`)
+curl -X POST https://blog.example.com/api/likes/toggle \
+  -H "Content-Type: application/json" \
+  -d '{"post_slug":"article-slug"}'
+
+# Blog comments: post_slug + content + parent_id (nullable)
+curl -X POST https://blog.example.com/api/comments/ \
+  -H "Content-Type: application/json" \
+  -d '{"post_slug":"article-slug","content":"text","parent_id":null}'
+```
+
+### Template Encoding Checks (BOM / Leading Whitespace)
+```bash
+# BOM marker: UTF-8 EF BB BF appears before DOCTYPE
+xxd /tmp/page.html | head -3
+
+# Leading newline before DOCTYPE: 0a 3c 21 44 4f ...
+head -c 20 /tmp/page.html | xxd
+
+# Python source BOM check:
+xxd app.py | head -1
+```
+
+## Static Analysis Checks (no browser needed)
+
+```bash
+# Check for BOM markers
+xxd file.html | head -3
+# Look for: ef bb bf (UTF-8 BOM)
+
+# Check for leading whitespace before DOCTYPE
+head -c 20 file.html | xxd
+
+# Check CSS variable definitions
+grep -n "\-\-warning-bg|--error-bg|--success-bg" file.html
+
+# Check for accessibility issues
+grep -n 'user-scalable=no' *.html
+grep -n 'alt=""' *.html
+grep -n 'aria-hidden' *.html
+
+# Check security headers
+curl -sI https://example.com | grep -i "x-content-type|x-frame|referrer-policy|content-security"
+```
--- a/dogfood/references/playwright-qa.md
+++ b/dogfood/references/playwright-qa.md
@@ -0,0 +1,156 @@
+# Playwright Python for QA Testing
+
+## Environment Setup
+
+Playwright Python is available on this system:
+- **Package**: `/home/ubuntu/.hermes/hermes-agent/venv/lib/python3.11/site-packages/playwright/`
+- **Chromium**: `~/.cache/ms-playwright/chromium-1217/`
+- **Import**: `from playwright.sync_api import sync_playwright`
+- **Run**: `python3 script.py` (not `node` — Playwright Node module may not be installed)
+
+## Basic Pattern
+
+```python
+from playwright.sync_api import sync_playwright
+
+with sync_playwright() as p:
+    browser = p.chromium.launch(headless=True)
+    context = browser.new_context()
+    page = context.new_page()
+
+    # Login
+    page.goto('https://auth.example.com/login')
+    page.wait_for_load_state('networkidle')
+    page.fill('#username', 'admin_user')
+    page.fill('#password', 'password')
+    page.click('button[type="submit"]')
+    page.wait_for_url('**/login-success**', timeout=10000)
+
+    # Navigate to target
+    page.goto('https://www.example.com/admin')
+    page.wait_for_load_state('networkidle')
+
+    # Interact and test...
+
+    browser.close()
+```
+
+## Event Monitoring (Critical for QA)
+
+### Console Messages and JS Errors
+```python
+console_msgs = []
+page_errors = []
+page.on("console", lambda m: console_msgs.append(f"[{m.type}] {m.text}"))
+page.on("pageerror", lambda e: page_errors.append(str(e)))
+
+# After interactions:
+for m in console_msgs:
+    print(f"  {m}")
+for e in page_errors:
+    print(f"  ERROR: {e}")
+```
+
+### Network Requests and Responses
+```python
+api_responses = []
+def on_response(r):
+    if '/api/' in r.url or '/admin/' in r.url:
+        api_responses.append({"url": r.url, "status": r.status})
+page.on("response", on_response)
+
+# After interactions:
+for r in api_responses:
+    print(f"  {r['url']} -> {r['status']}")
+```
+
+### Failed Resource Loads
+```python
+failed_resources = []
+page.on("requestfailed", lambda r: failed_resources.append({"url": r.url, "error": r.failure}))
+```
+
+## Element Query Patterns
+
+```python
+# By text content
+btn = page.query_selector('button:has-text("Save")')
+links = page.query_selector_all('a:has-text("Login")')
+
+# By CSS selector with attribute
+input_el = page.query_selector('input[name="csrf_token"]')
+form = page.query_selector('#contentForm')
+
+# By role
+submit = page.query_selector('button[type="submit"]')
+
+# Get all buttons (for debugging)
+all_btns = page.query_selector_all('button')
+btn_texts = [b.inner_text().strip() for b in all_btns]
+```
+
+## JavaScript Evaluation
+
+```python
+# Check if function is defined
+is_defined = page.evaluate("typeof saveDraft === 'function'")
+
+# Get element properties
+val = page.evaluate("document.getElementById('contentJson').value")
+
+# Get page content
+page_html = page.content()
+body_text = page.inner_text('body')
+
+# Execute arbitrary JS
+result = page.evaluate("() => { return document.title; }")
+```
+
+## Screenshots
+
+```python
+# Full page
+page.screenshot(path='/tmp/screenshot.png', full_page=True)
+
+# Then analyze with vision tool
+```
+
+## Cookie Inspection
+
+```python
+cookies = context.cookies()
+for c in cookies:
+    print(f"{c['name']}: domain={c['domain']}, httpOnly={c['httpOnly']}, "
+          f"secure={c['secure']}, sameSite={c.get('sameSite','N/A')}")
+```
+
+## Testing with Custom Headers (e.g., Bearer Token)
+
+```python
+# Create separate context with extra headers
+context2 = browser.new_context(extra_http_headers={"Authorization": "Bearer fake_token"})
+page2 = context2.new_page()
+page2.goto('https://www.example.com/admin')
+# Check if redirected to login
+print(f"URL: {page2.url}")
+page2.close()
+context2.close()
+```
+
+## CSP Bug Detection Pattern
+
+When buttons with `onclick="fnName()"` do nothing:
+
+1. Check console for CSP violation: `"Executing inline script violates the following Content Security Policy directive 'script-src-elem'"`
+2. Verify function availability: `page.evaluate("typeof fnName")` returns `"undefined"`
+3. Confirm script tag exists but is blocked by CSP
+4. Check CSP header: `curl -sI URL | grep content-security-policy`
+5. Look for `script-src-elem` directive that overrides `script-src`
+
+## Pitfalls
+
+- **Use Python, not Node.js**: The `playwright` npm module may not be installed. Python works.
+- **`expect_response` timeout**: Don't use broad URL patterns. Use specific path matches or handle timeout gracefully.
+- **`expect_navigation` for SPA**: Single-page apps may not trigger navigation events. Use `wait_for_timeout` or check state changes instead.
+- **Rate limit testing**: Don't try to trigger rate limits via Playwright — too slow. Use curl for rate limit tests.
+- **`page.on("console")` misses CSP errors**: CSP violations appear as `pageerror` events, not console messages. Listen to both.
--- a/dogfood/references/qa-dimensions-checklist.md
+++ b/dogfood/references/qa-dimensions-checklist.md
@@ -0,0 +1,152 @@
+# Comprehensive QA Dimensions Checklist
+
+Use this checklist when the user asks for "full", "complete", or "comprehensive" QA testing.
+Each dimension should appear as a section in the test plan with at least 1 test case.
+
+## Core Functional (always cover)
+- [ ] Page loads (HTTP 200) for all public pages
+- [ ] Navigation links work (header, footer, sidebar)
+- [ ] CRUD operations (create, read, update, delete)
+- [ ] Form submissions (valid data, empty data, invalid data)
+- [ ] Search/filter functionality
+- [ ] Pagination
+- [ ] Error pages (404, 500)
+
+## Auth & Permissions
+- [ ] Login page loads and form works
+- [ ] Valid credentials → success + cookie set
+- [ ] Invalid credentials → error message
+- [ ] Logout clears cookie
+- [ ] Cross-service cookie propagation (shared domain cookies)
+- [ ] Admin pages: admin user can access
+- [ ] Admin pages: regular user gets denied
+- [ ] Admin pages: unauthenticated user redirects to login
+- [ ] RBAC: different roles see different features
+- [ ] Permission checks on API endpoints
+
+## Input Validation
+- [ ] Empty form submissions (browser validation or server error)
+- [ ] Boundary values (min/max length, special chars)
+- [ ] Password strength requirements
+- [ ] Username format validation
+- [ ] Email format validation (if applicable)
+- [ ] Invite code validation (if invite-based registration)
+
+## Security — Cookie
+- [ ] Auth cookie: HttpOnly=true
+- [ ] Auth cookie: Secure=true (production)
+- [ ] Auth cookie: SameSite=Lax or Strict
+- [ ] Auth cookie: Max-Age is reasonable (not infinite)
+- [ ] Auth cookie: Domain scope correct (e.g., `.example.com` for subdomains)
+- [ ] CSRF cookie: HttpOnly=false (by design, JS needs to read it)
+
+## Security — CSRF
+- [ ] All state-changing POST endpoints require CSRF token
+- [ ] CSRF token matches between form field and cookie
+- [ ] CSRF token expires (check timestamp-based expiry)
+- [ ] Missing/invalid CSRF token returns 403 or error
+
+## Security — Redirect
+- [ ] `redirect` parameter accepts valid same-domain URLs
+- [ ] `redirect` parameter rejects external domains (open redirect prevention)
+- [ ] `redirect` parameter rejects protocol-relative URLs (`//evil.com`)
+- [ ] Default redirect when parameter is empty/invalid
+
+## Security — Rate Limiting
+- [ ] Login rate limit (e.g., 5/minute per IP)
+- [ ] Registration rate limit (e.g., 5/hour per IP)
+- [ ] API rate limits (comments, likes, uploads)
+- [ ] Account lockout after N failed attempts
+- [ ] IP-based lockout after N failed attempts
+- [ ] Rate limit returns 429 status
+
+## Security — File Upload
+- [ ] Allowed file types enforced (extension check)
+- [ ] File size limit enforced
+- [ ] Filename sanitized (no path traversal)
+- [ ] Uploaded files stored safely (UUID names, outside web root or in controlled dir)
+- [ ] Image processing (resize, format conversion) doesn't crash on malformed files
+
+## Security — Input Injection
+- [ ] XSS: user input rendered as text, not HTML (test `<script>alert(1)</script>`)
+- [ ] Path traversal: slug validation prevents `../` sequences
+- [ ] SQL injection: parameterized queries (verify from source code)
+
+## Session & Token
+- [ ] Token expiration: expired token redirects to login
+- [ ] Token format validation (reject malformed tokens)
+- [ ] Role changes: DB role takes precedence over token role
+- [ ] Token max-age from configuration
+
+## Content & Rendering
+- [ ] Empty state (no content) shows appropriate message
+- [ ] Long content doesn't break layout
+- [ ] Special characters (CJK, emoji, HTML entities) render correctly
+- [ ] Markdown rendering (code blocks, tables, lists)
+- [ ] LaTeX/MathJax rendering (if applicable)
+- [ ] Code syntax highlighting (if applicable)
+
+## Encoding
+- [ ] No BOM markers in HTML templates (`ef bb bf`)
+- [ ] No leading whitespace before `<!DOCTYPE>`
+- [ ] UTF-8 charset declared in meta tag
+- [ ] Python source files: no BOM
+
+## SEO & Metadata
+- [ ] `<title>` tag present and descriptive on each page
+- [ ] `<meta name="description">` present
+- [ ] Open Graph tags (`og:title`, `og:description`, `og:url`, `og:image`)
+- [ ] Twitter Card tags
+- [ ] Canonical URL (`<link rel="canonical">`)
+- [ ] `robots.txt` exists
+- [ ] `sitemap.xml` exists and is valid
+- [ ] RSS feed (if blog) exists and is valid XML
+
+## Accessibility
+- [ ] All `<img>` have `alt` text (or `aria-hidden` for decorative)
+- [ ] No `user-scalable=no` in viewport meta
+- [ ] Sufficient color contrast (text vs background)
+- [ ] Skip-to-content link (visually hidden)
+- [ ] Keyboard navigation: Tab order logical
+- [ ] ARIA labels on interactive elements without visible text
+- [ ] Form labels associated with inputs
+
+## Performance
+- [ ] All static assets return 200 (CSS, JS, images)
+- [ ] No broken links (404s in static resources)
+- [ ] CDN reliability (especially for users in China — jsDelivr may timeout)
+- [ ] Page load doesn't hang on slow external resources
+- [ ] Resource count reasonable (no excessive requests)
+
+## Responsive Design
+- [ ] Layout at 375px (mobile) — no horizontal overflow
+- [ ] Layout at 768px (tablet) — breakpoint works
+- [ ] Layout at 1440px (desktop) — content centered
+- [ ] Touch targets large enough (44x44px minimum)
+
+## Cross-Browser
+- [ ] Chrome/Chromium rendering
+- [ ] Firefox rendering
+- [ ] Safari rendering (WebKit differences)
+- [ ] Edge rendering
+
+## Operations
+- [ ] `/health` endpoint returns `{"status":"ok"}` per service
+- [ ] 404 page is custom (not default framework error)
+- [ ] 500 errors don't leak stack traces to users
+- [ ] Audit log captures admin actions (verify from source)
+- [ ] Audit log captures login attempts (success/failure)
+
+## Consistency (cross-service)
+- [ ] All pages include same CSS files (mobile.css, etc.)
+- [ ] All pages include same JS files (loader.js, etc.)
+- [ ] All pages have site-wide navigation bar
+- [ ] All pages have same security headers
+- [ ] All pages have same viewport meta
+
+## Security Headers
+- [ ] `X-Content-Type-Options: nosniff`
+- [ ] `X-Frame-Options: DENY`
+- [ ] `Referrer-Policy: strict-origin-when-cross-origin`
+- [ ] `Content-Security-Policy` present and reasonable
+- [ ] No `unsafe-eval` in CSP (check for `'unsafe-eval'`)
--- a/dogfood/references/server-inspection.md
+++ b/dogfood/references/server-inspection.md
@@ -0,0 +1,69 @@
+# Server Inspection Reference
+
+When asked to inspect a server without a URL, assume the **local machine Hermes runs on**.
+
+## Quick Checklist
+
+### System Resources
+```bash
+# CPU, load, uptime
+uptime && top -bn1 | head -3 && nproc
+
+# Memory
+free -h
+
+# Disk
+df -h
+```
+
+### Running Services & Processes
+```bash
+# All listening ports
+ss -tlnp | grep LISTEN
+
+# Top processes by CPU
+ps aux --sort=-%cpu | head -10
+
+# Docker containers
+docker ps -a
+```
+
+### Service Manager
+```bash
+systemctl list-units --type=service | grep running
+# or
+service --status-all
+```
+
+### Network
+```bash
+# All LISTEN ports (not just common ones)
+ss -tlnp
+
+# DNS resolution test
+nslookup example.com
+```
+
+### Security
+```bash
+# fail2ban status
+fail2ban-client status
+
+# UFW firewall (if enabled)
+ufw status
+```
+
+## Scope Signals
+
+| User says | Means |
+|-----------|-------|
+| "服务器巡检" / "server inspection" | Local machine (no URL given) |
+| "巡检 ephron.ren" | Remote web service at that domain |
+| "check the service on port 8000" | Likely remote host:port |
+| "你的服务器" / "this machine" | Local machine explicitly |
+
+## Anti-Patterns
+
+- **Don't** default to checking remote web services when no URL is provided
+- **Don't** assume the remote service is on the same machine as Hermes
+- **Do** ask for clarification if "server" could mean local or remote
--- a/dogfood/references/session-learnings-ephron-qa.md
+++ b/dogfood/references/session-learnings-ephron-qa.md
@@ -0,0 +1,97 @@
+# Session Learnings: ephron.ren QA (2026-05-03)
+
+## Environment Facts
+- 5 services: Home(8000), Auth(8001), Blog(8002), Canvas(8003), Prompt(8004)
+- Auth: FastAPI + Tortoise ORM, `.ephron.ren` domain cookie
+- RBAC: user(10) < admin(20) < owner(30)
+- CSRF: `{unix_timestamp}:{sha256_hex}` format, 75 chars, per-GET refresh
+- Rate limits: login 5/min, register 6/hour, comments 6/min, likes 11/min, save 21/min
+
+## High-Value Findings (Reproducible Patterns)
+
+### CSP script-src-elem Override (Critical)
+- **Symptom**: Buttons with `onclick="fnName()"` do nothing, `typeof fnName` returns `undefined`
+- **Root cause**: `script-src-elem 'self' https://cdn.example.com` overrides `script-src 'unsafe-inline'`
+- **Detection**: `curl -sI URL | grep content-security-policy`, look for `script-src-elem` without `'unsafe-inline'`
+- **Impact**: All inline JS blocked → save/publish/discard buttons broken, client-only validation bypassed
+
+### CSP form-action Blocks Cross-Origin Redirects (Critical)
+- **Date**: 2026-05-05
+- **Symptom**: Login form submits (POST appears in network tab), server sets cookie, but browser stays on login page — no redirect
+- **Root cause**: CSP `form-action 'self'` on the 303 redirect response blocks navigation to cross-origin targets
+- **Reproduction**:
+  1. Visit `https://auth.ephron.ren/login?redirect=aHR0cHM6Ly93d3cuZXBocm9uLnJlbi8=` (redirect=base64 of `https://www.ephron.ren/`)
+  2. Fill username/password, click submit
+  3. Browser sends POST to `/api/login` (same origin ✅ allowed)
+  4. Server returns 303 to `https://www.ephron.ren/` with CSP header containing `form-action 'self'`
+  5. Browser blocks redirect: `https://www.ephron.ren/` ≠ `self` (`https://auth.ephron.ren`)
+- **Controlled test**: Same-origin redirect (`auth.ephron.ren/admin`) works fine; cross-origin fails
+- **Console error**: `Sending form data to 'https://auth.ephron.ren/api/login' violates the following Content Security Policy directive: "form-action 'self'"`
+- **Fix**: Skip CSP header on 303 responses (empty body, no protection value), or use JS redirect
+- **Affected pages**: ALL pages that redirect to login with a cross-origin redirect target (www/blog/canvas/prompt subdomains)
+- **Key source files**: `shared/security_headers.py` (CSP middleware), `auth/src/routes/api.py` (login endpoint), `auth/src/utils/redirect.py` (redirect validation)
+
+### Server-Side Password Validation Missing
+- **Test**: `curl -X POST /api/register -d 'username=test&password=123&password_confirm=456&invite_code=CODE'`
+- **Expected**: 400/422 with validation error
+- **Actual**: 303 redirect (registration succeeds with weak/mismatched passwords)
+- **Root cause**: Validation only in client JS (blocked by CSP)
+- **Lesson**: Always test form validation with curl, not just browser
+
+### Fulltext Search Silent Failure
+- **Test**: `GET /posts?q=openclaw&mode=fulltext` returns 0 results, `mode=simple` returns 6
+- **Root cause**: BM25 index not built or jieba tokenizer not installed
+- **Detection**: Compare simple vs fulltext results for same query
+
+### API Auth Order Bug
+- **Test**: `POST /api/service/posts` without token, with invalid body
+- **Expected**: 401 (unauthenticated)
+- **Actual**: 422 (body validation error — leaks endpoint info)
+- **Root cause**: Pydantic validation middleware runs before auth middleware
+
+## Delegate Task Sizing
+- Curl-only tasks: max ~15-20 test cases per delegate (30+ cases timeout at 600s)
+- Browser tasks: max ~5-8 interactions per delegate (each = 10-30s)
+- Use `execute_code` with `from hermes_tools import terminal` for fastest execution
+- Parallel delegates: 3 max, but each should be independently scoped
+
+## Cookie Jar Synchronization
+- CSRF token changes on every GET request
+- Must use SAME cookie jar for GET (extract token) and POST (submit form)
+- Multiple CSRF tokens on one page (one per form) — extract from specific form context
+- Cross-service cookies: Domain=.ephron.ren should work for all subdomains
+- If cross-service test fails, check cookie jar file, not the cookie itself
+
+## Content Restoration Pattern (Playwright)
+When homepage/admin content is accidentally overwritten, restore via Playwright:
+1. Prepare JSON with original content (experience/projects/skills/contact/footer)
+2. Login → navigate to admin page
+3. Use `page.evaluate()` to set form fields by `id=` (NOT `name=` — admin forms use id):
+   ```js
+   document.getElementById('contact_email').value = '...';
+   document.getElementById('footer_copyright').value = '...';
+   ```
+4. Set structured data: `initialContent.experience = [...]; renderExperience();`
+5. Set `is_draft: false` for items that should be published
+6. Collect and publish:
+   ```js
+   const content = collectFormData();
+   document.querySelector('input[name="content_json"]').value = JSON.stringify(content);
+   // Find form with content_json input, set action=/admin/publish, submit
+   ```
+7. Verify with `curl -s https://site/` checking for restored content strings
+
+## Form Field Discovery
+- Admin page fields may use `id=` instead of `name=` — check both:
+  ```bash
+  curl -s -b cookies /admin | grep -oP 'id="[^"]*"' | sort -u
+  curl -s -b cookies /admin | grep -oP 'name="[^"]*"' | sort -u
+  ```
+- `collectFormData()` reads from visible form elements, not hidden `content_json`
+- Setting `content_json` directly is overwritten by `collectFormData()` on submit
+
+## Playwright vs curl for Form Submission
+- **curl**: CSRF token sync is fragile (token changes per-GET, cookie jar must match)
+- **Playwright**: Handles cookies/CSRF automatically, but CSP may block inline JS
+- **Best approach**: Use Playwright + `page.evaluate()` to bypass CSP-blocked functions
+- **Pattern**: Set form fields via JS → call `collectFormData()` → set `content_json` → submit form directly
--- a/dogfood/templates/dogfood-report-template.md
+++ b/dogfood/templates/dogfood-report-template.md
@@ -0,0 +1,89 @@
+# QA Issue List
+
+> **Note:** This is the ISSUE LIST. The TEST PLAN is a separate document (`test-plan.md`).
+> Always deliver both documents together.
+
+**Target:** {target_url}
+**Date:** {date}
+**Scope:** {scope_description}
+**Tester:** Hermes Agent (automated exploratory QA)
+
+---
+
+## Executive Summary
+
+| Severity | Count |
+|----------|-------|
+| 🔴 Critical | {critical_count} |
+| 🟠 High | {high_count} |
+| 🟡 Medium | {medium_count} |
+| 🔵 Low | {low_count} |
+| **Total** | **{total_count}** |
+
+**Overall Assessment:** {one_sentence_assessment}
+
+---
+
+## Issues
+
+<!-- Repeat this section for each issue found, sorted by severity (Critical first) -->
+
+### Issue #{issue_number}: {issue_title}
+
+| Field | Value |
+|-------|-------|
+| **Severity** | {severity} |
+| **Category** | {category} |
+| **URL** | {url_where_found} |
+
+**Description:**
+{detailed_description_of_the_issue}
+
+**Steps to Reproduce:**
+1. {step_1}
+2. {step_2}
+3. {step_3}
+
+**Expected Behavior:**
+{what_should_happen}
+
+**Actual Behavior:**
+{what_actually_happens}
+
+**Screenshot:**
+MEDIA:{screenshot_path}
+
+**Console Errors** (if applicable):
+```
+{console_error_output}
+```
+
+---
+
+<!-- End of per-issue section -->
+
+## Issues Summary Table
+
+| # | Title | Severity | Category | URL |
+|---|-------|----------|----------|-----|
+| {n} | {title} | {severity} | {category} | {url} |
+
+## Testing Coverage
+
+### Pages Tested
+- {list_of_pages_visited}
+
+### Features Tested
+- {list_of_features_exercised}
+
+### Not Tested / Out of Scope
+- {areas_not_covered_and_why}
+
+### Blockers
+- {any_issues_that_prevented_testing_certain_areas}
+
+---
+
+## Notes
+
+{any_additional_observations_or_recommendations}
--- a/dogfood/templates/test-plan-template.md
+++ b/dogfood/templates/test-plan-template.md
@@ -0,0 +1,69 @@
+# QA Test Plan
+
+**Site:** {site_name}
+**URL:** {target_url}
+**Date:** {date}
+**Source:** {repo_url}
+
+---
+
+## 一、测试概览
+
+### 1.1 服务架构
+
+| 服务 | 地址 | 端口 | 说明 |
+|------|------|------|------|
+| {name} | {url} | {port} | {description} |
+
+### 1.2 测试账号
+
+| 角色 | 用户名 | 密码 | 用途 |
+|------|--------|------|------|
+| 管理员 | {admin_user} | {admin_pass} | 测试管理后台 |
+| 普通用户 | {normal_user} | {normal_pass} | 测试前台 + 权限拦截 |
+
+### 1.3 认证机制
+
+- Cookie 名称: `{cookie_name}`
+- Cookie 域: `{cookie_domain}`
+- Token 签发: {mechanism}
+- 权限模型: {rbac_description}
+
+### 1.4 优先级定义
+
+| 级别 | 含义 |
+|------|------|
+| P0 | 核心功能，阻塞使用 |
+| P1 | 重要功能，影响体验 |
+| P2 | 次要功能，可延后 |
+
+---
+
+## 二、测试用例
+
+### 模块 N：{模块名} ({服务名})
+
+#### N.1 {子模块名}
+
+| 编号 | 测试内容 | 步骤 | 预期 | 账号 | 优先级 |
+|------|----------|------|------|------|--------|
+| X-001 | {test_name} | {steps} | {expected} | {account} | {priority} |
+
+---
+
+## 三、测试执行流程
+
+```
+Step 1  →  {first_step}
+Step 2  →  {second_step}
+...
+```
+
+---
+
+## 四、统计
+
+| 模块 | 公开页面 | 管理后台 | 合计 |
+|------|:--------:|:--------:|:----:|
+| {module} | {n} | {n} | {n} |
+| **合计** | **{n}** | **{n}** | **{n}** |