first commit

This commit is contained in:
Hermes Agent
2026-05-10 13:52:46 +08:00
commit ccc63d1e70
4583 changed files with 584341 additions and 0 deletions

419
dogfood/SKILL.md Normal file
View File

@@ -0,0 +1,419 @@
---
name: dogfood
description: "Exploratory QA of web apps: find bugs, evidence, reports."
version: 1.0.0
metadata:
hermes:
tags: [qa, testing, browser, web, dogfood]
related_skills: []
---
# Dogfood: Systematic Web Application QA Testing
## Overview
This skill guides you through systematic exploratory QA testing of web applications. It supports **two execution modes** depending on context:
1. **Browser-first** (default): Use browser toolset to navigate, interact, and capture evidence live.
2. **Source-code-first** (fallback): When browser automation is unavailable, slow, or times out, analyze the source code to enumerate all routes/endpoints/pages, then build a comprehensive test plan document. Execute browser tests selectively afterward.
For **multi-service sites** (e.g., separate auth/blog/canvas services), prefer the source-code-first approach — it produces more complete coverage faster than crawling.
## Prerequisites
- Browser toolset: either the built-in tools (`browser_navigate`, `browser_snapshot`, etc.) **or** Playwright Python (see `references/playwright-qa.md`) — optional if using source-code-first mode
- A target URL and testing scope from the user
- Source code access (repo clone or codebase) — strongly recommended for multi-service sites
## Inputs
The user provides:
1. **Target URL** — the entry point for testing
2. **Scope** — what areas/features to focus on (or "full site" for comprehensive testing)
3. **Output directory** (optional) — where to save screenshots and the report (default: `./dogfood-output`)
## Workflow
Follow this 5-phase systematic workflow:
### Phase 1: Plan
1. Create the output directory structure:
```
{output_dir}/
├── screenshots/ # Evidence screenshots
└── report.md # Final report (generated in Phase 5)
```
2. Identify the testing scope based on user input.
3. Build a rough sitemap by planning which pages and features to test:
- Landing/home page
- Navigation links (header, footer, sidebar)
- Key user flows (sign up, login, search, checkout, etc.)
- Forms and interactive elements
- Edge cases (empty states, error pages, 404s)
### Phase 2: Explore
For each page or feature in your plan:
1. **Navigate** to the page:
```
browser_navigate(url="https://example.com/page")
```
2. **Take a snapshot** to understand the DOM structure:
```
browser_snapshot()
```
3. **Check the console** for JavaScript errors:
```
browser_console(clear=true)
```
Do this after every navigation and after every significant interaction. Silent JS errors are high-value findings.
4. **Take an annotated screenshot** to visually assess the page and identify interactive elements:
```
browser_vision(question="Describe the page layout, identify any visual issues, broken elements, or accessibility concerns", annotate=true)
```
The `annotate=true` flag overlays numbered `[N]` labels on interactive elements. Each `[N]` maps to ref `@eN` for subsequent browser commands.
5. **Test interactive elements** systematically:
- Click buttons and links: `browser_click(ref="@eN")`
- Fill forms: `browser_type(ref="@eN", text="test input")`
- Test keyboard navigation: `browser_press(key="Tab")`, `browser_press(key="Enter")`
- Scroll through content: `browser_scroll(direction="down")`
- Test form validation with invalid inputs
- Test empty submissions
6. **After each interaction**, check for:
- Console errors: `browser_console()`
- Visual changes: `browser_vision(question="What changed after the interaction?")`
- Expected vs actual behavior
### Phase 3: Collect Evidence
For every issue found:
1. **Take a screenshot** showing the issue:
```
browser_vision(question="Capture and describe the issue visible on this page", annotate=false)
```
Save the `screenshot_path` from the response — you will reference it in the report.
2. **Record the details**:
- URL where the issue occurs
- Steps to reproduce
- Expected behavior
- Actual behavior
- Console errors (if any)
- Screenshot path
3. **Classify the issue** using the issue taxonomy (see `references/issue-taxonomy.md`):
- Severity: Critical / High / Medium / Low
- Category: Functional / Visual / Accessibility / Console / UX / Content
### Phase 4: Categorize
1. Review all collected issues.
2. De-duplicate — merge issues that are the same bug manifesting in different places.
3. Assign final severity and category to each issue.
4. Sort by severity (Critical first, then High, Medium, Low).
5. Count issues by severity and category for the executive summary.
### Phase 5: Report
Generate the final report using the template at `templates/dogfood-report-template.md`.
The report must include:
1. **Executive summary** with total issue count, breakdown by severity, and testing scope
2. **Per-issue sections** with:
- Issue number and title
- Severity and category badges
- URL where observed
- Description of the issue
- Steps to reproduce
- Expected vs actual behavior
- Screenshot references (use `MEDIA:<screenshot_path>` for inline images)
- Console errors if relevant
3. **Summary table** of all issues
4. **Testing notes** — what was tested, what was not, any blockers
Save the report to `{output_dir}/report.md`.
## Alternative Workflow: Source-Code-First (for multi-service / slow-browser sites)
When the target site has source code available, or browser automation is too slow/times out:
### Step 1: Clone and Map the Codebase
```bash
git clone <repo_url> /tmp/qa-target
cd /tmp/qa-target
find . -name "routes*.py" -o -name "main.py" -o -name "pages.py" -o -name "admin.py" | sort
```
### Step 2: Enumerate All Routes
Read each route file and extract:
- HTTP method + path (e.g., `GET /posts/{slug}`)
- Required auth/permissions
- Rate limits
- Form fields and validation rules
Build a **complete URL inventory** — this is your test matrix.
### Step 3: Analyze Static Assets and Templates
Check template files for:
- CSS variable definitions (look for `:root` blocks)
- JS includes (what scripts are loaded vs missing)
- Encoding issues (BOM markers, leading newlines before DOCTYPE)
- Accessibility: `alt` attributes, `user-scalable`, skip links
### Step 4: HTTP-Level Testing (no browser needed)
Use `curl` to test:
- Page loads (HTTP status codes)
- Static asset availability
- Response headers (security headers, CSP)
- Redirect chains (login flows)
- API endpoints (with/without auth cookies)
### Step 5: Generate Structured Test Plan
Output a markdown document with:
- Service architecture table
- Test accounts and auth mechanism
- Per-module test case tables: `编号 | 测试内容 | 测试步骤 | 预期结果 | 优先级`
- Known issues found during source analysis
- Cross-cutting concerns (consistency, accessibility, security)
### Step 6: Selective Browser Execution
Only use browser automation for:
- Login/register interactive flows
- Visual verification of known issues
- Console error capture
- Screenshot evidence
## Alternative Workflow: Test Plan Gap Analysis (improving existing plans)
When the user already has a test plan and wants to **improve/complete it** against source code:
### Step 1: Load Both Inputs in Parallel
```
1. Read the existing test-plan.md
2. Clone or pull the target repo
3. Use delegate_task to analyze ALL route files + security mechanisms in one pass
- Extract: endpoints, form fields, rate limits, cookie params, CSRF mechanism,
validation rules, ownership models, public APIs
- Focus on FEATURES not COVERED by the existing plan
```
### Step 2: Systematic Comparison
For each service, compare source code findings against existing test cases:
- **Endpoints**: Are all HTTP methods + paths covered?
- **Validation rules**: Password complexity, username blacklists, email uniqueness, slug format
- **Rate limits**: Are all limiters documented with correct values?
- **Security mechanisms**: CSRF token format/expiry, cookie attributes, redirect validation
- **APIs**: Public JSON APIs, service-to-service APIs, ownership isolation
- **Edge cases**: CRLF normalization, content size limits, cascading deletes
### Step 3: Patch, Don't Rewrite
Use targeted `patch` edits to add missing test cases within existing sections:
- Insert after the related existing case (e.g., `A-015a` after `A-015`)
- Use sub-numbering convention: `X-NNNa` for insertions between `X-0NN` and `X-0NN+1`
- Preserve existing case numbers — never renumber
- Add new subsections only when the entire category is missing (e.g., Service API)
### Step 4: Update Statistics
After all patches:
```bash
# Count per-module
grep -c '^| H-[0-9]' test-plan.md # Home
grep -c '^| A-[0-9]' test-plan.md # Auth
# ... etc for each prefix
# Count total (catches sub-numbered too)
grep -c '^| [A-Z]*-[0-9]' test-plan.md
```
Update both the header stats table and the footer summary table.
Bump the version number.
### Step 5: Commit
```bash
git add test-plan.md && git commit -m "vN.M: 完善测试计划,新增 X 个测试用例 (old→new)" && git push
```
### Pitfalls for Gap Analysis
- **Don't renumber existing cases**: Use `X-NNNa` sub-numbering to insert between existing cases. Renumbering breaks any existing references (issue tracker, test automation).
- **Count carefully**: `grep -c '^| X-[0-9]'` misses sub-numbered entries like `A-015a`. Use `'^| [A-Z]*-[0-9]'` for total count, but per-module counts with the prefix filter are usually accurate enough.
- **Don't duplicate**: Check if a concept is already covered under a different name before adding. "草稿可见" and "草稿预览" might be the same test.
- **delegate_task for source analysis**: Don't read 40+ route files manually. A single delegate_task with a well-structured prompt produces a complete analysis in one pass.
## Alternative Workflow: Module-by-Module Testing with Incremental Commits
When the user has an existing test plan (e.g., `test-plan.md` in a repo) and wants to execute it module by module, committing results after each:
### Step 1: Initialize Results Document
Create `test-results.md` with a summary table and placeholder sections for every module. Include: module name, status (⏳), execution time, and empty test result tables.
### Step 2: Test Module → Update → Commit Loop
For each module:
1. Execute tests (curl for HTTP-level, Playwright for browser-level)
2. Update the module's section in `test-results.md` with results
3. Update the summary table (pass/fail/blocked counts)
4. Add a "模块 N 小结" section with key findings
5. Add a "💡 模块 N 优化建议" section with prioritized recommendations (user explicitly wants these persisted in the document, not just in chat)
6. `git add test-results.md && git commit -m "模块N: 通过X/失败Y" && git push`
7. Report progress to user before starting next module
8. **⚠️ If any test modified content, restore it BEFORE committing**
### Step 3: Parallel Delegation
Use `delegate_task` with 3 parallel tasks for curl-based modules. Each task tests a group of modules and returns JSON results. Browser-based modules must run sequentially.
### Pitfalls for Module-by-Module
- **Don't wait until the end to commit**: Session may break, losing all work
- **Restore content after destructive tests**: Save state before, verify after
- **Rate limiting blocks repeated tests**: Test rate-limited endpoints last
- **CSRF token sync**: Use same cookie jar for GET+POST (see `references/multi-service-qa.md`)
- **Optimization suggestions go in the document, not just the chat**: User wants them persisted
## Deliverables: Split Test Plan + Issue List
**Always split QA output into two separate documents:**
1. **`test-plan.md`** — Structured test cases with execution steps and expected results
2. **`issue-list.md`** — Known issues found during analysis, with severity and fix suggestions
Do NOT merge them into one report. Users need the test plan for execution assignment and the issue list for bug tracking. Each document should be self-contained.
### Test Plan Structure
- Service architecture table (services, URLs, ports)
- Test accounts and auth mechanism explanation
- Per-module test case tables: `编号 | 测试内容 | 测试步骤 | 预期结果 | 账号 | 优先级`
- Cover both public pages AND admin/management pages
- Include a cross-cutting section for security, consistency, accessibility
### Issue List Structure
- Summary table (count by severity level)
- Per-issue entries: module, page, phenomenon, impact, root cause, source code location, fix suggestion, priority
- Consistency matrix (which pages have which features/assets)
- Fix priority recommendation (immediate / soon / backlog)
## Comprehensive QA Dimensions Checklist
When the user asks for "full" or "complete" testing, cover ALL of these dimensions. If you only covered page navigation and login flows, the plan is incomplete.
### Core (always cover)
1. **Functional** — Page loads, navigation, CRUD operations, form submissions
2. **Auth & Permissions** — Login/logout, RBAC, cross-service cookie propagation, admin vs user access
3. **Input Validation** — Form validation, empty submissions, boundary values, special characters
### Security (cover for any site with user input)
4. **Cookie Security** — HttpOnly, Secure, SameSite attributes; Max-Age; domain scope
5. **CSRF Protection** — Token presence, double-submit pattern, token expiry, replay resistance
6. **Redirect Safety** — Open redirect via `redirect` parameter; validate against allowed domains
7. **Rate Limiting** — Per-endpoint limits; account lockout; IP-based limits
8. **File Upload Safety** — Allowed extensions, size limits, filename sanitization, path traversal prevention
9. **Input Injection** — XSS in user-generated content, SQL injection attempts, path traversal in slugs
### Session & State
10. **Token Lifecycle** — Expiration behavior, role changes mid-session (DB role vs token role), token format validation
11. **Concurrent Access** — Race conditions on shared resources, optimistic locking
### Content & Rendering
12. **Edge Case Content** — Empty states, very long text, special characters (CJK, emoji), Markdown/LaTeX rendering
13. **Encoding** — BOM markers, UTF-8 consistency, DOCTYPE prefix cleanliness
### SEO & Metadata
14. **Meta Tags** — `<title>`, `<meta description>`, canonical URLs
15. **Open Graph** — `og:title`, `og:description`, `og:image`, `og:url` per page
16. **Structured Data** — robots.txt, sitemap.xml, RSS feed validity
### Accessibility
17. **WCAG Basics** — `alt` attributes, `user-scalable`, color contrast, skip-to-content links
18. **Keyboard Navigation** — Tab order, focus management, ARIA labels
### Performance & Compatibility
19. **Page Load** — Static asset availability, CDN reliability (especially in China), resource count
20. **Responsive Design** — Breakpoints, mobile layout, touch targets
21. **Cross-Browser** — Chrome/Firefox/Safari/Edge rendering differences
### Operations
22. **Health Checks** — `/health` endpoint availability per service
23. **Error Handling** — 404 pages, 500 error responses, graceful degradation
24. **Logging & Audit** — Audit trail for admin actions, login attempts
### Consistency (cross-service)
25. **Asset Inclusion** — Which pages include mobile.css, loader.js, etc.
26. **Navigation** — Which pages have site-wide nav bar
27. **Security Headers** — X-Content-Type-Options, X-Frame-Options, Referrer-Policy, CSP
## Pitfalls
- **🔴 CRITICAL: Always backup content before write operations**: When testing CRUD endpoints (save, publish, create, update), the test payload (including XSS test strings, dummy data, empty fields) CAN overwrite real production content. Before any write test:
1. `curl -s -b cookies SITE/admin` → extract current content_json / initialContent → save to `/tmp/backup_<service>.json`
2. Perform test
3. Restore original content via Playwright (set form fields + `collectFormData()` + submit)
- **This is not optional.** A session that deletes user content without restoring it is a failed session.
- **🔴 CRITICAL: Restore content IMMEDIATELY after destructive tests**: Don't wait until end of session. If a test modifies content, restore it in the same turn. Session interruptions, timeouts, or context limits can prevent later restoration.
- **🔴 CRITICAL: XSS payloads in form fields persist**: When you fill a form field with `<script>alert(1)</script>` for XSS testing, that value gets saved to the database if the form is submitted. Always use Playwright's `page.evaluate()` to set values directly on form elements, NOT `page.fill()` which triggers input events that may activate auto-save.
- **⚠️ Do NOT parallelize browser delegate_tasks for QA**: Each browser interaction is slow (navigate + snapshot + screenshot = 10-30s). 3 parallel browser tasks will all timeout at 600s. Run browser tests sequentially or use source-code-first mode.
- **⚠️ Curl-only delegate tasks also timeout with large batches**: A delegate_task with 30+ curl test cases can hit the 600s limit (each curl call = 1-3s + overhead). Split large test batches into smaller tasks (~15-20 cases each) or use `execute_code` with `from hermes_tools import terminal` for direct in-process execution (faster, no delegation overhead).
- **⚠️ Client-side-only validation is a security finding**: When CSP blocks inline JS (see `script-src-elem` pitfall), any validation that only exists in client JS (password strength, field format, confirmation matching) becomes bypassable. Always test registration/submission with curl to verify server-side validation exists independently.
- **⚠️ API authentication order matters**: Some endpoints validate request body BEFORE checking authentication, returning 422 (validation error) instead of 401 (unauthenticated). Test: `curl -X POST /api/endpoint -d 'invalid'` without auth — should get 401, not 422. This is a security issue (leaks endpoint existence and field requirements).
- **⚠️ Fulltext search can silently fail**: Search endpoints with `mode=fulltext` may return 0 results while `mode=simple` works fine. Always test both modes with the same query. Common causes: search index not built, tokenizer (jieba) not installed, BM25 ranking misconfigured.
- **⚠️ Rate limiting blocks subsequent tests**: Registration endpoints with strict limits (e.g., 6/hour) will block all remaining registration-related tests with 429. Strategy: test non-registration endpoints first, registration tests last, and note which tests were blocked.
- **⚠️ Present the test plan BEFORE executing**: Show the user the complete test plan first. If they say "is this really all of it?", the plan is missing dimensions. Refer to the Comprehensive QA Dimensions Checklist above.
- **⚠️ "全部加上" means ALL dimensions**: When the user says to add everything, do not skip any dimension. Write all 25+ categories into the test plan even if some have only 1-2 test cases.
- **Multi-service auth**: Sites with shared cookies (e.g., `.ephron.ren` domain) need login on ONE service first, then verify cookie propagation to others. Don't try to login on each service independently.
- **Encoding bugs**: Always hex-dump HTML source to check for BOM markers (`ef bb bf`) or leading newlines before DOCTYPE. Use: `xxd file.html | head -5`. For Python source files, also check: `xxd file.py | head -1`.
- **CSRF tokens**: Many form submissions require CSRF tokens. Extract from the page first, then include in POST requests. Don't forget the CSRF cookie (`ephron_csrf`). Note: CSRF cookies are HttpOnly=false (by design, so JS can read them).
- **Rate limits**: Note rate limit values from source code (e.g., `@limiter.limit("5/minute")`). When testing auth failures, stay under the limit or you'll get 429s that mask the real bug.
- **Template vs runtime issues**: Some issues (empty content, missing sections) may be data issues, not code bugs. Verify by checking if the data source (database/content files) actually has content.
- **File delivery fallback**: When sending files via QQ/WeChat fails, push to a Gitea repo as a fallback delivery mechanism.
- **Source code security analysis**: Always check these files when available: `cookie_utils.py` (cookie params), `csrf.py` (CSRF mechanism), `redirect.py` (open redirect validation), `security_headers.py` (CSP/headers), `auth.py` (token format, lockout), `validators.py` (slug/path validation), `limiter.py` (rate limit config).
- **⚠️ CSP `script-src-elem` silently kills inline JS**: When a page has inline `<script>` but buttons call functions defined there (e.g., `onclick="saveDraft()"`), always verify the CSP header. The `script-src-elem` directive **overrides** `script-src` for script elements — so `script-src 'unsafe-inline'` combined with `script-src-elem 'self' https://cdn.example.com` blocks ALL inline scripts. Symptoms: functions report "not defined", buttons do nothing, no network requests on click. Detection: check `typeof fnName` in browser console, or look for CSP error in console: `Executing inline script violates the following Content Security Policy directive 'script-src-elem'`. Fix: add `'unsafe-inline'` to `script-src-elem`, use nonce/hash, or extract inline scripts to external `.js` files.
- **⚠️ CSP `form-action 'self'` blocks cross-origin redirects after form submission**: When a form POSTs to a same-origin endpoint (allowed by `form-action 'self'`), but the server responds with 303 redirect to a **different origin** (e.g., `auth.example.com` → `www.example.com`), the browser blocks the redirect. CSP `form-action` applies to the **entire redirect chain** resulting from form submission, not just the form's action URL. Symptoms: form appears to submit (POST in network tab), cookie gets set server-side, but page stays on the form URL — no navigation. Console error: `Sending form data to '...' violates the following Content Security Policy directive: "form-action 'self'"`. Detection: (1) test same-origin redirect (should work) vs cross-origin redirect (should fail); (2) `curl -sI` the 303 response — if it carries CSP with `form-action 'self'`, that's the blocker. Fix options: (a) skip CSP header on 303 redirect responses (empty body, CSP adds no protection); (b) use JS-based redirect instead of server-side 303; (c) add allowed origins to `form-action`. Key insight: this breaks any auth flow where login service is on a different subdomain than target pages. See `references/session-learnings-ephron-qa.md` for full reproduction steps.
## Scope Ambiguity Pitfall
When the user asks to "inspect a server" or "巡检服务器" **without providing a URL**:
- Clarify whether they mean the **local machine Hermes runs on** (system resources, running processes, disk/memory) or a **remote web service** (HTTP endpoints, app health).
- **Default assumption**: If the user mentions a domain name (e.g., "巡检 ephron.ren" or "check blog.ephron.ren"), they mean the remote web service. If they say "your server" or "the machine you're on", they mean the local machine.
- When in doubt, ask: "是巡检本机还是远程服务?"
## Tools Reference
| Tool | Purpose |
|------|---------|
| `browser_navigate` | Go to a URL |
| `browser_snapshot` | Get DOM text snapshot (accessibility tree) |
| `browser_click` | Click an element by ref (`@eN`) or text |
| `browser_type` | Type into an input field |
| `browser_scroll` | Scroll up/down on the page |
| `browser_back` | Go back in browser history |
| `browser_press` | Press a keyboard key |
| `browser_vision` | Screenshot + AI analysis; use `annotate=true` for element labels |
| `browser_console` | Get JS console output and errors |
| **Playwright Python** | Full browser automation via script — use when built-in tools unavailable or need programmatic control (see `references/playwright-qa.md`) |
## Related References
- `references/issue-taxonomy.md` — severity and category classification for issues
- `references/server-inspection.md` — local server inspection checklist: system resources, listening ports, processes, Docker, security services; also covers scope ambiguity (local vs. remote), route file reading strategy, cross-service cookie auth testing, static analysis checks
- `references/qa-dimensions-checklist.md` — comprehensive 25-dimension QA checklist for "full site" testing requests
- `references/playwright-qa.md` — Playwright Python setup, patterns, event monitoring, CSP bug detection
- `references/session-learnings-ephron-qa.md` — concrete findings from ephron.ren QA: CSP override, password validation gaps, fulltext search failure, delegate sizing
## Templates
- `templates/dogfood-report-template.md` — issue list template (the output with bugs found)
- `templates/test-plan-template.md` — test plan template (structured test cases with steps)
## Tips
- **Always check `browser_console()` after navigating and after significant interactions.** Silent JS errors are among the most valuable findings.
- **Use `annotate=true` with `browser_vision`** when you need to reason about interactive element positions or when the snapshot refs are unclear.
- **Test with both valid and invalid inputs** — form validation bugs are common.
- **Scroll through long pages** — content below the fold may have rendering issues.
- **Test navigation flows** — click through multi-step processes end-to-end.
- **Check responsive behavior** by noting any layout issues visible in screenshots.
- **Don't forget edge cases**: empty states, very long text, special characters, rapid clicking.
- When reporting screenshots to the user, include `MEDIA:<screenshot_path>` so they can see the evidence inline.

View File

@@ -0,0 +1,109 @@
# Issue Taxonomy
Use this taxonomy to classify issues found during dogfood QA testing.
## Severity Levels
### Critical
The issue makes a core feature completely unusable or causes data loss.
**Examples:**
- Application crashes or shows a blank white page
- Form submission silently loses user data
- Authentication is completely broken (can't log in at all)
- Payment flow fails and charges the user without completing the order
- Security vulnerability (e.g., XSS, exposed credentials in console)
### High
The issue significantly impairs functionality but a workaround may exist.
**Examples:**
- A key button does nothing when clicked (but refreshing fixes it)
- Search returns no results for valid queries
- Form validation rejects valid input
- Page loads but critical content is missing or garbled
- Navigation link leads to a 404 or wrong page
- Uncaught JavaScript exceptions in the console on core pages
### Medium
The issue is noticeable and affects user experience but doesn't block core functionality.
**Examples:**
- Layout is misaligned or overlapping on certain screen sections
- Images fail to load (broken image icons)
- Slow performance (visible loading delays > 3 seconds)
- Form field lacks proper validation feedback (no error message on bad input)
- Console warnings that suggest deprecated or misconfigured features
- Inconsistent styling between similar pages
### Low
Minor polish issues that don't affect functionality.
**Examples:**
- Typos or grammatical errors in text content
- Minor spacing or alignment inconsistencies
- Placeholder text left in production ("Lorem ipsum")
- Favicon missing
- Console info/debug messages that shouldn't be in production
- Subtle color contrast issues that don't fail WCAG requirements
## Categories
### Functional
Issues where features don't work as expected.
- Buttons/links that don't respond
- Forms that don't submit or submit incorrectly
- Broken user flows (can't complete a multi-step process)
- Incorrect data displayed
- Features that work partially
### Visual
Issues with the visual presentation of the page.
- Layout problems (overlapping elements, broken grids)
- Broken images or missing media
- Styling inconsistencies
- Responsive design failures
- Z-index issues (elements hidden behind others)
- Text overflow or truncation
### Accessibility
Issues that prevent or hinder access for users with disabilities.
- Missing alt text on meaningful images
- Poor color contrast (fails WCAG AA)
- Elements not reachable via keyboard navigation
- Missing form labels or ARIA attributes
- Focus indicators missing or unclear
- Screen reader incompatible content
### Console
Issues detected through JavaScript console output.
- Uncaught exceptions and unhandled promise rejections
- Failed network requests (4xx, 5xx errors in console)
- Deprecation warnings
- CORS errors
- Mixed content warnings (HTTP resources on HTTPS page)
- Excessive console.log output left from development
### UX (User Experience)
Issues where functionality works but the experience is poor.
- Confusing navigation or information architecture
- Missing loading indicators (user doesn't know something is happening)
- No feedback after user actions (e.g., button click with no visible result)
- Inconsistent interaction patterns
- Missing confirmation dialogs for destructive actions
- Poor error messages that don't help the user recover
### Content
Issues with the text, media, or information on the page.
- Typos and grammatical errors
- Placeholder/dummy content in production
- Outdated information
- Missing content (empty sections)
- Broken or dead links to external resources
- Incorrect or misleading labels

View File

@@ -0,0 +1,356 @@
# Multi-Service Site QA Patterns
## Architecture Recognition
When a site has multiple subdomains or services, first map the architecture:
| Indicator | What it means |
|-----------|--------------|
| Multiple `main.py` files in subdirectories | Separate service entry points |
| `shared/` directory with auth/cookie modules | Shared authentication across services |
| Different port numbers in config | Local dev runs separate processes |
| Subdomain routing (auth.ephron.ren, blog.ephron.ren) | Production reverse proxy setup |
## Common Multi-Service Patterns (FastAPI)
```
project/
├── auth/src/main.py # Auth service (login, register, RBAC)
├── blog/src/main.py # Blog service (posts, comments, likes)
├── canvas/src/main.py # Canvas service (AI-generated pages)
├── prompt/src/main.py # Prompt service (prompt CRUD)
├── home/src/main.py # Homepage service
├── shared/ # Shared modules (auth, CSRF, audit, templating)
│ ├── auth_users.py
│ ├── cookie_utils.py
│ ├── csrf.py
│ ├── templating.py
│ └── ports.py # Service URL configuration
└── main.py # Unified launcher (starts all services)
```
## Cross-Service Cookie Auth Testing
1. Login on auth service → get `ephron_auth` cookie
2. Verify cookie domain is `.example.com` (not service-specific)
3. Test cookie propagation: visit each service, check logged-in state
4. Test logout: logout on one service, verify all services see logged-out state
## Route File Reading Strategy
For each service, read these files in order:
1. `src/routes/pages.py` — public page routes
2. `src/routes/admin.py` — admin/management routes
3. `src/routes/api.py` — API endpoints
4. `src/routes/service_api.py` — inter-service APIs
5. `src/services/auth.py` — auth helpers (what permissions are checked)
Extract from each route:
- `@router.get("/path")` or `@router.post("/path")` → HTTP method + path
- `_require_auth(ephron_auth, request, permission="X.Y.Z")` → required permission
- `@limiter.limit("N/minute")` → rate limit
- `Form(...)` parameters → required form fields
- `Cookie(default=None)` → cookie dependencies
## Test Matrix Generation
For each discovered route, create test cases:
- **Happy path**: valid inputs, correct auth → expected success
- **Auth failure**: no cookie / wrong role → expected redirect or 403
- **Validation failure**: missing fields, invalid data → expected error
- **Rate limit**: exceed the limit → expected 429
- **CSRF**: missing/invalid CSRF token → expected rejection
## Consistency Checks Across Services
Build a comparison table:
| Feature | Service A | Service B | Service C |
|---------|-----------|-----------|-----------|
| mobile.css loaded? | ✅ | ❌ | ❌ |
| loader.js loaded? | ❌ | ✅ | ✅ |
| Site navigation? | ✅ | ✅ | ❌ |
| user-scalable? | yes | no | no |
Inconsistencies are bugs — all services sharing a design system should be consistent.
## Curl-Based QA Techniques (Session-Proven)
When browser automation is unavailable, these curl patterns reliably test multi-service sites:
### Cookie Management
```bash
# Each curl -c (save) / -b (read) needs a SEPARATE cookie file per request chain
curl -s -c /tmp/c1.txt https://auth.example.com/login > /tmp/login.html
curl -s -b /tmp/c1.txt -c /tmp/c2.txt -X POST https://auth.example.com/api/login \
-d "username=user&password=pass&csrf_token=$CSRF" > /dev/null
# Verify: grep ephron /tmp/c2.txt
```
### CSRF Token Extraction (FastAPI/Tortoise patterns)
```bash
# Most reliable — matches name= then grabs value:
grep -oP 'name="csrf_token"[^>]*value="\K[^"]+' /tmp/page.html | head -1
# Fallback variants:
grep -oP 'csrf_token.*?value="\K[^"]+' /tmp/page.html | head -1
grep -i 'csrf' /tmp/page.html | grep -oP 'value="\K[^"]+' | head -1
```
### API Login: JSON vs Form-Encoded
```bash
# Modern FastAPI services use /api/login with JSON:
curl -s -b /tmp/c.txt -c /tmp/c.txt -X POST https://auth.example.com/api/login \
-H "Content-Type: application/json" \
-d '{"username":"user","password":"pass","csrf_token":"TOKEN"}'
# Legacy form-encoded (action="/login"):
curl -s -b /tmp/c.txt -c /tmp/c.txt -X POST https://auth.example.com/login \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "username=user&password=pass&csrf_token=$CSRF"
```
### Post-Login Redirect Chain
```bash
# Follow 303 redirect chain automatically:
curl -sL -b /tmp/c.txt -c /tmp/c.txt -X POST https://auth.example.com/api/login \
-d "username=u&password=p&csrf_token=$CSRF" -w "\nHTTP:%{http_code}"
# Get final status: curl -sL ... -o /dev/null -w "%{http_code}"
```
### Health Checks (All Services at Once)
```bash
for svc in www auth blog canvas prompt; do
result=$(curl -s "https://$svc.example.com/health")
echo "$svc: $result"
done
```
### Security Headers (All Services)
```bash
for svc in www auth blog canvas prompt; do
echo "=== $svc ==="
curl -sI "https://$svc.example.com/" | grep -iE \
'x-content-type|x-frame|referrer-policy|content-security|set-cookie'
done
```
### CSP Deep Analysis — script-src-elem Override Trap
```bash
# Extract full CSP header
curl -sI https://www.example.com/admin | grep -i content-security-policy
# Look for script-src-elem which OVERRIDES script-src for <script> elements:
# BAD: script-src 'self' 'unsafe-inline'; script-src-elem 'self' https://cdn.example.com;
# GOOD: script-src 'self' 'unsafe-inline'; script-src-elem 'self' 'unsafe-inline' https://cdn.example.com;
#
# If script-src-elem exists without 'unsafe-inline', ALL inline <script> tags are blocked.
# Symptoms: onclick handlers call undefined functions, buttons do nothing, no JS errors in console
# (CSP violations appear as pageerror events, not console.error)
```
### Cookie Security Verification
```bash
# Capture Set-Cookie on login response:
curl -sI -c /tmp/c.txt -X POST https://auth.example.com/api/login \
-d "username=u&password=p&csrf_token=t" 2>/dev/null | grep -i set-cookie
# Expected: HttpOnly; Secure; SameSite=lax; Max-Age=604800; Domain=.example.com
```
### Session Fixation Check
```bash
# Before login: record cookie
curl -sI -c /tmp/before.txt https://auth.example.com/login | grep -i set-cookie
# (GET requests rarely set auth cookies)
# After login: cookie must change
curl -s -b /tmp/before.txt -c /tmp/after.txt -X POST .../api/login ...
grep ephron_auth /tmp/after.txt
# Session ID must be different from before
```
### Known Rate Limits (ephron.ren observed)
```bash
# Auth login failures: 5/min → 429
# Auth registration: 6/hour → 429 (use existing test accounts)
# Blog comments: 6/min
# Blog likes toggle: 11/min
# Save/publish ops: 21/min
```
### Delegate Task Sizing for Large Test Suites
When testing 100+ cases across multiple modules, delegate_task has a 600s timeout. Size tasks carefully:
| Task Type | Max Cases per Delegate | Reason |
|-----------|----------------------|--------|
| Curl-only HTTP tests | 15-20 | Each curl = 1-3s + overhead |
| Browser interactions | 5-8 | Each interaction = 10-30s |
| Mixed curl + Playwright | 8-12 | Browser calls dominate time |
**Faster alternative**: Use `execute_code` with `from hermes_tools import terminal` for in-process execution. No delegation overhead, same capabilities.
```python
from hermes_tools import terminal
results = {}
r = terminal("curl -s -o /dev/null -w '%{http_code}' https://example.com/")
results["T-001"] = {"status": "PASS" if "200" in r["output"] else "FAIL", "detail": f"HTTP {r['output']}"}
```
### CSRF Token Synchronization Pitfall (curl)
When testing forms that require CSRF tokens, the token in the cookie changes on every GET request. If you GET a page, extract the CSRF token, then POST with a **different** cookie jar, the tokens won't match and you'll get "CSRF token 验证失败".
```bash
# WRONG: separate cookie jars for GET and POST
curl -s -b /tmp/jar1.txt https://example.com/admin > /tmp/page.html # sets new CSRF cookie
curl -s -b /tmp/jar2.txt -X POST ... -d "csrf_token=$CSRF" # different jar = mismatch!
# RIGHT: same cookie jar for GET and POST in sequence
curl -s -b /tmp/jar.txt -c /tmp/jar.txt https://example.com/admin > /tmp/page.html
CSRF=$(grep -oP 'name="csrf_token"[^>]*value="\K[^"]+' /tmp/page.html | head -1)
curl -s -b /tmp/jar.txt -c /tmp/jar.txt -X POST ... -d "csrf_token=$CSRF"
```
**Why this happens**: FastAPI/Starlette CSRF middleware generates a new token on each GET and stores it in the `ephron_csrf` cookie. The POST handler compares the form token against the cookie token — they must come from the same request chain.
**Multiple forms on one page**: If a page has N forms, there will be N CSRF tokens in the HTML but only ONE in the cookie. Each form's token is unique. Extract the token from the specific form you need (use context-aware parsing, not just `head -1`).
### Owner vs Admin Permission Testing Pattern
When a site has RBAC (user < admin < owner), test with all roles:
```bash
# Login as each role
for role in owner admin user; do
curl -s -c /tmp/$role.txt -X POST https://auth.example.com/api/login \
-d "username=Elaina_$role&password=Pass123!" -o /dev/null
done
# Test each protected endpoint with each role
for role in owner admin user; do
status=$(curl -s -b /tmp/$role.txt -o /dev/null -w '%{http_code}' https://example.com/admin/roles)
echo "$role -> /admin/roles: $status"
done
```
**Key insight**: If admin role can't access a page but the nav bar shows the link, it's a UX bug (hidden nav items for unauthorized roles) or a permission misconfiguration.
### Content Restoration for Destructive Tests
When tests modify content (create invite codes, publish posts, change settings):
1. **Before testing**: Save current state
```bash
# Save homepage content
curl -s -b /tmp/admin.txt https://www.example.com/admin | grep -oP 'initialContent = JSON\.parse\("\K[^"]*' > /tmp/homepage_backup.json
# Save blog post slugs
curl -s https://blog.example.com/ | grep -oP '/posts/[a-z0-9-]+' | sort -u > /tmp/blog_slugs.txt
```
2. **During testing**: Create test data with identifiable markers (e.g., `QA_TEST_TEMP` in notes/titles)
3. **After testing**: Clean up test data
```bash
# Delete test invite codes
curl -s -b /tmp/owner.txt -X POST https://auth.example.com/admin/invites/delete \
-d "csrf_token=$CSRF&code=$TEST_CODE"
```
4. **Verify restoration**: Check that original content is unchanged
```bash
for slug in $(cat /tmp/blog_slugs.txt); do
status=$(curl -s -o /dev/null -w '%{http_code}' "https://blog.example.com/posts/$slug")
echo "$slug: $status"
done
```
### Module-by-Module Testing with Incremental Commits
For large QA tasks (100+ test cases across many modules), the user may want results committed after each module:
1. Create `test-results.md` with placeholder sections for all modules
2. Test module N → update the module section in test-results.md
3. `git add test-results.md && git commit -m "模块N完成: 通过X/失败Y" && git push`
4. Report progress to user
5. Repeat for next module
**Document structure per module**:
```markdown
## 模块 N名称
**状态**: ✅ 已完成
**执行时间**: YYYY-MM-DD HH:MM - HH:MM
**测试结果**: 通过 X / 失败 Y / 阻塞 Z共 N 项)
| 编号 | 结果 | 备注 |
|------|------|------|
| X-001 | ✅ 通过 | detail |
| X-002 | ❌ 失败 | 🔴 description |
### 模块 N 小结
- Summary bullets
### 💡 模块 N 优化建议
1. **🔴 [Critical]**: description
2. **🟡 [High]**: description
```
**Why per-module commits**: Gives the user incremental visibility, prevents data loss if the session breaks, and creates a clean git history.
### Registration Rate Limiting Pitfall
Registration endpoints typically have strict rate limits (e.g., 6/hour). When testing multiple registration scenarios (password validation, username checks, invite codes), the rate limit kicks in and blocks subsequent tests with 429, masking the real behavior.
**Workaround**:
- Test rate-limited endpoints LAST in each module
- Use existing test accounts for non-registration tests
- Note which tests were blocked by rate limiting in results
- Space out registration tests or use different IPs if possible
### Common API Field Names (FastAPI/Pydantic patterns)
```bash
# Blog likes toggle: field is `post_slug` (NOT `slug`)
curl -X POST https://blog.example.com/api/likes/toggle \
-H "Content-Type: application/json" \
-d '{"post_slug":"article-slug"}'
# Blog comments: post_slug + content + parent_id (nullable)
curl -X POST https://blog.example.com/api/comments/ \
-H "Content-Type: application/json" \
-d '{"post_slug":"article-slug","content":"text","parent_id":null}'
```
### Template Encoding Checks (BOM / Leading Whitespace)
```bash
# BOM marker: UTF-8 EF BB BF appears before DOCTYPE
xxd /tmp/page.html | head -3
# Leading newline before DOCTYPE: 0a 3c 21 44 4f ...
head -c 20 /tmp/page.html | xxd
# Python source BOM check:
xxd app.py | head -1
```
## Static Analysis Checks (no browser needed)
```bash
# Check for BOM markers
xxd file.html | head -3
# Look for: ef bb bf (UTF-8 BOM)
# Check for leading whitespace before DOCTYPE
head -c 20 file.html | xxd
# Check CSS variable definitions
grep -n "\-\-warning-bg|--error-bg|--success-bg" file.html
# Check for accessibility issues
grep -n 'user-scalable=no' *.html
grep -n 'alt=""' *.html
grep -n 'aria-hidden' *.html
# Check security headers
curl -sI https://example.com | grep -i "x-content-type|x-frame|referrer-policy|content-security"
```

View File

@@ -0,0 +1,156 @@
# Playwright Python for QA Testing
## Environment Setup
Playwright Python is available on this system:
- **Package**: `/home/ubuntu/.hermes/hermes-agent/venv/lib/python3.11/site-packages/playwright/`
- **Chromium**: `~/.cache/ms-playwright/chromium-1217/`
- **Import**: `from playwright.sync_api import sync_playwright`
- **Run**: `python3 script.py` (not `node` — Playwright Node module may not be installed)
## Basic Pattern
```python
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context()
page = context.new_page()
# Login
page.goto('https://auth.example.com/login')
page.wait_for_load_state('networkidle')
page.fill('#username', 'admin_user')
page.fill('#password', 'password')
page.click('button[type="submit"]')
page.wait_for_url('**/login-success**', timeout=10000)
# Navigate to target
page.goto('https://www.example.com/admin')
page.wait_for_load_state('networkidle')
# Interact and test...
browser.close()
```
## Event Monitoring (Critical for QA)
### Console Messages and JS Errors
```python
console_msgs = []
page_errors = []
page.on("console", lambda m: console_msgs.append(f"[{m.type}] {m.text}"))
page.on("pageerror", lambda e: page_errors.append(str(e)))
# After interactions:
for m in console_msgs:
print(f" {m}")
for e in page_errors:
print(f" ERROR: {e}")
```
### Network Requests and Responses
```python
api_responses = []
def on_response(r):
if '/api/' in r.url or '/admin/' in r.url:
api_responses.append({"url": r.url, "status": r.status})
page.on("response", on_response)
# After interactions:
for r in api_responses:
print(f" {r['url']} -> {r['status']}")
```
### Failed Resource Loads
```python
failed_resources = []
page.on("requestfailed", lambda r: failed_resources.append({"url": r.url, "error": r.failure}))
```
## Element Query Patterns
```python
# By text content
btn = page.query_selector('button:has-text("Save")')
links = page.query_selector_all('a:has-text("Login")')
# By CSS selector with attribute
input_el = page.query_selector('input[name="csrf_token"]')
form = page.query_selector('#contentForm')
# By role
submit = page.query_selector('button[type="submit"]')
# Get all buttons (for debugging)
all_btns = page.query_selector_all('button')
btn_texts = [b.inner_text().strip() for b in all_btns]
```
## JavaScript Evaluation
```python
# Check if function is defined
is_defined = page.evaluate("typeof saveDraft === 'function'")
# Get element properties
val = page.evaluate("document.getElementById('contentJson').value")
# Get page content
page_html = page.content()
body_text = page.inner_text('body')
# Execute arbitrary JS
result = page.evaluate("() => { return document.title; }")
```
## Screenshots
```python
# Full page
page.screenshot(path='/tmp/screenshot.png', full_page=True)
# Then analyze with vision tool
```
## Cookie Inspection
```python
cookies = context.cookies()
for c in cookies:
print(f"{c['name']}: domain={c['domain']}, httpOnly={c['httpOnly']}, "
f"secure={c['secure']}, sameSite={c.get('sameSite','N/A')}")
```
## Testing with Custom Headers (e.g., Bearer Token)
```python
# Create separate context with extra headers
context2 = browser.new_context(extra_http_headers={"Authorization": "Bearer fake_token"})
page2 = context2.new_page()
page2.goto('https://www.example.com/admin')
# Check if redirected to login
print(f"URL: {page2.url}")
page2.close()
context2.close()
```
## CSP Bug Detection Pattern
When buttons with `onclick="fnName()"` do nothing:
1. Check console for CSP violation: `"Executing inline script violates the following Content Security Policy directive 'script-src-elem'"`
2. Verify function availability: `page.evaluate("typeof fnName")` returns `"undefined"`
3. Confirm script tag exists but is blocked by CSP
4. Check CSP header: `curl -sI URL | grep content-security-policy`
5. Look for `script-src-elem` directive that overrides `script-src`
## Pitfalls
- **Use Python, not Node.js**: The `playwright` npm module may not be installed. Python works.
- **`expect_response` timeout**: Don't use broad URL patterns. Use specific path matches or handle timeout gracefully.
- **`expect_navigation` for SPA**: Single-page apps may not trigger navigation events. Use `wait_for_timeout` or check state changes instead.
- **Rate limit testing**: Don't try to trigger rate limits via Playwright — too slow. Use curl for rate limit tests.
- **`page.on("console")` misses CSP errors**: CSP violations appear as `pageerror` events, not console messages. Listen to both.

View File

@@ -0,0 +1,152 @@
# Comprehensive QA Dimensions Checklist
Use this checklist when the user asks for "full", "complete", or "comprehensive" QA testing.
Each dimension should appear as a section in the test plan with at least 1 test case.
## Core Functional (always cover)
- [ ] Page loads (HTTP 200) for all public pages
- [ ] Navigation links work (header, footer, sidebar)
- [ ] CRUD operations (create, read, update, delete)
- [ ] Form submissions (valid data, empty data, invalid data)
- [ ] Search/filter functionality
- [ ] Pagination
- [ ] Error pages (404, 500)
## Auth & Permissions
- [ ] Login page loads and form works
- [ ] Valid credentials → success + cookie set
- [ ] Invalid credentials → error message
- [ ] Logout clears cookie
- [ ] Cross-service cookie propagation (shared domain cookies)
- [ ] Admin pages: admin user can access
- [ ] Admin pages: regular user gets denied
- [ ] Admin pages: unauthenticated user redirects to login
- [ ] RBAC: different roles see different features
- [ ] Permission checks on API endpoints
## Input Validation
- [ ] Empty form submissions (browser validation or server error)
- [ ] Boundary values (min/max length, special chars)
- [ ] Password strength requirements
- [ ] Username format validation
- [ ] Email format validation (if applicable)
- [ ] Invite code validation (if invite-based registration)
## Security — Cookie
- [ ] Auth cookie: HttpOnly=true
- [ ] Auth cookie: Secure=true (production)
- [ ] Auth cookie: SameSite=Lax or Strict
- [ ] Auth cookie: Max-Age is reasonable (not infinite)
- [ ] Auth cookie: Domain scope correct (e.g., `.example.com` for subdomains)
- [ ] CSRF cookie: HttpOnly=false (by design, JS needs to read it)
## Security — CSRF
- [ ] All state-changing POST endpoints require CSRF token
- [ ] CSRF token matches between form field and cookie
- [ ] CSRF token expires (check timestamp-based expiry)
- [ ] Missing/invalid CSRF token returns 403 or error
## Security — Redirect
- [ ] `redirect` parameter accepts valid same-domain URLs
- [ ] `redirect` parameter rejects external domains (open redirect prevention)
- [ ] `redirect` parameter rejects protocol-relative URLs (`//evil.com`)
- [ ] Default redirect when parameter is empty/invalid
## Security — Rate Limiting
- [ ] Login rate limit (e.g., 5/minute per IP)
- [ ] Registration rate limit (e.g., 5/hour per IP)
- [ ] API rate limits (comments, likes, uploads)
- [ ] Account lockout after N failed attempts
- [ ] IP-based lockout after N failed attempts
- [ ] Rate limit returns 429 status
## Security — File Upload
- [ ] Allowed file types enforced (extension check)
- [ ] File size limit enforced
- [ ] Filename sanitized (no path traversal)
- [ ] Uploaded files stored safely (UUID names, outside web root or in controlled dir)
- [ ] Image processing (resize, format conversion) doesn't crash on malformed files
## Security — Input Injection
- [ ] XSS: user input rendered as text, not HTML (test `<script>alert(1)</script>`)
- [ ] Path traversal: slug validation prevents `../` sequences
- [ ] SQL injection: parameterized queries (verify from source code)
## Session & Token
- [ ] Token expiration: expired token redirects to login
- [ ] Token format validation (reject malformed tokens)
- [ ] Role changes: DB role takes precedence over token role
- [ ] Token max-age from configuration
## Content & Rendering
- [ ] Empty state (no content) shows appropriate message
- [ ] Long content doesn't break layout
- [ ] Special characters (CJK, emoji, HTML entities) render correctly
- [ ] Markdown rendering (code blocks, tables, lists)
- [ ] LaTeX/MathJax rendering (if applicable)
- [ ] Code syntax highlighting (if applicable)
## Encoding
- [ ] No BOM markers in HTML templates (`ef bb bf`)
- [ ] No leading whitespace before `<!DOCTYPE>`
- [ ] UTF-8 charset declared in meta tag
- [ ] Python source files: no BOM
## SEO & Metadata
- [ ] `<title>` tag present and descriptive on each page
- [ ] `<meta name="description">` present
- [ ] Open Graph tags (`og:title`, `og:description`, `og:url`, `og:image`)
- [ ] Twitter Card tags
- [ ] Canonical URL (`<link rel="canonical">`)
- [ ] `robots.txt` exists
- [ ] `sitemap.xml` exists and is valid
- [ ] RSS feed (if blog) exists and is valid XML
## Accessibility
- [ ] All `<img>` have `alt` text (or `aria-hidden` for decorative)
- [ ] No `user-scalable=no` in viewport meta
- [ ] Sufficient color contrast (text vs background)
- [ ] Skip-to-content link (visually hidden)
- [ ] Keyboard navigation: Tab order logical
- [ ] ARIA labels on interactive elements without visible text
- [ ] Form labels associated with inputs
## Performance
- [ ] All static assets return 200 (CSS, JS, images)
- [ ] No broken links (404s in static resources)
- [ ] CDN reliability (especially for users in China — jsDelivr may timeout)
- [ ] Page load doesn't hang on slow external resources
- [ ] Resource count reasonable (no excessive requests)
## Responsive Design
- [ ] Layout at 375px (mobile) — no horizontal overflow
- [ ] Layout at 768px (tablet) — breakpoint works
- [ ] Layout at 1440px (desktop) — content centered
- [ ] Touch targets large enough (44x44px minimum)
## Cross-Browser
- [ ] Chrome/Chromium rendering
- [ ] Firefox rendering
- [ ] Safari rendering (WebKit differences)
- [ ] Edge rendering
## Operations
- [ ] `/health` endpoint returns `{"status":"ok"}` per service
- [ ] 404 page is custom (not default framework error)
- [ ] 500 errors don't leak stack traces to users
- [ ] Audit log captures admin actions (verify from source)
- [ ] Audit log captures login attempts (success/failure)
## Consistency (cross-service)
- [ ] All pages include same CSS files (mobile.css, etc.)
- [ ] All pages include same JS files (loader.js, etc.)
- [ ] All pages have site-wide navigation bar
- [ ] All pages have same security headers
- [ ] All pages have same viewport meta
## Security Headers
- [ ] `X-Content-Type-Options: nosniff`
- [ ] `X-Frame-Options: DENY`
- [ ] `Referrer-Policy: strict-origin-when-cross-origin`
- [ ] `Content-Security-Policy` present and reasonable
- [ ] No `unsafe-eval` in CSP (check for `'unsafe-eval'`)

View File

@@ -0,0 +1,69 @@
# Server Inspection Reference
When asked to inspect a server without a URL, assume the **local machine Hermes runs on**.
## Quick Checklist
### System Resources
```bash
# CPU, load, uptime
uptime && top -bn1 | head -3 && nproc
# Memory
free -h
# Disk
df -h
```
### Running Services & Processes
```bash
# All listening ports
ss -tlnp | grep LISTEN
# Top processes by CPU
ps aux --sort=-%cpu | head -10
# Docker containers
docker ps -a
```
### Service Manager
```bash
systemctl list-units --type=service | grep running
# or
service --status-all
```
### Network
```bash
# All LISTEN ports (not just common ones)
ss -tlnp
# DNS resolution test
nslookup example.com
```
### Security
```bash
# fail2ban status
fail2ban-client status
# UFW firewall (if enabled)
ufw status
```
## Scope Signals
| User says | Means |
|-----------|-------|
| "服务器巡检" / "server inspection" | Local machine (no URL given) |
| "巡检 ephron.ren" | Remote web service at that domain |
| "check the service on port 8000" | Likely remote host:port |
| "你的服务器" / "this machine" | Local machine explicitly |
## Anti-Patterns
- **Don't** default to checking remote web services when no URL is provided
- **Don't** assume the remote service is on the same machine as Hermes
- **Do** ask for clarification if "server" could mean local or remote

View File

@@ -0,0 +1,97 @@
# Session Learnings: ephron.ren QA (2026-05-03)
## Environment Facts
- 5 services: Home(8000), Auth(8001), Blog(8002), Canvas(8003), Prompt(8004)
- Auth: FastAPI + Tortoise ORM, `.ephron.ren` domain cookie
- RBAC: user(10) < admin(20) < owner(30)
- CSRF: `{unix_timestamp}:{sha256_hex}` format, 75 chars, per-GET refresh
- Rate limits: login 5/min, register 6/hour, comments 6/min, likes 11/min, save 21/min
## High-Value Findings (Reproducible Patterns)
### CSP script-src-elem Override (Critical)
- **Symptom**: Buttons with `onclick="fnName()"` do nothing, `typeof fnName` returns `undefined`
- **Root cause**: `script-src-elem 'self' https://cdn.example.com` overrides `script-src 'unsafe-inline'`
- **Detection**: `curl -sI URL | grep content-security-policy`, look for `script-src-elem` without `'unsafe-inline'`
- **Impact**: All inline JS blocked → save/publish/discard buttons broken, client-only validation bypassed
### CSP form-action Blocks Cross-Origin Redirects (Critical)
- **Date**: 2026-05-05
- **Symptom**: Login form submits (POST appears in network tab), server sets cookie, but browser stays on login page — no redirect
- **Root cause**: CSP `form-action 'self'` on the 303 redirect response blocks navigation to cross-origin targets
- **Reproduction**:
1. Visit `https://auth.ephron.ren/login?redirect=aHR0cHM6Ly93d3cuZXBocm9uLnJlbi8=` (redirect=base64 of `https://www.ephron.ren/`)
2. Fill username/password, click submit
3. Browser sends POST to `/api/login` (same origin ✅ allowed)
4. Server returns 303 to `https://www.ephron.ren/` with CSP header containing `form-action 'self'`
5. Browser blocks redirect: `https://www.ephron.ren/``self` (`https://auth.ephron.ren`)
- **Controlled test**: Same-origin redirect (`auth.ephron.ren/admin`) works fine; cross-origin fails
- **Console error**: `Sending form data to 'https://auth.ephron.ren/api/login' violates the following Content Security Policy directive: "form-action 'self'"`
- **Fix**: Skip CSP header on 303 responses (empty body, no protection value), or use JS redirect
- **Affected pages**: ALL pages that redirect to login with a cross-origin redirect target (www/blog/canvas/prompt subdomains)
- **Key source files**: `shared/security_headers.py` (CSP middleware), `auth/src/routes/api.py` (login endpoint), `auth/src/utils/redirect.py` (redirect validation)
### Server-Side Password Validation Missing
- **Test**: `curl -X POST /api/register -d 'username=test&password=123&password_confirm=456&invite_code=CODE'`
- **Expected**: 400/422 with validation error
- **Actual**: 303 redirect (registration succeeds with weak/mismatched passwords)
- **Root cause**: Validation only in client JS (blocked by CSP)
- **Lesson**: Always test form validation with curl, not just browser
### Fulltext Search Silent Failure
- **Test**: `GET /posts?q=openclaw&mode=fulltext` returns 0 results, `mode=simple` returns 6
- **Root cause**: BM25 index not built or jieba tokenizer not installed
- **Detection**: Compare simple vs fulltext results for same query
### API Auth Order Bug
- **Test**: `POST /api/service/posts` without token, with invalid body
- **Expected**: 401 (unauthenticated)
- **Actual**: 422 (body validation error — leaks endpoint info)
- **Root cause**: Pydantic validation middleware runs before auth middleware
## Delegate Task Sizing
- Curl-only tasks: max ~15-20 test cases per delegate (30+ cases timeout at 600s)
- Browser tasks: max ~5-8 interactions per delegate (each = 10-30s)
- Use `execute_code` with `from hermes_tools import terminal` for fastest execution
- Parallel delegates: 3 max, but each should be independently scoped
## Cookie Jar Synchronization
- CSRF token changes on every GET request
- Must use SAME cookie jar for GET (extract token) and POST (submit form)
- Multiple CSRF tokens on one page (one per form) — extract from specific form context
- Cross-service cookies: Domain=.ephron.ren should work for all subdomains
- If cross-service test fails, check cookie jar file, not the cookie itself
## Content Restoration Pattern (Playwright)
When homepage/admin content is accidentally overwritten, restore via Playwright:
1. Prepare JSON with original content (experience/projects/skills/contact/footer)
2. Login → navigate to admin page
3. Use `page.evaluate()` to set form fields by `id=` (NOT `name=` — admin forms use id):
```js
document.getElementById('contact_email').value = '...';
document.getElementById('footer_copyright').value = '...';
```
4. Set structured data: `initialContent.experience = [...]; renderExperience();`
5. Set `is_draft: false` for items that should be published
6. Collect and publish:
```js
const content = collectFormData();
document.querySelector('input[name="content_json"]').value = JSON.stringify(content);
// Find form with content_json input, set action=/admin/publish, submit
```
7. Verify with `curl -s https://site/` checking for restored content strings
## Form Field Discovery
- Admin page fields may use `id=` instead of `name=` — check both:
```bash
curl -s -b cookies /admin | grep -oP 'id="[^"]*"' | sort -u
curl -s -b cookies /admin | grep -oP 'name="[^"]*"' | sort -u
```
- `collectFormData()` reads from visible form elements, not hidden `content_json`
- Setting `content_json` directly is overwritten by `collectFormData()` on submit
## Playwright vs curl for Form Submission
- **curl**: CSRF token sync is fragile (token changes per-GET, cookie jar must match)
- **Playwright**: Handles cookies/CSRF automatically, but CSP may block inline JS
- **Best approach**: Use Playwright + `page.evaluate()` to bypass CSP-blocked functions
- **Pattern**: Set form fields via JS → call `collectFormData()` → set `content_json` → submit form directly

View File

@@ -0,0 +1,89 @@
# QA Issue List
> **Note:** This is the ISSUE LIST. The TEST PLAN is a separate document (`test-plan.md`).
> Always deliver both documents together.
**Target:** {target_url}
**Date:** {date}
**Scope:** {scope_description}
**Tester:** Hermes Agent (automated exploratory QA)
---
## Executive Summary
| Severity | Count |
|----------|-------|
| 🔴 Critical | {critical_count} |
| 🟠 High | {high_count} |
| 🟡 Medium | {medium_count} |
| 🔵 Low | {low_count} |
| **Total** | **{total_count}** |
**Overall Assessment:** {one_sentence_assessment}
---
## Issues
<!-- Repeat this section for each issue found, sorted by severity (Critical first) -->
### Issue #{issue_number}: {issue_title}
| Field | Value |
|-------|-------|
| **Severity** | {severity} |
| **Category** | {category} |
| **URL** | {url_where_found} |
**Description:**
{detailed_description_of_the_issue}
**Steps to Reproduce:**
1. {step_1}
2. {step_2}
3. {step_3}
**Expected Behavior:**
{what_should_happen}
**Actual Behavior:**
{what_actually_happens}
**Screenshot:**
MEDIA:{screenshot_path}
**Console Errors** (if applicable):
```
{console_error_output}
```
---
<!-- End of per-issue section -->
## Issues Summary Table
| # | Title | Severity | Category | URL |
|---|-------|----------|----------|-----|
| {n} | {title} | {severity} | {category} | {url} |
## Testing Coverage
### Pages Tested
- {list_of_pages_visited}
### Features Tested
- {list_of_features_exercised}
### Not Tested / Out of Scope
- {areas_not_covered_and_why}
### Blockers
- {any_issues_that_prevented_testing_certain_areas}
---
## Notes
{any_additional_observations_or_recommendations}

View File

@@ -0,0 +1,69 @@
# QA Test Plan
**Site:** {site_name}
**URL:** {target_url}
**Date:** {date}
**Source:** {repo_url}
---
## 一、测试概览
### 1.1 服务架构
| 服务 | 地址 | 端口 | 说明 |
|------|------|------|------|
| {name} | {url} | {port} | {description} |
### 1.2 测试账号
| 角色 | 用户名 | 密码 | 用途 |
|------|--------|------|------|
| 管理员 | {admin_user} | {admin_pass} | 测试管理后台 |
| 普通用户 | {normal_user} | {normal_pass} | 测试前台 + 权限拦截 |
### 1.3 认证机制
- Cookie 名称: `{cookie_name}`
- Cookie 域: `{cookie_domain}`
- Token 签发: {mechanism}
- 权限模型: {rbac_description}
### 1.4 优先级定义
| 级别 | 含义 |
|------|------|
| P0 | 核心功能,阻塞使用 |
| P1 | 重要功能,影响体验 |
| P2 | 次要功能,可延后 |
---
## 二、测试用例
### 模块 N{模块名} ({服务名})
#### N.1 {子模块名}
| 编号 | 测试内容 | 步骤 | 预期 | 账号 | 优先级 |
|------|----------|------|------|------|--------|
| X-001 | {test_name} | {steps} | {expected} | {account} | {priority} |
---
## 三、测试执行流程
```
Step 1 → {first_step}
Step 2 → {second_step}
...
```
---
## 四、统计
| 模块 | 公开页面 | 管理后台 | 合计 |
|------|:--------:|:--------:|:----:|
| {module} | {n} | {n} | {n} |
| **合计** | **{n}** | **{n}** | **{n}** |