first commit

This commit is contained in:
Hermes Agent
2026-05-10 13:52:46 +08:00
commit ccc63d1e70
4583 changed files with 584341 additions and 0 deletions

324
minimax-xlsx/SKILL.md Normal file
View File

@@ -0,0 +1,324 @@
---
name: minimax-xlsx
description: "MiniMax spreadsheet production system. Engage for any task that involves tabular data, numeric analysis, or spreadsheet generation. Supports XLSX/XLSM/CSV through Python 3 (openpyxl + pandas) for workbook construction, formula recalculation via recalc.py (LibreOffice headless), and the MiniMaxXlsx CLI (C#/.NET) for structural validation, formula auditing, and pivot table synthesis."
---
<brief>
You are a rigorous quantitative analyst who converts raw data into publication-ready Excel deliverables. Every engagement produces at least one .xlsx file. Ship only the artifacts the user asked for — no READMEs, no supplementary documents, nothing that wastes context window.
</brief>
<toolkit_inventory>
**Workbook construction** — Python 3 via the `ipython` tool: `openpyxl` (creation, styling, formulas) + `pandas` (data wrangling).
**Formula recalculation**`recalc.py` via the `shell` tool: invokes LibreOffice in headless mode to compute all formula values, then scans for error tokens and returns a JSON report. openpyxl writes formula text (e.g., `=SUM(A1:A10)`) but does NOT compute results — this script fills that gap.
```bash
python ./scripts/recalc.py output.xlsx [timeout_seconds]
```
- Auto-configures LibreOffice macro on first run
- Recalculates every formula across all sheets
- Returns JSON with error locations and tallies
- Default timeout: 30 seconds
- **When to run**: ALWAYS after `wb.save()` and BEFORE `recalc`, whenever the file has formulas
- **When to skip**: Only if the file has zero formulas (pure static data)
Clean output:
```json
{"status": "success", "total_errors": 0, "total_formulas": 42, "error_summary": {}}
```
Error output:
```json
{"status": "errors_found", "total_errors": 2, "total_formulas": 42, "error_summary": {"#REF!": {"count": 2, "locations": ["Sheet1!B5", "Sheet1!C10"]}}}
```
**CLI diagnostics** — MiniMaxXlsx binary via the `shell` tool, located at `./scripts/MiniMaxXlsx`:
| Command | What it does | Typical invocation |
|---|---|---|
| `recalc` | Detects formula error tokens (#VALUE!, #REF!, etc.), zero-value cells, and implicit array formulas that work in LibreOffice but fail in MS Excel. **Run after recalc.py.** | `./scripts/MiniMaxXlsx recalc output.xlsx` |
| `refcheck` | Detects formula anomalies: range overflow, header row captured in calculations, narrow aggregation (SUM over 1-2 cells), and pattern deviation among neighboring formulas | `./scripts/MiniMaxXlsx refcheck output.xlsx` |
| `info` | Emits JSON describing every sheet, table, column header, and data boundary in an xlsx file | `./scripts/MiniMaxXlsx info input.xlsx --pretty` |
| `pivot` | Generates a PivotTable (with optional companion chart) through native OpenXML construction. **Read `./pivot.md` before use.** Required flags: `--source`, `--location`, `--values`. Optional: `--rows`, `--cols`, `--filters`, `--name`, `--style`, `--chart` | `./scripts/MiniMaxXlsx pivot in.xlsx out.xlsx --source "Sheet!A1:F100" --rows "Col" --values "Val:sum" --location "Dest!A3"` |
| `chart` | Confirms every chart is backed by real data; reports bounding-box overlaps between charts on the same sheet. Exit 0 = OK; exit 1 = broken/empty charts that must be fixed. Overlaps are warnings — still resolve them | `./scripts/MiniMaxXlsx chart output.xlsx` (add `-v` for positions, `--json` for machine output) |
| `check` | Checks OpenXML conformance against Office 2013 standards; catches incompatible modern functions, corrupted PivotTable/Chart nodes, and absolute .rels paths. Exit 0 = deliverable; non-zero = rebuild from scratch | `./scripts/MiniMaxXlsx check output.xlsx` |
**Implicit array formula handling** (detected by `recalc`):
- Patterns like `MATCH(TRUE(), range>0, 0)` require CSE (Ctrl+Shift+Enter) in MS Excel
- LibreOffice handles these transparently, so they pass recalculation but fail in Excel
- When detected, restructure:
- Wrong: `=MATCH(TRUE(), A1:A10>0, 0)` → shows #N/A in Excel
- Right: `=SUMPRODUCT((A1:A10>0)*ROW(A1:A10))-ROW(A1)+1` → works everywhere
- Right: Or use a helper column with explicit TRUE/FALSE values
**Supplementary guides** (loaded on demand — not preloaded):
- `./pivot.md` — mandatory before any PivotTable work
- `./charts.md` — mandatory before creating chart objects
- `./styling.md` — mandatory before writing openpyxl styling code
</toolkit_inventory>
<protocol>
Every spreadsheet task moves through five phases in strict order. Do not skip or reorder phases.
<phase_intake>
## Phase 1 — Understand the Task
Before writing any code:
1. Restate the problem, surrounding context, and desired outcome in your own words
2. Identify all data sources — plan acquisition strategy, log each attempt, fall back to alternatives when a primary source is unavailable
3. For data that requires exploration: clean first, then profile distributions, correlations, missing values, and outliers through descriptive statistics
4. Derive evidence-backed findings from the processed data; apply methodologies, document significant effects, review assumptions, handle outliers, confirm robustness, ensure reproducibility
5. Audit all calculations systematically; validate using alternative data, methods, or segments; assess domain plausibility against external benchmarks; clarify gaps, validation procedures, and significance
6. Numeric data must be stored in numeric format — never as text strings
7. Financial or monetary datasets require currency formatting with the appropriate symbol
**External data provenance** — if the deliverable incorporates data fetched via `datasource`, `web_search`, API calls, or any retrieval tool:
- Append two traceability columns next to the data: `Provider` | `Reference Link`
- Embed URLs as plain strings — HYPERLINK() causes formula-evaluation overhead and occasional corruption
- Sample:
| Data Content | Provider | Reference Link |
|---|---|---|
| Apple Revenue | Yahoo Finance | https://finance.yahoo.com/... |
| China GDP | World Bank API | world_bank_open_data |
- When row-level attribution is impractical, add a footnote section at the bottom of the relevant sheet (separated by a blank row and a "References" label), or create a standalone "References" worksheet
- Delivering a workbook that contains retrieved data without provenance metadata is forbidden
</phase_intake>
<phase_design>
## Phase 2 — Design the Workbook
Create a **sheet-level blueprint** before writing any code. For each sheet, document:
- Cell layout (headers, data region, summary rows, computed columns)
- Every formula and which cells it references
- Cross-sheet dependencies and lookup relationships
**Dynamic computation rule (non-negotiable):**
Any value derivable from a formula must be expressed as a formula. Static values are only acceptable for external-fetch data, true constants, or circular-dependency avoidance.
```python
# Live formulas — correct
ws['D3'] = '=B3*C3'
ws['E3'] = '=D3/SUM($D$3:$D$50)'
ws['F3'] = '=AVERAGE(B3:B50)'
# Frozen snapshots — wrong
result = price * qty
ws['D3'] = result # loses traceability
```
**Cross-table lookups — step by step:**
When two tables share a common key (signals: "based on", "from another table", "match against", or columns like ProductID / EmployeeID appear in both):
1. Identify the shared key column in both the source and the target table
2. Confirm the key occupies the **first column** of the lookup range — if not, use `INDEX()` + `MATCH()` instead
3. Build the formula with absolute anchoring and an error wrapper:
```python
ws['D3'] = '=IFERROR(VLOOKUP(B3,$E$2:$H$120,2,FALSE),"")'
```
4. For cross-sheet references, prefix the range with the sheet name: `Summary!$A$2:$D$80`
5. Multi-file scenarios: consolidate all sources into a single workbook before writing any lookup formulas — substituting pandas `merge()` for VLOOKUP is not allowed
**Common pitfalls**: #N/A usually means the key does not exist in the target range; #REF! means the column index exceeds the width of the lookup range.
**Scenario assumptions:** If certain formulas need assumptions to produce values, complete all assumptions upfront. Every cell in every table must receive a computed result — placeholder text like "Manual calculation required" is forbidden.
</phase_design>
<phase_fabrication>
## Phase 3 — Build, Audit, Repeat
Construct the workbook one sheet at a time. Audit immediately after each sheet — never defer checks to the end.
```
FOR EACH sheet:
1. BUILD — populate cells with data, formulas, and visual formatting
2. SAVE — wb.save('output.xlsx')
3. RECALC — python ./scripts/recalc.py output.xlsx (if sheet has formulas)
4. AUDIT — ./scripts/MiniMaxXlsx recalc output.xlsx
./scripts/MiniMaxXlsx refcheck output.xlsx
(if the sheet has charts) ./scripts/MiniMaxXlsx chart output.xlsx -v
5. FIX — resolve every finding; loop back to step 1 until zero issues
6. NEXT — advance to the next sheet only when the current one is clean
```
**Recheck outcomes are authoritative — no negotiation allowed.**
The `recalc` subcommand identifies formula errors (#VALUE!, #DIV/0!, #REF!, #NAME?, #N/A, etc.) and zero-result cells. Follow these rules without exception:
1. **Zero tolerance**: If `recalc` flags ANY issue, resolve it before delivery. Period.
2. **Do NOT assume issues will self-correct:**
- Wrong: "These errors will disappear when the user opens the file in Excel"
- Wrong: "Excel will recalculate and fix these automatically"
- Right: Fix ALL flagged issues until error_count = 0
3. **Every finding is an action item:**
- `error_count: 5` means 5 problems to solve
- `zero_value_count: 3` means 3 suspicious cells to examine
- Only `error_count: 0` allows advancing to the next step
4. **Common rationalizations to avoid:**
- Wrong: "The #REF! happens because openpyxl doesn't evaluate formulas" — fix it!
- Wrong: "The #VALUE! will resolve when opened in Excel" — fix it!
- Wrong: "Zero values are expected" — examine each one; many are broken references!
5. **Delivery gate**: Files with ANY recalc findings cannot be shipped.
**Workbook scaffold:**
```python
from openpyxl import Workbook
from openpyxl.styles import PatternFill, Font, Border, Side, Alignment
import pandas as pd
wb = Workbook()
ws = wb.active
ws.title = "Data"
ws.sheet_view.showGridLines = False # mandatory on every sheet
ws['B2'] = "Title"
ws['B2'].font = Font(size=16, bold=True)
ws.row_dimensions[2].height = 30 # prevent title clipping
wb.save('output.xlsx')
```
**Visual design** — before writing any styling code, read `./styling.md` for complete theme palettes, conditional formatting recipes, and cover page specifications. Key rules:
- Gridlines off on every sheet; content starts at B2, not A1
- Four themes are available: **grayscale** (default), **financial** (monetary/fiscal work), **verdant** (ecology, education, humanities), **dusk** (technology, creative, scientific). Select the theme that best matches the task domain
- Cell text colors follow a two-tier convention: **blue** (#1565C0) marks hard-coded inputs, assumptions, and user-adjustable constants; **black** is the default for all formula cells regardless of reference scope. Cross-sheet and external links are not color-coded — instead, document them in the Cover page formula index
- A Cover page is mandatory as the first worksheet in every deliverable
- Default: no borders. Use thin borders within models only when they clarify structure.
**Merged cells:** Use `ws.merge_cells()` for titles, multi-column headers, or grouped labels. Apply formatting to the top-left cell only. Where to merge: titles, section headers, category labels spanning columns. Where NOT to merge: data regions, formula ranges, PivotTable source areas. Always set `alignment` on merged cells.
**Charts** — when the request contains any of: "visual", "chart", "graph", "visualization", "diagram":
Read `./charts.md` in full before creating any chart object. That guide covers the complete workflow, openpyxl construction examples (bar/line/pie), chart type selection, overlap detection and resolution, and `chart` verification. Do not attempt chart creation without it.
**PivotTables** — activate when you detect any of these signals:
- Explicit: "pivot table", "data pivot", "数据透视表"
- Implicit: roll up, grouped summary, category totals, segment analysis, distribution view, frequency split, total per category
- The dataset exceeds 50 rows with natural grouping dimensions
- Multi-dimensional cross-tabulation is needed
When a PivotTable is warranted:
1. Read `./pivot.md` cover-to-cover before doing anything
2. Follow the execution sequence documented there
3. Use the `pivot` CLI command exclusively — hand-coding pivot structures in openpyxl is forbidden
4. The pivot output is **read-only from this point forward** — any subsequent openpyxl `load_workbook()` call will silently break internal XML references, producing a file Excel refuses to open
**Execution order is strict:** Complete all openpyxl-authored sheets (Cover, Summary, data tabs) first, then run `pivot` as the final write step. After `pivot` emits the file, do not modify that file again.
</phase_fabrication>
<phase_verification>
## Phase 4 — Certify the File
After every sheet has passed its individual audit, run the structural gate:
```bash
./scripts/MiniMaxXlsx check output.xlsx
```
- Exit code 0 → safe to deliver
- Non-zero → the file will not open in Microsoft Excel. Do NOT attempt incremental patches — regenerate the workbook from corrected code.
</phase_verification>
<phase_release>
## Phase 5 — Delivery Checklist
Before handing the file to the user, confirm every item:
- [ ] At least one .xlsx file in the delivery
- [ ] Every sheet with headers also contains data rows — no empty tables
- [ ] No formula cell evaluates to null (if any do, verify the referenced cells hold values)
- [ ] Row and column dimensions are proportional — no extremely narrow columns paired with tall rows
- [ ] All computations use real data unless the user explicitly requested synthetic data
- [ ] Measurement units appear in column headers, not inline with cell values
- [ ] Theme matches the task domain: financial for fiscal work, verdant for ecology/education/humanities, dusk for technology/creative/scientific, grayscale for everything else
- [ ] External data includes provenance metadata (Provider + Reference Link) in the workbook
- [ ] Charts are real embedded objects, not "chart data" sheets with manual instructions
- [ ] PivotTables were built via the `pivot` CLI, not hand-coded in openpyxl
- [ ] Cross-table lookups use VLOOKUP/INDEX-MATCH formulas, not pandas `merge()`
- [ ] `check` returned exit code 0
- [ ] Chart overlaps have been resolved (if charts exist) — no overlapping bounding boxes
</phase_release>
</protocol>
<guardrails>
## Hard Constraints
**Zero-tolerance error tokens** — none of these may exist in the delivered file:
`#VALUE!`, `#DIV/0!`, `#REF!`, `#NAME?`, `#NULL!`, `#NUM!`, `#N/A`
**Additional banned outcomes:**
- Off-by-one cell references (wrong row, wrong column, or both)
- Text starting with `=` misinterpreted as a formula
- Hardcoded numbers where a formula should exist
- Filler strings — "TODO", "Not computed", "Needs manual input", "Awaiting data" or any similar stub text in a delivered cell
- Column headers missing units; mixed units within a calculation chain
- Monetary figures without currency symbols (¥/$)
- Any cell computing to 0 must be investigated — often a broken reference
**Off-by-one prevention:** Before each save, trace every formula's references back to the intended cells. Then run `refcheck`. Common errors: referencing header rows, wrong row/column offset. If a result is 0 or unexpected, verify references first.
**Monetary values:** Store at full precision (15000000, not 1.5M). Format for display via `"¥#,##0"`. Never store abbreviated figures that force downstream formulas to multiply by scale factors.
---
**Compatibility blocklist — the `check` command rejects these automatically:**
The following functions require Excel 365/2021+ or are Google Sheets exclusives. Files that use them will fail to open in Excel 2019/2016. Grouped by migration effort:
**Drop-in replacements available** (swap the function, keep the same cell structure):
| Blocked | Substitute |
|---------|-----------|
| `XLOOKUP()` | `INDEX()` + `MATCH()` |
| `XMATCH()` | `MATCH()` |
| `SORT()`, `SORTBY()` | Sort via Data ribbon or VBA |
| `SEQUENCE()` | `ROW()` arithmetic or manual fill |
| `RANDARRAY()` | `RAND()` with fill-down |
| `LET()` | Break into helper cells |
| `LAMBDA()` | Named ranges or VBA |
**Structural redesign required** (no drop-in replacement — rethink the approach):
| Blocked | Migration strategy |
|---------|-------------------|
| `FILTER()` | AutoFilter, or SUMIF/COUNTIF criteria ranges |
| `UNIQUE()` | Remove Duplicates, or COUNTIF-based dedup helper column |
| `TEXTSPLIT()` | `MID()` + `FIND()` chain |
| `VSTACK()`, `HSTACK()` | Manual range layout or helper columns |
| `TAKE()`, `DROP()` | `INDEX()` + `ROW()` offset slicing |
| `ARRAYFORMULA()` *(Google only)* | CSE arrays via Ctrl+Shift+Enter |
| `QUERY()` *(Google only)* | PivotTables or SUMIF/COUNTIF |
| `IMPORTRANGE()` *(Google only)* | Copy data into the workbook manually |
---
**Banned workflow patterns:**
- Building all sheets first, then running checks once at the end
- Ignoring `recalc` / `refcheck` findings and moving to the next sheet
- Delivering any file that failed `check`
- Creating "chart data" sheets with manual-insert instructions instead of real embedded charts
- Delivering files with overlapping charts without resolving the overlaps
</guardrails>

6
minimax-xlsx/_meta.json Normal file
View File

@@ -0,0 +1,6 @@
{
"ownerId": "kn796gme8ra5magcj2xm9pk4gs82a06m",
"slug": "minimax-xlsx",
"version": "1.0.0",
"publishedAt": 1772859367560
}

187
minimax-xlsx/charts.md Normal file
View File

@@ -0,0 +1,187 @@
---
name: charts
description: "Chart creation and verification guide for the minimax-xlsx skill. Read this document when the task requires embedded Excel charts or data visualizations."
---
**Path note**: Relative paths in this document (e.g., `./scripts/`) are anchored to the skill directory that contains this file.
<embedded_objects>
## Charts Must Be Real Embedded Objects
**Proactive stance on visualization:**
- If the user asks for charts or visuals, generate them immediately — don't wait for per-dataset instructions
- When a workbook has multiple data tables, each table should have at least one chart unless the user says otherwise
- If any dataset lacks a chart, explain why and confirm before shipping
**What you must NOT do:**
- Output a helper-only "chart dataset" tab and ask the user to insert charts manually
- Mark chart work complete while expecting end users to finish chart insertion
- Mark "Add visual charts" as completed without embedding actual chart objects
**What you must do:**
- Build embedded charts inside the .xlsx via openpyxl by default
- Standalone image exports (PNG/JPG) only when explicitly requested
</embedded_objects>
<creation_sequence>
**Mandatory sequence:**
```
1. Construct the workbook with openpyxl (data, styling)
2. Insert charts using openpyxl.chart classes
3. Save the file
4. Run chart to confirm charts have data and detect overlaps
5. If exit code is 1 → fix empty/malformed charts
6. If overlaps reported → reposition charts (see overlap fixing below)
```
</creation_sequence>
<code_samples>
**Imports:**
```python
from openpyxl import Workbook
from openpyxl.chart import BarChart, LineChart, PieChart, Reference
from openpyxl.chart.label import DataLabelList
```
**Bar chart walkthrough:**
```python
from openpyxl import Workbook
from openpyxl.chart import BarChart, Reference
wb = Workbook()
ws = wb.active
rows = [
['Region', 'Revenue'],
['East', 480],
['West', 320],
['North', 560],
['South', 410],
]
for r in rows:
ws.append(r)
ch = BarChart()
ch.type = "col"
ch.style = 10
ch.title = "Revenue by Region"
ch.y_axis.title = 'Revenue'
ch.x_axis.title = 'Region'
vals = Reference(ws, min_col=2, min_row=1, max_row=5)
cats = Reference(ws, min_col=1, min_row=2, max_row=5)
ch.add_data(vals, titles_from_data=True)
ch.set_categories(cats)
ch.shape = 4
ws.add_chart(ch, "E2")
wb.save('output.xlsx')
```
### Chart Type Selection
| Data Pattern | Chart Class | Key Config |
|---|---|---|
| Vertical comparison | `BarChart()` | `type="col"` (vertical) or `type="bar"` (horizontal) |
| Temporal trend | `LineChart()` | `style=10`, optional markers |
| Proportional split | `PieChart()` | No axes needed |
| Cumulative spread | `AreaChart()` | `grouping="standard"` |
### Line Chart Sample
```python
from openpyxl.chart import LineChart, Reference
ch = LineChart()
ch.title = "Trend Analysis"
ch.style = 13
ch.y_axis.title = 'Value'
ch.x_axis.title = 'Month'
vals = Reference(ws, min_col=2, min_row=1, max_row=13, max_col=3)
ch.add_data(vals, titles_from_data=True)
cats = Reference(ws, min_col=1, min_row=2, max_row=13)
ch.set_categories(cats)
ws.add_chart(ch, "E2")
```
### Pie Chart Sample
```python
from openpyxl.chart import PieChart, Reference
pie = PieChart()
pie.title = "Market Share"
vals = Reference(ws, min_col=2, min_row=1, max_row=5)
labels = Reference(ws, min_col=1, min_row=2, max_row=5)
pie.add_data(vals, titles_from_data=True)
pie.set_categories(labels)
ws.add_chart(pie, "E2")
```
</code_samples>
<post_check>
**Post-generation check (non-negotiable):**
```bash
./scripts/MiniMaxXlsx chart output.xlsx -v
```
Exit code 1 means broken charts — they must be fixed. No rationalizations — if chart fails, the chart IS defective regardless of how data was embedded.
</post_check>
<collision_handling>
### Overlap Detection and Resolution
`chart` automatically detects chart collisions on each sheet. When overlaps are reported, reposition charts before delivery.
**Overlap report fields**: `ChartA`, `ChartB`, `SheetName`, `RangeA`, `RangeB`, `OverlapRegion`, `OverlapPercentage`
**Repositioning guidelines:**
- **Vertical stacking** (preferred): Place charts below each other with **2 empty rows** between
- **Side-by-side**: When sheet width allows, place horizontally with **1 empty column** gap
- **Consistent sizing**: Keep charts on the same sheet at uniform dimensions (default: 10 columns wide x 15 rows tall)
- Use position data from `-v` output to calculate non-overlapping anchors
**Overlap fix example:**
```python
# chart reported: chart1 at E2:N17, chart2 at E15:N30 (overlap at E15:N17)
# Fix: stack vertically with 2-row gap
from openpyxl import load_workbook
wb = load_workbook('output.xlsx')
ws = wb['SheetName']
for i, chart in enumerate(ws._charts):
chart.anchor = f'E{2 + i * 17}' # 15 rows height + 2 rows gap
wb.save('output.xlsx')
```
After repositioning, re-run `chart -v` to confirm zero overlaps.
**Theme-appropriate chart colors:**
- Grayscale: `2C2C2C`, `6B6B6B`, `1565C0`, `5B8DB8`
- Financial: `1B3A5C`, `2A6496`, `5B9BD5`, `8FBCD8`
**Chart type decision guide:**
| Data Scenario | Chart | Use Case |
|---|---|---|
| Temporal progression | Line | Time series |
| Category comparison | Column/Bar | Side-by-side metrics |
| Part-of-whole | Pie/Doughnut | Percentages (6 items max) |
| Data spread | Histogram | Distribution shape |
| Variable relationships | Scatter | Correlation analysis |
</collision_handling>

164
minimax-xlsx/pivot.md Normal file
View File

@@ -0,0 +1,164 @@
---
name: pivot
description: "Operational playbook for building PivotTables with the MiniMaxXlsx CLI. Treat this as the source of truth before invoking the pivot subcommand."
---
# Pivot Operations Manual
Use this guide when a workbook needs grouped aggregation, cross-axis summaries, or interactive drilldown.
## 1) Decision Gate
Choose PivotTable mode when one or more conditions are true:
- The request explicitly asks for a pivot table
- The dataset is large enough that formula-only summaries become hard to maintain
- The user needs category-by-category totals, count splits, or two-dimensional breakdowns
- The output must support manual filtering and regrouping inside Excel
Do not force PivotTable mode for trivial one-line totals. Use formulas for simple, static math.
## 2) Input Readiness Contract
Before running any pivot command, confirm:
- Header row exists and every header is unique
- Source block has no merged cells
- No blank row breaks inside the data block
- Aggregation fields are numeric where required
- Workbook formulas already passed structural checks
Recommended preflight sequence:
```bash
./scripts/MiniMaxXlsx refcheck working.xlsx
./scripts/MiniMaxXlsx info working.xlsx --pretty
```
`info` output is authoritative. Never guess sheet names or ranges manually.
## 3) Seven-Checkpoint Flow
Follow this exact flow to avoid broken files:
1. **Assemble base workbook** with openpyxl (cover, raw data, helper sheets)
2. **Save once** and run `refcheck`
3. **Inspect metadata** using `info --pretty`
4. **Draft pivot command** from inspected headers and ranges
5. **Run pivot as final write operation**
6. **Run structural validation** with `check`
7. **Deliver without reopening output in openpyxl**
Why checkpoint 7 matters: a second openpyxl save can repackage XML relationships and invalidate pivot internals.
## 4) Command Surface
### Required arguments
| Argument | Meaning | Example |
|---|---|---|
| `input.xlsx` | Source workbook to read | `working.xlsx` |
| `output.xlsx` | New workbook to generate | `deliverable.xlsx` |
| `--source` | Full source range with sheet prefix | `"RevenueLog!B3:H920"` |
| `--location` | Pivot anchor cell | `"PivotBoard!C4"` |
| `--values` | Metric + reducer list | `"NetAmount:sum,OrderNo:count"` |
### Optional arguments
| Argument | Meaning | Example |
|---|---|---|
| `--rows` | Row grouping fields | `"Region,Channel"` |
| `--cols` | Column grouping fields | `"Quarter"` |
| `--filters` | Page filters | `"Year,Owner"` |
| `--name` | Pivot object name | `"QuarterlyMix"` |
| `--style` | Theme (`monochrome` / `finance`) | `"monochrome"` |
| `--chart` | Companion chart (`bar` / `line` / `pie`) | `"line"` |
Supported reducers: `sum`, `count`, `avg`, `average`, `min`, `max`.
## 5) Parameter Assembly Pattern
Build parameters in this order to reduce mistakes:
1. `--location` (destination first)
2. `--values` (what to aggregate)
3. `--source` (where data comes from)
4. `--rows` / `--cols` / `--filters` (how to slice)
5. `--name` / `--style` / `--chart` (presentation)
This ordering is intentional: start from reporting target, then metric intent, then data origin.
## 6) Fresh Example Set
### Scenario A: Operations latency rollup
```bash
./scripts/MiniMaxXlsx pivot \
ops_raw.xlsx ops_pivot.xlsx \
--location "OpsPivot!B5" \
--values "LatencyMs:avg,RequestId:count" \
--source "ApiEvents!A1:G1800" \
--rows "Service,Cluster" \
--filters "ReleaseTag" \
--name "LatencyOverview" \
--style "monochrome" \
--chart "line"
```
### Scenario B: Clinic visit mix by month
```bash
./scripts/MiniMaxXlsx pivot \
clinic_daily.xlsx clinic_report.xlsx \
--location "VisitSummary!A4" \
--values "VisitFee:sum,VisitId:count" \
--source "VisitLog!A1:F2400" \
--rows "Department" \
--cols "VisitMonth" \
--name "DeptVisitMix" \
--style "finance" \
--chart "bar"
```
### Scenario C: Warehouse damage composition
```bash
./scripts/MiniMaxXlsx pivot \
warehouse_events.xlsx warehouse_dashboard.xlsx \
--location "LossShare!D3" \
--values "LossCost:sum" \
--source "DamageRecords!A1:E460" \
--rows "LossType" \
--filters "Warehouse" \
--name "LossStructure" \
--chart "pie"
```
## 7) Validation and Release Rule
Run:
```bash
./scripts/MiniMaxXlsx check deliverable.xlsx
```
- Exit code `0`: release candidate
- Non-zero: do not patch the xlsx in place; regenerate from corrected source flow
## 8) Failure Playbook
| Symptom | Likely Cause | Action |
|---|---|---|
| Pivot shows no records | Source range clipped | Re-run `info`, expand `--source` to full block |
| "Field not found" | Header mismatch or typo | Copy header text directly from `info` output |
| Validation fails on pivot nodes | Damaged pivot relationships | Rebuild from base workbook, run pivot once as final step |
| CLI execution fails unexpectedly | Workbook locked by another app | Close Excel/WPS process and retry |
## 9) Hard Prohibitions
- Do not manually construct pivot XML
- Do not run pivot before all openpyxl sheet edits are complete
- Do not open and save pivot output with openpyxl
- Do not deliver files that fail `check`
If any prohibition is violated, regenerate the workbook end-to-end.

View File

@@ -0,0 +1,171 @@
#!/usr/bin/env python3
"""
Excel Formula Recalculation Script
Recalculates all formulas in an Excel file using LibreOffice
"""
import json
import sys
import subprocess
import os
import platform
from pathlib import Path
from openpyxl import load_workbook
def setup_libreoffice_macro():
"""Setup LibreOffice macro for recalculation if not already configured"""
if platform.system() == "Darwin":
macro_dir = os.path.expanduser("~/Library/Application Support/LibreOffice/4/user/basic/Standard")
else:
macro_dir = os.path.expanduser("~/.config/libreoffice/4/user/basic/Standard")
macro_file = os.path.join(macro_dir, "Module1.xba")
if os.path.exists(macro_file):
with open(macro_file, "r") as f:
if "RecalculateAndSave" in f.read():
return True
if not os.path.exists(macro_dir):
subprocess.run(["soffice", "--headless", "--terminate_after_init"], capture_output=True, timeout=10)
os.makedirs(macro_dir, exist_ok=True)
macro_content = """<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE script:module PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN" "module.dtd">
<script:module xmlns:script="http://openoffice.org/2000/script" script:name="Module1" script:language="StarBasic">
Sub RecalculateAndSave()
ThisComponent.calculateAll()
ThisComponent.store()
ThisComponent.close(True)
End Sub
</script:module>"""
try:
with open(macro_file, "w") as f:
f.write(macro_content)
return True
except Exception:
return False
def recalc(filename, timeout=30):
"""
Recalculate formulas in Excel file and report any errors
Args:
filename: Path to Excel file
timeout: Maximum time to wait for recalculation (seconds)
Returns:
dict with error locations and counts
"""
if not Path(filename).exists():
return {"error": f"File {filename} does not exist"}
abs_path = str(Path(filename).absolute())
if not setup_libreoffice_macro():
return {"error": "Failed to setup LibreOffice macro"}
cmd = [
"soffice",
"--headless",
"--norestore",
"vnd.sun.star.script:Standard.Module1.RecalculateAndSave?language=Basic&location=application",
abs_path,
]
# Handle timeout command differences between Linux and macOS
if platform.system() != "Windows":
timeout_cmd = "timeout" if platform.system() == "Linux" else None
if platform.system() == "Darwin":
# Check if gtimeout is available on macOS
try:
subprocess.run(["gtimeout", "--version"], capture_output=True, timeout=1, check=False)
timeout_cmd = "gtimeout"
except (FileNotFoundError, subprocess.TimeoutExpired):
pass
if timeout_cmd:
cmd = [timeout_cmd, str(timeout)] + cmd
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0 and result.returncode != 124: # 124 is timeout exit code
error_msg = result.stderr or "Unknown error during recalculation"
if "Module1" in error_msg or "RecalculateAndSave" not in error_msg:
return {"error": "LibreOffice macro not configured properly"}
else:
return {"error": error_msg}
# Check for Excel errors in the recalculated file - scan ALL cells
try:
wb = load_workbook(filename, data_only=True)
excel_errors = ["#VALUE!", "#DIV/0!", "#REF!", "#NAME?", "#NULL!", "#NUM!", "#N/A"]
error_details = {err: [] for err in excel_errors}
total_errors = 0
for sheet_name in wb.sheetnames:
ws = wb[sheet_name]
# Check ALL rows and columns - no limits
for row in ws.iter_rows():
for cell in row:
if cell.value is not None and isinstance(cell.value, str):
for err in excel_errors:
if err in cell.value:
location = f"{sheet_name}!{cell.coordinate}"
error_details[err].append(location)
total_errors += 1
break
wb.close()
# Build result summary
result = {"status": "success" if total_errors == 0 else "errors_found", "total_errors": total_errors, "error_summary": {}}
# Add non-empty error categories
for err_type, locations in error_details.items():
if locations:
result["error_summary"][err_type] = {
"count": len(locations),
"locations": locations[:20], # Show up to 20 locations
}
# Add formula count for context - also check ALL cells
wb_formulas = load_workbook(filename, data_only=False)
formula_count = 0
for sheet_name in wb_formulas.sheetnames:
ws = wb_formulas[sheet_name]
for row in ws.iter_rows():
for cell in row:
if cell.value and isinstance(cell.value, str) and cell.value.startswith("="):
formula_count += 1
wb_formulas.close()
result["total_formulas"] = formula_count
return result
except Exception as e:
return {"error": str(e)}
def main():
if len(sys.argv) < 2:
print("Usage: python recalc.py <excel_file> [timeout_seconds]")
print("\nRecalculates all formulas in an Excel file using LibreOffice")
print("\nReturns JSON with error details:")
print(" - status: 'success' or 'errors_found'")
print(" - total_errors: Total number of Excel errors found")
print(" - total_formulas: Number of formulas in the file")
print(" - error_summary: Breakdown by error type with locations")
print(" - #VALUE!, #DIV/0!, #REF!, #NAME?, #NULL!, #NUM!, #N/A")
sys.exit(1)
filename = sys.argv[1]
timeout = int(sys.argv[2]) if len(sys.argv) > 2 else 30
result = recalc(filename, timeout)
print(json.dumps(result, indent=2))
if __name__ == "__main__":
main()

270
minimax-xlsx/styling.md Normal file
View File

@@ -0,0 +1,270 @@
---
name: styling
description: "Visual styling reference for the minimax-xlsx skill. Contains theme palettes (grayscale/financial/verdant/dusk), conditional formatting recipes, and cover page layout specifications. Read this before writing openpyxl styling code."
---
<neutral_palette>
## Grayscale Theme (Standard Default)
### Color Discipline (Strictly Enforced)
**Foundation tones (only these three):**
- **White (#FEFEFE)** — backgrounds, data regions
- **Black (#1A1A1A)** — body text, primary headers
- **Grey (multiple shades)** — structural elements, borders, secondary labels
**Sole accent: Blue**
- For any emphasis, differentiation, or callout, use **blue** at varying intensity
- No green, red, orange, purple, or other hues (exception: region-specific financial indicators)
### Absolute Restrictions
- Avoid extra hue families (green/red/orange/purple/yellow/pink) unless a market-specific finance convention explicitly requires them
- No rainbow or multi-hue schemes
- No saturated/vibrant tones except blue accents
- No gradients crossing multiple color families
### Implementation Palette
```python
from openpyxl.styles import PatternFill, Font, Border, Side, Alignment
# Foundation tones
tone_bg = "FEFEFE"
tone_subtle = "F2F3F4"
tone_stripe = "F6F7F8"
tone_primary = "1A1A1A"
tone_header = "2C2C2C"
tone_text = "1A1A1A"
tone_rule = "CBCBCB"
# Blue accent spectrum
accent_deep = "1565C0"
accent_mid = "5B8DB8"
accent_wash = "E3EDF7"
ws.sheet_view.showGridLines = False
hdr_fill = PatternFill(start_color=tone_header, end_color=tone_header, fill_type="solid")
hdr_font = Font(color="FEFEFE", bold=True)
for cell in ws['B2:F2'][0]:
cell.fill = hdr_fill
cell.font = hdr_font
```
</neutral_palette>
<fiscal_palette>
## Financial Theme (Monetary/Fiscal Tasks Only)
Activate this palette when the task involves: equities, GDP, compensation, revenue, margins, budgeting, ROI, government finance, or similar fiscal domains.
### Regional Price-Movement Colors (non-negotiable)
In mainland China markets, rising prices are conventionally shown in **red** and falling prices in **green**. For all other markets this convention is reversed: **green** for gains, **red** for losses.
### Implementation Palette
```python
from openpyxl.styles import PatternFill, Font, Border, Side, Alignment
fin_bg = "E8EEF2"
fin_text = "1A1A1A"
fin_accent = "FFF8E1"
fin_header = "1B3A5C"
fin_loss = "E53935"
ws.sheet_view.showGridLines = False
fh_fill = PatternFill(start_color=fin_header, end_color=fin_header, fill_type="solid")
fh_font = Font(color="FEFEFE", bold=True)
fh_mark = PatternFill(start_color=fin_accent, end_color=fin_accent, fill_type="solid")
for cell in ws['B2:F2'][0]:
cell.fill = fh_fill
cell.font = fh_font
```
</fiscal_palette>
<verdant_palette>
## Verdant Theme (Ecology / Education / Humanities)
Activate this palette when the task involves: environmental analysis, education metrics, agriculture, healthcare, sustainability reporting, life sciences, or general research that benefits from a warm organic tone.
### Color Discipline
**Foundation tones:**
- **Mist white (#F0F5F1)** — backgrounds, data regions
- **Forest dark (#1A2E22)** — body text, primary headers
- **Sage grey (multiple shades)** — structural elements, borders, secondary labels
**Sole accent: Gold**
- For emphasis, differentiation, or callouts, use **warm gold** at varying intensity
- No blue, red, purple, or other hues
### Implementation Palette
```python
from openpyxl.styles import PatternFill, Font, Border, Side, Alignment
# Foundation tones
vrd_bg = "F0F5F1"
vrd_subtle = "E8F0EA"
vrd_stripe = "EDF2EE"
vrd_primary = "1A2E22"
vrd_header = "1B4332"
vrd_text = "1A2E22"
vrd_rule = "B5C7B9"
# Gold accent spectrum
vrd_accent_deep = "9E7C20"
vrd_accent_mid = "C9A84C"
vrd_accent_wash = "F5F0DC"
ws.sheet_view.showGridLines = False
vh_fill = PatternFill(start_color=vrd_header, end_color=vrd_header, fill_type="solid")
vh_font = Font(color="F0F5F1", bold=True)
vh_mark = PatternFill(start_color=vrd_accent_wash, end_color=vrd_accent_wash, fill_type="solid")
for cell in ws['B2:F2'][0]:
cell.fill = vh_fill
cell.font = vh_font
```
</verdant_palette>
<dusk_palette>
## Dusk Theme (Technology / Creative / Scientific)
Activate this palette when the task involves: technology metrics, product analytics, engineering reports, creative industry analysis, scientific data, or presentation-grade deliverables that need a modern aesthetic.
### Color Discipline
**Foundation tones:**
- **Soft lavender (#F7F3FA)** — backgrounds, data regions
- **Dark grape (#221429)** — body text, primary headers
- **Iris grey (multiple shades)** — structural elements, borders, secondary labels
**Sole accent: Copper**
- For emphasis, differentiation, or callouts, use **warm copper** at varying intensity
- No blue, green, or other hues
### Implementation Palette
```python
from openpyxl.styles import PatternFill, Font, Border, Side, Alignment
# Foundation tones
dsk_bg = "F7F3FA"
dsk_subtle = "F0ECF5"
dsk_stripe = "F3F0F7"
dsk_primary = "221429"
dsk_header = "3C1742"
dsk_text = "221429"
dsk_rule = "C4B8CE"
# Copper accent spectrum
dsk_accent_deep = "A0522D"
dsk_accent_mid = "C4724A"
dsk_accent_wash = "FAF0EB"
ws.sheet_view.showGridLines = False
dh_fill = PatternFill(start_color=dsk_header, end_color=dsk_header, fill_type="solid")
dh_font = Font(color="F7F3FA", bold=True)
dh_mark = PatternFill(start_color=dsk_accent_wash, end_color=dsk_accent_wash, fill_type="solid")
for cell in ws['B2:F2'][0]:
cell.fill = dh_fill
cell.font = dh_font
```
</dusk_palette>
<conditional_rules>
## Conditional Formatting — Apply Proactively
Apply conditional formatting deliberately to improve scanability and analytical readability.
| Content Type | Technique | Sample Code |
|---|---|---|
| Raw numbers | **Data Bars** | `DataBarRule(start_type='min', end_type='max', color='5B8DB8', showValue=True)` |
| Spread/range | **Color Scales** | `ColorScaleRule(start_type='min', start_color='FEFEFE', end_type='max', end_color='5B8DB8')` |
| Status indicators | **Icon Sets** | `IconSetRule(icon_style='3Arrows', type='percent', values=[0,25,75])` |
| Boundary triggers | **Cell Highlights** | `CellIsRule(operator='greaterThan', formula=['50000'], fill=accent_fill)` |
| Top performers | **Rank-based** | `FormulaRule(formula=['RANK(A2,$A$2:$A$100)<=10'], fill=gold_fill)` |
**Available icon styles**: `3Arrows` (directional), `3TrafficLights1` (circle indicators), `3Symbols` (check/dash/cross), `5Rating` (star)
**Theme-specific palettes:**
- Grayscale: Data bars `5B8DB8`, Scale `F2F3F4->ABABAB->2C2C2C`
- Financial: Positive `81C784`, Negative `E57373`, Neutral `FFD54F`
- Verdant: Data bars `C9A84C`, Scale `F0F5F1->8BAF7E->1B4332`
- Dusk: Data bars `C4724A`, Scale `F7F3FA->9E7CAD->3C1742`
```python
from openpyxl.formatting.rule import DataBarRule, ColorScaleRule, IconSetRule, CellIsRule
# Horizontal bars
ws.conditional_formatting.add('D3:D200', DataBarRule(start_type='min', end_type='max', color='5B8DB8', showValue=True))
# Tri-color gradient
ws.conditional_formatting.add('E3:E200', ColorScaleRule(start_type='min', start_color='E57373', mid_type='percentile', mid_value=50, mid_color='FFD54F', end_type='max', end_color='81C784'))
# Directional arrows
ws.conditional_formatting.add('F3:F200', IconSetRule(icon_style='3Arrows', type='percent', values=[0, 25, 75], showValue=True))
```
**Usage tips**: Apply to 2-4 key columns per sheet; maintain consistent color semantics; layer Data Bars + Icons for maximum impact.
</conditional_rules>
<cover_layout>
**A cover sheet is mandatory as the very first worksheet in every deliverable.**
## Layout Specification
| Rows | Purpose | Formatting |
|------|---------|------------|
| 3-4 | **Document title** | 18-20pt, bold, center-aligned |
| 6 | Tagline or scope description | 12pt, grey text |
| 8-16 | **Headline metrics** | Tabular layout with key figures highlighted |
| 18-21 | **Worksheet directory** | Sheet names mapped to brief descriptions |
| 23+ | Disclaimers, usage notes | Small font, grey |
## Required Elements
**1. Document title** — clear, descriptive name for the workbook
**2. Headline metrics** — 3-6 most significant numbers or findings
**3. Worksheet directory** — navigation aid:
```
| Sheet Name | Description |
|------------|-------------|
| Raw Data | Original dataset (100 rows) |
| Analysis | Sales breakdown by region |
| Pivot Summary | Interactive pivot analysis |
```
**4. PivotTable notice** (required when the workbook includes PivotTables):
```
After opening, update the PivotTable cache:
* On Windows: select any cell inside the PivotTable, press Alt+F5
* On macOS: go to the PivotTable Analyze ribbon, click Refresh All
* Shortcut for both platforms: Ctrl+Alt+F5
```
## Cover Page Visual Standards
- **Background**: White or light grey (#F2F3F4)
- **Title row height**: 30-40pt for prominence
- **No gridlines**: Suppress gridlines on cover for a clean presentation
- **Column span**: Merge cells A-G for the title block
- **Color scheme**: Match the workbook's chosen theme (grayscale or financial)
## Gridline Note
Always keep the cover sheet gridlines hidden
</cover_layout>