Pi for Excel

Architecture

A codebase snapshot at v0.9.0-pre — covering tools and extensions, interface, model pipeline, session lifecycle, safety, and testing.

Generated by Pi using nicobailon/visual-explainer · Feb 2026 · a089f86

Overview

Pi for Excel is an AI agent that lives inside Excel's taskpane. It reads, writes, formats, and analyzes spreadsheets through natural conversation — backed by any major LLM provider. 24 built-in tools, a sandboxed extension runtime, and bridges to Python, tmux, and MCP servers make it a general-purpose automation surface for spreadsheets.

Agent
5 LLM Providers OAuth · stream proxy · prefix caching
Excel Office.js · WorkbookCoordinator
Extensions Sandboxed · 18 capabilities
Bridges Python · tmux · MCP · web
User Sidebar taskpane · 350px
Capable
24 tools in 5 categories — from cell reads to Python scripts, terminal sessions, and MCP servers. Plus 28+ slash commands.
Extensible
18-capability extension runtime with trust tiers. Self-authoring loop — Pi can build and install extensions from chat.
Safe
Every mutation snapshotted. 6 defense boundaries — SSRF, markdown, bridge, sandbox, HTML, secret redaction. Undo the undo.

This page covers what Pi can do (Capabilities, Interface), then how it works under the hood (Model Layer, Lifecycle, Safety & Recovery, Testing).

Capabilities

Default Tools

24 tools in 5 categories — from cell reads to terminal sessions and MCP servers.

Read & Inspect 5 tools
get_workbook_overviewBlueprint: sheets, tables, named ranges, objects + per-sheet detail mode
read_range3 modes: compact / csv / detailed. Comments in detailed, format humanization
search_workbookText / formula / regex search, pagination, context rows
trace_dependenciesPrecedent / dependent tracing, depth 5, 50K cell scan
explain_formulaPlain-language explanation, load direct references
Write & Transform 3 tools
write_cellsOverwrite protection, formula validation, auto-verify, recovery snapshot
fill_formulaAutoFill single formula across range, error detection
modify_structure10 actions: insert / delete rows / cols / sheets, rename, move
Format & Annotate 4 tools
format_cellsNamed styles, conventions, multi-range, borders, merge
conditional_format8 rule types: cell_value, formula, text, top_bottom, preset, data_bar, color_scale, icon_set
view_settings15 actions: gridlines, freeze, tab color, hide / show sheets
comments7 actions: read / add / update / reply / delete / resolve / reopen
Session & Config 4 tools
workbook_historyList / restore / delete recovery snapshots
instructionsUser / workbook scope rules management
skillsList / read skills, session-scoped read cache
conventions6 built-in formatting presets, custom presets
Bridges & Integrations 8 tools
execute_office_jsDirect Office.js eval via blob URL module — Excel.run() banned (context auto-provided). 20K code, 8K result. Requires user approval.
python_runExecute Python scripts. Native bridge preferred → Pyodide fallback.
python_transform_rangeRead range → Python transform → write back. Same bridge stack.
libreoffice_convertFile format conversion. Bridge-only (no Pyodide fallback).
tmuxFull terminal sessions via local bridge (port 3341). 6 actions with dynamic timeout — run shell commands, interactive REPLs, background processes.
web_search5 providers (Jina zero-config fallback, Serper, Tavily, Brave, Firecrawl). Domain rate limiting.
fetch_pageDOMParser → markdown. 12K default / 50K max result.
mcpFull JSON-RPC 2.0 gateway: initialize → tools/list → tools/call. Multi-server discovery, tool caching, proxy-routed.
Tool Wrapping Pipeline

Every tool passes through 3 wrapping layers before reaching the agent. Order matters — outermost wrapper executes first.

Layer 3: Output Truncation
2K lines / 50KB. Head strategy (data) or tail strategy (logs). Full output saved to workspace files.
outermost
Layer 2: Connection Preflight
Pre-check connection status. Post-catch auth failures (401/403). Secret redaction in errors. Fuzzy connection matching.
Layer 1: Workbook Coordinator
read → runRead, mutate → runWrite (FIFO queue). Execution mode gating (yolo/safe). Mutation observer dispatch.
innermost

Full Pipeline

createAllTools()applyExperimentalToolGates()createToolsForIntegrations()extensionManager.getRegisteredTools()normalizeRuntimeTools()withWorkbookCoordinator()withConnectionPreflight()applyToolOutputTruncation()
  • Fingerprint comparison (FNV-1a hash of tool schemas) decides whether to call agent.setTools()
  • Extension tool revision counter (monotonic) also triggers refresh when extensions add/remove tools
  • Static tool ordering preserved for prompt cache stability

Extensions

Three extension surfaces — Connections, Plugins, and Skills — let users and the agent expand Pi's capabilities at runtime. Each has a dedicated tab in /extensions.

Skills skills

Markdown documents (SKILL.md) that inject task-specific workflows into the conversation on demand. The agent calls skills → read when a task matches; the full document becomes part of the context.

4 bundled workspace-discovered toggle per-skill
Bundled
python-bridge · tmux-bridge · web-search · mcp-gateway — loaded via Vite import.meta.glob
External
skills/external/<name>/SKILL.md — managed installs via the skills tool or workspace discovery
Prompt
Catalog listed in system prompt under ## Available Agent Skills; body injected on read. Session-scoped read cache avoids duplicates.
Activation
Per-skill enable/disable in /extensions → Skills. Stored in SettingsStore (skills.activation.v1).
Connections

Credential requirements declared by plugins or built-in integrations. Each connection stores secrets, surfaces auth state, and gates tool access — if a tool's connection isn't configured, the tool is withheld from the model. Managed in /extensions → Connections.

External tools
Master toggle for web search, fetch, and other network-dependent tools
Web search
Jina (default, no key) · Serper · Tavily · Brave Search — provider-specific API key fields
Plugin connections
Owner-scoped ({ownerId}.{connId}). Auto-rendered setup UI from plugin declarations.
Plugins extensions_manager

Runtime code modules that register tools, commands, sidebar widgets, overlays, and connections. Users install them from /extensions; the agent can also create and install them from chat.

Pi can extend itself — design, generate, install, reload, and iterate on plugins without leaving the conversation. The extensions_manager tool handles the full lifecycle: list → install → enable → reload → uninstall.

What Plugins Register

Tools
Agent-callable tools with TypeBox / JSON schema params. Name-conflict guard against core tools.
Commands
Slash commands with busyAllowed control. Appear in command menu.
Widgets
Sidebar panels via Widget API v2: upsert / remove / clear. Placement, ordering, collapsible, size bounds.
Connections
Declare credential requirements, store secrets, surface auth state. Auto-rendered in /tools overlay.
Overlays
Full-screen modal via overlay.show(el). Single overlay per plugin at a time.

18 Capability Gates

Every API call passes through assertCapability(). Sandbox iframe bridges all 18 surfaces via postMessage with CSP + TypeBox schema reconstruction.

Agent
agent.read agent.events.read agent.context.write agent.steer agent.followup
UI & Output
ui.overlay ui.widget ui.toast clipboard.write download.file
Integration
tools.register commands.register llm.complete http.fetch storage.readwrite connections.readwrite skills.read skills.write

Trust Tiers & Runtime

Host runtime
builtin local module
Full access — minus steer / follow-up / context-write / skills-write
Sandbox iframe
inline code remote URL
Restricted — no tools / agent / llm / http / connections by default. Capability toggles in /extensions.
Lifecycle & Activation Bridge
Load
v2 doc from SettingsStore (v1 auto-migrated). Source: module specifier or inline code blob URL.
Activate
Resolve runtime (host / sandbox). Build activation bridge. Register tools & commands with conflict check.
Deactivate
Reverse order: handle → widgets → events → commands → tools → connections → blob URLs.

LLM bridge
Active agent's model + API key. Per-plugin side session ID for isolated cache telemetry.
HTTP bridge
URL validation + blocked hostnames (loopback/private) + proxy routing + 256 KB limit.
Storage bridge
Per-plugin key-value in SettingsStore, 1 MB limit.
Connections bridge
Owner-scoped ({ownerId}.{connId}). Auto-rendered setup UI in /tools overlay.
Interface

User Interface

PiSidebar (1,221 lines · LitElement · 350px) — purpose-built for Excel's narrow taskpane. Replaces pi-web-ui's ChatPanel + AgentInterface.

Input & Auto-scroll
Auto-grow textarea, send/abort, file drop. Scroll hysteresis: disengage at 32px, re-engage at 20px.
11 Overlay Types
Rules, settings, recovery, extensions hub, files, shortcuts… Single-instance, Escape close, focus restore.
Status Bar
Context token %, thinking level flash, execution mode badge
28+ Slash Commands
model, settings, compact, export, session, help, extensions, tools, skills, files, experimental, debug…

Keyboard Shortcuts

F2 Focus input
Esc Blur editor / abort stream
⌘/Ctrl+T New tab
⌘/Ctrl+W Close tab
⌘/Ctrl+Shift+T Reopen closed tab
⌘/Ctrl+Z Undo close tab
Enter Send / steer (while streaming)
Alt+Enter Queue follow-up (while streaming)
Alt+↑ Restore queued messages to editor
Shift+Tab Cycle thinking level
⌘/Ctrl+O Toggle details visibility
/ Command menu

Agent Interface

What the model sees — Pi's awareness is layered. Some context is always present, some is injected fresh each turn, and some is fetched on demand via tools.

Always in prompt
Identity & persona 24 tool schemas Workflow rules Conventions & styles Execution mode User instructions Workbook instructions Bridge status Skill catalog
Static prefix — cached by provider. Changes invalidate the entire cache.
Injected each turn
Workbook blueprint Selection ±5 rows Recent cell changes Workspace file summary
Auto-context — spliced before the user message. Blueprint only re-sent on structure change or workbook switch.
On demand via tools
Read any range Search cells Trace formulas Read workspace files Read skill docs Web search & fetch Run Python Terminal sessions MCP servers
Agent decides when to call tools. Results enter conversation history and become part of the cached prefix on subsequent turns.
Persists across sessions
Workspace files User & workbook rules Convention presets Recovery snapshots Session history
Survives compaction and restarts. notes/index.md is the memory entry point for new sessions.

Rules & Conventions

Two persistence layers that shape the agent's behavior. Rules are free-text guidance (what to do); conventions are structured formatting defaults (how to format). Both survive across sessions.

Rules instructions
Injected into system prompt. Works like AGENTS.mdappend or replace, ask on conflict.
User scope 2 K
"All my files" — private to this machine. Auto-updated on preferences.
Workbook scope 4 K
"This file" — keyed by workbook identity. Explicit confirmation required.
Conventions conventions
Overrides surfaced in system prompt; applied by tools at execution time.
Number presets 6 Named styles 11 Visual defaults Color coding Custom presets
Model Layer

LLM Pipeline

From browser to model endpoint — authentication, proxy routing, and stream normalization.

5 Browser OAuth Providers

ProviderFlowRouting
Anthropic PKCE Proxy for OAuth tokens (sk-ant-oat-*); API keys direct
OpenAI Codex PKCE + JWT Always proxy-routed; JWT decode for ChatGPT account ID
Google Gemini CLI Code Assist Tiered provisioning (free/legacy/standard), LRO polling, VPC SC handling
Google Antigravity API key 2 endpoints (prod + sandbox), default fallback project. JSON {token, projectId}
GitHub Copilot Device code Token refresh via GitHub API
Stream Proxy (createOfficeStreamFn)
Intercept
Every LLM call — routing, model normalization, tool bundle selection
Proxy routing
Anthropic OAuth, OpenAI Codex, Google Code Assist, Z-AI, custom gateways
Normalization
Google preview models → stable fallback, Antigravity → Code Assist base URL
Payload stats
24-entry ring buffer + 24-session LRU context cache
Churn tracking
FNV-1a fingerprint of model + systemPrompt + tools hashes per session
CORS Proxy (fetch interceptor)
Dev
Vite reverse proxies — 11 rewrite rules for Anthropic / OpenAI / Google / GitHub endpoints
Production
User-configured CORS proxy (localhost:3003 default). Conservative endpoint matching
Hygiene
Strips anthropic-dangerous-direct-browser-access header when proxied
Cache
3s settings cache for performance

Prompt Caching

LLM prompt caching is prefix-based — providers cache the longest matching token prefix and reuse it on subsequent calls. Pi keeps the prompt structured so the prefix stays stable and the cache extends as far as possible each turn.

Prefix anchor FNV-1a fingerprinted per call
model identity hash
system prompt hash — identity · tool docs · workflow policy · conventions
tools schema hash — full bundle in fixed order, never sub-setted
If any hash changes → entire cache invalidated (all history recomputed). PrefixChangeReason recorded & counter incremented.

Previous turns incrementally cached
Per turn Auto-context injection · user message · assistant response (thinking + text) · tool calls · tool results. Grows each turn — provider extends the cached prefix automatically.
If history is rewritten (e.g. compaction), cache breaks from the first changed token onward.

Current turn cached for next turn
Blueprint buildOverview() cached per workbook, monotonic revision. Re-injected on structural changes.
Workspace files Summary of files in OPFS / native / memory — data, docs, artifacts.
Selection Auto-read ±5 rows around active cell. Formulas highlighted, errors flagged.
Changes onChanged events → dedup by cell → truncate at 50 → flush on send.

User message Current prompt — spliced after auto-context injection.

Stability invariants

Static system prompt Built from fixed sections (identity, tool docs, policy). Structure only changes on explicit user actions, not per-turn.
Deterministic tool ordering selectToolBundle() returns full list in fixed order — no intent-based sub-setting. Extension revision tracking: hot-reloads skip setTools() when schema unchanged. src/context/tool-disclosure.ts
Volatile state in message tail only Auto-context (selection, changes, blueprint) injected as a user message after the frozen prefix — never by mutating the system prompt.
Runtime tool fingerprinting Refresh passes rebuild tool objects but only call agent.setTools() when the metadata fingerprint actually differs. Schema-stable handler swaps are silent no-ops. src/taskpane/runtime-utils.ts
Compaction Strategy
Quality caps
88% of context for ≥128K models, 85% for ≥200K
Auto-compact
Before each user prompt if projected tokens exceed hard threshold (requires ≥4 messages)
Soft warning
At hard threshold − 5% (min 2K tokens), floors at 70%
Memory nudge
Regex-detects "remember this" → extracts up to 3 snippets (180 chars) → focus instruction in compaction summary
Result shaping
6 most recent tool results intact, older >1200 chars → 500-char preview
Known prefix change triggers
Repeated turns
No churn (cache hit)
/model switch
["model"]
Rules / exec mode
["systemPrompt"]
Skill toggle
["systemPrompt"]
Integration toggle
["systemPrompt", "tools"]
Extension add/remove
includes "tools"
Extension hot-reload
No churn (same schema)
Extension side call
Isolated session key — no main-session churn

Baseline matrix documented in docs/cache-observability-baselines.md. PRs that change context shape must include a cache observability check.

Lifecycle

Session Runtime

Each tab = one SessionRuntime with its own Agent, ActionQueue, QueueDisplay, and SessionPersistenceController. Multi-tab layout persisted per workbook.

Tabs
Create, close, rename, reorder, duplicate, restore recently closed (stack of 10)
Lock state
idlewaiting_for_lockholding_lock — prevents concurrent writes
Association
Session ↔ workbook is write-once (no accidental move on resume)
Restore
Partitions sessions: matching / unlinked / foreign
Queue
FIFO for prompts + commands. Guards against /compact race (agent.streamFn() outside Agent loop). Auto-compaction before each prompt.

Boot Sequence

bootstrap.tsinitTaskpane() — 7 phases with timeouts and fallbacks for non-Excel environments (dev mode).

1
Global Patches
  • Render loading UI
  • process.env shim, fetch interceptor (CORS proxy), model-selector patch
  • Office.onReady() 3 s fallback for dev without Excel
  • Call initTaskpane() 60 s hard timeout
2
Storage & Migrations
  • SettingsStore init + proxy default seed
  • Legacy migrations: web-search API keys → ConnectionStore, MCP tokens → ConnectionStore
  • Remote proxy security warning
3
Auth & Credentials
  • Provider discovery (5 built-in + custom gateways)
  • Credential restore: pi auth.json (dev) or IndexedDB OAuth (prod) 6 s timeout
  • Auto-refresh expired tokens
  • Show welcome login overlay if no providers
4
Core Infrastructure
  • ChangeTracker.start() — cell change monitoring
  • createOfficeStreamFn() — LLM call interceptor
  • createWorkbookCoordinator() — FIFO write queue
5
UI Mount & Managers
  • PiSidebar mount + execution mode controller
  • ConnectionManager + ExtensionRuntimeManager (reserved tool names from core + integrations)
6
Runtime Factory
  • Bridge health probe (async — first turn waits for result)
  • Runtime factory wiring
  • Tab layout restore from SettingsStore (or create first runtime)
7
Extensions & UI Polish
  • Extension init 5 s timeout non-blocking
  • Keyboard shortcuts, status bar, command menu
  • Proxy polling 30 s interval
  • Disclosure bar + proxy banner
Safety & Recovery

6 defense boundaries — each enforced independently, no single-point-of-failure.

SSRF Protection
Proxy target policy: hostname check + DNS-resolved IP check. Blocks loopback (127.0.0.0/8, ::1), RFC1918, link-local. IPv4-mapped-IPv6 handling.
Markdown Safety
Global marked patch: block javascript: / data: / file: links. No <img> from markdown (exfiltration risk → clickable link). Disable $...$ KaTeX (currency collision).
Bridge Security
CORS origin allowlist, Bearer token auth (timing-safe compare), loopback-only binding, 512KB body limit, 256KB output limit, process timeout with SIGKILL.
Extension Sandbox
Inline/remote code runs in sandboxed iframe with CSP. postMessage RPC for all 18 API surfaces. Allowlisted UI tag set. URL validation for HTTP requests.
HTML Safety
No innerHTML for user/tool/session content — DOM APIs or escapeHtml() / escapeAttr(). Queue display explicitly avoids innerHTML.
Secret Redaction
Connection secrets never exposed to UI (presence flags only). Error messages auto-redacted: stored values → ••••. OAuth tokens in IndexedDB.

Mutation Finalization

Every mutation tool calls finalizeMutationOperation():

1
Append audit entry
WorkbookChangeAuditLog: persistent, 500-entry rotating, tagged with execution mode + workbook identity
2
Recovery snapshot (optional)
Deep-clone state → stamp result.details.recovery → dispatch created event or append unavailable note
3
Change explanation
Deterministic from audit metadata (no LLM call), bounded 420 chars, up to 8 citations

5 Snapshot Kinds

Range
Values + formulas grid capture
Format
20 properties via boolean mask
Structure
Rows, columns, sheets + data
Cond. Format
8 rule types, 20 icon styles
Comment
Full thread + replies

Restore creates an inverse snapshot before applying — enables "undo the undo". save-boundary-monitor polls Workbook.isDirty every 4s, clears checkpoints on user save.

Manual Full Backup

Office.getFileAsync("compressed") → 1MB chunks → base64 → workspace file. Stored under manual-backups/full-workbook/v1/.

Testing Strategy
100
Test files
3
Suites
node:test
Runner
DI
Mock strategy
SuiteFilesCoverage
test:models Fast Provider priority, family priority, parseMajorMinor
test:context ~80 files Tools, context injection, compaction, change tracker, session persistence, blueprint, recovery
test:security 9 files SSRF proxy, CORS server, tmux/python bridges, extension source policy, marked safety, OAuth
Build & Config
Vite
HTTPS dev server, pi-auth plugin, stub plugins for heavy Node deps, 11 proxy entries
TypeScript
ES2022 target, strict, bundler moduleResolution, useDefineForClassFields: false for Lit
ESLint
typescript-eslint recommendedTypeChecked, ban ts-ignore, error on floating/misused promises
Manifest
Office TaskPaneApp, ReadWriteDocument permission, Home ribbon button. Dev = localhost:3000, prod = Vercel
Pre-commit
npm run lint + npm run typecheck
CI checks
5 custom scripts — inline style hygiene, dead CSS vars, landing page copy, pi dep lockstep, theme utility overrides

Credits

Pi
by Mario Zechner — the agent framework powering this project. Pi for Excel uses pi-agent-core, pi-ai, and pi-web-ui for the agent loop, LLM abstraction, and session storage.
visual-explainer
by Nico Bailon — the Pi extension used to generate this architecture page.
whimsical.ts
by Armin Ronacher — the rotating "Working…" messages are adapted from his Pi extension, rewritten for a spreadsheet audience.