Architecture — Pi for Excel

Overview

Pi for Excel is an AI agent that lives inside Excel's taskpane. It reads, writes, formats, and analyzes spreadsheets through natural conversation — backed by any major LLM provider. 24 built-in tools, a sandboxed extension runtime, and bridges to Python, tmux, and MCP servers make it a general-purpose automation surface for spreadsheets.

Agent

5 LLM Providers OAuth · stream proxy · prefix caching

Excel Office.js · WorkbookCoordinator

Extensions Sandboxed · 18 capabilities

Bridges Python · tmux · MCP · web

User Sidebar taskpane · 350px

Capable

24 tools in 5 categories — from cell reads to Python scripts, terminal sessions, and MCP servers. Plus 28+ slash commands.

Extensible

18-capability extension runtime with trust tiers. Self-authoring loop — Pi can build and install extensions from chat.

Safe

Every mutation snapshotted. 6 defense boundaries — SSRF, markdown, bridge, sandbox, HTML, secret redaction. Undo the undo.

This page covers what Pi can do (Capabilities, Interface), then how it works under the hood (Model Layer, Lifecycle, Safety & Recovery, Testing).

Capabilities

Default Tools

24 tools in 5 categories — from cell reads to terminal sessions and MCP servers.

Read & Inspect 5 tools

get_workbook_overviewBlueprint: sheets, tables, named ranges, objects + per-sheet detail mode

read_range3 modes: compact / csv / detailed. Comments in detailed, format humanization

search_workbookText / formula / regex search, pagination, context rows

trace_dependenciesPrecedent / dependent tracing, depth 5, 50K cell scan

explain_formulaPlain-language explanation, load direct references

Write & Transform 3 tools

write_cellsOverwrite protection, formula validation, auto-verify, recovery snapshot

fill_formulaAutoFill single formula across range, error detection

modify_structure10 actions: insert / delete rows / cols / sheets, rename, move

Format & Annotate 4 tools

format_cellsNamed styles, conventions, multi-range, borders, merge

conditional_format8 rule types: cell_value, formula, text, top_bottom, preset, data_bar, color_scale, icon_set

view_settings15 actions: gridlines, freeze, tab color, hide / show sheets

comments7 actions: read / add / update / reply / delete / resolve / reopen

Session & Config 4 tools

workbook_historyList / restore / delete recovery snapshots

instructionsUser / workbook scope rules management

skillsList / read skills, session-scoped read cache

conventions6 built-in formatting presets, custom presets

Bridges & Integrations 8 tools

execute_office_jsDirect Office.js eval via blob URL module — Excel.run() banned (context auto-provided). 20K code, 8K result. Requires user approval.

python_runExecute Python scripts. Native bridge preferred → Pyodide fallback.

python_transform_rangeRead range → Python transform → write back. Same bridge stack.

libreoffice_convertFile format conversion. Bridge-only (no Pyodide fallback).

tmuxFull terminal sessions via local bridge (port 3341). 6 actions with dynamic timeout — run shell commands, interactive REPLs, background processes.

web_search5 providers (Jina zero-config fallback, Serper, Tavily, Brave, Firecrawl). Domain rate limiting.

fetch_pageDOMParser → markdown. 12K default / 50K max result.

mcpFull JSON-RPC 2.0 gateway: initialize → tools/list → tools/call. Multi-server discovery, tool caching, proxy-routed.

Tool Wrapping Pipeline

Every tool passes through 3 wrapping layers before reaching the agent. Order matters — outermost wrapper executes first.

Layer 3: Output Truncation

2K lines / 50KB. Head strategy (data) or tail strategy (logs). Full output saved to workspace files.

outermost

Layer 2: Connection Preflight

Pre-check connection status. Post-catch auth failures (401/403). Secret redaction in errors. Fuzzy connection matching.

Layer 1: Workbook Coordinator

read → runRead, mutate → runWrite (FIFO queue). Execution mode gating (yolo/safe). Mutation observer dispatch.

innermost

Full Pipeline

createAllTools() → applyExperimentalToolGates() → createToolsForIntegrations() → extensionManager.getRegisteredTools() → normalizeRuntimeTools() → withWorkbookCoordinator() → withConnectionPreflight() → applyToolOutputTruncation()

Fingerprint comparison (FNV-1a hash of tool schemas) decides whether to assign agent.state.tools = ...
Extension tool revision counter (monotonic) also triggers refresh when extensions add/remove tools
Static tool ordering preserved for prompt cache stability

Extensions

Three extension surfaces — Connections, Plugins, and Skills — let users and the agent expand Pi's capabilities at runtime. Each has a dedicated tab in /extensions.

Skills skills

Markdown documents (SKILL.md) that inject task-specific workflows into the conversation on demand. The agent calls skills → read when a task matches; the full document becomes part of the context.

4 bundled workspace-discovered toggle per-skill

Bundled

python-bridge · tmux-bridge · web-search · mcp-gateway — loaded via Vite import.meta.glob

External

skills/external/<name>/SKILL.md — managed installs via the skills tool or workspace discovery

Prompt

Catalog listed in system prompt under ## Available Agent Skills; body injected on read. Session-scoped read cache avoids duplicates.

Activation

Per-skill enable/disable in /extensions → Skills. Stored in SettingsStore (skills.activation.v1).

Connections

Credential requirements declared by plugins or built-in integrations. Each connection stores secrets, surfaces auth state, and gates tool access — if a tool's connection isn't configured, the tool is withheld from the model. Managed in /extensions → Connections.

External tools

Master toggle for web search, fetch, and other network-dependent tools

Web search

Jina (default, no key) · Serper · Tavily · Brave Search — provider-specific API key fields

Plugin connections

Owner-scoped ({ownerId}.{connId}). Auto-rendered setup UI from plugin declarations.

Plugins extensions_manager

Runtime code modules that register tools, commands, sidebar widgets, overlays, and connections. Users install them from /extensions; the agent can also create and install them from chat.

Pi can extend itself — design, generate, install, reload, and iterate on plugins without leaving the conversation. The extensions_manager tool handles the full lifecycle: list → install → enable → reload → uninstall.

What Plugins Register

Tools

Agent-callable tools with TypeBox / JSON schema params. Name-conflict guard against core tools.

Commands

Slash commands with busyAllowed control. Appear in command menu.

Widgets

Sidebar panels via Widget API v2: upsert / remove / clear. Placement, ordering, collapsible, size bounds.

Connections

Declare credential requirements, store secrets, surface auth state. Auto-rendered in /tools overlay.

Overlays

Full-screen modal via overlay.show(el). Single overlay per plugin at a time.

18 Capability Gates

Every API call passes through assertCapability(). Sandbox iframe bridges all 18 surfaces via postMessage with CSP + TypeBox schema reconstruction.

Agent

agent.read agent.events.read agent.context.write agent.steer agent.followup

UI & Output

ui.overlay ui.widget ui.toast clipboard.write download.file

Integration

tools.register commands.register llm.complete http.fetch storage.readwrite connections.readwrite skills.read skills.write

Trust Tiers & Runtime

Host runtime

builtin local module

Full access — minus steer / follow-up / context-write / skills-write

Sandbox iframe

inline code remote URL

Restricted — no tools / agent / llm / http / connections by default. Capability toggles in /extensions.

Lifecycle & Activation Bridge

Load

v2 doc from SettingsStore (v1 auto-migrated). Source: module specifier or inline code blob URL.

Activate

Resolve runtime (host / sandbox). Build activation bridge. Register tools & commands with conflict check.

Deactivate

Reverse order: handle → widgets → events → commands → tools → connections → blob URLs.

LLM bridge

Active agent's model + API key. Per-plugin side session ID for isolated cache telemetry.

HTTP bridge

URL validation + blocked hostnames (loopback/private) + proxy routing + 256 KB limit.

Storage bridge

Per-plugin key-value in SettingsStore, 1 MB limit.

Connections bridge

Owner-scoped ({ownerId}.{connId}). Auto-rendered setup UI in /tools overlay.

Interface

User Interface

PiSidebar (1,221 lines · LitElement · 350px) — purpose-built for Excel's narrow taskpane. Replaces pi-web-ui's ChatPanel + AgentInterface.

Input & Auto-scroll

Auto-grow textarea, send/abort, file drop. Scroll hysteresis: disengage at 32px, re-engage at 20px.

11 Overlay Types

Rules, settings, recovery, extensions hub, files, shortcuts… Single-instance, Escape close, focus restore.

Status Bar

Context token %, thinking level flash, execution mode badge

28+ Slash Commands

model, settings, compact, export, session, help, extensions, tools, skills, files, experimental, debug…

Keyboard Shortcuts

F2 Focus input

Esc Blur editor / abort stream

⌘/Ctrl+T New tab

⌘/Ctrl+W Close tab

⌘/Ctrl+Shift+T Reopen closed tab

⌘/Ctrl+Z Undo close tab

Enter Send / steer (while streaming)

Alt+Enter Queue follow-up (while streaming)

Alt+↑ Restore queued messages to editor

Shift+Tab Cycle thinking level

⌘/Ctrl+O Toggle details visibility

/ Command menu

Agent Interface

What the model sees — Pi's awareness is layered. Some context is always present, some is injected fresh each turn, and some is fetched on demand via tools.

Always in prompt

Identity & persona 24 tool schemas Workflow rules Conventions & styles Execution mode User instructions Workbook instructions Bridge status Skill catalog

Static prefix — cached by provider. Changes invalidate the entire cache.

Injected each turn

Workbook blueprint Selection ±5 rows Recent cell changes Workspace file summary

Auto-context — spliced before the user message. Blueprint only re-sent on structure change or workbook switch.

On demand via tools

Read any range Search cells Trace formulas Read workspace files Read skill docs Web search & fetch Run Python Terminal sessions MCP servers

Agent decides when to call tools. Results enter conversation history and become part of the cached prefix on subsequent turns.

Persists across sessions

Workspace files User & workbook rules Convention presets Recovery snapshots Session history

Survives compaction and restarts. notes/index.md is the memory entry point for new sessions.

Rules & Conventions

Two persistence layers that shape the agent's behavior. Rules are free-text guidance (what to do); conventions are structured formatting defaults (how to format). Both survive across sessions.

Rules instructions

Injected into system prompt. Works like AGENTS.md — append or replace, ask on conflict.

User scope 2 K

"All my files" — private to this machine. Auto-updated on preferences.

Workbook scope 4 K

"This file" — keyed by workbook identity. Explicit confirmation required.

Conventions conventions

Overrides surfaced in system prompt; applied by tools at execution time.

Number presets 6 Named styles 11 Visual defaults Color coding Custom presets

Model Layer

LLM Pipeline

From browser to model endpoint — authentication, proxy routing, and stream normalization.

5 Browser OAuth Providers

Provider	Flow	Routing
Anthropic	PKCE	Proxy for OAuth tokens (`sk-ant-oat-*`); API keys direct
OpenAI Codex	PKCE + JWT	Always proxy-routed; JWT decode for ChatGPT account ID
Google Gemini CLI	Code Assist	Tiered provisioning (free/legacy/standard), LRO polling, VPC SC handling
Google Antigravity	API key	2 endpoints (prod + sandbox), default fallback project. JSON `{token, projectId}`
GitHub Copilot	Device code	Token refresh via GitHub API

Stream Proxy (createOfficeStreamFn)

Intercept

Every LLM call — routing, model normalization, tool bundle selection

Proxy routing

Anthropic OAuth, OpenAI Codex, Google Code Assist, Z-AI, custom gateways

Normalization

Google preview models → stable fallback, Antigravity → Code Assist base URL

Payload stats

24-entry ring buffer + 24-session LRU context cache

Churn tracking

FNV-1a fingerprint of model + systemPrompt + tools hashes per session

CORS Proxy (fetch interceptor)

Dev

Vite reverse proxies — 11 rewrite rules for Anthropic / OpenAI / Google / GitHub endpoints

Production

User-configured CORS proxy (localhost:3003 default). Conservative endpoint matching

Hygiene

Strips anthropic-dangerous-direct-browser-access header when proxied

Cache

3s settings cache for performance

Prompt Caching

LLM prompt caching is prefix-based — providers cache the longest matching token prefix and reuse it on subsequent calls. Pi keeps the prompt structured so the prefix stays stable and the cache extends as far as possible each turn.

Prefix anchor FNV-1a fingerprinted per call

model identity hash

system prompt hash — identity · tool docs · workflow policy · conventions

tools schema hash — full bundle in fixed order, never sub-setted

If any hash changes → entire cache invalidated (all history recomputed). PrefixChangeReason recorded & counter incremented.

Previous turns incrementally cached

Per turn Auto-context injection · user message · assistant response (thinking + text) · tool calls · tool results. Grows each turn — provider extends the cached prefix automatically.

If history is rewritten (e.g. compaction), cache breaks from the first changed token onward.

Current turn cached for next turn

Blueprint buildOverview() cached per workbook, monotonic revision. Re-injected on structural changes.

Workspace files Summary of files in OPFS / native / memory — data, docs, artifacts.

Selection Auto-read ±5 rows around active cell. Formulas highlighted, errors flagged.

Changes onChanged events → dedup by cell → truncate at 50 → flush on send.

User message Current prompt — spliced after auto-context injection.

Stability invariants

Static system prompt Built from fixed sections (identity, tool docs, policy). Structure only changes on explicit user actions, not per-turn.

Deterministic tool ordering selectToolBundle() returns full list in fixed order — no intent-based sub-setting. Extension revision tracking: hot-reloads skip setTools() when schema unchanged. src/context/tool-disclosure.ts

Volatile state in message tail only Auto-context (selection, changes, blueprint) injected as a user message after the frozen prefix — never by mutating the system prompt.

Runtime tool fingerprinting Refresh passes rebuild tool objects but only assign agent.state.tools = ... when the metadata fingerprint actually differs. Schema-stable handler swaps are silent no-ops. src/taskpane/runtime-utils.ts

Compaction Strategy

Quality caps

88% of context for ≥128K models, 85% for ≥200K

Auto-compact

Before each user prompt if projected tokens exceed hard threshold (requires ≥4 messages)

Soft warning

At hard threshold − 5% (min 2K tokens), floors at 70%

Memory nudge

Regex-detects "remember this" → extracts up to 3 snippets (180 chars) → focus instruction in compaction summary

Result shaping

6 most recent tool results intact, older >1200 chars → 500-char preview

Known prefix change triggers

Repeated turns

No churn (cache hit)

/model switch

["model"]

Rules / exec mode

["systemPrompt"]

Skill toggle

["systemPrompt"]

Integration toggle

["systemPrompt", "tools"]

Extension add/remove

includes "tools"

Extension hot-reload

No churn (same schema)

Extension side call

Isolated session key — no main-session churn

Baseline matrix documented in docs/cache-observability-baselines.md. PRs that change context shape must include a cache observability check.

Lifecycle

Session Runtime

Each tab = one SessionRuntime with its own Agent, ActionQueue, QueueDisplay, and SessionPersistenceController. Multi-tab layout persisted per workbook.

Tabs

Create, close, rename, reorder, duplicate, restore recently closed (stack of 10)

Lock state

idle → waiting_for_lock → holding_lock — prevents concurrent writes

Association

Session ↔ workbook is write-once (no accidental move on resume)

Restore

Partitions sessions: matching / unlinked / foreign

Queue

FIFO for prompts + commands. Guards against /compact race (agent.streamFn() outside Agent loop). Auto-compaction before each prompt.

Boot Sequence

bootstrap.ts → initTaskpane() — 7 phases with timeouts and fallbacks for non-Excel environments (dev mode).

Global Patches

Render loading UI
process.env shim, fetch interceptor (CORS proxy), model-selector patch
Office.onReady() 3 s fallback for dev without Excel
Call initTaskpane() 60 s hard timeout

Storage & Migrations

SettingsStore init + proxy default seed
Legacy migrations: web-search API keys → ConnectionStore, MCP tokens → ConnectionStore
Remote proxy security warning

Auth & Credentials

Provider discovery (5 built-in + custom gateways)
Credential restore: pi auth.json (dev) or IndexedDB OAuth (prod) 6 s timeout
Auto-refresh expired tokens
Show welcome login overlay if no providers

Core Infrastructure

ChangeTracker.start() — cell change monitoring
createOfficeStreamFn() — LLM call interceptor
createWorkbookCoordinator() — FIFO write queue

UI Mount & Managers

PiSidebar mount + execution mode controller
ConnectionManager + ExtensionRuntimeManager (reserved tool names from core + integrations)

Runtime Factory

Bridge health probe (async — first turn waits for result)
Runtime factory wiring
Tab layout restore from SettingsStore (or create first runtime)

Extensions & UI Polish

Extension init 5 s timeout non-blocking
Keyboard shortcuts, status bar, command menu
Proxy polling 30 s interval
Disclosure bar + proxy banner

Safety & Recovery

6 defense boundaries — each enforced independently, no single-point-of-failure.

SSRF Protection

Proxy target policy: hostname check + DNS-resolved IP check. Blocks loopback (127.0.0.0/8, ::1), RFC1918, link-local. IPv4-mapped-IPv6 handling.

Markdown Safety

Global marked patch: block javascript: / data: / file: links. No <img> from markdown (exfiltration risk → clickable link). Disable $...$ KaTeX (currency collision).

Bridge Security

CORS origin allowlist, Bearer token auth (timing-safe compare), loopback-only binding, 512KB body limit, 256KB output limit, process timeout with SIGKILL.

Extension Sandbox

Inline/remote code runs in sandboxed iframe with CSP. postMessage RPC for all 18 API surfaces. Allowlisted UI tag set. URL validation for HTTP requests.

HTML Safety

No innerHTML for user/tool/session content — DOM APIs or escapeHtml() / escapeAttr(). Queue display explicitly avoids innerHTML.

Secret Redaction

Connection secrets never exposed to UI (presence flags only). Error messages auto-redacted: stored values → ••••. OAuth tokens in IndexedDB.

Mutation Finalization

Every mutation tool calls finalizeMutationOperation():

Append audit entry

WorkbookChangeAuditLog: persistent, 500-entry rotating, tagged with execution mode + workbook identity

Recovery snapshot (optional)

Deep-clone state → stamp result.details.recovery → dispatch created event or append unavailable note

Change explanation

Deterministic from audit metadata (no LLM call), bounded 420 chars, up to 8 citations

5 Snapshot Kinds

Range

Values + formulas grid capture

Format

20 properties via boolean mask

Structure

Rows, columns, sheets + data

Cond. Format

8 rule types, 20 icon styles

Comment

Full thread + replies

Restore creates an inverse snapshot before applying — enables "undo the undo". save-boundary-monitor polls Workbook.isDirty every 4s, clears checkpoints on user save.

Manual Full Backup

Office.getFileAsync("compressed") → 1MB chunks → base64 → workspace file. Stored under manual-backups/full-workbook/v1/.

Testing Strategy

100

Test files

Suites

node:test

Runner

Mock strategy

Suite	Files	Coverage
test:models	Fast	Provider priority, family priority, `parseMajorMinor`
test:context	~80 files	Tools, context injection, compaction, change tracker, session persistence, blueprint, recovery
test:security	9 files	SSRF proxy, CORS server, tmux/python bridges, extension source policy, marked safety, OAuth

Build & Config

Vite

HTTPS dev server, pi-auth plugin, stub plugins for heavy Node deps, 11 proxy entries

TypeScript

ES2022 target, strict, bundler moduleResolution, useDefineForClassFields: false for Lit

ESLint

typescript-eslint recommendedTypeChecked, ban ts-ignore, error on floating/misused promises

Manifest

Office TaskPaneApp, ReadWriteDocument permission, Home ribbon button. Dev = localhost:3000, prod = Vercel

Pre-commit

npm run lint + npm run typecheck

CI checks

5 custom scripts — inline style hygiene, dead CSS vars, landing page copy, pi dep lockstep, theme utility overrides

Credits

Pi: by Mario Zechner — the agent framework powering this project. Pi for Excel uses pi-agent-core, pi-ai, and pi-web-ui for the agent loop, LLM abstraction, and session storage.
visual-explainer: by Nico Bailon — the Pi extension used to generate this architecture page.
whimsical.ts: by Armin Ronacher — the rotating "Working…" messages are adapted from his Pi extension, rewritten for a spreadsheet audience.