A-06 Certified 

The Living Ledger

A legacy Flutter application for tracking real estate wealth. Firebase-coupled, untyped, with a silent data model bug that had been corrupting cash flow calculations since the first version shipped. The founding team knew the numbers felt wrong. They could not prove it. Three Phoenix Runtime agents ran a full archaeological survey, rebuilt the stack from the ground up in React 18 and Dexie.js, produced 51 passing tests, and issued A-06 certification. The app is live. The numbers are right.

3
Agents Run
51
Tests Passing
0
Firebase Deps
A-06
Certification
React 18
Stack
Live
w2t.semanticintent.ai

Pipeline via Phoenix Runtime — 7-Agent Modernization™

What the original app was carrying

The application was a Flutter-based real estate wealth tracker built by a three-person founding team. It tracked investment properties, mortgages, rent income, and a portfolio overview across CAD and USD. It worked well enough to use. It did not work well enough to trust.

The core problem was structural. The original stack had no TypeScript, no unit tests, and no separation between data access and display logic. Firebase provided authentication and storage — a coupling that meant no offline access, no data portability, and no ability to run the application without an active cloud connection. The founding team had chosen Firebase for speed. The debt compounded quietly.

Data

The frequencyFactor bug

Additional property expenses carried a frequencyFactor field intended to normalise recurring payments to monthly equivalents. One-time expenses stored frequencyFactor: 0. The division operation used this value as a divisor. Division by zero returned NaN, which propagated silently through every cash flow calculation upstream. The cash flow total shown in the portfolio overview was wrong by a compounding factor. The team knew the numbers felt off. The bug had no error, no warning, and no test to catch it.

Impact: portfolio cash flow total corrupted — silent, no stack trace.
Types

No type surface to reason from

The original codebase carried no TypeScript types. Property data, borrowing records, and expense entries were passed as untyped maps. The archaeological survey found that the typed store name for cascade deletes in the new system — db.additionalExpenses — had been written as the string 'additionalProperties' in the original. A category error invisible without types. The absence of a type surface meant every refactoring decision had to be made against runtime behaviour rather than schema guarantees.

Impact: cascade delete silently targeting wrong store — data integrity risk.
Arch

Firebase coupling with no migration path

The original application required Firebase authentication to function. All user data lived in Firestore. There was no export mechanism, no local fallback, and no way to use the application without an active cloud session. For a personal finance tool handling investment property data, this was a structural misfit: the data is sensitive, the use is personal, and the dependency on an external service introduced both a privacy surface and a single point of failure the user could not control.

Impact: no offline access, no data portability, cloud dependency on sensitive personal data.

“The numbers feel wrong but I can’t prove it.”

— Founding team, pre-modernisation

What the original app actually did

It is worth being precise here. The original Flutter application was not a sketch. It was a fully-conceived real estate investment platform with a domain model that a competent team spent months designing. Fourteen screens. Thirteen entities. Multi-currency support across CAD and USD with a user-controlled exchange rate. Cross-property aggregation. Three separate computed view models that never touched the database. The founding team understood their domain.

The data model was structurally sophisticated. Property was the central entity, with three embedded sub-documents — PropertyDetails (purchase price, market value, property type), PropertyRenting (rent amount, term, renewal tracking, increase limit), and PropertyExpenses (taxes, maintenance, management fee, insurance) — plus two sub-collections: Borrowing and AdditionalExpense. The Money value object carried amount, currency, and a frequency factor for payment normalisation. OverviewModel, ExpenseModel, and MarketValueModel were pure computed views, never persisted. The design was clean. The implementation had gaps.

Property
Central entity — address, building type, 3 sub-documents, 2 sub-collections
Borrowing
Per-property loan/mortgage — institution, rate, term, renewal date, payment frequency
AdditionalExpense
Free-form recurring or one-time expenses with frequency normalisation
Money
Value object — amount, currency, frequency factor; CAD/USD conversion inline
OverviewModel
Computed — market value, equity, borrowings, funds, cash flow; never persisted
MarketValueModel
Computed — per-property LTV% against total borrowings; never persisted

The portfolio overview aggregated across all properties in real time: total market value, total equity (value minus borrowings), total outstanding debt, total funds, and a monthly net cash flow figure. The LTV calculation divided total borrowings by market value per property. The currency switcher converted every figure from CAD to USD using the stored exchange rate. The borrowings portfolio screen grouped loans cross-property with a toggle. These are not trivial features. The domain model was correct. The infrastructure beneath it was the problem.

What makes this more remarkable: the original application was not a weekend experiment. It was a full-fledged prototype, conceived and shipped in six weeks, with production-grade deployment infrastructure from day one. Codemagic handled dual CI/CD — every commit to master triggered parallel iOS and Android builds, pushing to TestFlight for beta testers and distributing a universal APK for Android. Version 1.1.19 reached Apple review. The team built something real, shipped it properly, and kept iterating. The decision to modernise was not a rescue — it was the next chapter of a project that had always been serious.

The design was right. The stack was wrong for the use case. Phoenix Runtime’s job was to preserve the domain model and replace the infrastructure — not to redesign the product.

— A-04 Archaeological Survey finding

The artifact trail — how Phoenix speaks

Every Phoenix Runtime engagement produces a structured directory of .sil files — plain text, human-readable, agent-parseable, Git-diffable. EMBER (the Semantic Intent Language) is the grammar. The files are the permanent record of what each agent found, decided, and produced. Not documentation written after the fact. Artifacts written as the work happens, in a format every downstream agent can read without a schema.

For wealth2track, the full artifact trail looked like this:

wealth2track/
├── _mission.sil                              ← A-00: heads-up brief every agent reads first
│
├── episodes/
│   └── ep-001.sil                            ← stack pivot declaration: Flutter → React 18   compresses A-00–A-03 into A-04, affects A-05/A-06
│
├── signals/                                    ← A-04: archaeological findings, one per workflow
│   ├── property.manage-properties.sil
│   ├── property.manage-borrowings.sil
│   ├── property.track-rent.sil
│   ├── property.track-expenses.sil          ← frequencyFactor bug located here
│   ├── portfolio.view-portfolio-overview.sil
│   ├── portfolio.view-market-values.sil
│   ├── portfolio.view-borrowings-portfolio.sil
│   ├── funds.manage-funds.sil
│   ├── background.calculate-metrics.sil
│   ├── auth.authenticate.sil
│   ├── auth.onboard.sil
│   ├── auth.manage-profile.sil
│   ├── settings.configure-settings.sil
│   └── data.delete-data.sil               ← 14 signal files
│
├── workflows/                                  ← A-01: server-side traces, ASCII flow per workflow
│   └── ... 14 workflow traces
│
├── screens/                                    ← A-02: UI archaeologist, ASCII wireframes per screen
│   └── ... 15 screen wireframes
│
├── specs/                                      ← A-03: semantic intent specs — what each workflow
│   └── ... 14 intent specs                   was actually trying to accomplish
│
├── architecture/                               ← A-04: stack recommendation + layer blueprints
│   ├── system.overview.sil
│   ├── blueprint.frontend.sil
│   ├── blueprint.data.sil
│   └── blueprint.electron.sil
│
├── build/                                      ← A-05: six build passes, each a named .sil gate
│   ├── pass.1.sil                               UI shell — all screens wired, server calls stubbed
│   ├── pass.2.sil                               API layer — real endpoints, real business rules
│   ├── pass.3.sil                               Data layer — real queries, real schema
│   ├── pass.4.sil                               normaliseToMonthly() + calculateLtv() extracted
│   └── pass.5.sil                               51 tests passing
│
├── certification/                              ← A-06: per-workflow cert + summary
│   ├── cert.property.manage-properties.sil
│   ├── cert.property.track-expenses.sil
│   ├── ... 13 workflow certs
│   └── cert.summary.sil                      ← the certification document
│
└── .phoenix/
    └── state.json                            ← machine state: agent IDs, status, confidence

Each .sil file is a named artifact with a type, version, and structured body. The mission brief tells every agent what system it is working on. The episode file records what changed mid-engagement and why. The signal files carry the archaeological findings. The certification files are the per-workflow verdicts. The whole tree is a complete audit trail of how the modernised system came to be — not generated after the fact, written as the pipeline ran.

This is what separates a Phoenix engagement from a code rewrite. The rewrite produces a new codebase. The pipeline produces a new codebase and a permanent, legible record of every decision that shaped it.

The pipeline

Phoenix Runtime deploys a sequential seven-agent pipeline. Each agent has a defined scope and produces structured outputs that downstream agents consume. All seven stages were followed. A-00 through A-03 — intent setting, workflow mapping, screen archaeology, and semantic spec — were compressed into a single A-04 pass via an episode declaration that recorded the stack pivot decision as first-class pipeline state. The archaeological work was done; it ran as one survey rather than four discrete stages. A-05 ran twice (initial build, then A-06 findings incorporated). A-06 issued certification.

Pipeline A-00 Intent A-01–03 Compressed A-04 Survey A-05 Build ×2 A-06 Certified
A-04

Archaeological Survey — what was there

A-04 catalogued the full original codebase: data model, Firebase schema, screen inventory, business logic distribution, and known bugs. The survey located the frequencyFactor root cause, identified the typed store name mismatch, documented the LTV zero-guard gap, and confirmed that no unit test surface existed anywhere in the project. The survey output became the A-05 work brief.

Output: full bug inventory, data model map, zero test surface confirmed.
A-05

Engineering — two passes, two scopes

A-05 ran twice. Pass 4 extracted normaliseToMonthly(amount, frequency) as the single source of truth for payment frequency normalisation — resolving the frequencyFactor bug with an explicit 0 return for one-time expenses. It extracted calculateLtv(totalBorrowed, marketValue) as a pure function with a zero-division guard. Both functions were placed in dedicated modules, testable in isolation. Pass 5 built the test suite: 51 tests across 4 files using Vitest and fake-indexeddb, covering normalisation, LTV, export/import round-trips, and overview calculations. All 51 passed before A-06 ran.

Output: normaliseToMonthly(), calculateLtv(), 51 tests passing, 0 failures.
A-06

Validator — independent certification

A-06 ran as an independent agent with no prior context from A-04 or A-05 — a deliberate constraint that prevents the certifier from inheriting the builder’s assumptions. It reviewed the full codebase, ran the test suite, and issued five findings: the exchange rate direction convention needed documentation; stub data used frequencyFactor: 1 for annual expenses (should be 12); the useProperties() hook returned oldest-first; the vite.config.ts import was not from vitest/config; email validation was absent on the profile screen. All five were non-blocking. All five were fixed before deployment. Certification issued.

Output: A-06 CERTIFIED — 5 non-blocking findings, all resolved.

What the new stack is

The rebuilt application runs on React 18, TypeScript, Vite 5, Tailwind CSS, Zustand, and Dexie.js. All data lives in IndexedDB on the user’s device. There is no authentication layer, no cloud dependency, and no data leaving the browser. The founding team’s original instinct — that a personal finance tool should be personal — is now architectural fact.

The local-first decision shaped every subsequent choice. Dexie.js provides a typed IndexedDB wrapper with reactive hooks. Data is exported as a single JSON file and imported with schema validation. The exchange rate between CAD and USD is user-controlled and stored locally. Address lookup uses Nominatim/OpenStreetMap with a 1000ms debounce and no API key. The application works offline, on a plane, without a Cloudflare edge, without a Firebase project, and without an account.

A-06 Certified
Certification verdict, April 2026: The modernisation is structurally sound. The frequencyFactor bug is resolved. LTV calculation is guarded. The test suite provides meaningful regression coverage. The local-first architecture is the correct fit for the use case. Five non-blocking findings were identified and resolved before this certification was issued. The application is production-ready.
Phoenix TracePipeline state — machine-readable record of the run
-- ep-001.sil — Stack Pivot Episode Declaration
EPISODE "stack-pivot"
DATE "2026-04"
AFFECTS A-04, A-05, A-06
COMPRESSES A-00, A-01, A-02, A-03 INTO A-04

-- Original: Flutter + Firebase + Dart
-- Rebuilt:  React 18 + Dexie.js + TypeScript + Vite 5
-- Reason:   Local-first architecture. No auth. No cloud coupling.
--           The data is personal. The stack should reflect that.

NOTE "A-04 reads the original Flutter codebase as archaeological source."
NOTE "A-05 builds against the new stack from scratch."
NOTE "A-06 certifies the new build independently."

-- .phoenix/state.json (abridged)
{
  "project": "wealth2track",
  "agents": [
    { "id": "a-04", "name": "Archaeologist",
      "status": "complete", "confidence": "high",
      "outputCount": 3 },
    { "id": "a-05", "name": "Engineer",
      "status": "complete", "confidence": "high",
      "outputCount": 5 },
    { "id": "a-06", "name": "Validator",
      "status": "certified", "confidence": "high",
      "findings": 5, "blocking": 0 }
  ]
}
SURVEY A-04 confirmed the frequencyFactor bug as the root cause of the cash flow corruption the founding team had sensed but could not locate (documented in full in Section 01). The survey also found the typed store name bug ('additionalProperties' vs db.additionalExpenses) and confirmed there was no unit test surface anywhere in the project. No tests meant the bugs had no mechanism for detection short of the founding team noticing that the numbers felt wrong — which they did, and which was correct.
BUILD A-05 Pass 4 extracted normaliseToMonthly(amount, frequency) as the single source of truth for payment normalisation. The function handles six frequency types — Monthly, Bi-weekly, Weekly, Annual, Quarterly, Semi-monthly — and returns 0 explicitly for one-time expenses, eliminating the division-by-zero permanently. calculateLtv(totalBorrowed, marketValue) was extracted as a pure testable function with a guard returning 0 when market value is zero or less. Pass 5 built the test suite: 51 tests, 4 files, Vitest, fake-indexeddb. Tests covered normalisation across all six frequencies, LTV edge cases, export/import round-trip fidelity, and overview calculation correctness. All 51 passed before A-06 ran a single check.
CERTIFY A-06 ran without context from A-04 or A-05 — the independence constraint is the point. A certifier that inherits the builder’s mental model cannot find what the builder normalised away. Five findings: (1) exchange rate direction convention undocumented; (2) stub annual expenses used frequencyFactor: 1 instead of 12, producing a stub cash flow total of −$15,180 instead of the correct −$4,455; (3) useProperties() returned oldest-first; (4) vite.config.ts imported defineConfig from 'vite' instead of 'vitest/config'; (5) profile screen accepted any string as email with no validation. Zero blocking. All five resolved. Certification issued.
DEPLOY The application was deployed to Cloudflare Pages at w2t.semanticintent.ai following certification. The build produces a 432KB JS bundle (125KB gzip), a 19KB CSS bundle (4.67KB gzip), and a single _redirects rule for client-side routing. Total upload: 5 files. Deployment time: under 3 seconds. The app loads, runs, and stores data entirely in the browser. No server, no session, no account. The founding team can hand the URL to anyone and the app will work — including offline, after the first load.

What this engagement reveals

Silent bugs are the most expensive kind

Silent math corruption is the hardest class of bug to surface. No error. No warning. No stack trace. Just numbers that feel slightly wrong to anyone who knows the domain well enough to notice. A-04’s archaeological mandate — read everything, document everything, trust nothing — is specifically designed for this. You cannot fix what you cannot locate. You cannot locate what you cannot name.

The test suite is the most important A-05 output

A-05 produced normalised functions, a corrected data model, a full React rebuild, and 51 tests. The tests are the most durable output. The functions will be refactored. The data model will evolve. The tests will catch both. A codebase with zero tests is a codebase where every change is a bet. A-05’s second pass exists specifically to convert the bet into a guarantee. The 51 tests are not a metric. They are the mechanism by which future changes can be made without fear.

Local-first is the correct architecture for personal finance

Firebase was chosen for speed. The debt was privacy, portability, and offline access. A personal finance application — tracking investment properties, mortgage balances, and net worth — handles data that users should own. Local-first with Dexie.js and IndexedDB means the data lives on the user’s device, exports to a JSON file they control, and requires no account to access. The architecture is not a technical preference. It is the correct answer to the question: whose data is this?

Independent certification finds what builders normalise away

A-06’s five findings were not failures of A-05 — they were findings that A-05 could not structurally produce. The stub annual expense error required looking at the data model with fresh eyes. The email validation gap required asking “what is missing” rather than “what is broken.” The certifier’s independence constraint is not a formality. It is the mechanism that makes certification mean something. A builder who certifies their own work is auditing their own assumptions. That is not an audit.

The numbers were wrong. Now they’re right. The stack is local-first. The tests are there. The app is live.

Phoenix Runtime runs the same pipeline on any legacy codebase. Three agents. One engagement. A-06 certification before deployment.