AI cost approval before merge

Class1

Know what an AI change will cost — the quote, before the invoice.

Most AI cost tools hand you a chart of money already spent. Class1 reads the diff the moment it is proposed and answers the only question that matters before merge: what does this do to next month's bill? P50/P90/P95, model fit, tail drivers, a 0/1/2-year forecast, and a carbon line. Set a budget and the P90 monthly dollar delta can fail the CI check — advisory by default.

See the worked example See the PR gate

pull request #184 budget gate: fail

120,000 paired Monte Carlo draws Risk band: real engine output, built from the Python cost_engine demo scenario. The estimate class is illustrative — a new install ships at Class 5 and rises as post-merge actuals accrue.

$0$111.8k tail cap

+$3.6kP50 / mo

+$22.4kP90 / mo

+$37.9kP95 / mo

Class 5screening · −50%/+100%

retry taildriver to attack

- model="gpt-4o-mini" + model="gpt-4.1" + max_tokens=8192 + retries=5

6,924 priced model rows effective 2026-06-06

7,389 model metadata rows drop-nothing spec sheet

34 coding grades SWE-Bench Verified . deduped

862 test functions 98 test files

103 Python modules engine, takeoff, ledger, organism

The category

Not another spend dashboard. A pre-merge cost approval layer.

Dashboards tell you what happened after the money is gone. Class1 answers the decision while the change is still a pull request.

The method is cost engineering: quantity takeoff, rate basis, contingency, escalation, an estimate class declared on the AACE International 18R-97 classification system, and actuals calibration.

Class1 is named for AACE Class 1 — the definitive end of that five-to-one ladder. Every estimate is born Class 5, a screening estimate on assumed risk factors, and earns its way toward Class 1 as measured actuals calibrate it.

Buyer moment

The cost decision belongs in code review because the architecture is still negotiable.

Most AI cost tools start after production telemetry exists. By then the model choice, retry policy, context shape, fallback path, and tool schema architecture have already become habits. Class1 moves the decision upstream, where the team can still cap output, narrow context, lazy-load tools, choose a fit-for-purpose model, or require a budget owner before merge.

The buyer is not buying a prettier dashboard. The buyer is buying a governance moment: a repeatable way to ask whether a software change creates recurring AI spend, whether that spend is justified, and which control lowers the tail without blocking useful engineering work.

That is why the homepage leads with P90 after scale. Expected cost is useful for discussion, but P90 is the number a finance team can approve against. Class1 keeps both numbers visible and separates the modeled risk band from the estimate class, so a report can be useful without pretending to be definitive.

CTO

Will this PR create an unstable AI workload?

See callsites, model swaps, max-token changes, retry/fallback exposure, MCP schema overhead, and the exact controls that reduce the tail.

CFO

What recurring spend should we approve at P90?

Review expected, P50, P90, P95, the P99.5 deep tail, estimate class, budget gate status, and the 0/1/2-year escalation curve.

CEO

Is this feature worth the cost after scale?

Approve, defer, or require controls with one report that engineering and finance can both defend.

Why now

AI spend is leaving the infrastructure budget and entering the product design loop.

Agents multiply hidden workA single feature can add retries, fallbacks, tool definitions, longer outputs, and larger context windows. The invoice shows the aggregate later; Class1 shows the architectural source before merge.

Cheap per token is not the same as cheap per taskA weak model can look inexpensive until retries, failures, and human rework are counted. Class1 prices the completed task, not only the token.

Governance needs a narrow leverThe policy gate is intentionally simple: positive P90 monthly delta versus a declared budget. That makes the enforcement explainable to engineering and finance.

Calibration becomes the moatEvery post-merge actual becomes an estimate-actual pair. As pairs accrue, the estimate class climbs the AACE ladder and the product learns each team's own retry tails, demand spikes, and model-fit economics. The mechanism is built and tested today; it begins moving the class once a team logs its first real pairs, roughly a month after merge.

Free to estimate. Paid to enforce.

The wedge is simple: comments are education, blocking checks are governance.

Open core proves the forecast. The Team Gate pilot adds private repo installation and blocking P90 policy gates; the Business Pilot then adds actuals ingestion, a private Blue Book basis, and monthly variance reports.

Apply for the Business Pilot

01Scan diffPython, TypeScript, JavaScript callsites

02Price the deltaMonte Carlo plus structured rates

03Fail if neededP90 over budget returns non-zero CI

04Learn from actualsPilot: estimate -> actual -> variance -> calibration (begins once real actuals land)

Explore the system

Every page is grounded in a real module, dataset, or test path.

Product The PR comment, policy gate, config, and CI workflow. Platform The cost engine, takeoff, Blue Book, footprint, and autobuild layers. Ledger Frozen pricing, specs, capability, actuals, and cloud basis. Footprint Carbon, water, and materials as a second currency. Trust Tests, assumptions, honest gaps, and reproducibility discipline. Pilot How the product becomes revenue without pretending the open items are done.