[&] Ampersand Box — box-and-box, the kernel for AI operating systems

The deployment problem

Guardrails are prompts. Agents have syscalls.

An autonomous agent can spend money, send mail, delete records, and call other agents. The instruction be careful and don't do anything harmful lives in a prompt — advisory, in-band, and skippable the moment the context gets long or an injection lands.

The AIOS line of research (Rutgers, COLM 2025) reached the structural conclusion first: unrestricted access to tools and resources leads to harmful allocation, so an agent runtime should be shaped like a kernel — a scheduler, managers, and an access manager that checks rights before any operation. box-and-box is the part that check should have grown into.

01 / EXCESS

Excessive agency

The capability to act outruns the controls on acting. A bad call isn't a wrong sentence — it's a wire transfer, a dropped table, a leaked record.

02 / INVERSION

The hierarchy is inverted

Most stacks rank options by usefulness, then bolt on safety. That asks best? before permitted? — so a useful-but-forbidden action can win.

03 / SILENCE

Verdicts without proof

A confidence score is not a reason. When an agent acts, you deserve an audit trail — what was forbidden, what was infeasible, why this option survived.

Why a kernel

A policy engine is one subsystem.

The tools you already reach for each do one of the kernel's jobs — and only one. A content filter screens words but can't see a tool call. A workflow engine sequences steps but never asks whether a step is permitted. A policy engine answers permit-or-deny in isolation. A planner maximizes a score — and a score with no floor will happily rank a forbidden action first.

None of them compose. None can express the one rule that makes governance safe: a vetoed option is 0̲, and 0̲ outranks nothing. That rule only exists when permission, feasibility, cost, and knowledge live in one algebra.

Approach	Genuinely good at	The rung it is	Can't do alone
Prompt guardrails	steering tone & behavior	— advice, not a gate	enforce anything once the context shifts
Content filtersmoderation APIs	screening text in and out	— the output layer	govern a tool call — only the sentence describing it
Workflow enginesorchestration	sequencing, retries, state	— orchestration	judge whether a step is permitted or feasible
Policy enginesOPA · Cedar	permit / deny authorization	deontic	rank, cost, gate on knowledge, or compose
Planning agentsReAct · RL · MCTS	maximizing an objective	axiological	refuse a high-scoring forbidden action
box-and-box	one certified verdict	all eight, composed	—

Already running OPA or Cedar? Good — that's your deontic evaluator. box-and-box is the other seven questions, plus the algebra that composes them — so a forbidden option can't be out-scored, an infeasible one can't be scheduled, and every verdict ships its reasons. We don't replace your policy engine. We contain it.

The kernel

Ring 0 for agents: eight subsystems, one bridge.

A kernel is not a wrapper around the model — it sits beneath it. box-and-box decomposes governance the way an OS decomposes the machine: each modal subsystem owns one question, and the bridge composes them in a fixed precedence that can never be reordered at runtime.

It's pure arithmetic — monoids, lattices, semirings — so the subsystems compose associatively and a vetoed option is 0̲: it annihilates. No score downstream can bring it back.

MMU · feasibility

alethic can it happen?

Gates on capability and confidence. Below the floor, the action is 0̲ — infeasible, like a write to protected memory.

permissions

deontic is it allowed?

Obligations, prohibitions, and contrary-to-duty repair. A forbidden action is vetoed; an obligation in force overrides higher utility.

scheduler · priority

axiological which is best?

Ranks only what survived the floor. Lives in a semiring, so preferences compose without ever resurrecting a vetoed option.

watchdog

temporal safe over time?

Safety invariants as a runtime shield over the whole trajectory; liveness as a horizon obligation, with escalation when missed.

scheduler · quota

resource can we afford it?

A closed, double-entry economy of tokens and compute — and it prices the kernel's own deliberation: stop and think only when it's worth it.

knowledge base

epistemic do we know enough?

Possible-worlds knowledge vs. belief. A known-unknown routes to deliberate instead of a confident guess.

IPC · coordination

strategic who can ensure it?

Coalition ability. An obligation no agent can discharge alone escalates — ought implies can, enforced.

ring 0 · protected

reflexive may the rules change?

The policy can amend itself — tighten, add duties — but the entrenched core is un-writable. Self-modification can never relax the floor.

syscall interface

[&] composition npm i -g box-and-box

The capability-composition surface you already install: validate → compose → compile agent capabilities to MCP and A2A. The same package, the governance core switched on.

eight subsystems · one bridge · 116 property-tested laws · 2000 trials each

The one idea

Can, then ought, then best.

Almost every agent stack computes desirability first and treats safety as a filter afterward. box-and-box runs the order the way a kernel must:

feasible▸permitted▸best

The floor doesn't penalize an unsafe option — it annihilates it. A vetoed action becomes 0̲, and 0̲ times anything is 0̲. Drag the utility to the ceiling; a forbidden action stays dead. Try it.

box-and-box · govern( safe_reply, tempting_action ) live · in your browser

the tempting syscall

agent → tempting_action

utility 14

modal profile — tap to toggle

feasible✓ yes

permitted✗ forbidden

confident✓ yes

safe_reply

feasible ▸ permitted ▸ confident

6allow

tempting_action

—

——

—

The flagship

An AI operating system, on the kernel.

Here is the whole thing: a kernel GenServer that governs every syscall, a use BoxAndBox.Agent DSL where a model declares the actions it may propose, and a demo that denies a forbidden call, stops when the budget is gone, refuses to weaken its own floor, and escalates a goal no agent can reach alone.

Agents are interchangeable — Claude, GPT, Gemini, a local model. To the kernel they're processes making syscalls.

box_and_box_aios.ex Elixir / OTP

The kernel is pure

judge/3 is the bridge as one function — alethic ▸ deontic ▸ resource ▸ epistemic ▸ axiological — returning a certificate, not a boolean.

Ring 0 holds

An amendment that would weaken an entrenched rule is rejected by the kernel itself. The safety floor isn't policy — it's the machine.

Duty over utility

The cite-obligation forces answer_cited over the higher-scoring answer_raw; the PII call annihilates to 0̲.

Honest note · this is the reference host. The conformance-tested verdict engine is box-and-box (the npm / edge package, 116 property-tested laws); this Elixir kernel speaks the same arithmetic, so a verdict here and a verdict there are identical.

Portable governance

One substrate. Any model, any language.

Because the kernel is arithmetic, its laws are the specification. Two runtimes in two languages aren't "compatible" — they are provably identical, the way two calculators agree on 2 + 2. The monoids and lattices give you CRDT-like convergence for free.

So you can decide at the edge in TypeScript and re-decide in the Elixir host and audit it in CI, and get the same verdict and the same certificate every time. That is what makes agent decisions portable, un-relaxable, and reviewable.

Claude → syscall GPT-4o → syscall Gemini → syscall local 7B → syscall

Conformance

The verdict is the contract.

modal subsystems

composing bridge

116

property-tested laws

2000

trials per law

Every subsystem is a small algebra with stated laws — associativity, annihilation, the precedence of the bridge, the un-weakenability of the entrenched core — checked against thousands of random cases. Two conformant hosts, in any language, must agree on whether a composition is feasible, permitted, and ensurable. Pass the laws in your language; you're conformant.

Browse all 116 laws → run node test/laws.mjs and watch them pass

Boot it

A CLI, a library, an Elixir host — same verdict.

box-and-box ships as a zero-dependency CLI and an ES-module library: deterministic, no LLM, no network — safe to drop into CI, a pre-commit hook, or a pipeline. Pipe a decision in, get a certified verdict out. Conformance is the 116-law suite, so a port in any language agrees verdict-for-verdict.

box-and-box

# install the kernel — zero-dependency, Node ≥ 18
npm i -g box-and-box

# run the 116-law conformance harness (2000 trials/law)
box-and-box laws

# a real verdict: feasible ▸ permitted ▸ best → certificate JSON
box-and-box govern decision.json     # exit 0 decision · 3 escalation · 1 none
cat decision.json | box-and-box govern --quiet

# …or import the library directly, in any JS runtime
import { govern } from 'box-and-box';

Requirements Node.js ≥ 18 · macOS, Linux, or Windows · no DB, no cloud, no network. · Embed it as a CLI in CI, an imported ES module, or via the Elixir reference host above (which delegates to the verified engine). · An MCP adapter for live agent calls is a planned optional surface — the kernel itself is library-first.

Read the reference implementation → The 116 laws →

The kernel for AI
operating systems.Agents can call anything. box-and-box decides what they may.

Guardrails are prompts. Agents have syscalls.

Excessive agency

The hierarchy is inverted

Verdicts without proof

A policy engine is one subsystem.

Ring 0 for agents: eight subsystems, one bridge.

Can, then ought, then best.

An AI operating system, on the kernel.

The kernel is pure

Ring 0 holds

Duty over utility

One substrate. Any model, any language.

The verdict is the contract.

The kernel runs the [&] portfolio.

Graphonomous

Deliberatic

TickTickClock

GeoFleetic

Delegatic + AgenTroMatic

OpenSentience

A CLI, a library, an Elixir host — same verdict.