&

box-and-box · the [&] kernel

The kernel for AI
operating systems.Agents can call anything. box-and-box decides what they may.

One modal core sits under every agent action and answers, in order, the only questions that matter: is it feasible, is it permitted, is it the best option left — is it safe over time, affordable, known enough to attempt, and jointly ensurable. Under any model, in any language, with a proof for every verdict.

u ⊗  =   ·  a forbidden action is zero. No utility outranks it.

npm i -g box-and-box See an AIOS in 190 lines → Elixir / OTP · MCP-native · runs fully local
  box-and-box · ring 0 · governing one syscall
process
agent(gpt-4o) wants
spend($840)
the bridge
feasible? ✓
permitted? ✗ forbidden
best?
certificate
DENY — deontic veto
util 18 ⊗ =

Models generate answers.
Kernels govern them.

The deployment problem

Guardrails are prompts. Agents have syscalls.

An autonomous agent can spend money, send mail, delete records, and call other agents. The instruction be careful and don't do anything harmful lives in a prompt — advisory, in-band, and skippable the moment the context gets long or an injection lands.

The AIOS line of research (Rutgers, COLM 2025) reached the structural conclusion first: unrestricted access to tools and resources leads to harmful allocation, so an agent runtime should be shaped like a kernel — a scheduler, managers, and an access manager that checks rights before any operation. box-and-box is the part that check should have grown into.

01 / EXCESS

Excessive agency

The capability to act outruns the controls on acting. A bad call isn't a wrong sentence — it's a wire transfer, a dropped table, a leaked record.

02 / INVERSION

The hierarchy is inverted

Most stacks rank options by usefulness, then bolt on safety. That asks best? before permitted? — so a useful-but-forbidden action can win.

03 / SILENCE

Verdicts without proof

A confidence score is not a reason. When an agent acts, you deserve an audit trail — what was forbidden, what was infeasible, why this option survived.

Why a kernel

A policy engine is one subsystem.

The tools you already reach for each do one of the kernel's jobs — and only one. A content filter screens words but can't see a tool call. A workflow engine sequences steps but never asks whether a step is permitted. A policy engine answers permit-or-deny in isolation. A planner maximizes a score — and a score with no floor will happily rank a forbidden action first.

None of them compose. None can express the one rule that makes governance safe: a vetoed option is , and outranks nothing. That rule only exists when permission, feasibility, cost, and knowledge live in one algebra.

ApproachGenuinely good atThe rung it isCan't do alone
Prompt guardrailssteering tone & behavior— advice, not a gateenforce anything once the context shifts
Content filtersmoderation APIsscreening text in and out— the output layergovern a tool call — only the sentence describing it
Workflow enginesorchestrationsequencing, retries, state— orchestrationjudge whether a step is permitted or feasible
Policy enginesOPA · Cedarpermit / deny authorizationdeonticrank, cost, gate on knowledge, or compose
Planning agentsReAct · RL · MCTSmaximizing an objectiveaxiologicalrefuse a high-scoring forbidden action
box-and-boxone certified verdictall eight, composed
Already running OPA or Cedar? Good — that's your deontic evaluator. box-and-box is the other seven questions, plus the algebra that composes them — so a forbidden option can't be out-scored, an infeasible one can't be scheduled, and every verdict ships its reasons. We don't replace your policy engine. We contain it.

The kernel

Ring 0 for agents: eight subsystems, one bridge.

A kernel is not a wrapper around the model — it sits beneath it. box-and-box decomposes governance the way an OS decomposes the machine: each modal subsystem owns one question, and the bridge composes them in a fixed precedence that can never be reordered at runtime.

It's pure arithmetic — monoids, lattices, semirings — so the subsystems compose associatively and a vetoed option is : it annihilates. No score downstream can bring it back.

MMU · feasibility
alethic can it happen?
Gates on capability and confidence. Below the floor, the action is — infeasible, like a write to protected memory.
permissions
deontic is it allowed?
Obligations, prohibitions, and contrary-to-duty repair. A forbidden action is vetoed; an obligation in force overrides higher utility.
scheduler · priority
axiological which is best?
Ranks only what survived the floor. Lives in a semiring, so preferences compose without ever resurrecting a vetoed option.
watchdog
temporal safe over time?
Safety invariants as a runtime shield over the whole trajectory; liveness as a horizon obligation, with escalation when missed.
scheduler · quota
resource can we afford it?
A closed, double-entry economy of tokens and compute — and it prices the kernel's own deliberation: stop and think only when it's worth it.
knowledge base
epistemic do we know enough?
Possible-worlds knowledge vs. belief. A known-unknown routes to deliberate instead of a confident guess.
IPC · coordination
strategic who can ensure it?
Coalition ability. An obligation no agent can discharge alone escalates — ought implies can, enforced.
ring 0 · protected
reflexive may the rules change?
The policy can amend itself — tighten, add duties — but the entrenched core is un-writable. Self-modification can never relax the floor.
syscall interface
[&] composition npm i -g box-and-box
The capability-composition surface you already install: validate → compose → compile agent capabilities to MCP and A2A. The same package, the governance core switched on.
eight subsystems  ·  one bridge  ·  97 property-tested laws  ·  2000 trials each

The one idea

Can, then ought, then best.

Almost every agent stack computes desirability first and treats safety as a filter afterward. box-and-box runs the order the way a kernel must:

feasiblepermittedbest

The floor doesn't penalize an unsafe option — it annihilates it. A vetoed action becomes , and times anything is . Drag the utility to the ceiling; a forbidden action stays dead. Try it.

box-and-box · govern( safe_reply, tempting_action ) live · in your browser

the tempting syscall

agent → tempting_action

utility   14

modal profile — tap to toggle

feasible✓ yes
permitted✗ forbidden
confident✓ yes
safe_reply
feasible ▸ permitted ▸ confident
6allow
tempting_action

The flagship

An AI operating system, on the kernel.

Here is the whole thing: a kernel GenServer that governs every syscall, a use BoxAndBox.Agent DSL where a model declares the actions it may propose, and a demo that denies a forbidden call, stops when the budget is gone, refuses to weaken its own floor, and escalates a goal no agent can reach alone.

Agents are interchangeable — Claude, GPT, Gemini, a local model. To the kernel they're processes making syscalls.

 box_and_box_aios.ex Elixir / OTP

The kernel is pure

judge/3 is the bridge as one function — alethic ▸ deontic ▸ resource ▸ epistemic ▸ axiological — returning a certificate, not a boolean.

Ring 0 holds

An amendment that would weaken an entrenched rule is rejected by the kernel itself. The safety floor isn't policy — it's the machine.

Duty over utility

The cite-obligation forces answer_cited over the higher-scoring answer_raw; the PII call annihilates to .

Honest note · this is the reference host. The conformance-tested verdict engine is box-and-box (the npm / edge package, 97 property-tested laws); this Elixir kernel speaks the same arithmetic, so a verdict here and a verdict there are identical.

Portable governance

One substrate. Any model, any language.

Because the kernel is arithmetic, its laws are the specification. Two runtimes in two languages aren't "compatible" — they are provably identical, the way two calculators agree on 2 + 2. The monoids and lattices give you CRDT-like convergence for free.

So you can decide at the edge in TypeScript and re-decide in the Elixir host and audit it in CI, and get the same verdict and the same certificate every time. That is what makes agent decisions portable, un-relaxable, and reviewable.

Claude → syscall GPT-4o → syscall Gemini → syscall local 7B → syscall

Conformance

The verdict is the contract.

8
modal subsystems
1
composing bridge
97
property-tested laws
2000
trials per law

Every subsystem is a small algebra with stated laws — associativity, annihilation, the precedence of the bridge, the un-weakenability of the entrenched core — checked against thousands of random cases. Two conformant hosts, in any language, must agree on whether a composition is feasible, permitted, and ensurable. Pass the laws in your language; you're conformant.

Browse all 97 laws → run node test/laws.mjs and watch them pass

Userland

The kernel runs the [&] portfolio.

box-and-box is ring 0; the rest of the stack is userland that runs on it. PULSE declares the scheduler loops, PRISM is the profiler, and each cognitive primitive is a long-running service the kernel governs.

Boot it

A CLI, a library, an Elixir host — same verdict.

box-and-box ships as a zero-dependency CLI and an ES-module library: deterministic, no LLM, no network — safe to drop into CI, a pre-commit hook, or a pipeline. Pipe a decision in, get a certified verdict out. Conformance is the 97-law suite, so a port in any language agrees verdict-for-verdict.

  box-and-box
# install the kernel — zero-dependency, Node ≥ 18
npm i -g box-and-box

# run the 97-law conformance harness (2000 trials/law)
box-and-box laws

# a real verdict: feasible ▸ permitted ▸ best → certificate JSON
box-and-box govern decision.json     # exit 0 decision · 3 escalation · 1 none
cat decision.json | box-and-box govern --quiet

# …or import the library directly, in any JS runtime
import { govern } from 'box-and-box';

Requirements Node.js ≥ 18 · macOS, Linux, or Windows · no DB, no cloud, no network.  ·  Embed it as a CLI in CI, an imported ES module, or via the Elixir reference host above (which delegates to the verified engine).  ·  An MCP adapter for live agent calls is a planned optional surface — the kernel itself is library-first.