box-and-box · the [&] kernel
One modal core sits under every agent action and answers, in order, the only questions that matter: is it feasible, is it permitted, is it the best option left — is it safe over time, affordable, known enough to attempt, and jointly ensurable. Under any model, in any language, with a proof for every verdict.
u ⊗ 0̲ = 0̲ · a forbidden action is zero. No utility outranks it.
Models generate answers.
Kernels govern them.
The deployment problem
An autonomous agent can spend money, send mail, delete records, and call other agents. The instruction be careful and don't do anything harmful lives in a prompt — advisory, in-band, and skippable the moment the context gets long or an injection lands.
The AIOS line of research (Rutgers, COLM 2025) reached the structural conclusion first: unrestricted access to tools and resources leads to harmful allocation, so an agent runtime should be shaped like a kernel — a scheduler, managers, and an access manager that checks rights before any operation. box-and-box is the part that check should have grown into.
The capability to act outruns the controls on acting. A bad call isn't a wrong sentence — it's a wire transfer, a dropped table, a leaked record.
Most stacks rank options by usefulness, then bolt on safety. That asks best? before permitted? — so a useful-but-forbidden action can win.
A confidence score is not a reason. When an agent acts, you deserve an audit trail — what was forbidden, what was infeasible, why this option survived.
Why a kernel
The tools you already reach for each do one of the kernel's jobs — and only one. A content filter screens words but can't see a tool call. A workflow engine sequences steps but never asks whether a step is permitted. A policy engine answers permit-or-deny in isolation. A planner maximizes a score — and a score with no floor will happily rank a forbidden action first.
None of them compose. None can express the one rule that makes governance safe: a vetoed option is 0̲, and 0̲ outranks nothing. That rule only exists when permission, feasibility, cost, and knowledge live in one algebra.
| Approach | Genuinely good at | The rung it is | Can't do alone |
|---|---|---|---|
| Prompt guardrails | steering tone & behavior | — advice, not a gate | enforce anything once the context shifts |
| Content filtersmoderation APIs | screening text in and out | — the output layer | govern a tool call — only the sentence describing it |
| Workflow enginesorchestration | sequencing, retries, state | — orchestration | judge whether a step is permitted or feasible |
| Policy enginesOPA · Cedar | permit / deny authorization | deontic | rank, cost, gate on knowledge, or compose |
| Planning agentsReAct · RL · MCTS | maximizing an objective | axiological | refuse a high-scoring forbidden action |
| box-and-box | one certified verdict | all eight, composed | — |
The kernel
A kernel is not a wrapper around the model — it sits beneath it. box-and-box decomposes governance the way an OS decomposes the machine: each modal subsystem owns one question, and the bridge composes them in a fixed precedence that can never be reordered at runtime.
It's pure arithmetic — monoids, lattices, semirings — so the subsystems compose associatively and a vetoed option is 0̲: it annihilates. No score downstream can bring it back.
The one idea
Almost every agent stack computes desirability first and treats safety as a filter afterward. box-and-box runs the order the way a kernel must:
feasible▸permitted▸best
The floor doesn't penalize an unsafe option — it annihilates it. A vetoed action becomes 0̲, and 0̲ times anything is 0̲. Drag the utility to the ceiling; a forbidden action stays dead. Try it.
the tempting syscall
utility 14
modal profile — tap to toggle
The flagship
Here is the whole thing: a kernel GenServer that governs every syscall, a use BoxAndBox.Agent DSL where a model declares the actions it may propose, and a demo that denies a forbidden call, stops when the budget is gone, refuses to weaken its own floor, and escalates a goal no agent can reach alone.
Agents are interchangeable — Claude, GPT, Gemini, a local model. To the kernel they're processes making syscalls.
judge/3 is the bridge as one function — alethic ▸ deontic ▸ resource ▸ epistemic ▸ axiological — returning a certificate, not a boolean.
An amendment that would weaken an entrenched rule is rejected by the kernel itself. The safety floor isn't policy — it's the machine.
The cite-obligation forces answer_cited over the higher-scoring answer_raw; the PII call annihilates to 0̲.
Honest note · this is the reference host. The conformance-tested verdict engine is box-and-box (the npm / edge package, 97 property-tested laws); this Elixir kernel speaks the same arithmetic, so a verdict here and a verdict there are identical.
Portable governance
Because the kernel is arithmetic, its laws are the specification. Two runtimes in two languages aren't "compatible" — they are provably identical, the way two calculators agree on 2 + 2. The monoids and lattices give you CRDT-like convergence for free.
So you can decide at the edge in TypeScript and re-decide in the Elixir host and audit it in CI, and get the same verdict and the same certificate every time. That is what makes agent decisions portable, un-relaxable, and reviewable.
Conformance
Every subsystem is a small algebra with stated laws — associativity, annihilation, the precedence of the bridge, the un-weakenability of the entrenched core — checked against thousands of random cases. Two conformant hosts, in any language, must agree on whether a composition is feasible, permitted, and ensurable. Pass the laws in your language; you're conformant.
Userland
box-and-box is ring 0; the rest of the stack is userland that runs on it. PULSE declares the scheduler loops, PRISM is the profiler, and each cognitive primitive is a long-running service the kernel governs.
The memory loop — continual-learning knowledge graphs. Retrieve, route, act, learn, consolidate. No retraining, no forgetting.
graphonomous.com →The deliberation loop the kernel routes to on a known-unknown. Argue, vote, credit, seal — every decision ships a Merkle-chained proof.
deliberatic.com →Temporal intelligence — anomaly detection and prediction via state-space models. Feeds the watchdog's invariants.
ticktickclock.com →Spatial intelligence — fleet tracking and geo-routing with delta-CRDT state sync across edge locations.
geofleetic.com →Multi-agent coordination — agents bid, argue fitness, and self-organize. The strategic subsystem decides who can ensure what.
delegatic.com →Agent lifecycle, permissions, and tool routing — and the open research arm where the modality ladder is specified, rung by rung.
opensentience.org →Boot it
box-and-box ships as a zero-dependency CLI and an ES-module library: deterministic, no LLM, no network — safe to drop into CI, a pre-commit hook, or a pipeline. Pipe a decision in, get a certified verdict out. Conformance is the 97-law suite, so a port in any language agrees verdict-for-verdict.
# install the kernel — zero-dependency, Node ≥ 18 npm i -g box-and-box # run the 97-law conformance harness (2000 trials/law) box-and-box laws # a real verdict: feasible ▸ permitted ▸ best → certificate JSON box-and-box govern decision.json # exit 0 decision · 3 escalation · 1 none cat decision.json | box-and-box govern --quiet # …or import the library directly, in any JS runtime import { govern } from 'box-and-box';
Requirements Node.js ≥ 18 · macOS, Linux, or Windows · no DB, no cloud, no network. · Embed it as a CLI in CI, an imported ES module, or via the Elixir reference host above (which delegates to the verified engine). · An MCP adapter for live agent calls is a planned optional surface — the kernel itself is library-first.