No More Slacking — Technical Appendix

appendix to No More Slacking

This appendix carries the math apparatus that supports No More Slacking. The body of that essay describes the shape of the bet; here is where the formalism lives, in clean exposition.

A. Why this formalism, and the literature it draws on

Why cellular sheaf cohomology rather than another framework? The architecture-level slogan, if you want one, is lossless fusion. Semantic-web stacks (RDF, OWL, named graphs, PROV-O) can preserve provenance and source context just fine; what they do not make first-class is local compatibility between source-local views and obstruction-detection across overlap cycles. Sheaves invert that emphasis. Source-local stalks, restriction maps between them, approximate gluing, and the obstructions to gluing become the central computational object rather than something you bolt on top of triples. Ontology in this work means lossless-fusion-shaped, not formal-taxonomy-shaped; the word is being used in the older, more precise sense.

The lineage I have been reading:

Justin Curry's 2014 thesis, Sheaves, Cosheaves, and Applications, for the cellular formalism. Still the cleanest umbrella treatment.
Hansen and Ghrist's 2019 Toward a Spectral Theory of Cellular Sheaves, for the sheaf Laplacian: the sparse positive-semidefinite operator whose kernel encodes the global sections, and the right computational object to actually solve with.
Hansen and Gebhart's 2020 Sheaf Neural Networks, as one ML application of that machinery.
Robert Ghrist's Elementary Applied Topology, chapter 9, for the exact-cohomology baseline.
Michael Robinson's Topological Signal Processing (2014), which starts the signal-processing program. The approximate-section / consistency-radius formalism that handles real-world data is in his later pseudometric papers, especially Assignments to Sheaves of Pseudometric Spaces (arXiv 2018; Compositionality 2020).

The engineering side is much thinner than the math. Catlab.jl and the broader AlgebraicJulia ecosystem give you primitive cellular-sheaf operators usable in research. The Topos Institute and the ACT conference series have been turning category-theoretic abstractions into computational tools for the better part of a decade. Beyond that, there is no production-grade substrate that handles per-source trust, schema evolution, structural-vs-value disagreement, and refusal as first-class operations at the scale agent organisations will need. The published spectral-theory papers are not that substrate; they are necessary background for someone trying to build it.

None of this is exotic mathematics any more. The gap that matters is between the math is real and the small-scale demos run and the substrate hundreds of heterogeneous agents call into without thinking about it. Most of that gap is engineering rather than mathematics: sparse numerics (SuiteSparse and the Rust ecosystem around it only recently got fast enough to make this tractable at production scale), restriction-map authoring tools, audit and observability, multi-tenant isolation, an API surface most callers never have to look behind. The math being real is necessary; on its own it is not close to sufficient.

B. The sparse weighted least squares solve, concretely

Compile each subject (a thing being reconciled) into a graph whose vertices are what each source says about each shared attribute and whose edges are the known restriction maps between them. Stack the restriction-map constraints into a matrix δ_map (constraint rows, target zero), and stack the observation anchors (what each source actually reported) into a matrix A_obs. The practical computation is a sparse weighted least squares with two trust blocks:

minimize ‖W_map δ_map x‖² + ‖W_obs(A_obs x − y)‖²

W_map weights map closure (how much you trust the authored restriction maps to be correct). W_obs weights observation fit (how much you trust each source on each attribute). The two are conceptually different hyperparameters and worth separating; collapsing them into one diagonal block is fine for exposition but obscures the distinction. y is the observations. x is the latent reconciled state.

Two distinct objects, worth keeping separate. Exact consensus (every source agrees, no perturbation needed) lives in the intersection ker δ_map ∩ A_obs⁻¹(y): a vector that closes the authored maps and matches the observations. ker δ_map alone is the space of global sections (H⁰) satisfying the maps; that intersection can be empty even when H⁰ is nontrivial, because the observations may not lie in the affine slice the global sections cut out. The runtime, with finite trust weights, returns a weighted least-squares representative x̂ balancing map closure against observation fit. x̂ may or may not lie in ker δ_map. If δ_map x̂ = 0, x̂ is a global section (the authored maps close on it). If A_obs x̂ = y also holds, x̂ is exactly consistent with the observations as well. When neither condition holds, x̂ is the closest weighted compromise the constraints permit.

Two distinct inconsistency signals, also worth keeping separate.

The structural signal, the H⁰ deficit, is read directly off the maps before any observation arrives: how much does rank δ_map exceed the spanning-tree baseline that would obtain if all overlap cycles closed cleanly? Edge-level rank-defects pin the violation to specific edges; cycle-level attribution depends on a chosen cycle basis.

The value signal comes from the two residual pieces: the map-side residual δ_map x̂ (zero when authored maps close on the runtime solution) and the observation-side residual A_obs x̂ − y, each with per-source and per-edge attribution.

The two signals detect different failure modes. The substrate exposes both and never collapses them.

Solver choice depends on size and conditioning. Sparse rank-revealing QR (SPQR) is the safe default for most subjects. Iterative least-squares methods like LSQR or LSMR are the right tool for very large sparse problems where QR fill-in is prohibitive. Sparse Cholesky on the normal equations is the choice you reach for last, not first; it squares the condition number and is only appropriate when the system is known to be well-conditioned. The choice is an engineering question, not a foundational one.

None of the people calling the API will ever see any of this. They get a reconciled state, a structured list of disagreements split into structural and value classes, and a refusal when applicable, in the same way that nobody calling Stripe sees the double-entry ledger underneath. The internals are deep and the surface is simple. That is the bet at the engineering level: the math is real, but the math being real is the substrate's problem, not the user's.

← Back to the essay