Independent research

Meaning compression turns fixed data into structured supervision.

DDA projects real data through frozen embedder panels. TCT reuses those panels as labellers, training targets, and runtime guards.

Explore the framework Watch the video

identity/signals <= n * (N + choose(N, 2))

Context Graph: N=13, up to 91 signals/input as a constructive upper bound
ClipCannon: N=7, up to 28 signals/input as a constructive upper bound
Voice case: 0.961 mean WavLM SECS, encoder-matched, one speaker, held-out English sentences

Fixed source data projected through a frozen embedder panel into a structured signal lattice. — Figure 1.DDA holds real data fixed and decomposes it into structured supervision through frozen instruments.
Figure description
The visual shows translucent source-data slabs feeding into a frozen multi-embedder panel. Output beams from the panel form a structured geometric lattice on the right, representing typed signals and relationships derived from fixed real observations.

The fixed-data move

The data wall is not only a raw-token shortage. It is also a structured-signal shortage: a single token stream exposes only one view of the structure inside real observations.

DDA keeps the raw corpus fixed and projects it through frozen embedders. This keeps the source observations real while adding typed labels and pairwise features that can be inspected.

The counting identity

For n inputs and N frozen embedders, the derived signal count is bounded by the per-embedder projections plus the embedder-pair interactions.

The pairwise term is a constructive upper bound. Effective signal count depends on measured redundancy, so the mutual-information audit remains explicit future work.

Diagram showing raw inputs projected through frozen embedders and pairwise interactions. — Figure 2.The DDA identity counts per-embedder projections plus pairwise interaction features.
Figure description
The diagram starts with n raw inputs. Each input is passed through N frozen embedders, producing N direct projections. Pairwise interaction features are then computed for every embedder pair, producing choose(N, 2) additional structured signals per input. The displayed identity is signals less than or equal to n * (N + choose(N, 2)); the pairwise term is a constructive upper bound pending mutual-information audit.

From signal density to runtime guards

Meaning compression is the ratio view of DDA: structured signals per raw-data unit.

TCT then uses the same frozen panel to construct centroids, train against those centroids, and accept or reject candidate outputs with a panel-relative geometric guard.

Three-phase protocol showing constellation construction, training, and runtime guard checks. — Figure 3.TCT reuses the frozen panel as target construction, training target, and runtime guard.
Figure description
The figure shows TCT as three phases. Phase 1 constructs frozen multi-modal centroids from reference data. Phase 2 trains generated outputs against those centroid targets. Phase 3 evaluates candidate outputs with a deterministic geometric guard. The guard only verifies proximity to the chosen centroid panel.

Production witnesses

Context Graph, ClipCannon, OCR Provenance, and Dynamic / ME-JEPA are evidence surfaces around the same posture: real source data, durable records, and independent verification.

text-dda-witness

Context Graph

Text-side DDA witness using 13 frozen embedders, RocksDB storage, and MCP retrieval tools.

Embedder panel: N=13
Structured signals: up to 91 per input

Open system

video-dda-witness

ClipCannon

Video-side DDA/TCT witness using a 23-stage DAG, seven modalities, and per-project provenance records.

Pipeline: 23-stage DAG
Modality panel: N=7

Open system

document-provenance-witness

OCR Provenance

Document-side provenance witness for local-first OCR, chunking, embeddings, and hash-linked records.

Architecture: local-first MCP server
Provenance: hash-linked transformation records

Open system

jepa-runtime-witness

Dynamic / ME-JEPA

Private-preview JEPA-style runtime witness for audited bundles, deterministic artifacts, and Full State Verification.

Preview artifact: private PDF
Verification posture: source-of-truth readback

Open system

Research library

The paper library separates framework preprints, system whitepapers, measured case studies, and background theory so that each claim stays near its source and limitation.

Meaning compression and Derived Data Abundance

00:11:23 / 6umU6kuXR3s

Plain-language talk track for the fixed-data DDA move and the meaning-compression ratio.

The formal site claim is narrower than the talk title: DDA is fixed-corpus decomposition and meaning compression is a proposed structured-signal-density measure.

Open video page

preprint / April 2026

Teleological Constellation Training: Multi-Modal Embedding Decomposition as Meaning Compression for Identity-Locked Generation and a Third Path Around the Data Wall

Primary framework preprint for DDA, meaning compression, TCT, and panel-relative generation guards.

DOI 10.13140/RG.2.2.24370.57288

Preprint status; not peer reviewed.
Pairwise mutual-information audit remains pending.

Open source record

preprint / April 2026

Derived Data Abundance: How Multi-Modal Embedding Decomposition Solves the AI Training Data Crisis

DDA explainer preprint for fixed-corpus decomposition through frozen embedder panels.

DOI 10.13140/RG.2.2.22624.03845

Preprint status; not peer reviewed.
The title is broader than the formal site claim; the site frames DDA as a constructive fixed-data method.

Open source record

preprint / March 2026

ClipCannon: 0.961 Mean Cross-Encoder Speaker Similarity via Pipeline Engineering for Personalized Voice Cloning

Measured voice case: 0.961 mean WavLM SECS under an encoder-matched one-speaker protocol with held-out English sentences.

DOI 10.13140/RG.2.2.33842.16324

Preprint status; not peer reviewed.
One speaker and held-out English sentences.

Open source record

Sources

docs2/PAPER.md#1
docs2/PAPER.md#3.1
docs2/PAPER.md#5
docs/refactorwebsite/02_positioning_and_message.md

Explore the framework Watch the video

The fixed-data move

The counting identity

From signal density to runtime guards

Production witnesses

Context Graph

ClipCannon

OCR Provenance

Dynamic / ME-JEPA

Research library

Meaning compression and Derived Data Abundance

Teleological Constellation Training: Multi-Modal Embedding Decomposition as Meaning Compression for Identity-Locked Generation and a Third Path Around the Data Wall

Derived Data Abundance: How Multi-Modal Embedding Decomposition Solves the AI Training Data Crisis

ClipCannon: 0.961 Mean Cross-Encoder Speaker Similarity via Pipeline Engineering for Personalized Voice Cloning

Related videos

Meaning compression and Derived Data Abundance

Sources