Derived Data Abundance

Fixed data, more measurements

A DDA pipeline holds the raw corpus fixed and projects it through frozen embedders. The derived dataset contains per-embedder projections and pairwise interaction features attached to real observations.

Context Graph example

Context Graph at N=13 yields up to 91 structured signals per input under the DDA counting identity; the effective pairwise count remains pending a mutual-information audit.

text-dda-witness

Context Graph

Text-side DDA witness using 13 frozen embedders, RocksDB storage, and MCP retrieval tools.

Embedder panel: N=13
Structured signals: up to 91 per input

Open system

ClipCannon example

ClipCannon at N=7 yields up to 28 structured signals per source clip as a constructive upper bound, and persists those records through a video-side provenance chain.

video-dda-witness

ClipCannon

Video-side DDA/TCT witness using a 23-stage DAG, seven modalities, and per-project provenance records.

Pipeline: 23-stage DAG
Modality panel: N=7

Open system

What remains open

The count is structural. The important empirical question is how much independent signal remains after redundancy, overlap, and task-specific relevance are measured.

Derived Data Abundance keeps the corpus fixed.

Fixed data, more measurements

Context Graph example

Context Graph

ClipCannon example

ClipCannon

What remains open

Related videos

Meaning compression and Derived Data Abundance

Shakespeare LoRA as a signal-density case study

Sources