Skip to main content
Teleox.ai

Systems

ClipCannon is the video-side DDA/TCT witness.

A 23-stage DAG decomposes source video into multi-modal records, centroids, and provenance with consent-aware release posture.

ClipCannon pipeline diagram showing staged analysis from source video to multimodal records.
ClipCannon pipeline diagram showing staged analysis from source video to multimodal records.

Figure 1.ClipCannon materializes video into multi-modal records through a 23-stage analysis DAG.

Figure description

The figure shows source video moving through a staged ClipCannon DAG. Analysis stages extract visual, semantic, emotion, speaker, prosody, sentiment, and voice-identity records. The outputs are persisted as multi-modal training and provenance records rather than one unstructured video blob.

Video decomposition diagram showing a source interview becoming curated multi-modal clips.

Figure 2.The identity-locked video case remains protocol-level; subject-specific artifacts are withheld.

Figure description

The figure represents a protocol-level video case. A single-subject interview is analyzed into curated clips and multi-modal labels. The research site treats the identity-locked talking-head result as protocol-level, with subject-specific artifacts withheld and end-to-end measurement pending.

23-stage video decomposition

ClipCannon transforms source video into structured multi-modal records through a 23-stage DAG. The relevant research point is source-grounded decomposition, not generic video editing.

Seven modalities to a constellation

The video-side panel provides visual, semantic, emotion, speaker, prosody, sentiment, and voice-identity views that can become a TCT-style target constellation.

Constellation guard diagram showing generated candidates checked against frozen modality centroids.

Figure 3.ClipCannon supplies the modality centroids used by TCT-style runtime guard checks.

Figure description

The diagram shows modality centroids from ClipCannon as a target constellation. Candidate outputs are projected through the same frozen instruments and accepted only when their modality projections remain within configured thresholds. The check is a panel-relative geometric predicate.

Release posture

Voice and identity demonstrations are dual-use. This site names the method and scoped metrics but does not publish subject-specific clone artifacts.

measured

Voice cloning WavLM SECS case

The published case reports 0.961 mean WavLM SECS on 10 held-out English sentences for one speaker under an encoder-matched protocol.

Mean WavLM SECS
0.961
One English-speaking subject; 10 held-out English sentences; encoder-matched protocol.
Max WavLM SECS
0.975
Reported as a within-protocol maximum, not a cross-encoder identity proof.

protocol-level

Identity-locked video decomposition

The manuscript treats the identity-locked talking-head case as architecturally complete but measurement-pending.

Training clips
2,362 curated clips
Protocol-level current manuscript row; subject-specific artifacts withheld and end-to-end multi-modal measurement pending.

Related videos

00:10:32 / zhgD_iL-4Bc

ClipCannon TCT demo

Video-side DDA and TCT demonstration around multi-modal decomposition and identity centroids.

Open video

00:14:09 / oO9huQ2gSoU

ClipCannon open-source video editor walkthrough

Operational walkthrough for ClipCannon as a local video understanding and editing system.

Open video

Sources

  • docs2/PAPER.md#4.2
  • docs2/PAPER.md#case-2
  • docs2/PAPER.md#case-3