Skip to main content
Teleox.ai

Video

Held-out voice-cloning sentence test

Additional voice case-study context; the formal site copy keeps the metric scoped to the published WavLM protocol.

Held-out voice-cloning sentence test

00:07:57 / 9re6jYR6GZg

Additional voice case-study context; the formal site copy keeps the metric scoped to the published WavLM protocol.

Video phrasing is treated as public talk-track language; publication cards and case-study pages use the scoped WavLM result.

Open video page
Voice protocol diagram showing centroid enrollment, best-of-12 generation, and WavLM scoring.

Figure 1.The voice case reports 0.961 mean WavLM SECS for one speaker on held-out English sentences under an encoder-matched protocol.

Figure description

The figure shows a one-speaker voice protocol. A 50-clip centroid enrolls the target. For each held-out English sentence, the pipeline generates 12 candidates and selects by WavLM scoring. The reported result is 0.961 mean WavLM SECS under an encoder-matched protocol; cross-encoder triangulation remains pending.

Related publications

preprint / March 2026

ClipCannon: 0.961 Mean Cross-Encoder Speaker Similarity via Pipeline Engineering for Personalized Voice Cloning

Measured voice case: 0.961 mean WavLM SECS under an encoder-matched one-speaker protocol with held-out English sentences.

DOI 10.13140/RG.2.2.33842.16324

  • Preprint status; not peer reviewed.
  • One speaker and held-out English sentences.

Open source record

Related systems

video-dda-witness

ClipCannon

Video-side DDA/TCT witness using a 23-stage DAG, seven modalities, and per-project provenance records.

Pipeline
23-stage DAG
Modality panel
N=7

Open system

Sources