Figure 1.The voice case reports 0.961 mean WavLM SECS for one speaker on held-out English sentences under an encoder-matched protocol.
Figure description
The figure shows a one-speaker voice protocol. A 50-clip centroid enrolls the target. For each held-out English sentence, the pipeline generates 12 candidates and selects by WavLM scoring. The reported result is 0.961 mean WavLM SECS under an encoder-matched protocol; cross-encoder triangulation remains pending.
Related publications
preprint / March 2026
ClipCannon: 0.961 Mean Cross-Encoder Speaker Similarity via Pipeline Engineering for Personalized Voice Cloning
Measured voice case: 0.961 mean WavLM SECS under an encoder-matched one-speaker protocol with held-out English sentences.