Skip to the content.

A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Units

Speaker Trasnfer Samples

Source Speech Target Speech AutoVC SRDVC Ours (VCTK, predicted P,Q) Ours (VCTK, original P,Q) Ours (LibriTTS, original P,Q)

Prosody (pitch-energy + rhythm) Transfer Samples

Source Speech Target Speech SRDVC Ours (VCTK) Ours (LibriTTS+VCTK+ESD)

Observations