Speech-02 HD is MiniMax’s flagship text-to-speech (TTS) model optimized for premium audio use cases like voiceovers, audiobooks, and narration. It supports zero-shot voice cloning (i.e. cloning a speaker from just a short reference audio), emotional expression, rich multilingual support, and fine-grained control over speech attributes. The model leverages a novel Flow-VAE and a learnable speaker encoder to extract timbre features without requiring transcripts.
Please login to use description