🎤 F5-TTS Vietnamese

Text-to-Speech Synthesis • Trained on ~1000 hours of data

1.0x

Generating speech... Please wait...

🎧 Generated Audio

📊 Spectrogram

Spectrogram

❗ Model Limitations

  • May not perform well with numbers, dates, and special characters
  • Rhythm may be inconsistent with some texts
  • Works best with clear, well-pronounced reference audio
  • Maximum 1000 words per request

📡 API Documentation

Use the following endpoint to integrate with your application:

POST /api/synthesize

curl -X POST http://localhost:5000/api/synthesize \
  -F "ref_audio=@sample.wav" \
  -F "gen_text=Xin chào, đây là giọng nói tổng hợp" \
  -F "speed=1.0"

Response:

{
  "success": true,
  "audio": "base64_encoded_audio_data",
  "spectrogram": "base64_encoded_image_data",
  "sample_rate": 24000,
  "message": "Speech synthesized successfully"
}

GET /api/health

Check if the service is running:

curl http://localhost:5000/api/health

GET /api/info

Get model information:

curl http://localhost:5000/api/info