-
-
Notifications
You must be signed in to change notification settings - Fork 14k
feat: KokoroTTS text-to-speech-engine option (decoupled from OpenAI TTS) #16036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
feat: KokoroTTS text-to-speech-engine option (decoupled from OpenAI TTS) #16036
Conversation
Please, please, PLEASE review this PR before considering to merge it. If this is merged, I encourage refactors where necessary to bring the code up to speed! Reason for me not being able to test this PR thoroughly: Ubuntu Kernel Panics upon testing TTS models from KokoroTTS unless I make the test less than 2 seconds; I'm serious. |
Browser console errors when KokoroTTS endpoint is not reachable/not running. I am not sure if this is a problem to solve or not. Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://host.docker.internal:8880/v1/audio/voices. (Reason: CORS request did not succeed). Status code: (null).
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://host.docker.internal:8880/v1/models. (Reason: CORS request did not succeed). Status code: (null). |
…actual available voice names fetched from the KokoroTTS API
I think that the voices server Connection should to be managed as for MCP Tools, server side (https://github.com/open-webui/open-webui/blob/main/src/lib/components/admin/Settings/Connections.svelte) & client side (https://github.com/open-webui/open-webui/blob/main/src/lib/components/chat/Settings/Connections.svelte) |
Pull Request Checklist
Before submitting, make sure you've checked the following:
dev
branch.Changelog Entry
Description
AUDIO_TTS_KOKORO_API_BASE_URL
environment variable (or UI setting) and integrates KokoroTTS model/voice discovery and speech generation throughout the audio pipeline.Added
kokoro
) selectable in the UI alongsideopenai
,azure
, andelevenlabs
./v1/models
and/v1/audio/voices
endpoints, respectively.TTS_KOKORO_API_BASE_URL/v1/audio/speech
.AUDIO_TTS_KOKORO_API_BASE_URL
and corresponding persistent config entry.TTS_KOKORO_API_BASE_URL
to prevent cache collisions when switching engines.TTS_KOKORO_API_BASE_URL
and an optionalTTS_API_KEY
for KokoroTTS.af_bella+af_sky
oraf_bella(2)+af_sky(1)
).KOKORO_NORMALIZATION_OPTIONS.normalize
in the config.Changed
backend/open_webui/config.py
to registerAUDIO_TTS_KOKORO_API_BASE_URL
.backend/open_webui/main.py
to exposeTTS_KOKORO_API_BASE_URL
to the application state.backend/open_webui/routers/audio.py
:TTSConfigForm
,get_audio_config
, andupdate_audio_config
, includingKOKORO_NORMALIZATION_OPTIONS
./speech
endpoint for KokoroTTS requests.get_available_models
andget_voices
for dynamic discovery.Audio.svelte
(frontend UI):TTS_ENGINE
selection.KOKORO_NORMALIZATION_OPTIONS.normalize
setting.Fixed
Hopefully there aren't any breaking changes. All existing TTS engines remain unchanged and KokoroTTS is an optional text-to-speech engine.
Before vs After — KokoroTTS in Open WebUI
/v1/models
and/v1/audio/voices
) endpoints and lists every available model and voice in drop-down menus—no typing required!Settings
>Audio
>Text-to-Speech Engine
. Selecting it instantly tells the UI to use the correct endpoints and parameters (as long as a valid URL is entered in the TTS engine base URL input field), removing all guesswork.af_alloy+af_heart+af_sky+af_bella
) into a single text field.af_bella(2)+af_sky(1)
).kokoro
) into a text field.Additional Information
Voice Combination
section of https://github.com/remsky/Kokoro-FastAPI?tab=readme-ov-file#features.Missing words & Missing some timestamps
section of https://github.com/remsky/Kokoro-FastAPI?tab=readme-ov-file#known-issues--troubleshooting. I am not 100% sure if this is working or not. It gets toggled off when I toggle it on, save the settings, swap off the audio settings page, and swap back to it. PLEASE TEST/FIX THIS FOR ME!Optional
) logic is working properly for KokoroTTS - UNTESTED!Enable Text Normalization
option is colored gray and still needs styling to be green.AUDIO_TTS_KOKORO_API_KEY
likely should be an environment variable added to this PR. Thoughts?Testing is definitely desired with this PR and any feedback is certainly appreciated. This PR was made entirely possible with the companionship of Gemini 2.5 Flash model! DO NOT JUST BLINDLY MERGE THIS!
BEFORE THIS PR
AFTER MODIFICATIONS MADE IN THIS PR
Contributor License Agreement
By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.