feat: KokoroTTS text-to-speech-engine option (decoupled from OpenAI TTS) #16036

silentoplayz · 2025-07-26T02:59:15Z

Pull Request Checklist

Before submitting, make sure you've checked the following:

Target branch: Please verify that the pull request targets the dev branch.
Description: Provide a concise description of the changes made in this pull request.
Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
Documentation: Have you updated relevant documentation Open WebUI Docs, or other documentation sources?
Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
Testing: Have you written and run sufficient tests to validate the changes?
Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
Prefix: To clearly categorize this pull request, prefix the pull request title using one of the following:
- BREAKING CHANGE: Significant changes that may affect compatibility
- build: Changes that affect the build system or external dependencies
- ci: Changes to our continuous integration processes or workflows
- chore: Refactor, cleanup, or other non-functional code changes
- docs: Documentation update or addition
- feat: Introduces a new feature or enhancement to the codebase
- fix: Bug fix or error correction
- i18n: Internationalization or localization changes
- perf: Performance improvement
- refactor: Code restructuring for better maintainability, readability, or scalability
- style: Changes that do not affect the meaning of the code (white space, formatting, missing semi-colons, etc.)
- test: Adding missing tests or correcting existing tests
- WIP: Work in progress, a temporary label for incomplete or ongoing work

Changelog Entry

Description

Adds a new, decoupled KokoroTTS text-to-speech engine option to Open-WebUI, enabling users to use a KokoroTTS endpoint for Open WebUI's text-to-speech engine independently of OpenAI TTS engine. Configuration is driven by the new AUDIO_TTS_KOKORO_API_BASE_URL environment variable (or UI setting) and integrates KokoroTTS model/voice discovery and speech generation throughout the audio pipeline.

Added

New KokoroTTS TTS engine (kokoro) selectable in the UI alongside openai, azure, and elevenlabs.
Support for fetching KokoroTTS models and voices via /v1/models and /v1/audio/voices endpoints, respectively.
Dedicated KokoroTTS speech generation path that POSTs to TTS_KOKORO_API_BASE_URL/v1/audio/speech.
Environment variable AUDIO_TTS_KOKORO_API_BASE_URL and corresponding persistent config entry.
Cache keys now include TTS_KOKORO_API_BASE_URL to prevent cache collisions when switching engines.
UI fields for TTS_KOKORO_API_BASE_URL and an optional TTS_API_KEY for KokoroTTS.
Option in the UI to input custom KokoroTTS voice combinations (e.g., af_bella+af_sky or af_bella(2)+af_sky(1)).
UI toggle to enable/disable text normalization for KokoroTTS, driven by KOKORO_NORMALIZATION_OPTIONS.normalize in the config.

Changed

Updated backend/open_webui/config.py to register AUDIO_TTS_KOKORO_API_BASE_URL.
Updated backend/open_webui/main.py to expose TTS_KOKORO_API_BASE_URL to the application state.
Extended backend/open_webui/routers/audio.py:
- Added Kokoro-related fields to TTSConfigForm, get_audio_config, and update_audio_config, including KOKORO_NORMALIZATION_OPTIONS.
- Added Kokoro branch in /speech endpoint for KokoroTTS requests.
- Added Kokoro branches in get_available_models and get_voices for dynamic discovery.
Audio.svelte (frontend UI):
- Implemented dynamic fetching and display of KokoroTTS models and voices based on the TTS_ENGINE selection.
- Introduced a "Custom Combination..." option for KokoroTTS voices, allowing users to input complex voice strings, with client-side validation for empty custom inputs.
- Added a toggle for "Enable Text Normalization" for KokoroTTS, reflecting the KOKORO_NORMALIZATION_OPTIONS.normalize setting.
- Modified the voice and model selection logic to clear previously selected values when switching TTS engines, and to set default OpenAI values when selecting the OpenAI engine.
- Updated voice and model type definitions for better clarity and consistency across different TTS engines.

Fixed

Made the API Key for KokoroTTS actually optional. It wasn't before with the OpenAI TTS engine.
Ensured that the "TTS Model" and "TTS Voice" fields are now marked as required for all TTS engines that utilize them, including KokoroTTS, to prevent users from leaving these essential settings blank.

Hopefully there aren't any breaking changes. All existing TTS engines remain unchanged and KokoroTTS is an optional text-to-speech engine.

Before vs After — KokoroTTS in Open WebUI

BEFORE	AFTER
Manual Setup – Users had to know and either type out or copy-paste the exact KokoroTTS model name and voice(s) string into the “TTS Voice” and “TTS Model” fields.	Automatic Discovery – Open WebUI queries your KokoroTTS server (`/v1/models` and `/v1/audio/voices`) endpoints and lists every available model and voice in drop-down menus—no typing required!
Hidden Support – KokoroTTS only worked if you tricked the “OpenAI” engine into pointing at a KokoroTTS endpoint; the UI never mentioned KokoroTTS, so many users assumed it simply wasn’t supported or couldn't figure out setup for it.	First-class Option – “KokoroTTS” now appears as its own TTS engine in `Settings` > `Audio` > `Text-to-Speech Engine`. Selecting it instantly tells the UI to use the correct endpoints and parameters (as long as a valid URL is entered in the TTS engine base URL input field), removing all guesswork.
Confusion – Users often asked, “Does Open WebUI support Kokoro?” and struggled to configure voices manually.	Clarity – The dedicated Kokoro toggle and auto-populated lists make the answer obvious: Yes, and it’s one click away.
Text-to-Speech Engine: Only "OpenAI" is explicitly available in the dropdown, requiring users to configure KokoroTTS by pointing the OpenAI engine at a KokoroTTS endpoint.	Text-to-Speech Engine: "KokoroTTS" is now a dedicated option in the dropdown, making its support explicit and direct.
TTS Voice: Requires manual entry of voice combinations (e.g., `af_alloy+af_heart+af_sky+af_bella`) into a single text field.	TTS Voice: Offers a "Custom Combination..." option with a dropdown, suggesting predefined choices are now available or expected to be. When "Custom Combination..." is selected, a text input field appears below for manual entry of combinations, along with guidance on how to format them (e.g., `af_bella(2)+af_sky(1)`).
TTS Model: Requires manual entry of the model name (e.g., `kokoro`) into a text field.	TTS Model: Provides a dropdown menu to select the model, implying automatic discovery of available models from the KokoroTTS server.
Missing Feature: "Enable Text Normalization" toggle is not present.	New Feature: "Enable Text Normalization" toggle is introduced, with a description: "Disable text normalization if words are missing or timestamps are incorrect in the generated audio." This suggests enhanced control over audio generation.

Additional Information

'Voice Combination/Weighted Combinations' of KokoroTTS voices is birthed from the Voice Combination section of https://github.com/remsky/Kokoro-FastAPI?tab=readme-ov-file#features.

Weighted voice combinations using ratios (e.g., "af_bella(2)+af_heart(1)" for 67%/33% mix)

Ratios are automatically normalized to sum to 100%

Available through any endpoint by adding weights in parentheses"

The 'Enable Text Normalization' toggle is birthed from the Missing words & Missing some timestamps section of https://github.com/remsky/Kokoro-FastAPI?tab=readme-ov-file#known-issues--troubleshooting. I am not 100% sure if this is working or not. It gets toggled off when I toggle it on, save the settings, swap off the audio settings page, and swap back to it. PLEASE TEST/FIX THIS FOR ME!

"The api will automaticly do text normalization on input text which may incorrectly remove or change some phrases. This can be disabled by adding "normalization_options":{"normalize": false} to your request json"

I'm unsure if the API key (Optional) logic is working properly for KokoroTTS - UNTESTED!
The toggle for the added Enable Text Normalization option is colored gray and still needs styling to be green.
AUDIO_TTS_KOKORO_API_KEY likely should be an environment variable added to this PR. Thoughts?

Testing is definitely desired with this PR and any feedback is certainly appreciated. This PR was made entirely possible with the companionship of Gemini 2.5 Flash model! DO NOT JUST BLINDLY MERGE THIS!

BEFORE THIS PR

AFTER MODIFICATIONS MADE IN THIS PR

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

…TS option)

silentoplayz · 2025-07-26T03:47:08Z

Please, please, PLEASE review this PR before considering to merge it. If this is merged, I encourage refactors where necessary to bring the code up to speed!

Reason for me not being able to test this PR thoroughly: Ubuntu Kernel Panics upon testing TTS models from KokoroTTS unless I make the test less than 2 seconds; I'm serious. ☹️

silentoplayz · 2025-07-26T05:13:25Z

Browser console errors when KokoroTTS endpoint is not reachable/not running. I am not sure if this is a problem to solve or not.

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://host.docker.internal:8880/v1/audio/voices. (Reason: CORS request did not succeed). Status code: (null).

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://host.docker.internal:8880/v1/models. (Reason: CORS request did not succeed). Status code: (null).

…inations input

…actual available voice names fetched from the KokoroTTS API

rgaricano · 2025-07-26T09:53:28Z

I think that the voices server Connection should to be managed as for MCP Tools, server side (https://github.com/open-webui/open-webui/blob/main/src/lib/components/admin/Settings/Connections.svelte) & client side (https://github.com/open-webui/open-webui/blob/main/src/lib/components/chat/Settings/Connections.svelte)
(client side have to be public accesible from browser)
( I'll try to take a closer look, if time allows)

silentoplayz added 2 commits July 25, 2025 22:58

feat: KokoroTTS text-to-speech-engine option (decoupled from OpenAI T…

3a25512

…TS option)

remove comments

60f83f3

silentoplayz marked this pull request as ready for review July 26, 2025 03:47

silentoplayz added 5 commits July 26, 2025 02:13

fix: add required for TTS fields to save settings

f170ea3

refac: forward slash (/) support for KokoroTTS Base URL

d007819

feat: display voice distribution percentages for KokoroTTS voice comb…

4bb995c

…inations input

refac: validate KokoroTTS custom voice combination input against the …

4c88936

…actual available voice names fetched from the KokoroTTS API

.

0536da1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: KokoroTTS text-to-speech-engine option (decoupled from OpenAI TTS) #16036

feat: KokoroTTS text-to-speech-engine option (decoupled from OpenAI TTS) #16036

silentoplayz commented Jul 26, 2025 •

edited

Loading

Uh oh!

silentoplayz commented Jul 26, 2025 •

edited

Loading

Uh oh!

silentoplayz commented Jul 26, 2025 •

edited

Loading

Uh oh!

rgaricano commented Jul 26, 2025

Uh oh!

Uh oh!

Uh oh!

feat: KokoroTTS text-to-speech-engine option (decoupled from OpenAI TTS) #16036

Are you sure you want to change the base?

feat: KokoroTTS text-to-speech-engine option (decoupled from OpenAI TTS) #16036

Conversation

silentoplayz commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Checklist

Changelog Entry

Description

Added

Changed

Fixed

Before vs After — KokoroTTS in Open WebUI

Additional Information

Testing is definitely desired with this PR and any feedback is certainly appreciated. This PR was made entirely possible with the companionship of Gemini 2.5 Flash model! DO NOT JUST BLINDLY MERGE THIS!

BEFORE THIS PR

AFTER MODIFICATIONS MADE IN THIS PR

Contributor License Agreement

Uh oh!

silentoplayz commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

silentoplayz commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgaricano commented Jul 26, 2025

Uh oh!

Uh oh!

silentoplayz commented Jul 26, 2025 •

edited

Loading

silentoplayz commented Jul 26, 2025 •

edited

Loading

silentoplayz commented Jul 26, 2025 •

edited

Loading