Overview

Text-to-Speech (TTS) is a dedicated section in the VMSC sidebar that lets you turn text into spoken audio during your stream. Whether you want viewers to hear their chat messages read aloud, trigger custom voice alerts from gifts and follows, or build comedic moments with character voices, TTS gives you full control over how your stream sounds.

VMSC ships with five TTS engines, each with its own strengths. You can set a global default engine and voice, then override them on a per-action basis — so one rule can use a deep narrator voice from ElevenLabs while another uses a funny TikTok character voice, all in the same stream.

TTS Engines

Each engine connects to a different voice synthesis service. You can switch engines at any time from the TTS settings page or override the engine per action in your rules.

Engine	Voices	Cost	Quality	Latency	Offline	Best For
Edge TTS	400+ neural voices	Free	High	Low	No	General-purpose default engine with broad language coverage
TikTok TTS	33 character & singing voices	Free	Medium	Low	No	Fun character voices, singing, comedic moments
Windows SAPI	System-installed voices	Free	Low–Medium	Instant	Yes	Offline fallback when internet is unavailable
ElevenLabs	Unlimited (voice cloning)	Paid API	Very High	Medium	No	Premium voice cloning, ultra-realistic narration
StreamElements	1000+ voices	Free	Medium–High	Low	No	Massive voice library with familiar streaming voices

Default Engine

Edge TTS is selected as the default engine on first launch. It offers the best balance of quality, speed, and language support at no cost.

Edge TTS

Edge TTS uses Microsoft's neural text-to-speech service, the same engine that powers the Read Aloud feature in Microsoft Edge. It provides over 400 neural voices across dozens of languages and regional accents.

Voice selection — Browse and search voices by language, gender, or name in the Voice Browser modal.
SSML support — Edge TTS accepts SSML markup for fine-tuned pronunciation, emphasis, and pauses.
Speed and pitch — Adjustable speed (-50% to +100%) and pitch (-50% to +50%) sliders.
No API key required — Works out of the box with no account or billing setup.

TikTok TTS

TikTok TTS provides 33 unique character and singing voices, including the iconic TikTok narrator voice. These voices add personality and humor to your stream alerts.

Character voices — Includes deep narrator, cheerful female, ghostface, rocket, and many more.
Singing voices — Several voices can sing short melodies from text input.
No API key required — Uses the public TikTok TTS endpoint.

Windows SAPI

Windows SAPI (Speech API) uses the voices installed on your Windows system. It is the only engine that works completely offline, making it a reliable fallback if your internet drops mid-stream.

Offline operation — No network connection needed.
System voices — Uses whatever voices are installed in Windows Settings > Time & Language > Speech.
Instant playback — No network round-trip means near-zero latency.
Limited quality — Older SAPI voices sound robotic compared to neural engines.

ElevenLabs

ElevenLabs offers the highest quality voice synthesis available, including voice cloning. You can clone your own voice or use any voice from the ElevenLabs library for ultra-realistic narration.

Voice cloning — Upload audio samples to create a clone of any voice.
API key required — Enter your ElevenLabs API key in TTS settings. Usage is billed through your ElevenLabs account.
Model selection — Choose between Multilingual v2, Turbo v2.5, and other available models.
Higher latency — The superior quality comes with slightly longer processing time.

Usage Costs

ElevenLabs charges per character synthesized. Monitor your usage on the ElevenLabs dashboard to avoid unexpected bills, especially if TTS is triggered by chat messages from a large audience.

StreamElements

StreamElements TTS provides access to over 1,000 voices from multiple providers (Google, Amazon Polly, IBM Watson, and more). Many of these are the familiar voices used across popular streaming platforms.

Massive library — Over 1,000 voices spanning dozens of languages.
No API key required — Uses the public StreamElements TTS endpoint.
Familiar voices — Includes the classic "Brian" and other voices popular in the streaming community.

Voice Browser

The Voice Browser is a modal dialog that opens when you click the voice selection field in TTS settings or in an action editor. It lets you search, preview, and select voices from the active engine.

Search — Type to filter voices by name, language, or locale code.
Preview — Click the play button next to any voice to hear a sample.
Engine filter — The browser shows only voices from the currently selected engine.
Favorites — Star voices to keep them at the top of the list for quick access.

Per-Action Engine and Voice Override

Every TTS action in your rules can override the global default engine and voice. This means a single stream can use multiple engines and voices simultaneously.

When editing a TTS action in the rule editor:

The Engine dropdown lets you pick a specific engine (or leave it on "Use Default" to inherit the global setting).
The Voice selector opens the Voice Browser filtered to the chosen engine.
The Speed, Pitch, and Volume sliders can also be overridden per action.

Mix and Match

You could have a gift alert use a deep ElevenLabs narrator, a follow alert use a funny TikTok character voice, and chat TTS use the default Edge voice — all at the same time with no conflicts.

Per-Platform Settings

VMSC supports multiple streaming platforms simultaneously. The TTS settings page provides per-platform overrides so you can fine-tune behavior for each connected platform:

Volume sliders — Set different TTS volume levels for each platform (e.g., louder for TikTok, quieter for YouTube).
Voice/engine overrides — Assign a different default voice or engine per platform.
Enable/disable — Turn TTS on or off for specific platforms without affecting others.

Speed, Pitch, and Volume Controls

Global audio controls are available in the TTS settings page and apply to all TTS playback unless overridden per action:

Control	Range	Default	Description
Speed	0.5x – 2.0x	1.0x	Playback speed multiplier. Higher values make speech faster.
Pitch	-50% – +50%	0%	Shifts the vocal pitch up or down.
Volume	0% – 100%	80%	Master TTS volume. Per-platform sliders further scale this value.

Random Voice Mode

Enable Random Voice Mode to have VMSC pick a random voice from the current engine's library for each TTS message. This creates a chaotic, entertaining experience where every message sounds different.

Random mode respects any voice filters you set (e.g., only English voices, only female voices).
You can pin specific voices to exclude them from the random pool or create a curated random set.

User Cooldowns and Spam Protection

To prevent individual viewers from flooding TTS, VMSC provides several spam protection tools:

Per-user cooldown — Set a minimum time between TTS messages from the same viewer (e.g., one message every 30 seconds).
Global cooldown — Set a minimum time between any TTS messages, regardless of who sent them.
Maximum message length — Limit how many characters a single TTS message can contain.
Word filter — Block specific words or patterns from being spoken. Messages containing blocked words are silently dropped.
Subscriber-only mode — Restrict TTS to subscribers or viewers with specific roles.

Queue Management

All TTS messages enter a queue and play sequentially. The TTS panel displays a live queue showing pending messages, the currently playing message, and recently played messages.

Skip current — Click the skip button to immediately stop the current message and advance to the next.
Clear queue — Remove all pending messages from the queue.
Pause/resume — Temporarily pause the queue without clearing it. Messages continue to accumulate but will not play until resumed.
Priority messages — TTS actions triggered by higher-value events (e.g., large gifts) can be configured to jump to the front of the queue.
Live display — The queue panel updates in real time, showing the viewer name, message text, and selected voice for each entry.

TTS as an Action Target in Rules

TTS is available as an action target in the Rules system. When creating or editing a rule, select Text-to-Speech from the action type dropdown to add a TTS action. You can configure:

Text template — The text to speak, supporting template variables like {{user.nickname}}, {{event.giftName}}, and custom strings.
Engine override — Select a specific engine or use the global default.
Voice override — Select a specific voice or use the global default.
Speed, pitch, volume overrides — Fine-tune audio parameters for this specific action.
Priority — Set whether this message should jump the queue.

Example TTS text template:
"{{user.nickname}} just sent {{event.giftCount}} {{event.giftName}}! Thank you so much!"