Prerequisites
- A working microphone connected to your PC (headset, USB mic, or webcam mic).
- VMSC version 1.2.0 or later.
- Approximately 100 MB–3 GB of free disk space depending on the model you choose.
All speech recognition runs locally on your machine using Whisper.cpp. Audio is never sent to a remote server.
Installing the Whisper.cpp Engine
VMSC bundles a one-click installer for the Whisper.cpp inference engine.
- Open Settings > Voice Commands.
- Click Install Whisper Engine. VMSC downloads the pre-compiled binary (~15 MB) and places it in the application data folder.
- A green checkmark appears next to Engine Status when installation is complete.
Some antivirus programs flag the Whisper binary as unknown software. If the installation fails, add the VMSC data folder (%APPDATA%\vmsc) to your antivirus exclusion list and retry.
Choosing and Downloading a Model
Whisper.cpp uses GGML model files. Larger models are more accurate but require more RAM and processing time.
| Model | Size | RAM Usage | Speed | Accuracy | Recommended For |
|---|---|---|---|---|---|
| tiny | 75 MB | ~400 MB | Very fast | Low | Testing & low-end hardware |
| base | 142 MB | ~500 MB | Fast | Moderate | Simple, short trigger phrases |
| small | 466 MB | ~1 GB | Moderate | Good | General use (recommended) |
| medium | 1.5 GB | ~2.5 GB | Slow | High | Noisy environments or accented speech |
| large | 2.9 GB | ~4 GB | Very slow | Highest | Maximum accuracy, powerful hardware only |
- In Settings > Voice Commands, select a model from the dropdown.
- Click Download Model. Progress is shown in the status bar.
- Once downloaded, the model loads automatically when voice commands are enabled.
You can download multiple models and switch between them at any time. Only one model is loaded into memory at a time.
Selecting the Audio Input Device
In Settings > Voice Commands > Audio Input, select the microphone you want VMSC to listen on. The dropdown lists all audio input devices detected by the system.
- Choose a device dedicated to your voice (e.g., a headset mic) rather than a room mic to reduce background noise.
- Click Test to see a live waveform and verify the correct device is selected.
- Adjust the Gain slider if the waveform is too quiet or clipping.
Creating Command Presets
A command preset maps a spoken trigger phrase to one or more VMSC actions.
- Go to Settings > Voice Commands > Command Presets.
- Click Add Preset.
- Enter a Trigger Phrase — the word or short sentence you will speak (e.g., "start giveaway").
- Choose a Match Mode (see below).
- Under Actions, add one or more actions to execute when the phrase is detected (e.g., trigger a rule, send a chat message, play a sound).
- Click Save.
Fuzzy vs. Exact Matching
VMSC supports two matching modes for comparing the transcript against your trigger phrases.
| Mode | Behavior | Best For |
|---|---|---|
| Exact | The transcript must contain the trigger phrase verbatim (case-insensitive). | Short, distinct phrases with low risk of false positives. |
| Fuzzy | Uses Levenshtein distance to allow minor transcription errors. A confidence threshold (0–100%) controls how close the match must be. | Longer phrases, accented speech, or noisy environments. |
Start with a fuzzy threshold of 75%. Lower it if commands are not being recognized; raise it if you get false triggers.
Linking Actions to Voice Commands
Each preset can execute any combination of VMSC actions when triggered:
- Trigger Rule — fire a specific rule by name or ID, as if a matching stream event arrived.
- Send Chat — post a message to one or more connected chat platforms.
- Play Sound — play a local audio file through the system default output.
- HTTP Request — call an external webhook or API endpoint.
- Run Program — execute a local executable or script.
- OSC Message — send a VRChat OSC message (e.g., toggle an avatar parameter).
Actions execute in order and can include a configurable delay between each step.
Testing with the Live Transcript
The Voice Commands settings panel includes a live transcript window.
- Toggle Enable Voice Commands to start listening.
- Speak a trigger phrase into your microphone.
- The transcript window shows the recognized text in real time, with matched trigger phrases highlighted in green.
- If a match fires, the corresponding actions execute and the action log updates.
Use this view to fine-tune your trigger phrases and confidence thresholds before going live.
Tips for Accuracy
- Quiet environment: Background music, game audio, or audience noise reduces accuracy significantly. Use a directional microphone or noise-gated input.
- Short, distinct phrases: Two- or three-word commands like "start poll" or "confetti burst" are more reliably recognized than long sentences.
- Adjust confidence threshold: If a command fires too often on random speech, raise the fuzzy threshold. If it never fires, lower it or switch to a larger model.
- Avoid similar phrases: Trigger phrases that sound alike (e.g., "start" and "stop") may confuse the model. Use more distinctive words when possible.
- Consistent microphone position: Keep the mic at the same distance from your mouth for consistent audio levels.
- Test before streaming: Always run a quick test session before going live to confirm recognition works with your current audio setup.