Prerequisites

A working microphone connected to your PC (headset, USB mic, or webcam mic).
VMSC version 1.2.0 or later.
Approximately 100 MB–3 GB of free disk space depending on the model you choose.

Privacy first

All speech recognition runs locally on your machine using Whisper.cpp. Audio is never sent to a remote server.

Installing the Whisper.cpp Engine

VMSC bundles a one-click installer for the Whisper.cpp inference engine.

Open Settings > Voice Commands.
Click Install Whisper Engine. VMSC downloads the pre-compiled binary (~15 MB) and places it in the application data folder.
A green checkmark appears next to Engine Status when installation is complete.

Antivirus note

Some antivirus programs flag the Whisper binary as unknown software. If the installation fails, add the VMSC data folder (%APPDATA%\vmsc) to your antivirus exclusion list and retry.

Choosing and Downloading a Model

Whisper.cpp uses GGML model files. Larger models are more accurate but require more RAM and processing time.

Model	Size	RAM Usage	Speed	Accuracy	Recommended For
tiny	75 MB	~400 MB	Very fast	Low	Testing & low-end hardware
base	142 MB	~500 MB	Fast	Moderate	Simple, short trigger phrases
small	466 MB	~1 GB	Moderate	Good	General use (recommended)
medium	1.5 GB	~2.5 GB	Slow	High	Noisy environments or accented speech
large	2.9 GB	~4 GB	Very slow	Highest	Maximum accuracy, powerful hardware only

In Settings > Voice Commands, select a model from the dropdown.
Click Download Model. Progress is shown in the status bar.
Once downloaded, the model loads automatically when voice commands are enabled.

Switching models

You can download multiple models and switch between them at any time. Only one model is loaded into memory at a time.

Selecting the Audio Input Device

In Settings > Voice Commands > Audio Input, select the microphone you want VMSC to listen on. The dropdown lists all audio input devices detected by the system.

Choose a device dedicated to your voice (e.g., a headset mic) rather than a room mic to reduce background noise.
Click Test to see a live waveform and verify the correct device is selected.
Adjust the Gain slider if the waveform is too quiet or clipping.

Creating Command Presets

A command preset maps a spoken trigger phrase to one or more VMSC actions.

Go to Settings > Voice Commands > Command Presets.
Click Add Preset.
Enter a Trigger Phrase — the word or short sentence you will speak (e.g., "start giveaway").
Choose a Match Mode (see below).
Under Actions, add one or more actions to execute when the phrase is detected (e.g., trigger a rule, send a chat message, play a sound).
Click Save.

Fuzzy vs. Exact Matching

VMSC supports two matching modes for comparing the transcript against your trigger phrases.

Mode	Behavior	Best For
Exact	The transcript must contain the trigger phrase verbatim (case-insensitive).	Short, distinct phrases with low risk of false positives.
Fuzzy	Uses Levenshtein distance to allow minor transcription errors. A confidence threshold (0–100%) controls how close the match must be.	Longer phrases, accented speech, or noisy environments.

Recommended threshold

Start with a fuzzy threshold of 75%. Lower it if commands are not being recognized; raise it if you get false triggers.

Linking Actions to Voice Commands

Each preset can execute any combination of VMSC actions when triggered:

Trigger Rule — fire a specific rule by name or ID, as if a matching stream event arrived.
Send Chat — post a message to one or more connected chat platforms.
Play Sound — play a local audio file through the system default output.
HTTP Request — call an external webhook or API endpoint.
Run Program — execute a local executable or script.
OSC Message — send a VRChat OSC message (e.g., toggle an avatar parameter).

Actions execute in order and can include a configurable delay between each step.

Testing with the Live Transcript

The Voice Commands settings panel includes a live transcript window.

Toggle Enable Voice Commands to start listening.
Speak a trigger phrase into your microphone.
The transcript window shows the recognized text in real time, with matched trigger phrases highlighted in green.
If a match fires, the corresponding actions execute and the action log updates.

Use this view to fine-tune your trigger phrases and confidence thresholds before going live.

Tips for Accuracy

Quiet environment: Background music, game audio, or audience noise reduces accuracy significantly. Use a directional microphone or noise-gated input.
Short, distinct phrases: Two- or three-word commands like "start poll" or "confetti burst" are more reliably recognized than long sentences.
Adjust confidence threshold: If a command fires too often on random speech, raise the fuzzy threshold. If it never fires, lower it or switch to a larger model.
Avoid similar phrases: Trigger phrases that sound alike (e.g., "start" and "stop") may confuse the model. Use more distinctive words when possible.
Consistent microphone position: Keep the mic at the same distance from your mouth for consistent audio levels.
Test before streaming: Always run a quick test session before going live to confirm recognition works with your current audio setup.

Setting Up Voice Commands

Prerequisites

Installing the Whisper.cpp Engine

Choosing and Downloading a Model

Selecting the Audio Input Device

Creating Command Presets

Fuzzy vs. Exact Matching

Linking Actions to Voice Commands

Testing with the Live Transcript

Tips for Accuracy