Rhasspy is a loosely connected set of tools for creating a personal voice assistant. Each of the tools, including speech recognition and speech generation, can run on your own computer without sending your private data off to a cloud-based service (although you can still choose to do so).
The components within Rhasspy's pipeline can talk to each other using various methods such as by posting data to a web URL or launching a program in response to a command.
One of the most useful methods that Rhasspy supports is a protocol called MQTT, along with an API called Hermes. MQTT "topics" are similar to web URLs that programs can publish and subscribe to for communicating with other programs and devices that support MQTT. The Hermes API is a collection of MQTT topics that make sense in the context of digital assistants and smart home automation.
At the time of this writing, the pipeline looks something like this:
- Audio is captured by a process monitoring the microphone.
- Captured audio is send to a small wake word detection tool.
- If a wake word is detected, a dialogue session is started within a dialogue manager process.
- The dialogue manager uses a speech recognition tool to convert the spoken command to text.
- The text is analyzed by a natural language tool to determine which command (called an "intent") was spoken.
- The intent is sent off to some other program of your choosing, which should handle the intent in some way.
- The intent handling program may produce some kind of response.
- The response text is converted to audio data using a text-to-speech tool.
- The audio data is sent to yet another program to play it through the computer's speakers.
That's a lot of different programs, each dedicated to doing one small task and doing it well. The beauty of Rhasspy is that your custom assistant is not locked in to any one of them. Rhasspy is responsible for linking together the tools you choose into what feels like one cohesive unit.