Voice control revisited: the Web Speech API

After exploring some things last April it became quiet regarding Voice control. I also played with the Web Speech API for some time but never finished it. But last weekend (while still waiting for my ESP-8266’s (ESP-12) to arrive) I decided to give it another try, even though I backed the Homey project on Kickstarter so I probably won’t even need to spend time on this – this is just for fun.

Home screenVoice control page

The Web Speech API documentation is not that hard to understand and there are dozens of good examples to be found – just search for “Web Speech API demo” or something similar and you’ll find plenty of good examples.

I had already made a small ‘voice’ button on the Home page of our Web-app so all I had to was finish the page behind that button. A large start/stop button to control the voice recognition and an area in which the results of the speech recognition could be displayed. Speech recognition works very good, impressive stuff!

The code is very short and simple actually:

<div data-role="page" id="pgstt" data-theme="a" data-content-theme="a">
  <div data-role="header"><h2>Spraak commando</h2></div><!-- /header -->
  <div data-role="header"><h2 id="sttstatus"></h2></div><!-- /header -->
  <div data-role="content">
  <button id="sttbutton" onclick="toggleStartStop()"><img id="sttbuttonimg" src="icons/micbut.png" /></button>
  <div style="border:dotted;padding:10px">
    <span id="interim_span" style="color:grey"></span>
  </div>

  <script type="text/javascript">
    var recognizing;
    var recognition = new webkitSpeechRecognition();
    recognition.lang = "nl-NL";
    recognition.continuous = true;
    recognition.interim = true;
    reset();
    recognition.onend = reset;

    recognition.onresult = function (event) {
      var interim = "";
      var last = event.results[event.results.length-1][0].transcript;
      interim_span.innerHTML += last;
      cmdPublish('speech', last);
    }

    function setLS(t) {
      sttstatus.innerHTML = t;
    }

    function reset() {
      console.log('Stopped');
      recognizing = false;
      $("#sttbuttonimg").attr("src","icons/micbut.png");
      setLS("Gestopt");
    }

    function toggleStartStop() {
      if (recognizing) {
        $("#sttbuttonimg").attr("src","icons/micbut.png");
        setLS("Gestopt");
        recognition.abort();
        reset();
      } else {
        recognition.start();
        recognizing = true;
        $("#sttbuttonimg").attr("src","icons/micbutl.png");
        setLS("Luisteren ...");
        interim_span.innerHTML = "";
      }
    }
  </script>
  </div><!-- /content -->
  <div data-role="footer" data-position="fixed">
  </div><!-- /footer -->
</div><!-- /page -->

That’s it. Primus takes care of delivering the text to the server-side NodeJS script which passes it on to the Nools rules engine which I use to automate things. I can now makes rules like this:

//---------------------------------------------------------
rule hobbytestopen {
    when {
      or(
        m1: Message m1.t == 'sensor/value' && m1.changedTo('open'),
        m1: Message m1.t == 'speech' && m1.contains('test licht aan')
        );
    }
    then {
        unchange(m1);
        log('Execute rule Office test open');
        publish('command/plcbus','{"address":"B02", "command":"ON"}');
    }
}

Now this rule can be triggered by either a sensor or a speech command which contains the words ‘test’ licht’ and ‘aan’ (for the non-Dutch: “test light on”). The only restriction yet is that those words need to be in the order as specified in the condition.

That’s not good enough of course, cause not only saying “test licht aan” would trigger this rule but saying “blaastest wijst uit dat ik lichtelijk ben aangeschoten” would also … not really intelligent 😉 But those are just small issues that can easily be handled.

 

Home Automation and Voice Control

HAL-9000 (Space Odyssey), Mother (Alien), The Matrix, Jarvis (Iron Man), KITT – who doesn’t know them? And since a few days there’s Jasper, voice control for the Raspberry Pi.
An RPi, microphone, speaker and network connection is all you need (and the Jasper software package of course).

Interacting with computers by voice has always been a very appealing feature to have in my Home Automation System. There’s a button on the touchscreen in the living room which controls a light bulb – when you press that button, you hear Darth Vader saying “Yes, Master“. My son and I liked it; it was funny. But there had to be more…

So when 2 Princeton students released ‘Jasper’ a few days ago, I was triggered to revisit the subject ‘voice control’ once again.

My first thought was to give Jasper a try as soon as I had the time – but after I read some parts of the API documentation I became a bit hesitant – stuff like defining the words that the user is allowed to speak (or better: which will be recognized and processed further by Jasper) in the code is not how I’d like to do things. Another thing I didn’t like is that it would become a more or less isolated ‘sub-system’ to my HA system – answering questions, controlling Spotify and such. Create a module for every type of hardware here in our house? Neh. No chance.

Maybe it’s better to revisit Voicecommand, a tool developed by Steven Hickson and his PiAUISuite which I read about a year ago or so. Voicecommand (at least, the demo-videos are) seems to be made primarily to initiate actions (playing video, music, start the browser) on the local computer(/Raspberry). But why not try to extend it, remove some of the (local) action initiation parts of the code and replace that with a MQTT client?

That would make it a perfect fit for my HA system – this way I can let my rules engine receive the voice commands and let the rules engine be the definition of what has to be accepted as valid command and what actions should be executed.

So I ‘freed’ a Raspberry Pi and downloaded the PiAUISuite. The first problem was that I didn’t have a USB microphone – ahh, but our kids do, for things like Skype, online gaming and other things I never do. I found an old speaker set in the garage and I was good to go.

After some tinkering with the Voicecommand tool as-is, it’s configuration, trying different keywords and stuff like that, it was time to change some things.

First thing I wanted to change was the language. Voicecommand uses the Google Speech API, so using Dutch as language should not be a problem; all I had to do was change lang = “en” to lang = “nl”. Done! It improved the voice recognition quite a bit too! 😉

I also wanted to change the response (“Yes, Sir?”) in to a simple short beep. This would significantly shorten the duration of the whole conversation which was a bit too long for my taste. I searched for a ‘beep’ MP3 on the internet that was short and loud enough to be noticed, searched the Voicecommand code for Speak(response) and replaced that call with Play(beep), a new function that I added to the code.

Another thing I changed was the matching of spoken command with a list of predefined commands (and their associated actions) in the ~/.commands.conf. Right now, I just send every word to my HA system and let my system decide if the spoken command contains something useful.

The last thing I did to do get communication between Voicecommand and my HA system going was building the Mosquitto MQTT client on the Raspberry Pi and call that client (mosquitto_pub) with the right parameters from Voicecommand with a system() call. It’s a bit of a quick & dirty trick to get things going though; it would be much better to incorporate the MQTT protocol in the Voicecommand code, but that’s too much work for now – first I want to see how this works out in practice with a better microphone and some useful commands & rules…

The only rule I have right now is this one, for controlling a small night lamp in the office:

rule office_test_light {
  when {
    m1: Message m1.t == 'voice' && m1.contains('licht');
  }
  then {
    if (m1.contains('aan')) {
      publish("command",'{"address":"B02", "command":"ON"}');
    } else
    if (m1.contains('uit')) {
      publish("command",'{"address":"B02", "command":"OFF"}');
    } else {
      log('Snap het niet');
    }
  }
}

Voicecommand has, for as far as I can see now, one drawback: no Internet connection means no Voice Control. The (very!) big plus is that the TTS voice is superior to what I’ve heard with Jasper.

Future plans:

  • sending textual (MQTT) messages to Voicecommand and let it speak them;
  • returning an error message when the rules engine was not able to process the command;
  • adding the RPi hostname to the message that goes to my HA system, which can be useful when having multiple Voicecommand Rpi’s throughout the house – cause a “light off” command in the garage implies a different action than “light off” in the kitchen.. 😉

Right now, after a few hours of tinkering, I think I’ve got something that’s worth spending more time on. We’ll see! Here’s a video of what I’ve accomplished so far: