Voice control revisited: the Web Speech API

After exploring some things last April it became quiet regarding Voice control. I also played with the Web Speech API for some time but never finished it. But last weekend (while still waiting for my ESP-8266’s (ESP-12) to arrive) I decided to give it another try, even though I backed the Homey project on Kickstarter so I probably won’t even need to spend time on this – this is just for fun.

Home screenVoice control page

The Web Speech API documentation is not that hard to understand and there are dozens of good examples to be found – just search for “Web Speech API demo” or something similar and you’ll find plenty of good examples.

I had already made a small ‘voice’ button on the Home page of our Web-app so all I had to was finish the page behind that button. A large start/stop button to control the voice recognition and an area in which the results of the speech recognition could be displayed. Speech recognition works very good, impressive stuff!

The code is very short and simple actually:

<div data-role="page" id="pgstt" data-theme="a" data-content-theme="a">
  <div data-role="header"><h2>Spraak commando</h2></div><!-- /header -->
  <div data-role="header"><h2 id="sttstatus"></h2></div><!-- /header -->
  <div data-role="content">
  <button id="sttbutton" onclick="toggleStartStop()"><img id="sttbuttonimg" src="icons/micbut.png" /></button>
  <div style="border:dotted;padding:10px">
    <span id="interim_span" style="color:grey"></span>
  </div>

  <script type="text/javascript">
    var recognizing;
    var recognition = new webkitSpeechRecognition();
    recognition.lang = "nl-NL";
    recognition.continuous = true;
    recognition.interim = true;
    reset();
    recognition.onend = reset;

    recognition.onresult = function (event) {
      var interim = "";
      var last = event.results[event.results.length-1][0].transcript;
      interim_span.innerHTML += last;
      cmdPublish('speech', last);
    }

    function setLS(t) {
      sttstatus.innerHTML = t;
    }

    function reset() {
      console.log('Stopped');
      recognizing = false;
      $("#sttbuttonimg").attr("src","icons/micbut.png");
      setLS("Gestopt");
    }

    function toggleStartStop() {
      if (recognizing) {
        $("#sttbuttonimg").attr("src","icons/micbut.png");
        setLS("Gestopt");
        recognition.abort();
        reset();
      } else {
        recognition.start();
        recognizing = true;
        $("#sttbuttonimg").attr("src","icons/micbutl.png");
        setLS("Luisteren ...");
        interim_span.innerHTML = "";
      }
    }
  </script>
  </div><!-- /content -->
  <div data-role="footer" data-position="fixed">
  </div><!-- /footer -->
</div><!-- /page -->

That’s it. Primus takes care of delivering the text to the server-side NodeJS script which passes it on to the Nools rules engine which I use to automate things. I can now makes rules like this:

//---------------------------------------------------------
rule hobbytestopen {
    when {
      or(
        m1: Message m1.t == 'sensor/value' && m1.changedTo('open'),
        m1: Message m1.t == 'speech' && m1.contains('test licht aan')
        );
    }
    then {
        unchange(m1);
        log('Execute rule Office test open');
        publish('command/plcbus','{"address":"B02", "command":"ON"}');
    }
}

Now this rule can be triggered by either a sensor or a speech command which contains the words ‘test’ licht’ and ‘aan’ (for the non-Dutch: “test light on”). The only restriction yet is that those words need to be in the order as specified in the condition.

That’s not good enough of course, cause not only saying “test licht aan” would trigger this rule but saying “blaastest wijst uit dat ik lichtelijk ben aangeschoten” would also … not really intelligent 😉 But those are just small issues that can easily be handled.

 

Tagged . Bookmark the permalink.

3 Responses to Voice control revisited: the Web Speech API

  1. Ron Weasley says:

    Bedankt voor het delen van informatie over API zeer behulpzaam en nuttige info.

  2. Gilles says:

    Nice post, i’m also pondering into voice. Not for controlling but more for notifying; e.g. “Movement detected in the backyard”. I wonder which voice Homey uses; because the Google TTS engine produces crappy-er sound than they show in the demo video..

    • Thanks. For serious STT this way you’ll need a secure HTTP connection, otherwise you’ll have to permit mic access every time you want to give a voice command. I now have a StartSSL.com certificate just to give it a try without the annoying popup. I used TTS for some time too, but forgot to limit the amount of notifications so it drove the rest of the family mad in a matter of 2 or 3 days. TURN IT OFF! Hey, I was just testing 🙂 Regarding Homey, well it’s a demo video, so it just has to be perfect. But soon enough we’ll know whether we were “played” [a video made by the sales department] or not 😉

Leave a Reply

Your email address will not be published. Required fields are marked *