Web Technology Demo

Who:

I'm very interested in speech recognition. So I did a search for "browser speech recognition" and found a link to an MDN article about the Web Speech API.


Where:

The article can be found here. In the article there is a link to a working example of the concept located here.


When:

The article was last updated July 7th, 2023.


What:

The web speech api is made up of two parts. The speech recognition part which is what I'm going to demonstrate listens to the users voice and responds accordingly. And the speech synthesis part which attempts to render the text from the browser to audible speech.
According to caniuse. The speech synthesis part has a global reach of 94.75% and has had full support by Chrome since 2014, and in Edge since 2016. While the speech recognition part has 0% global reach and only 87.58% partial support.
What I am going to explore specifically is the Web Speech API and the ability to use it to have the browser style a page based on verbal commands issued by the user.


Why:

The concept is interesting to me because I have disabilities that make keyboarding difficult and painful so speech recognition is very relevant to me. Putting my personal issues aside however development of speech recognition for browsers could be useful for mobile devices also. For instance I saw an example of an email being written by voice from a chrome browser which could be useful for visually impaired people.

Speech Color Changer Demo

Note: browser must have access to microphone.

Microphone Tap or click the microphone and say a color to change the background color of the body.


…diagnostic messages


The MDN demo can be found here.

If you're done playing now, we can look at how it's done.


As you can see below, the HTML for this is pretty simple. There's a paragraph for instructions, and a div with a paragraph for diagnostic messages both with the appropriate classes assigned. In my code below there is also an image tag. I didn't like how the click event in MDN's demonstration version was tied to clicking anywhere in the body so I changed it to fire from clicking the microphone image which I downloaded from Flaticon.

                        
                            <p class="hints">                        
                            <img id="mic" src="images/microphone.png" style="cursor:pointer" height="50" width="50" alt="Microphone"> Tap or click the microphone and say a color to change the background color of the body.
                            </p>
                            <br>
                            <div>
                            <p class="output"><em>…diagnostic messages</em></p>
                            </div>
                        
                     

Next we'll look at the javascript that makes this happen.

First code is entered to allow both prefixed properties (which is what browsers currently support for speech recognition), and unprefixed versions that may be supported in the future.

                        
                            const SpeechRecognition =
                            window.SpeechRecognition || window.webkitSpeechRecognition;
                            const SpeechGrammarList =
                            window.SpeechGrammarList || window.webkitSpeechGrammarList;
                            const SpeechRecognitionEvent =
                            window.SpeechRecognitionEvent || window.webkitSpeechRecognitionEvent;
                        
                    

The next section of defines the grammar that we want the app to recognize. The grammar format used is JSpeech Grammar Format or (JSGF)

                        
                            const colors = [
                            "aqua",
                            "azure",
                            "beige",
                            "bisque",
                            "black",
                            "blue",
                            "brown",
                            "chocolate",
                            "coral",
                            "gray" /* … */,
                            ];
                            const grammar = `#JSGF V1.0; grammar colors; public color = ${colors.join(
                            " | ",
                            )};`;
                        
                    

Next a speech recognition instance is defined using the SpeechRecognition() constructor, a grammar list is created using the SpeechGrammarList() constructor, and the gramar list is added using the SpeechGrammarList.addFromString() method.

                        
                            const recognition = new SpeechRecognition();
                            const speechRecognitionList = new SpeechGrammarList();
                            speechRecognitionList.addFromString(grammar, 1);
                        
                    

Next the speech grammar list is added to the recognition instance, and then 4 lines of code are added that set some properties of the recognition instance.

  1. Whether or not continuous results are captured is set.
  2. The language of the recognition is set.
  3. Whether or not interim results are returned is set.
  4. The number of alternative potential matches is set.
                        
                            recognition.grammars = speechRecognitionList;
                            recognition.continuous = false;
                            recognition.lang = "en-US";
                            recognition.interimResults = false;
                            recognition.maxAlternatives = 10;
                        
                    

Now references to the output div, and the HTML element are made so diagnostic messages can be displayed and the background color can be updated later on in the code, The onclick handler is also implemented here so that when the microphone image is clicked or tapped the speech recognition service will start.

                        
                            const diagnostic = document.querySelector(".output");
                            const bg = document.querySelector("html");
                            const hints = document.querySelector(".hints");
                            let colorHTML = "";
                            colors.forEach((color, i) => {
                            console.log(color, i);
                            colorHTML += ` ${color} `;
                            });
                            hints.innerHTML = `Microphone Tap or click the microphone and name a color to change the background color of the body. You can try ${colorHTML}. Or you can choose to go rogue and try your own. The question you have to ask yourself first is:  
Do you feel lucky punk? Well do ya?
`; document.getElementById("mic").onclick = () => { recognition.start(); console.log("Ready to receive a color command.");};

After the speech recognition is started event handlers are used to retrieve results, and monitor the input. The event handlers are named so there functions are evident. For example recognition.onspeechend does exactly what it sounds like. When the API recognizes that speech has ended, it stops recognition.

                        
                            recognition.onresult = (event) => {
                                const color = event.results[0][0].transcript;
                                diagnostic.textContent = `Result received: ${color}.`;
                                bg.style.backgroundColor = color;
                                console.log(`Confidence: ${event.results[0][0].confidence}`);
                            };
                            
                            recognition.onspeechend = () => {
                                recognition.stop();
                            };
                            
                            recognition.onnomatch = (event) => {
                                diagnostic.textContent = "I didn't recognize that color.";
                            };
                            
                            recognition.onerror = (event) => {
                                diagnostic.textContent = `Error occurred in recognition: ${event.error}`;
                            };                            
                        
                    

The complete JavaScript file is below. Just copy the HTML from above, and the JavaScript from below. Add them to the appropriate places, and you too can have ugly background colors appear on demand using nothing but a click and your melodic voice.