voiceless
Computer interfaces are changing. As devices become smaller, more wearable and integrated into daily life, the traditional interfaces of keyboards, mice and even touch screens are increasingly becoming unsuitable. Remarkable breakthroughs have been made in voice recognition with the error rate recently falling below that of humans. It’s also estimated that by 2020 half of all internet searches will be voice!
At Octopus, we recently invested in Audiotelligence, a real-time audio processing business, which is surely going to enhance the voice experience for consumers and the enterprise, but lately I’ve been thinking more about alternatives to voice control. Due to social conventions or privacy concerns, speaking out loud to control a device poses problems in certain environments, i.e. in public or offices.
Some stats:
74% of consumers currently solely use voice assistants at home. Focus groups highlight privacy and social embarrassment as reasons for this;
6% of consumers have used voice commands in public;
20% have never used voice commands at all due to feeling social discomfort;
38% of consumers are concerend about being overheard when using a voice assistant
Is it me or is it a specific type of awkward seeing a middle-aged guy yell into Siri on the subway? Anywho, I am thinking the next frontier is sub-vocal speech recognition or SVR.
First off, sub-vocal speech recognition is a technique that detects electrical signals from vocal cords and other muscles involved in speaking i.e. the tongue. The electrical signals are collected by electromyography (EMG) devices then sent to a computer for processing and reconstruction into words. Imagine asking yourself a question but not actually saying the words out loud and having that captured by a computer/device. Pretty rad.
I first learned about sub-vocal speech recognition toward the end of 2018 when I mistook an image of Arnav Kapur showcasing AlterEgo on his face for a face accessory shot from NYFW. I haven’t seen much innovation around this technology since then, but I’d love to explore the challenges that lay ahead for SVR to actually become a thing.
SVR is generally an under explored area of speech recognition and AI in general. I think it’s mainly held back by the lack of suitable data. Also, unlike speech data which is widely available, data for SVR requires specialist, medical grade equipment. I think we need to see novel SVR algorithms inspired by ones proven effective for audible voice recognition. How much of these algos can be easily re-purposed? Also, I’d love to know which university labs are working on SVR. This will be an area I will explore in the coming months which I’ll follow-up with a longer post. Stay tuned!