Fifty Smartphone years after we first met the sentient, soft-spoken computer Hal 9000 in 2001: A Space Odyssey, we’re finally able to command our computers with our voice. Now when we talk to our smartphones, we expect them not only to hear us but also to understand and, unlike Hal 9000, to obey us. But to truly improve our lives, our mobile devices must not only process what we say, and what we mean, but also our context. When you’re driving, your phone shouldn’t expect you to touch it. While you’re sleeping, it should know to send your calls straight to voicemail… unless there’s an emergency, something it should also be able to ascertain. If you want to take a selfie, you should be able to simply say “Take a selfie!” In short, your smartphone should live up to its name. That’s the goal with the Moto Voice and Moto Assist software integrated into the second generation Moto X smartphone.
“It needs to be adaptive and responsive,” says Mark Rose, senior director of product management at Motorola Mobility.
And to do that, the Moto X is always listening, for verbal commands from the user and also ambient cues of the context. That emergent behavior is spawned by complex interactions between the software and hardware. Myriad processes, running simultaneously, analyze inputs from the handset’s sensors and software, triggering immediate responses when called upon but fading into the background when not needed. The magic happens behind the interface’s curtain.
“All of the processes are divided and connected in a way that makes it seamless, so from the user’s point of view you just talk to it and it does things for you,” Rose says.
Enabling your smartphone to hear you was the first major challenge. Signals from the microphone must be filtered for background noise like nearby conversations, music, and TV shows. Just think of the brainpower you use, subconsciously, at a loud party to tune out all the chit-chat except the one you’re participating in. After minimal training in a quiet space, the Moto X can recognize your voice amidst a cacophony of others. And that involves computerized consideration of gender, accent, and dialects of a variety of languages.