Category Archives: Computer Science
Siri, the voice assistant found on the new iPhone 4s, is a wonderful way to interact with a smartphone. While a typical desktop has a handy keyboard and mouse, road warriors such as myself are often found with a phone and a need to do something more expansive that send text messages with missing vowels.
Voice transcription has been around for a long time, and voice control is certainly not new. Why, then, is Siri getting such praise and press? It’s the implementation.
A keyboard and mouse still make sense for most technology interactions, but when you don’t have them, an efficient voice control trumps both the on screen keyboards and their hardware brethren found on a smartphone. I find myself using dictation far more than typing on my phone, and the recent explosion of Siri has given rise to a number of other voice control products on various platforms. I’ve been playing with Vlingo on my phone for a week or so, and it’s pretty good. Far safer than using my thumbs while driving!
We’ve been using the same input controls on devices for a very long time. Sure, touch controls exist, and they’re great, but they still mimic the paradigm of tapping keys or using a mouse. Voice control gives us a far more intuitive way of interacting with our devices. At least, when it works.
Now that we’ve seen people caring about this technology, I predict many more entries into the space. This competition will improve voice control and dictation more over the next couple years than we’ve seen in the previous 20. FINALLY, the technology that we’ve all wanted, even if we didn’t know it, is within our reach. Next step – getting rid of all the damned wires!
For as long as I can remember, speech recognition technology has been â€œalmost there.â€ Some have tried it, a few have even loved it, but Iâ€™ve never felt the need to jump in with both feet. Sure, the ability to talk rather than type is exciting. Or, it is until you realize that you have to speak as the computer wants you to. Directing my computer with my voice is an interesting idea, except that it doesnâ€™t work as well as I want it to. After all, telling my computer what to click on is much more complicated than, well, clicking on it.
For the first time, though, I feel these cries of â€œweâ€™re almost thereâ€ may be correct. My Android based phone has speech input, and it works pretty well. In fact, it surprises me as to how good it is. Windows 7 comes with speech recognition built in, and itâ€™s not perfect, but itâ€™s the best Iâ€™ve used.
In the scientists and engineers defense, speech is extremely complicated. With all of the voice patterns, accents, dialects and varying word choice, it takes years for us to understand people as humans. Computers donâ€™t even have the luxury of that time to be trained â€“ we expect them to know our voice right away. The science behind even the simple voice recognition in my phone is so complicated that the recorded file has to be uploaded to Android servers and then sent back to my phone as text.
If you havenâ€™t played with any speech-to-text software, Iâ€™d encourage you to try it. In Windows 7 itâ€™s free, and other products have demos and trials if youâ€™re not ready to commit. You might hate it, you might love it, but you never know until you try.
If you had a chance to watch the much-publicized episodes of Jeopardy last week, during which a computer competed against two human components, you saw what I consider to be one of the greatest technical achievements of our time. Watson, the name IBM gave to their expensive research project, was held to the same rules as the live people and did very well. How well is not the point of this piece, so I wonâ€™t go into it. What I do want to bring up is how advanced this technology is.
Anyone that used the first generation of speech-to-text computer software, like Dragon, knows how difficult it was to use. You had to adapt your voice and train the software. Even then, it wasnâ€™t very good. These products have gotten much better today, but the new, similar challenge, is the ability to ask a question of a computer and get an answer. Try typing a question into your favorite search engine â€“ youâ€™ll get results. Lots of results, and sometimes contradictory ones. Itâ€™s rare to get the answer you want without having to dig through at least one linked article. Watson wasnâ€™t allowed to supply a bunch of text for Alex Trebek to read through â€“ he had to give a definitive answer. The technology behind that is staggering and took many years to develop.
What does this all mean for humans and technology? For one, I believe that turning Watson into a search engine would be very interesting, if not progress. At the very least, that level of natural-language processing â€“ the ability to take speech and convert it to data â€“ is a major advancement in computer science. Maybe, just maybe, weâ€™ll someday be able to ask a computer a question and get a single, correct answer.