True, I'm sure there was hyperbole in that number and probably none of the people on the Watson team are experts on the state of voice recognition. But I could see it being not so much of a problem with accurate transcription as the processing time of the transcription. 1 second of lag for visual voicemail isn't noticeable but one second of lag for Watson would have just flipped the buzz-in advantage since unlike the human players Watson wouldn't be able to read faster than Trebek talks.
OCR would be fine, but once in place, dealing with a completely standardized font it probably wouldn't be a significantly slower interface than just getting it as a text file. But that part really isn't a big deal, as Ken Jennings has said, for the best players they almost always know the answer (or have comprehended the question well enough to know they will know the answer and want to buzz in) before the buzzers are active so it all comes down to that.
To eliminate the buzz-in advantage I think what I might have done (though I haven't thought this through very much) is have run response time tests with the two human players to see what their average buzz in times were after activation for questions they knew the answer to, along with standard deviation and then programmed a random delay into sending the signal to Watson that matched that statistical distribution.
Then we'd have a true test of Jeopardy skills instead of the already known fact that a person who knows a lot of answers but always wins the buzz in will usually beat the person who knows all the answers but can't buzz in if anybody else does too. I could be Ken Jennings at Jeopardy if I always had first option to answer.
|