Human vs Machine Transcription

Kyle Polich discusses with Andreas Stolcke about a paper that compares human and machine transcription study. The following are the highlights of the paper

Dataset used was switchboard, one that contains voice recordings of individuals on carefully chosen topics and these voices were then transcribed in to sentences. This served as labeled dataset for machine learning algorithms
The researchers found that human error rate was 5% and the neural network achieved a good comparative error rate.
The errors made by computers and humans were similar for soft words such as and, him, her etc
For a certain speaker, there was a correlation between error rate by humans and machines
Computers had tough time in transcribing fill words such ahem, aa etc
If there is a conference where people are talking simultaneously, transcription is still a difficult problem
Real time transcription is a still an active area of research
Skype does not do real time but near real time transcription. A talks, Skype transcribes, then B reads. B talks,Skype transcribes and A reads.
Chinese voice to English sentence translation is another active research area
The accents that humans found it difficult to transcribe were the same accents that computers found it difficult

Contents