Kyle Polich discusses with Andreas Stolcke about a paper that compares human and machine transcription study. The following are the highlights of the paper

  • Dataset used was switchboard, one that contains voice recordings of individuals on carefully chosen topics and these voices were then transcribed in to sentences. This served as labeled dataset for machine learning algorithms
  • The researchers found that human error rate was 5% and the neural network achieved a good comparative error rate.
  • The errors made by computers and humans were similar for soft words such as and, him, her etc
  • For a certain speaker, there was a correlation between error rate by humans and machines
  • Computers had tough time in transcribing fill words such ahem, aa etc
  • If there is a conference where people are talking simultaneously, transcription is still a difficult problem
  • Real time transcription is a still an active area of research
  • Skype does not do real time but near real time transcription. A talks, Skype transcribes, then B reads. B talks,Skype transcribes and A reads.
  • Chinese voice to English sentence translation is another active research area
  • The accents that humans found it difficult to transcribe were the same accents that computers found it difficult