Simultaneous Translation

Kyle Polich discusses with Liang Huang about his work on Baidu on Simultaneous translation. The following are the points covered in the podcast: Most of the advertized cross language translation vendors such as skype do not do simultaneous translation. They wait for the speaker to finish and then the system does the translation. Skype does consecutive translation and not simultaneous translation Simultaneous translation trades off between accuracy and latency You cannot wait too much of a time to do the translation Prefix-to-Prefix method of translating What’s the dataset used ?

Human vs Machine Transcription

Kyle Polich discusses with Andreas Stolcke about a paper that compares human and machine transcription study. The following are the highlights of the paper Dataset used was switchboard, one that contains voice recordings of individuals on carefully chosen topics and these voices were then transcribed in to sentences. This served as labeled dataset for machine learning algorithms The researchers found that human error rate was 5% and the neural network achieved a good comparative error rate.

Data Skeptic - Word Embeddings Lower Bound

The following are the learnings from a Data Skeptic podcast interview with Kevin Patel: Word embedding dimension of 300 is mostly chosen based on intuition When a telecom company wanted to analyze the sentiment of the sms messages, they were challenged by the huge 300 dim representation of words. They wanted to have a fewer dimensional representation - more like a 10 dim space. This was a problem as most of the datasets were atleast 100 to 300 dim embedding space Till date there has not been any scientific investigation in to the hyperparameter choice Kevin Patel and his team investigated on this hyperparameter on brown corpus and found that a dimension of 19 was enough to efficiently represent the word vectors in brown corpus The team borrowed concepts from algebraic topology.

Index Funds and ETFs

The first three chapters of the book are targeted towards those who want to get a basic understanding of index funds. The first chapter talks about the massive growth of indexing and hence index funds & ETFs. The second chapter walks the reader through the history of various fund structures that came before index funds and ETFs. The third chapter gives a laundry list of entities that have benefited from the rise of index funds.

Machine Learning With Boosting - Summary

Gradient Boosting Algorithm is one of the powerful algos out there for solving classification and regression problems.This book gives a gentle introduction to the various aspects of the algo with out overwhelming the reader with the detailed math of the algo. The fact that there are a many visuals in the book makes the learning very sticky. Well worth a read for any one who wants to understand the intuition behind the algo.