Data Skeptic - Word Embeddings Lower Bound
The following are the learnings from a Data Skeptic podcast interview with Kevin Patel:
Word embedding dimension of 300 is mostly chosen based on intuition When a telecom company wanted to analyze the sentiment of the sms messages, they were challenged by the huge 300 dim representation of words. They wanted to have a fewer dimensional representation - more like a 10 dim space. This was a problem as most of the datasets were atleast 100 to 300 dim embedding space Till date there has not been any scientific investigation in to the hyperparameter choice Kevin Patel and his team investigated on this hyperparameter on brown corpus and found that a dimension of 19 was enough to efficiently represent the word vectors in brown corpus The team borrowed concepts from algebraic topology.