Embeddings in NLP and beyond

The following are my takeaways from the talk, Embedding work in NLP.

Glove can be used directly in gensim library
Do PCA of word vectors to find interesting relationships
Two most powerful Word2vec models are
- CBOW
- Skipgram
  - Negative Sampling is a technique used in Skipgram to reduce training time
Usecases of word2vec applications
- Airbnb is looking at the listing click sequence from its users and creating a embedding space for various listings assuming that each listing sequence as a sentence
  - Using the sequence of listings that lead to user not clicking anything as negative examples
  - Using the sequence of listings that lead to user clicking something as positive examples
- Alibaba
  - Creates a huge graph that traces the click through and then generates random walks across products as a sentence
  - These random walks that are a proxy to sentence contains products as words
  - Using the random walks and training via word2vec, Alibaba is creating recommendations for its users
- ASOS
  - Predicting the customer life time value using word2vec model
- ANGHAMI
  - Using Word recommendation engine
- Spotify
  - Using it for music recommendations / artist recommendations
- Factchecking uses word embeddings - https://www.youtube.com/watch?v=ddf0lgPCoSo
Challenges
- 1 billion hours of Youtube video are watched every day
- 700 million hours are recommended by algos
- Scope for massive influence on us
- Youtube has come out with a statement that says that it will not recommend certain things
- Facebook is tweaking its algo so that content that has a potential to become illegal is removed from the recommendation algos

Contents