Embeddings in NLP and beyond
Contents
The following are my takeaways from the talk, Embedding work in NLP.
- Glove can be used directly in
gensim
library - Do PCA of word vectors to find interesting relationships
- Two most powerful Word2vec models are
- CBOW
- Skipgram
- Negative Sampling is a technique used in Skipgram to reduce training time
- Usecases of word2vec applications
- Airbnb is looking at the listing click sequence from its users and creating a embedding space for various listings assuming that each listing sequence as a sentence
- Using the sequence of listings that lead to user not clicking anything as negative examples
- Using the sequence of listings that lead to user clicking something as positive examples
- Alibaba
- Creates a huge graph that traces the click through and then generates random walks across products as a sentence
- These random walks that are a proxy to sentence contains products as words
- Using the random walks and training via word2vec, Alibaba is creating recommendations for its users
- ASOS
- Predicting the customer life time value using word2vec model
- ANGHAMI
- Using Word recommendation engine
- Spotify
- Using it for music recommendations / artist recommendations
- Factchecking uses word embeddings - https://www.youtube.com/watch?v=ddf0lgPCoSo
- Airbnb is looking at the listing click sequence from its users and creating a embedding space for various listings assuming that each listing sequence as a sentence
- Challenges
- 1 billion hours of Youtube video are watched every day
- 700 million hours are recommended by algos
- Scope for massive influence on us
- Youtube has come out with a statement that says that it will not recommend certain things
- Facebook is tweaking its algo so that content that has a potential to become illegal is removed from the recommendation algos