Mapping Dialects with Twitter Data
Contents
The following are the learnings from the podcast:
- Bruno Gonçalves who is now working in JP Morgan chase is a PhD from Emory university
- He has done some interesting work on looking at all twitter data and look for geographical based patterns.
- Can one draw a map based on language patterns?
- 10 TB of data - Twitter
- Create a huge matrix of latitude and longitude
- Words and Geolocation matrix pattern matching
- PCA + Kmeans based clustering based on the patterns in the high dimensional matrix that combines word embeddings and geo location
- Mobile phones have made marrying the two datasets possible
- Evolution of language across time can also be done
- Ton of people working on emoji’s in twitter feed
- Ton of stuff can be done based on Reuters News and NLP based work