Mapping Dialects with Twitter Data
The following are the learnings from the podcast:
Bruno Gonçalves who is now working in JP Morgan chase is a PhD from Emory university He has done some interesting work on looking at all twitter data and look for geographical based patterns. Can one draw a map based on language patterns? 10 TB of data - Twitter Create a huge matrix of latitude and longitude Words and Geolocation matrix pattern matching PCA + Kmeans based clustering based on the patterns in the high dimensional matrix that combines word embeddings and geo location Mobile phones have made marrying the two datasets possible Evolution of language across time can also be done Ton of people working on emoji’s in twitter feed Ton of stuff can be done based on Reuters News and NLP based work