Conversations with Hugging Face CTO

The following are the learnings from Hugging Face Interview in Oct 2019 GPT2 from Open AI is impressive - Packaged in to Demo Application Conversational AI + Open Source package(Transformers) Half a million monthly active users Hard to good Deep Conversational AI Self starter - Was working in 2008 on ML and then moved on to do some software jobs I was curious to see what the number of downloads for various pre-trained models were.

Super Mario Effect for Learning

Watched a fantastic Ted Talk that highlighted the importance of gamifying learning What if you looked at your learning as similar to playing Super Mario You focus on princess and all the rest of the steps are your learnings on the way Life as a straight path is never a story worth telling Turn any learning process in to game and then things become super interesting Research that showed no penalty means increased attempts and better score Nobody gets disappointed when the italian plumber falls in to a ditch - They just learn that they need to be careful at that level, the next time they play 3 year effort - dart board moves based on how one throws the dart Redesign boring tasks to games

Attention is all you need

The following are my learning from the paper titled, Attention is all you need : Using RNN’s for language modeling has been particularly painful as they take long time to train and have problems with learning representational encodings all at once In Transformer architecture, the number of operations required to relate signals from two arbitrary input or output positions is a constant Self-attention is an attention mechanism relating different positions of a single sentence in order to compute a representation of a the sequence Transformer is a first transduction model relying entirely on self-attention to compute representations of its inputs and output without using sequence aligned RNNs or convolution networks Learnt about the relationship between Induction, Deduction and Transduction Induction, derives the function from the given data, i.