img

Kyle Polich discusses BERT. The following are my takeaways.

  • BERT is like magic. It can be used for many tasks
  • It is memory intensive and hence you need to set up as a service
  • BERT is an improved version of ELMo
  • One can spin up a service and let it serve word vectors based on BERT
  • It is Bidirectional Encoding based
  • It has transfer function embedded in it
  • Need to experiment with BERT
  • You can use BERT for a ton of tasks such as Sentiment classification, NER, Topic Modeling etc