NLP - Natural Language Processing with Python - Jose Portilla
Contents
The following are my learnings from Jose course on NLP.
- use of
f string
seek(0)
takes the iterator to the first position of the filePyPDF2
used to extract text from library- Spacy uses one best of breed algo for the specific task
- NLTK - Released in 2001
- Spacy - Released in 2015
- CoreNLP - Package in Java
- Spacy common tasks
- Loading the language library
- Building the pipeline object
- Using Tokens
- POS tagging
- Understanding Token attributes
- Spacy takes the text and creates a document object
- Tagger, Parser and NER are the main components of the pipeline
- Span is another data type - slice of a document
- One can extract the sentences
- Tokens are basic building blocks of a sentence
- Prefix, Suffix, Infix, Exception
displacy
used for showing dependency trees- NTLK has Porter Stemmer, Snowball Stemmer
- Lemmatization looks beyond word reduction, and considers language full vocabulary. Looks at context words
- Phrase matching can be done via Spacy
- Fine grained and Coarse grained POS can be obtained by Spacy
- One can set custom boundaries for sentence boundary detection
- CountVectorizer
- TfidfVectorizer
- Wordvector via
token.vector
- Sentiment analysis using VADER
polarity_score
gives the sentiment values of a sentence- One can invoke a function based on a sentence to give the compound sentiment
- Topic Modeling via LDA - You can do it via sklearn
- Topic Modeling via NMF- You can do it via sklearn
- Use LSTM for text generation
Here is the completion certificate