Spacy Deliberate Practice
Contents
This post gives some of the learnings from the deliberate practice on spacy
.
What can spacy do ?
- Spacy can do
shallow parsing/Chunking
. This entails grouping adjacent tokens in to phrases based on their POS tags. Some of them are noun phrases, verb phrases, prepositional phrases - Named Entity Recognition : This entails locating named entities and classifying them in to pre-defined categories
- Available packages to do NER
- Stanford NER - Provides sequence models. Train your own models with labeled data to build NER models
- Spacy - Comes with Out of the box NER tagging
- NLTK: This involves going through three stages
- Word Tokenization
- POS tagging : Download corpora to do POS tagging and NER
- Chunking: Shallow parsing that uses POS tagging and adds more structure to the sentence
- Available packages to do NER
verb-phrase
detection can be done viatextacy
- Gives dependency parse tree via
doc.dep_
- One can use regex to match spacy docs
- One can quickly remove stop words, remove punctuation, lemmatize and remove punctuation symbols via spacy
tag_
gives fine grained POSpos_
gives coarse grained POS- word frequencies can be obtained by passing through
Counter
object - Lemmatization can be done via
token.lemma_
spacy.lang.en.stop_words.STOP_WORDS
gives the list of stop wordsnlp.vocab
gives the list of words present in a specific language- Every token as a set of very useful attributes and functions useful in NLP tasks
- Sentence detection is automatic. One can also tweak it to create custom sentence detections