Clarification on the use of Vocab file in NER

Question

I am learning Named Entity Recognition, and i see that the training script uses a variable called vocab which looks like this

vocab = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\'-/\t \n\r\x0b\x0c:"

My Guess is that it is supposed to learn all these characters present in the text like abcd... etc, what i dont understand is the use of char like /n /t what is the use of these char? and in general this variable?

Thanks in advance.

dimid · Accepted Answer · 2019-08-13 07:22:56Z

1

This string is the vocabulary. In the context of NLP, vocabulary is a list of all words or characters used in the training set. In your example the vocabulary is a list of characters. Specifically \n is a newline, and \t a tab.

For NER and other nlp tasks, we usually use a vocabulary to produce embeddings for each token (word or char), and these embeddings are fed to the machine learning model (nowadays, neural networks architectures such as LSTM are used to get the best results). Character based embeddings have an advantage over word based embeddings for OOV (Out-of-vocabulary) words, i.e. words that do not appear in the training set, but are encountered during inference.

answered Aug 13, 2019 at 7:22

dimid

7,6213 gold badges54 silver badges90 bronze badges

What happens if i do not use the \n in the vocab?
– Ryan
Commented Aug 13, 2019 at 7:25
@Ryan It depends on your model, but basically it means you consider newlines as non-significant for the task (NER, in your case).
– dimid
Commented Aug 13, 2019 at 7:30

Add a comment |

Collectives™ on Stack Overflow

Clarification on the use of Vocab file in NER

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
deep-learning
nlp
named-entity-recognition
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged deep-learningnlpnamed-entity-recognition or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
deep-learning
nlp
named-entity-recognition
or ask your own question.