Natural Language Processing
Natural Language Processing
Natural Language Processing
Theory:
Expert systems :
The expert systems are the computer applications developed to solve complex problems in a particular
domain, at the level of extra-ordinary human intelligence and expertise.
Lexical semantics
Machine translation
o Automatically translate text from one human language to another. This is one of the most
difficult problems, and is a member of a class of problems colloquially termed "AI-
complete", i.e. requiring all of the different types of knowledge that humans possess
(grammar, semantics, facts about the real world, etc.) in order to solve properly.
o Given a stream of text, determine which items in the text map to proper names, such as
people or places, and what the type of each such name is (e.g. person, location,
organization). Note that, although capitalization can aid in recognizing named entities in
languages such as English, this information cannot aid in determining the type of named
entity, and in any case is often inaccurate or insufficient. For example, the first word of a
sentence is also capitalized, and named entities often span several words, only some of
which are capitalized. Furthermore, many other languages in non-Western
script(e.g. Chinese or Arabic) do not have any capitalization at all, and even languages
with capitalization may not consistently use it to distinguish names. For
example, German capitalizes all nouns, regardless of whether they are names,
and French and Spanish do not capitalize names that serve as adjectives.
Natural language generation
o Convert information from computer databases or semantic intents into readable human
language.
Image Captioning
Problem Definition:
The task of image captioning aims at automatic generation of a natural language description of an image
.It connects two major fields of artificial intelligence: computer vision and natural language processing.
Given an image, break it down to extract the different objects, actions, and attributes, and finally generate
a meaningful sentence (caption/description) for the image. A description must capture not only the objects
contained in an image, but it also must express how these objects relate to each other as well as their
attributes and the activities they are involved in.
Thus the problem boils down to two things - image analysis to get features, and then a language
model to generate meaningful captions.
Model:
The model is divided into two parts:
1. CNN-based Image Feature Extractor
2. LSTM(Long Short Term Memory)-based Sentence Generator
Step 2
Fine-tune the model
1. Change the number of categories from 1000 to 20.
2. Remove the last full-connection layer.
Step 3
Feature extraction:
1. Extract all candidate boxes from the image (selective search).
2. For each region: Correct the size of the region to fit the CNN input, do a forward operation, and
output the fifth pooled layer (that is, the candidate box extracted features) to the hard disk.
Step 4
1. Train an SVM classifier (2 categories) to determine the category of objects in this candidate box.
2. Determine whether the SVM belongs to each of the categories. If so the result is positive, and
negative if not. For example, the following figure is the SVM for a dog classification.
Step 5
Use regression to fine-tune the position of the candidate boxes. For each class, train a linear regression
model to determine if the box is optimized.
LSTM-based Sentence Generator
LSTM(Long Short Term Model) is a type of RNN’ s used for remembering information over long period
of time. The central role in LSTM is played by cell state, which contains gates to optimally send
information through. The sigmoid layer outputs numbers between zero and one, describing how
much of each component should be let through by each gate.
LSTM decoder combined with a CNN image embedder to generate image captioning
The LSTM model is trained to predict each word in the sentence. Each word is predicted based
on the preceding words and the image features. The image feature vector is given as input only
for the first time. Each output by the memory cell gives out a vector, which is used to predict a
word based on the highest probability and also "forwarded" again as input to the memory cell.
Results: