Unit 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

UNIT – 3 Recurrent Neural Networks

Recurrent Neural Network:


What is Recurrent Neural Network (RNN)? State and explain types of RNN in brief.

• A Recurrent Neural Network (RNN) is a type of artificial neural network that is designed to
process sequential data such as time-series data and text data.
• RNNs can use their internal state (memory) to process sequences of inputs, which makes
them extremely useful for tasks where the context or sequence of data points is important,
such as natural language processing, speech recognition, and even image captioning.
• In an RNN, connections between nodes form a directed graph along a temporal sequence,
allowing it to exhibit temporal dynamic behavior.
• The output from the previous step is fed as input to the current step, and the hidden state
remembers some information about a sequence.
• The main feature of RNN is its hidden state, which is also referred to as Memory State since
it remembers the previous input to the network.

• Advantages:
- It processes a sequence of data as an output and receives a sequence of information as
input.
- A recurrent neural network is even used with convolutional layers to extend the active
pixel neighborhood.

• Disadvantages:
- Gradient vanishing and exploding problems.
- Training an RNN is a complicated task.
- It could not process very long sequences if it were using tanh or relu like an activation
function.
Types of RNN:
1) One-to-One: It is also known as Plain Neural Network or Vanilla Neural Network. It works with
fixed-size input to fixed-size output, where they are unrelated to earlier data or output.
Example: Image Classification

2) One-to-many: It works with information of a fixed size as input and outputs a series of data.
Example: Image Captioning takes the images as input and outputs a sentence of words.

3) Many-to-One: It accepts a series of data as input and produces an output with a set size.
Example: Sentiment analysis where any sentence is classified as expressing the positive or
negative sentiments.

4) Many-to-Many: It receives a sequence of information as input and processes a sequence of


data as an output.
Example: Machine translation, where the RNN reads any sentence in English and the output
the sentence in Marathi.
Feed-forward neural networks Vs Recurrent neural networks
Differentiate between feed-forward neural networks and recurrent neural networks.
Feed-forward neural networks and recurrent neural networks are two types of artificial neural
networks that have different architectures and applications.

• Feed-forward neural networks have no cycles in their connections, while recurrent neural
networks have feedback loops that allow them to store information from previous inputs.
• Feed-forward neural networks map input vectors to output vectors through a series of hidden
layers, while recurrent neural networks map input sequences to output sequences through a
series of hidden layers and feedback loops.
• Feed-forward neural networks are used to learn datasets where the input and output are
fixed-length vectors, while recurrent neural networks are used to learn datasets where the
input and output are variable-length sequences.
• Feed-forward neural networks are well-suited for tasks that involve predicting a single output,
such as image classification or natural language processing. Recurrent neural networks are
well-suited for tasks that involve sequential data, such as speech recognition or machine
translation.

Comparison Attribute Feed-forward Neural Recurrent Neural Networks


Networks

Signal flow direction Forward only Bidirectional


Delay introduced No Yes
Complexity Low High
Neuron independence Yes No
in the same layer

Speed High slow


Commonly used for Pattern recognition, speech Language translation, speech-
recognition, and character to-text conversion, and robotic
recognition control
Encoder Decoder
Explain how sequence to sequence model works. (2)

• The Encoder-Decoder architecture is a neural network model that is used for sequence-to-
sequence learning tasks such as machine translation, summarization, and image captioning.
• It consists of two main components: an encoder and a decoder. The encoder takes an input
sequence and maps it to a fixed-length vector representation, while the decoder takes this
vector and generates an output sequence.
• In the context of Recurrent Neural Networks (RNNs), the encoder and decoder are typically
implemented using RNN cells such as LSTM (Long Short-term Memory) or GRU (Gated
Recurrent Unit) cells.
• Encoder:
- It uses deep neural network layers and converts the input words to corresponding hidden
vectors. Each vector represents the current word and the context of the word.
- The encoder takes the input sequence, one token at a time, and uses an RNN or
transformer to update its hidden state, which summarizes the information in the input
sequence.
- The final hidden state of the encoder is then passed as the context vector to the decoder.

• Decoder:
- It is similar to the encoder. It takes as input the hidden vector generated by the encoder,
its own hidden states, and the current word to produce the next hidden vector and finally
predict the next word.
- The decoder uses the context vector and an initial hidden state to generate the output
sequence, one token at a time.
- At each time step, the decoder uses the current hidden state, the context vector, and the
previous output token to generate a probability distribution over the possible next tokens.
- The token with the highest probability is then chosen as the output, and the process
continues until the end of the output sequence is reached.
Long Short-Term Memory Network (LSTM)

• A Long Short-Term Memory (LSTM) network is a type of recurrent neural network (RNN) that
is capable of handling long-term dependencies in sequential data. LSTM is widely used in
natural language processing, speech recognition, and other applications.
• The basic architecture of an LSTM network consists of a sequence input layer, one or more
LSTM layers, and a sequence output layer.
• The LSTM layer is the core of the network and is responsible for learning the long-term
dependencies in the input sequence. It has a unique structure that allows it to selectively
remember or forget information from previous time steps.
• The basic unit of an LSTM network is the LSTM cell. An LSTM cell contains four main
components:
1) Forget gate: The forget gate decides what information to discard from the previous state
of the cell. This is done by assigning a value between 0 and 1 to each element of the
previous state. A value of 1 means to keep the information, and a value of 0 means to
discard it.
2) Input gate: The input gate decides what information to add to the cell state. This is done
by assigning a value between 0 and 1 to each element of the input data. A value of 1
means to add the information to the cell state and value of 0 means to ignore it.
3) Cell state: The cell state is the memory of the LSTM cell. It is a vector of values that
represents the information that the cell has learned over time.
4) Output gate: The output gate decides what information to output from the cell state. This
is done by assigning a value between 0 and 1 to each element of the cell state. A value of
1 means to output the information, and a value of 0 means to suppress it.
• The input data is first passed through the forget gate, which decides what information to
discard from the previous state of the cell. The input data is then passed through the input
gate, which decides what information to add to the cell state. The cell state is then updated,
and finally, the output gate decides what information to output from the cell state.
Recursive Neural Network and types of Recursive Neural Network
Describe Recursive Neural Network and types of Recursive Neural Network. Explain its
advantages.

• Recursive Neural Networks (RvNNs) are a class of deep neural networks that can learn
detailed and structured information. With RvNN, we can get a structured prediction by
recursively applying the same set of weights on structured inputs.
• The word recursive indicates that the neural network is applied to its output.
• RvNNs are used when there’s a need to parse an entire sentence. To calculate the parent
node’s representation, we add the products of the weight matrices (W_i) and the children’s
representations (C_i) and apply the transformation f:

where c is the number of children.


• Working Mechanism of Recursive Neural Networks:
o Input Processing: The input data, such as a sentence or a syntactic tree, is transformed
into a vector representation.
o Recursive Application: The RvNN recursively applies its parameters to the input
vector, traversing the hierarchical structure of the data.
o Information Aggregation: At each level of the hierarchy, the RvNN combines
information from its child nodes, capturing the context and relationships between
them.
o Output Generation: The final representation of the input structure is generated at the
root node of the hierarchy.
Advantages of Recursive Neural Networks:
- RvNNs effectively capture hierarchical relationships within structured data, making them
suitable for NLP tasks.
- Sharing parameters across substructures reduces the number of parameters required and
improves generalization.
- RvNNs can handle inputs with varying lengths and structures, making them versatile for
various NLP applications.
Applications of Recursive Neural Networks:
- Sentiment Analysis: Determining the sentiment expressed in a text, such as positive, negative,
or neutral.
- Semantic Composition: Combining the meanings of individual words to derive the meaning
of a phrase or sentence.
- Sentence Parsing: Identifying the grammatical structure of a sentence, breaking it down into
its constituent parts and their relationships.
- Opinion Mining: Extracting opinions and sentiments from text data, such as customer reviews
or social media posts.

Types of Recursive Neural Networks


1) Tree-Structured Recursive Neural Networks (TreeRNNs):
• Child-Sum TreeRNN: In this model, the representation of a parent node is the sum of the
representations of its children. This is achieved by applying a shared weight matrix to the
concatenation of the child representations.
• N-ary TreeRNN: This extension of the Child-Sum TreeRNN allows for more than two
children per node. It considers the representations of all children and combines them
using a weight matrix.
2) Graph-Structured Recursive Neural Networks (GraphRNNs):
• Graph Recursive Neural Network (GraphRNN): This model is designed to operate on
general graph structures. It recursively applies a neural network function to combine the
representations of neighboring nodes in the graph.
• Message Passing Neural Network (MPNN): This is a general framework for graph-
structured data, where nodes exchange messages with their neighbors in an iterative
manner. It involves message passing and updating node representations based on
aggregated information from neighboring nodes.
These types of Recursive Neural Networks are particularly useful for tasks involving structured
data, such as parsing sentences, analyzing parse trees, processing molecules in chemistry, or any
other scenario where the input has a hierarchical or graph-like structure.
1) Differentiate between Recurrent Neural Network and Recursive Neural Network with
appropriate diagram.

Recurrent Neural Network (RNN):

• Architecture: RNNs are designed to handle sequential data and have connections that
form cycles, allowing them to maintain a hidden state or memory of previous inputs.
• Usage: RNNs are commonly used for tasks involving sequences, such as natural language
processing (NLP), time series analysis, and speech recognition.
• Key Feature: The hidden state of an RNN allows it to capture information from previous
time steps and use it in the processing of the current input.

Recursive Neural Network (ReNN):

• Architecture: ReNNs, on the other hand, have a hierarchical structure where a neural
network is applied recursively to a nested structure (e.g., a tree).
• Usage: ReNNs are often used in tasks where the input data has a recursive or hierarchical
structure, such as parsing sentences or analyzing hierarchical relationships in data.
• Key Feature: The recursive structure of ReNNs allows them to capture hierarchical
representations of data, making them suitable for problems where the relationships
between elements have a nested or tree-like structure.

You might also like