Steps For Training A Recurrent Neural Network: Advantages
Steps For Training A Recurrent Neural Network: Advantages
Steps For Training A Recurrent Neural Network: Advantages
Given below are few steps for training a recurrent neural network.
1. In the input layers, the initial input is sent with all having the same weight and activation
function.
2. Using the current input and the previous state output, the current state is calculated.
3. Now the current state ht will become ht-1 for the second time step.
4. This keeps on repeating for all the steps, and to solve any particular problem, it can go on
as many times to join the information from all the previous steps.
5. The final step is then calculated by the current state of the final state and all other
previous steps.
6. Now an error is generated by calculating the difference between the actual output and the
output generated by our RNN model.
7. The final step is when the process of backpropagation occurs where in the error is
backpropagated to update the weights.
Disadvantages
b. Except that activations arrive at the hidden layer from both the current
external input and the hidden layer activations one step back in time.
1. The complete sequence of delta terms can be calculated by starting at t = T and recursively
applying the below functions, decrementing t at each step.
2. Note that δjT+1 = 0, for all j, since no error is received from beyond the end of the sequence.
3. Finally, bearing in mind that the weights to and from each unit in the
hidden layer are the same at every time-step, we sum over the whole
sequence to get the derivatives with respect to each of the network weights
For many sequence labeling tasks, we would like to have access to future.
1. Algorithm looks like this
A. Prediction problems
RNNs are generally useful in working with sequence prediction problems. Sequence prediction
problems come in many forms and are best described by the types of inputs and outputs it
supports.
The problem with Recurrent neural networks was that they were traditionally difficult to train.
The Long Short-Term Memory, or LSTM, network is one of the most successful RNN because it
solves the problems of training a recurrent network and in turn has been used on a wide range of
applications. RNNs and LSTMs have received the most success when working with sequences of
words and paragraphs, generally in the field of natural language processing(NLP).They are also
used as generative models that produce a sequence output, not only with text, but on applications
such as generating handwriting.
C. Machine Translation
RNNs in one form or the other can be used for translating text from one language to other .
Almost all of the Translation systems being used today use some advanced version of a RNN.
The input can be the source language and the output will be in the target language which the user
wants.
Currently one of the most popular and prominent machine translation application is Google
Translate. There are even numerous custom recurrent neural network applications used to refine
and confine content by various platforms. E-Commerce platforms like Flipkart, Amazon, and
eBay make use of machine translation in many areas and it also helps with the efficiency of the
search results.
D. Speech Recognition
RNNs can be used for predicting phonetic segments considering sound waves from a medium as
an input source .The set of inputs consists of phoneme or acoustic signals from an audio which
are processed in a proper manner and taken as inputs. The RNN network will compute the
phonemes and then produce a phonetic segment along with the likelihood of output.The steps
used in speech recognition are as follows:-
The input data is first processed and recognized through a neural network. The result
consists of a varied collection of input sound waves.
The information contained in the sound wave is further classified by intent and through
key words related to the query.
Then input sound waves are classified into phonetic segments and are pieced together
into cohesive words using a RNN application. The output consists of a pattern of phonetic
segments put together into a singular whole in a logical manner.
Image recognition is one of the major applications of computer vision. It is also one of the most
accessible form of RNN to explain. In its core, the algorithm is designed to consider one unit of
image as input and produce the description of the image in the form of multiple groups of
output .
• Training time: Training a RNN is known to be very slow. For example, the RNNs used in the
experiments described in Mikolov (2010) took several weeks of training, although the authors
considered only about 17% of the NYT section of English Giga word for training. Usually it takes
about 10–50 training epochs to achieve convergence although cases have been reported where
even thousands of epochs were needed . In addition the size of the vocabulary |V| which for many
language and speech applications is usually very large, plays a crucial role in the real
complexity of the training.
• Fixed number of hidden neurons: The number of hidden neurons nH has to be fixed in advance.
However, in practice the user has no clue as how to choose an appropriate number of hidden neurons,
since there does not exist a generally accepted method that determines this number. A lot of rules of
thumb are available but these rules can give very different values for nH.
• Small context size in practice. Although in theory the context size that can be taken into account is
unlimited ( if the genuine consecutive scheme is used, the history of words that is taken into account
equals all previous words relative to the current word), the range of context that is actually
accessed is quite limited. This observation is often referred to as the vanishing gradient problem .
References
[1]. CS224d Deep NLP, “Recurrent Neural Networks”, Richard Socher, Stanford University
[2]. “Supervised Sequence Labelling with Recurrent Neural Networks”, Alex Graves, Doktors
der Naturwissenschaften (Dr. rer. nat.)