DL Notes
DL Notes
DL Notes
1. What is data mining and data warehousing? Explain the primary methodologies of Data
mining.
2. Describe about prescriptive analytics and its step in the business analytics process.
3. Discuss about Qualitative and Judgmental Forecasting and statistical forecasting models.
4. Explain different forecasting models for stationary time-series and regression forecasting
with casual variables.
5. Write about Monte Carle Simulation model and cash budget model.
6. Explain the decision strategies, decision trees, and decision theory model elements
7. Write about value of information and utility and decision making.
8. Discuss about collaborative business intelligence and data storytelling and data
journalism
1. List and explain application that can be modeled using RNN
Recurrent neural networks (RNN) are the state of the art algorithm for sequential data and are used
by Apple's Siri and Google's voice search. It is the first algorithm that remembers its input, due to an
internal memory, which makes it perfectly suited for machine learning problems that involve
sequential data. It is one of the algorithms behind the scenes of the amazing achievements seen
in deep learning over the past few years. In this post, we'll cover the basic concepts of how
recurrent neural networks work, what the biggest issues are and how to solve them.
RNNs are a powerful and robust type of neural network, and belong to the most promising
algorithms in use because it is the only one with an internal memory.
Like many other deep learning algorithms, recurrent neural networks are relatively old. They were
initially created in the 1980’s, but only in recent years have we seen their true potential. An
increase in computational power along with the massive amounts of data that we now have to
work with, and the invention of long short-term memory (LSTM) in the 1990s, has really brought
RNNs to the foreground.
Because of their internal memory, RNN’s can remember important things about the input they
received, which allows them to be very precise in predicting what’s coming next. This is why
they're the preferred algorithm for sequential data like time series, speech, text, financial data,
audio, video, weather and much more. Recurrent neural networks can form a much deeper
understanding of a sequence and its context compared to other algorithms.
WHAT IS A RECURRENT NEURAL NETWORK (RNN)?
Recurrent neural networks (RNN) are a class of neural networks that are helpful in modeling
sequence data. Derived from feedforward networks, RNNs exhibit similar behavior to how human
brains function. Simply put: recurrent neural networks produce predictive results in sequential data
that other algorithms can’t.
But when do you need to use a RNN?
“Whenever there is a sequence of data and that temporal dynamics that connects the data is more
important than the spatial content of each individual frame.” – Lex Fridman (MIT)
Since RNNs are being used in the software behind Siri and Google Translate, recurrent neural
networks show up a lot in everyday life.
How Recurrent Neural Networks Work
To understand RNNs properly, you'll need a working knowledge of "normal“feed-forward neural
networks and sequential data.
Sequential data is basically just ordered data in which related things follow each other. Examples
are financial data or the DNA sequence. The most popular type of sequential data is perhaps time
series data, which is just a series of data points that are listed in time order.
RNN VS. FEED-FORWARD NEURAL NETWORKS
RNN’s and feed-forward neural networks get their names from the way they channel information.
In a feed-forward neural network, the information only moves in one direction — from the input
layer, through the hidden layers, to the output layer. The information moves straight through the
network and never touches a node twice.
Feed-forward neural networks have no memory of the input they receive and are bad at predicting
what’s coming next. Because a feed-forward network only considers the current input, it has no
notion of order in time. It simply can’t remember anything about what happened in the past except
its training.
In a RNN the information cycles through a loop. When it makes a decision, it considers the current
input and also what it has learned from the inputs it received previously.
The two images below illustrate the difference in information flow between a RNN and a feed-
forward neural network.
A usual RNN has a short-term memory. In combination with a LSTM they also have a long-term
memory (more on that later).
Another good way to illustrate the concept of a recurrent neural network's memory is to explain it
with an example:
Imagine you have a normal feed-forward neural network and give it the word "neuron" as an input
and it processes the word character by character. By the time it reaches the character "r," it has
already forgotten about "n," "e" and "u," which makes it almost impossible for this type of neural
network to predict which character would come next.
A recurrent neural network, however, is able to remember those characters because of its internal
memory. It produces output, copies that output and loops it back into the network.
Simply put: recurrent neural networks add the immediate past to the present.
Therefore, a RNN has two inputs: the present and the recent past. This is important because the
sequence of data contains crucial information about what is coming next, which is why a RNN can
do things other algorithms can’t.
A feed-forward neural network assigns, like all other deep learning algorithms, a weight matrix to
its inputs and then produces the output. Note that RNNs apply weights to the current and also to
the previous input. Furthermore, a recurrent neural network will also tweak the weights for both
through gradient descent and backpropagation through time (BPTT).
TYPES OF RNNS
One to One
One to Many
Many to One
Many to Many
Also note that while feed-forward neural networks map one input to one output, RNNs can map
one to many, many to many (translation) and many to one (classifying a voice).
BPTT is basically just a fancy buzz word for doing backpropagation on an unrolled RNN. Unrolling is
visualization and conceptual tool, which helps you understand what’s going on within the network.
Most of the time when implementing a recurrent neural network in the common programming
frameworks, backpropagation is automatically taken care of, but you need to understand how it
works to troubleshoot problems that may arise during the development process.
You can view a RNN as a sequence of neural networks that you train one after another with
backpropagation.
The image below illustrates an unrolled RNN. On the left, the RNN is unrolled after the equal sign.
Note there is no cycle after the equal sign since the different time steps are visualized and
information is passed from one time step to the next. This illustration also shows why a RNN can be
seen as a sequence of neural networks.
Step 6 - Apply activation over the total input to calculate the output as per the equation given
below:
To determine if the network will converge to a stable configuration, we see if the energy function
reaches its minimum by:
The network is bound to converge if the activity of each neuron wrt time is given by the following
differential equation:
4. List the different types of auto-encoders.
Autoencoders encodes the input values x using a function f. Then decodes the encoded
values f(x) using a function g to create output values identical to the input values.
Autoencoder objective is to minimize reconstruction error between the input and output. This helps
autoencoders to learn important features present in the data. When a representation allows a good
reconstruction of its input then it has retained much of the information present in the input.
What are different types of Autoencoders?
Undercomplete Autoencoders
Undercomplete Autoencoder- Hidden layer has smaller dimension than input layer
Goal of the Autoencoder is to capture the most important features present in the data.
Undercomplete autoencoders have a smaller dimension for hidden layer compared to the
input layer. This helps to obtain important features from the data.
Objective is to minimize the loss function by penalizing the g(f(x)) for being different from the
input x.
When decoder is linear and we use a mean squared error loss function then undercomplete
autoencoder generates a reduced feature space similar to PCA
We get a powerful nonlinear generalization of PCA when encoder function f and decoder
function g are non linear.
Undercomplete autoencoders do not need any regularization as they maximize the
probability of data rather than copying the input to the output.
Sparse Autoencoders
Sparse autoencoders take the highest activation values in the hidden layer and zero out the
rest of the hidden nodes. This prevents autoencoders to use all of the hidden nodes at a
time and forcing only a reduced number of hidden nodes to be used.
As we activate and inactivate hidden nodes for each row in the dataset. Each hidden node
extracts a feature from the data
Denoising Autoencoders (DAE) :
Denoising helps the autoencoders to learn the latent representation present in the data.
Denoising autoencoders ensures a good representation is one that can be derived robustly
from a corrupted input and that will be useful for recovering the corresponding clean input.
Denoising is a stochastic autoencoder as we use a stochastic corruption process to set some
of the inputs to zero
Contractive Autoencoders (CAE)
Contractive Autoencoders
Contractive autoencoder(CAE) objective is to have a robust learned representation which is
less sensitive to small variation in the data.
Robustness of the representation for the data is done by applying a penalty term to the loss
function. The penalty term is Frobenius norm of the Jacobian matrix. Frobenius norm of
the Jacobian matrix for the hidden layer is calculated with respect to input. Frobenius norm
of the Jacobian matrix is the sum of square of all elements.
Loss function with penalty term — Frobenius norm of the Jacobian matrix
Contractive autoencoder is another regularization technique like sparse autoencoders and
denoising autoencoders.
CAE surpasses results obtained by regularizing autoencoder using weight decay or by
denoising. CAE is a better choice than denoising autoencoder to learn useful feature
extraction.
Penalty term generates mapping which are strongly contracting the data and hence the
name contractive autoencoder.
Stacked Denoising Autoencoders
Boltzmann Machine
Energy-Based Models:
Boltzmann Distribution is used in the sampling distribution of the Boltzmann Machine. The
Boltzmann distribution is governed by the equation –
Pi = e(-∈i/kT)/ ∑e(-∈j/kT)
Pi - probability of system being in state i
∈i - Energy of system in state i
T - Temperature of the system
k - Boltzmann constant
∑e(-∈j/kT) - Sum of values for all possible states of the system
Boltzmann Distribution describes different states of the system and thus Boltzmann machines
create different states of the machine using this distribution. From the above equation, as the
energy of system increases, the probability for the system to be in state ‘i’ decreases. Thus, the
system is the most stable in its lowest energy state (a gas is most stable when it spreads). Here, in
Boltzmann machines, the energy of the system is defined in terms of the weights of synapses.
Once the system is trained and the weights are set, the system always tries to find the lowest
energy state for itself by adjusting the weights.
Types of Boltzmann Machines:
Restricted Boltzmann Machines (RBMs)
Deep Belief Networks (DBNs)
Deep Boltzmann Machines (DBMs)
Restricted Boltzmann Machines (RBMs):
In a full Boltzmann machine, each node is connected to every other node and hence the
connections grow exponentially. This is the reason we use RBMs. The restrictions in the node
connections in RBMs are as follows –
Hidden nodes cannot be connected to one another.
Visible nodes connected to one another.
Energy function example for Restricted Boltzmann Machine –
E(v, h) = -∑ aivi - ∑ bjhj - ∑∑ viwi,jhj
a, v - biases in the system - constants
vi, hj - visible node, hidden node
P(v, h) = Probability of being in a certain state
P(v, h) = e(-E(v, h))/Z
Z - sum if values for all possible states
Suppose that we are using our RBM for building a recommender system that works on six (6)
movies. RBM learns how to allocate the hidden nodes to certain features. By the process
of Contrastive Divergence, we make the RBM close to our set of movies that is our case or
scenario. RBM identifies which features are important by the training process. The training data is
either 0 or 1 or missing data based on whether a user liked that movie (1), disliked that movie (0)
or did not watch the movie (missing data). RBM automatically identifies important features.
Contrastive Divergence:
RBM adjusts its weights by this method. Using some randomly assigned initial weights, RBM
calculates the hidden nodes, which in turn use the same weights to reconstruct the input nodes.
Each hidden node is constructed from all the visible nodes and each visible node is reconstructed
from all the hidden node and hence, the input is different from the reconstructed input, though
the weights are the same. The process continues until the reconstructed input matches the
previous input. The process is said to be converged at this stage. This entire procedure is known
as Gibbs Sampling.
Gibb’s Sampling
The Gradient Formula gives the gradient of the log probability of the certain state of the system
with respect to the weights of the system. It is given as follows –
d/dwij(log(P(v0))) = <vi0 * hj0> - <vi∞ * hj∞>
v - Visible state, h- hidden state
<vi0 * hj0> - initial state of the system
<vi∞ * hj∞> - final state of the system
P (v0) - probability that the system is in state v 0
wij - weights of the system
The above equations tell us – how the change in weights of the system will change the log
probability of the system to be a particular state. The system tries to end up in the lowest
possible energy state (most stable). Instead of continuing the adjusting of weights process until
the current input matches the previous one, we can also consider the first few pauses only. It is
sufficient to understand how to adjust our curve so as to get the lowest energy state. Therefore,
we adjust the weights; redesign the system and energy curve such that we get the lowest energy
for the current position. This is known as the Hinton’s shortcut.
Hinton’s Shortcut
Working of RBM – Illustrative Example –
Consider – Mary watches four movies out of the six available movies and rates four of them. Say,
she watched m1, m3, m4 and m5 and likes m3, m5 (rated 1) and dislikes the other two, that is m 1,
m4 (rated 0) whereas the other two movies – m2, m6 are unrated. Now, using our RBM, we will
recommend one of these movies for her to watch next. Say –
m3, m5 are of ‘Drama’ genre.
m1, m4 are of ‘Action’ genre.
‘Dicaprio’ played a role in m 5.
m3, m5 have won ‘Oscar.’
‘Tarantino’ directed m4.
m2 is of the ‘Action’ genre.
m6 is of both the genres ‘Action’ and ‘Drama’, ‘Dicaprio’ acted in it and it has won an
‘Oscar’.
We have the following observations –
Mary likes m3, m5 and they are of genre ‘Drama,’ she probably likes ‘Drama’ movies.
Mary dislikes m1, m4 and they are of action genre, she probably dislikes ‘Action’ movies.
Mary likes m3, m5 and they have won an ‘Oscar’, she probably likes an ‘Oscar’ movie.
Since ‘Dicaprio’ acted in m 5 and Mary likes it, she will probably like a movie in
which ‘Dicaprio’ acted.
Mary does not like m4 which is directed by Tarantino, she probably dislikes any movie
directed by ‘Tarantino’.
Therefore, based on the observations and the details of m 2, m6; our RBM recommends m6 to
Mary (‘Drama’, ‘Dicaprio’ and ‘Oscar’ matches both Mary’s interests and m 6). This is how an RBM
works and hence is used in recommender systems.
Working of RBM
Thus, RBMs are used to build Recommender Systems.
Deep Belief Networks (DBNs):
Suppose we stack several RBMs on top of each other so that the first RBM outputs are the input
to the second RBM and so on. Such networks are known as Deep Belief Networks. The
connections within each layer are undirected (since each layer is an RBM). Simultaneously, those
in between the layers are directed (except the top two layers – the connection between the top
two layers is undirected). There are two ways to train the DBNs-
1. Greedy Layer-wise Training Algorithm – The RBMs are trained layer by layer. Once the
individual RBMs are trained (that is, the parameters – weights, biases are set), the
direction is set up between the DBN layers.
2. Wake-Sleep Algorithm – The DBN is trained all the way up (connections going up – wake)
and then down the network (connections going down — sleep).
Therefore, we stack the RBMs, train them, and once we have the parameters trained, we make
sure that the connections between the layers only work downwards (except for the top two
layers).
Deep Boltzmann Machines (DBMs):
DBMs are similar to DBNs except that apart from the connections within layers, the connections
between the layers are also undirected (unlike DBN in which the connections between layers are
directed). DBMs can extract more complex or sophisticated features and hence can be used for
more complex tasks.
6. Discuss about sparse coding and computer vision.
Sparse Coding: Sparse coding is a class of unsupervised methods for learning sets of over-completes
bases to represent data efficiently. Sparse coding aims to find a set of basis vectors ϕi such that we
can represent an input vector x as a linear combination of these basis vectors:
While techniques such as Principal Component Analysis (PCA) allow us to learn a complete set of
basis vectors efficiently, we wish to learn an over-complete set of basis vectors to represent input
vectors x∈Rn (i.e. such that k>n). The advantage of having an over-complete basis is that our basis
vectors are better able to capture structures and patterns inherent in the input data. However, with
an over-complete basis, the coefficients ai are no longer uniquely determined by the input vector x.
Therefore, in sparse coding, we introduce the additional criterion of sparsity to resolve the
degeneracy introduced by over-completeness. 3 Here, we define sparsity as having few non-zero
components or having few components not close to zero. The requirement that our coefficients ai is
sparse means that given an input vector, we would like as few of our coefficients to be far from zero
as possible. The choice of sparsity as a desired characteristic of our representation of the input data
can be motivated by the observation that most sensory data such as natural images may be
described as the superposition of a small number of atomic elements such as surfaces or edges.
Other justifications such as comparisons to the properties of the primary visual cortex have also
been advanced.
We define the sparse coding cost function on a set of m input vectors as
where S(.) is a sparsity cost function that penalizes ai for being far from zero. We can interpret the
first term of the sparse coding objective as a reconstruction term that tries to force the algorithm to
provide a good representation of x and the second term as a sparsity penalty which forces our
representation of x to be sparse. The constant λ is a scaling constant to determine the relative
importance of these two contributions.
Although the most direct measure of sparsity is the ”L0” norm (S(ai)=1(|ai|>0)), it is non-
differentiable and difficult to optimize in general. In practice, common choices for the sparsity cost
S(.) are the L1 penalty S(ai)=|ai|1 and the log penalty S(ai)=log(1+a2i).
Also, it is possible to make the sparsity penalty arbitrarily small by scaling down ai and scaling ϕi up
by some large constant. To prevent this from 4 happenings, we will constrain ||ϕ||2 to be less than
some constant C. The full sparse coding cost function including our constraint on ϕ is
Computer vision has traditionally been one of the most active research areas for deep learning
applications, because vision is a task that is effortless for humans and many animals but challenging
for computers (Ballard et al., 1983). Many of the most popular standard benchmark tasks for deep
learning algorithms are forms of object recognition or optical character recognition. Computer
vision is a very broad field encompassing a wide variety of ways of processing images, and an
amazing diversity of applications. Applications of computer vision range from reproducing human
visual abilities, such as recognizing faces, to creating entirely new categories of visual abilities.
computer vision task involving repairing defects in images or removing objects from images.
Preprocessing
Many application areas require sophisticated preprocessing because the original input comes in a
form that is difficult for much deep learning architecture to represent. Computer vision usually
requires relatively little of this kind of preprocessing. The images should be standardized so that
their pixels all lie in the same, reasonable range, like [0,1] or [-1, 1]. Mixing images that lie in [0,1]
with images that lie in [0, 255] will usually result in failure. Formatting images to have the same
scale is the only kind of preprocessing that is strictly necessary. Many computer vision architectures
require images of a standard size, so images must be cropped or scaled to fit that size. However,
even this rescaling is not always strictly necessary. Some convolutional models accept variably-sized
inputs and dynamically adjust the size of their pooling regions to keep the output size constant
Contrast Normalization
One of the most obvious sources of variation that can be safely removed for many tasks is the
amount of contrast in the image. Contrast simply refers to the magnitude of the difference
between the bright and the dark pixels in an image. There are many ways of quantifying the
contrast of an image. In the context of deep learning, contrast usually refers to the standard
deviation of the pixels in an image or region of an image.
Global contrast normalization will often fail to highlight image features we would like to stand out,
such as edges and corners. If we have a scene with a large dark area and a large bright area (such as
a city square with half the image in the shadow of a building) then global contrast normalization will
ensure there is a large difference between the brightness of the dark area and the brightness of the
light area. It will not, however, ensure that edges within the dark region stand out. This motivates
local contrast normalization. Local contrast normalization ensures that the contrast is normalized
across each small window, rather than over the image as a whole.
Dataset Augmentation
it is easy to improve the generalization of a classifier by increasing the size of the training set by
adding extra copies of the training examples that have been modified with transformations that do
not change the 459 http://www.deeplearningbook.org/contents/applications.html 15 of 43
6/20/16, 13:55 CHAPTER 12. APPLICATIONS class. Object recognition is a classification task that is
especially amenable to this form of dataset augmentation because the class is invariant to so many
transformations and the input can be easily transformed with many geometric operations. As
described before, classifiers can benefit from random translations, rotations, and in some cases,
flips of the input to augment the dataset. In specialized computer vision applications, more
advanced transformations are commonly used for dataset augmentation
7. Discuss the features of Tensor flow, caffe, Theano, Torch tools.
Theano: Theano is a Python library that lets you define, optimize, and evaluate mathematical
expressions, especially ones with multi-dimensional arrays (NumPy.ndarray). Using Theano it is
possible to attain speeds rivaling handcrafted C implementations for problems involving large
amounts of data. It can also surpass C on a CPU by many orders of magnitude by taking advantage of
recent GPUs.
Theano combines aspects of a computer algebra system (CAS) with aspects of an optimizing
compiler. It can also generate customized C code for many mathematical operations. This
combination of CAS with optimizing compilation is particularly useful for tasks in which complicated
mathematical expressions are evaluated repeatedly and evaluation speed is critical. For situations
where many different expressions are each evaluated once Theano can minimize the amount of
compilation/analysis overhead, but still provide symbolic features such as automatic differentiation.
Theano’s compiler applies many optimizations of varying complexity to these symbolic expressions.
These optimizations include, but are not limited to:
- Use of GPU for computations
- Constant folding
- Merging of similar subgraphs, to avoid redundant calculations
- Arithmetic simplification (e.g. X*y/x -> y, — x -> x)
- Inserting efficient BLAS operations (e.g. GEMM) in a variety of contexts
- Using memory aliasing to avoid calculation
- Using in-place operations wherever it does not interfere with aliasing
- Loop fusion for elementwise sub-expressions
- Improvements to numerical stability (log(1+exp(x)) and log(sum_i \exp(x[i]))
- Optimizations
Theano was written at the LISA lab to support the rapid development of efficient machine learning
algorithms. Theano is named after the Greek mathematician, who may have been Pythagoras’ wife.
Theano is released under a BSD license.
Here is an example of how to use Theano. It doesn’t show off many of Theano’s features, but it
illustrates concretely what Theano is.
import theano
from theano import tensor
# declare two symbolic floating-point scalars
a = tensor.dscalar()
b = tensor.dscalar()
# create a simple expression
c=a+b
# convert the expression into a callable object that takes (a,b)
# values as input and computes a value for c
f = theano.function([a,b], c)
# bind 1.5 to ‘a’, 2.5 to ‘b’, and evaluate ‘c’
assert 4.0 == f(1.5, 2.5)
Theano is not a programming language in the normal sense because you write a program in Python
that builds expressions for Theano. Still, it is like a programming language in the sense that you have
to
- Declare variables (a,b) and give their types
- Build expressions for how to put those variables together
- Compile expression graphs to functions to use them for computation
It is good to think of theano.function as the interface to a compiler which builds a callable object
from a purely symbolic graph. One of Theano’s most important features is that theano.function can
optimize a graph and even compile some or all of it into native machine instructions.
execution speed optimizations: Theano can use g++ or nvcc to compile parts of your expression
graph into CPU or GPU instructions, which run much faster than pure Python.
- symbolic differentiation: Theano can automatically build symbolic graphs for computing gradients.
- stability optimizations: Theano can recognize [some] numerically unstable expressions and
compute them with more stable algorithms.
The closest Python package to Theano is SymPy. Theano focuses more on tensor expressions than
SymPy and has more machinery for compilation. Sympy has more sophisticated algebra rules and
can handle a wider variety of mathematical operations (such as series, limits, and integrals).
Procedure:
-Import necessary libraries
- First load the required dataset
- Do the necessary preprocessing as our dataset is in CSV form, so convert the dataset into matrix
form
- Now divide the data into X and Y part
- First, pass the image from a sparse coder and convert the images into a sparse representation
- Now build a 2- layer neural network with 10 neurons in output layer as we are doing digit
recognition
- Now pass the Sparse encoded images through the network and note down the cost at each
iteration
- Now draw a graph between no of iterations and cost function
- And if the cost function decreases at each iteration then our model is doing great
· Since at each iterations cost is decreasing so our model is correct.
· We need more iterations for the cost of approaches to zero.
TensorFlow Example
Google wants to use machine learning to take advantage of their massive datasets to give users the
best experience. Three different groups use machine learning:
Researchers
Data Scientists
Programmers
They can all use the same toolset to collaborate with each other and improve their efficiency.
Google does not just have any data; they have the world’s most massive computer, so Tensor Flow
was built to scale. TensorFlow is a library developed by the Google Brain Team to accelerate
machine learning and deep neural network research.
It was built to run on multiple CPUs or GPUs and even mobile operating systems, and it has several
wrappers in several languages like Python, C++ or Java.
History of TensorFlow
A couple of years ago, deep learning started to outperform all other machine learning algorithms
when giving a massive amount of data. Google saw it could use these deep neural networks to
improve its services:
Gmail
Photo
Google search engine
They build a framework called Tensorflow to let researchers and developers work together on an AI
model. Once developed and scaled, it allows lots of people to use it.
It was first made public in late 2015, while the first stable version appeared in 2017. It is open
source under Apache Open Source license. You can use it, modify it and redistribute the modified
version for a fee without paying anything to Google.
Next in this TensorFlow Deep learning tutorial, we will learn about TensorFlow architecture and
how does TensorFlow work.
How TensorFlow Works
TensorFlow enables you to build dataflow graphs and structures to define how data moves through
a graph by taking inputs as a multi-dimensional array called Tensor. It allows you to construct a
flowchart of operations that can be performed on these inputs, which goes at one end and comes
at the other end as output.
TensorFlow Architecture
Tensorflow architecture works in three parts:
Preprocessing the data
Build the model
Train and estimate the model
It is called Tensorflow because it takes input as a multi-dimensional array, also known as tensors.
You can construct a sort of flowchart of operations (called a Graph) that you want to perform on
that input. The input goes in at one end, and then it flows through this system of multiple
operations and comes out the other end as output.
This is why it is called TensorFlow because the tensor goes in it flows through a list of operations,
and then it comes out the other side.
Where can Tensorflow run?
TensorFlow hardware, and software requirements can be classified into
Development Phase: This is when you train the mode. Training is usually done on your Desktop or
laptop.
Run Phase or Inference Phase: Once training is done Tensorflow can be run on many different
platforms. You can run it on
Desktop running Windows, macOS or Linux
Cloud as a web service
Mobile devices like iOS and Android
You can train it on multiple machines then you can run it on a different machine, once you have the
trained model.
TensorFlow Components
Tensor
Tensorflow’s name is directly derived from its core framework: Tensor. In Tensorflow, all the
computations involve tensors. A tensor is a vector or matrix of n-dimensions that represents all
types of data. All values in a tensor hold identical data type with a known (or partially
known) shape. The shape of the data is the dimensionality of the matrix or array.
A tensor can be originated from the input data or the result of a computation. In TensorFlow, all the
operations are conducted inside a graph. The graph is a set of computation that takes place
successively. Each operation is called an op node and are connected to each other.
The graph outlines the ops and connections between the nodes. However, it does not display the
values. The edge of the nodes is the tensor, i.e., a way to populate the operation with data.
Graphs
TensorFlow makes use of a graph framework. The graph gathers and describes all the series
computations done during the training. The graph has lots of advantages:
It was done to run on multiple CPUs or GPUs and even mobile operating system
The portability of the graph allows to preserve the computations for immediate or later use.
The graph can be saved to be executed in the future.
All the computations in the graph are done by connecting tensors together
o A tensor has a node and an edge. The node carries the mathematical operation and
produces an endpoints outputs. The edges the edges explain the input/output
relationships between nodes.
Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is
developed by Berkeley AI Research (BAIR) and by community contributors. Yangqing Jia created the
project during his PhD at UC Berkeley. Caffe is released under the BSD 2-Clause license.
Check out our web image classification demo!
Why Caffe?
Expressive architecture encourages application and innovation. Models and optimization are
defined by configuration without hard-coding. Switch between CPU and GPU by setting a single flag
to train on a GPU machine then deploy to commodity clusters or mobile devices.
Extensible code fosters active development. In Caffe’s first year, it has been forked by over 1,000
developers and had many significant changes contributed back. Thanks to these contributors the
framework tracks the state-of-the-art in both code and models.
Speed makes Caffe perfect for research experiments and industry deployment. Caffe can
process over 60M images per day with a single NVIDIA K40 GPU*. That’s 1 ms/image for inference
and 4 ms/image for learning and more recent library versions and hardware are faster still. We
believe that Caffe is among the fastest convnet implementations available.
Community: Caffe already powers academic research projects, startup prototypes, and even large-
scale industrial applications in vision, speech, and multimedia. Join our community of brewers on
the caffe-users group and Github.
What Are Torch and PyTorch?
PyTorch is an open-source Python library for deep learning developed and maintained by Facebook.
The project started in 2016 and quickly became a popular framework among developers and
researchers.
Torch (Torch7) is an open-source project for deep learning written in C and generally used via the
Lua interface. It was a precursor project to PyTorch and is no longer actively developed. PyTorch
includes “Torch” in the name, acknowledging the prior torch library with the “Py” prefix indicating
the Python focus of the new project.
The PyTorch API is simple and flexible, making it a favorite for academics and researchers in the
development of new deep learning models and applications. The extensive use has led to many
extensions for specific applications (such as text, computer vision, and audio data), and may pre-
trained models that can be used directly. As such, it may be the most popular library used by
academics.
The flexibility of PyTorch comes at the cost of ease of use, especially for beginners, as compared to
simpler interfaces like Keras. The choice to use PyTorch instead of Keras gives up some ease of use,
a slightly steeper learning curve, and more code for more flexibility, and perhaps a more vibrant
academic community.
Before installing PyTorch, ensure that you have Python installed, such as Python 3.6 or higher.
If you don’t have Python installed, you can install it using Anaconda. This tutorial will show you
how:
How to Setup Your Python Environment for Machine Learning With Anaconda
There are many ways to install the PyTorch open-source deep learning library.
The most common, and perhaps simplest, way to install PyTorch on your workstation is by using
pip.
Perhaps the most popular application of deep learning is for computer vision, and the PyTorch
computer vision package is called “torchvision.”
If you prefer to use an installation method more specific to your platform or package manager, you
can see a complete list of installation instructions here:
All examples in this tutorial will work just fine on a modern CPU. If you want to configure PyTorch
for your GPU, you can do that after completing this tutorial. Don’t get distracted!
Once PyTorch is installed, it is important to confirm that the library was installed successfully and
that you can start using it.
If PyTorch is not installed correctly or raises an error on this step, you won’t be able to run the
examples later.
Create a new file called versions.py and copy and paste the following code into the file.
2 import torch
3 print(torch.__version__)
Save the file, then open your command line and change directory to where you saved the file.
Then type:
1 python versions.py
1 1.3.1
This confirms that PyTorch is installed correctly and that we are all using the same version.
This also shows you how to run a Python script from the command line. I recommend running all
code from the command line in this manner, and not from a notebook or an IDE.
8. List and explain NLP packages and tools with examples.
Natural language processing (NLP) is the use of human languages, such as English or French, by a
computer. Computer programs typically read and emit specialized languages designed to allow
efficient and unambiguous parsing by simple programs. More naturally occurring languages are
often ambiguous and defy formal description. Natural language processing includes applications
such as machine translation, in which the learner must read a sentence in one human language and
emit an equivalent sentence in another human language. Many NLP applications are based on
language models that define a probability distribution over sequences of words, characters or bytes
in a natural language.
5 Best NLP tools and libraries
1. NLTK - entry-level open-source NLP Tool
Natural Language Toolkit (AKA NLTK) is an open-source software powered with Python NLP. From
this point, the NLTK library is a standard NLP tool developed for research and education.
NLTK provides users with a basic set of tools for text-related operations. It is a good starting point
for beginners in Natural Language Processing.
Natural Language Toolkit features include:
Text classification
Part-of-speech tagging
Entity extraction
Tokenization
Parsing
Stemming
Semantic reasoning
2. Stanford Core NLP - Data Analysis, Sentiment Analysis, Conversational UI
We can say that the Stanford NLP library is a multi-purpose tool for text analysis. Like NLTK,
Stanford CoreNLP provides many different natural language processing software. But if you need
more, you can use custom modules.
The main advantage of Stanford NLP tools is scalability. Unlike NLTK, Stanford Core NLP is a perfect
choice for processing large amounts of data and performing complex operations.
With its high scalability, Stanford CoreNLP is an excellent choice for:
information scraping from open sources (social media, user-generated reviews)
sentiment analysis (social media, customer support)
conversational interfaces(chatbots)
text processing, and generation(customer support, e-commerce)
3. Apache OpenNLP - Data Analysis and Sentiment Analysis
Accessibility is essential when you need a tool for long-term use, which is challenging in the realm
of Natural Language Processing open-source tools. Because while being powered with the right
features, it could be too complex to use.
Apache OpenNLP is an open-source library for those who prefer practicality and accessibility. Like
Stanford CoreNLP, it uses Java NLP libraries with Python decorators.
While NLTK and Stanford CoreNLP are state-of-the-art libraries with tons of additions, OpenNLP is a
simple yet useful tool. Besides, you can configure OpenNLP in the way you need and get rid of
unnecessary features.
Apache OpenNLP is the right choice for:
Named Entity Recognition
Sentence Detection
POS tagging
Tokenization
4. GenSim - Document Analysis, Semantic Search, Data Exploration
Sometimes you need to extract particular information to discover business insights. GenSim is the
perfect tool for such things. It is an open-source NLP library designed for document exploration and
topic modeling. It would help you to navigate the various databases and documents.
The key GenSim feature is word vectors. It sees the content of the documents as sequences of
vectors and clusters. And then, GenSim classifies them.
GenSim is also resource-saving when it comes to dealing with a large amount of data.
The main GenSim use cases are:
Data analysis
Semantic search applications
Text generation applications (chatbot, service customization, text summarization, etc.)
5. Intel NLP Architect - Data Exploration, Conversational UI3
Intel NLP Architect is the newer application in this list. Intel NLP Architect uses Python library for
deep learning using recurrent neural networks. You can use it for:
text generation and summarization
aspect-based sentiment analysis
and conversational interfaces such as chatbots