Ai Algorithms PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

WHITE PAPER

H3 TRENDS IN AI ALGORITHMS:
THE INFOSYS WAY

Abstract
Artificial Intelligence algorithms are the wheels of AI. To make the art
of possible applications of AI, a very good and deep understanding
of these algorithms is required. This paper tries to bring a perspective
on the landscape of various AI algorithms that will be shaping key
advancements across industries.
Today, technology adoption is influenced At Infosys Center for Emerging With the emergence and availability of
by business and technology uncertainties. Technology solutions (iCETS), we several open datasets, computational
These uncertainties drive organisations continuously look at H2, and H3 thrust with GPU availability and maturity
to evaluate technology adoptions based technologies and their impact on client of Artificial Intelligence (AI) algorithms, AI
on risks and returns. Broadly, technology landscapes. These H2 and H3 technologies is making strong inroads into current and
led disruptions can be classified into are very important to be monitored as they future of IT ecosystems. Today, AI plays
Horizon1, Horizon 2 and Horizon 3. Horizon have the potential to transform or disrupt an integral role in IT strategy by driving
1 or H1 technologies are those that are existing well-oiled business models, hence new experiences and creating new art of
in mainstream client adoptions and have fetching large returns. However, there possibilities. In this paper, we try to look
steady business transactions, while H2 are also associated risks from adoptions at important AI algorithms that are
and H3 are those that are yet to become that need to be monitored as some of shaping various H3 of AI possibilities.
mainstream but have started to spring those can have higher negative impact on While we do that, here is a chart
interesting possibilities and potential compliance, safety and so on. representing the broader AI algorithm
returns in the future. landscape in the context of this paper.

Algorithms Use Cases


erings
Off • Explainable AI • Scene Captioning
ng • Generative Networks • Scene detection
gi • Fine Grained Classification • Store Footfall counts
er

Emerging Investment
Em

• Capsule Networks • Specific object class detection


Opportunities • Meta Learning • Sentence Completion
Envision •

Transfer Learning (Text)
Single Shot Learning


Video Scene prediction
Auto learning
Invent • Reinforcement Learning • Fake images, Art generation
Technology Uncertainty

gs Disrupt • Auto ML • Music generation


rin
ffe • Neural Architecture Search (NAS) • Data Augmentation
lk
Wa
O
w

• Convolution Neural Networks (CNN) • Object detection


nd
Ne

w la • Long Term Short Term Memory • Product brand recognition,


Adopt Cra (LSTM) classification
Scale • Recurrent Neural Networks (RNN) • Facial Recognition
Incubated to New Offerings • Word2Vec • Speech Recognition
s Enhance
ring • GloVe • Speech Transcriptions
ffe • Transfer Learning (Vision) • Topic Classification, extraction
O
Core

n
Differentiate Ru • Logistic Regression • Recommendations
Diversify Main stream • Naive Bayes • Prediction
Deploy • Random Forest • Document, image
• Support Vector Machines (SVM) classification
Fl
y

• Collaborative Filtering • Document, image Clustering


• n-grams • Sentiment Analysis

Business Uncertainty

Figure 1.0: Horizon 3 AI Algorithms- Infosys Research

External Document © 2019 Infosys Limited


H1 of AI “core offerings” are typically that will be mainstream in second wave. These H2 AI algorithms are promising
defined as algorithm powered use cases Convolution Neural Networks (CNN) have interesting new possibilities in various
that have become mainstream and will laid foundation for several art of possible business functions, however it is still in
remain major investment areas for the Computer Vision use cases ranging from nascent stage of adoption and user testing.
current wave. In that respect, adoption object detection, image captioning and
H3 of AI “emerging offerings” use cases
of use cases such as product or customer segmentation to facial recognition. Long
are the ones that are potential game
recommendations, churn and sentiment Term Short Term Memory(LSTM) and
changers and can unearth new possibilities
analysis and leveraging algorithms such as Recurrent Neural Nets (RNN) are helping
from AI that are unexplored and
Random Forest, Support Vector Machines to significantly improve art of possibilities
unimagined today. As these technologies
(SVM), Naïve Bayes, and n-grams based from use cases such as language
are relatively new, it requires more time
approaches have been mainstream for translations, sentence formulation, text
to establish its weaknesses, strengths and
some time and will continue to get weaved summarization, topic extraction and so on.
nuances. In this paper, we look at key H3 AI
into varied AI experiences. Word vectors based models such as GloVe
algorithmic trends, and how we leverage
and Word2Vec are helping in dealing with
H2 of AI “new offerings” use cases are the these in various use cases built as part
large multi dimensional text corpuses
ones, that are currently in experimentative, of our IP, Infosys Enterprise Cognitive
and finding hidden unspotted complex
evolutionary mode and will have major platform (iECP).
interwoven relationships and similarity
impact on Artificial Intelligent systems
between topics, entities and keywords.

Horizon 1 Horizon 2 Horizon 3


(Mainstream) (Adopt, Scale) (Envision, Invent, Disrupt)

• Logistic Regression • Convolution Neural Networks • Explainable AI


• Naive Bayes (CNN) • Generative Networks
• Random Forest • Long Term Short Term • Fine Grained Classification
• Support Vector Machines Memory (LSTM) • Capsule Networks
(SVM) • Recurrent Neural Networks • Meta Learning
Algorithms • Collaborative Filtering (RNN) • Transfer Learning (Text)
• n-grams • Word2Vec • Single Shot Learning
• GloVe • Reinforcement Learning
• Transfer Learning (Vision) • Auto ML
• Neural Architecture Search
(NAS)

• Recommendations • Object Detection • Scene Captioning


• Prediction • Face Recognition • Scene Detection
• Document, Image • Product Brand Recognition, • Store Footfall Counts
Classification Classification • Specific Object Class
• Document, Image Clustering • Speech Recognition Detection
Use Cases • Sentiment Analysis • Sentence Completion • Sentence Completion
• Named Entity Recognition • Speech Transcriptions • Video Scene Prediction
(NER) • Topic Classification • Auto Learning
• Keyword Extractions • Topic Extraction • Fake Images, Art Generation
• Intent Mining • Music Generation
• Question Extraction • Data Augmentation

Table 1.0: H3 Algorithms and Usecases- Infosys Research

External Document © 2019 Infosys Limited


Explainable AI (XAI)
Neural Network algorithms are considered have profound impact on livelihood. Here are a few approaches provided
to derive hidden patterns from data that through certain frameworks that can help
Geoffrey Hinton (University of Toronto),
many other conventional best of breed understand the traceability of results.
often called the godfather of deep
Machine Learning algorithms such as
learning, explains: “A deep-learning system Feature Visualization as depicted in figure
Support Vector Machines, Random Forrest,
doesn’t have any explanatory power. The below helps in visualizing various layers in
Naïve Bayes, etc. are unable to establish.
more powerful the deep-learning system a neural network. They help establish that
However, there is an increasing rate of
becomes, the more opaque it can become.” lower layers are useful in learning features
incorrect and unexplainable decisions
such as edges, and textures whereas
and results produced by the Neural It is to address the issues of transparency
higher layer provides more of higher order
Network algorithms in activities such in AI that Explainable AI was developed.
abstract concepts such as objects.
as credit lending, skilled job hiring and Explainable AI (XAI) as a framework
facial recognition. Given this scenario, increases the transparency of black-box
AI results should be justified, explained algorithms by providing explanations for
and reproduced for consistency and the predictions made and can accurately
correctness as some of these results can explain a prediction at the individual level.

Edges Textures Patterns Parts Objects

Figure 2.0: Feature Visualisation. Source: Olah, et al. 2017 (CC-BY 4.0)

Network dissection helps in associating LIME ( Local Interpretability Model parts of the image are most important in
these established units to concepts. agnostic Explanations): It treats the arriving at results. Since the original model
They learn from labeled concepts during model as a blackbox and tries to create does not participate directly, it is model
supervised training stages, and how and in another surrogate non-linear model, where independent. The challenge with this
what magnitude these are influenced by explainablitiy is supported or feasible approach is that even when the surrogate
channel activations. such as SVM, Random Forest or Logistic model based explanations can be relevant
Regression. The surrogate non-linear to the model it is used on, it may not be
Several frameworks are currently evolving
model is then used to evaluate different generalizable precisely or become one to
to improve the explainability of the
components of the image by perturbing one mappable to the original model all the
models. Two known frameworks in this
the inputs and evaluating its impact time.
space are LIME and SHAP.
on the result. Thereby deciding which

External Document © 2019 Infosys Limited


Steps simple linear model (Logistic regression, (predicted - actual) and then computing
etc..) and get the results average of the score for that feature to
1. Create set of noisy (perturbed) example
explain the results. For image use cases,
images by disabling certain features 4. Superpixels with highest positive
it marks the dominating feature areas by
(marking certain portions gray) weights becomes an explanation
coloring the pixels in the image.
2. For each example, get the probability SHAP (SHapley Additive exPlanations):
SHAP produces relatively accurate results
that tree frog is in the image as per It uses a game theory based approach
and is more widely used in Explainable AI
original model to predict the outcome by using various
as against LIME.
permutations and combinations of features
3. Using these created data points, train a
and their effect on the delta of the result

Figure 3.0: Explaining a Prediction with LIME. Source: Pol Ferrando, Understanding how LIME explains predictions

Generative AI
Generative AI will have a potentially domain (e.g., think millions of images, Generative Networks can be of multiple
strong role in creative work, be it writing sentences, or sounds, etc.) and then types depending on the objective they are
articles, creating completely new images train the model to generate similar designed for, example being.
from the existing set of trained models, data. Generative network generates the
improving image or video quality, merging data to fool the Discriminative Network Neural Style Transfer (NST)
images for artistic creations, in creating while Discriminative Network learns by
Neural Style Transfer (NST) is one of the
music or improving dataset through data identifying real vs fake data received from
Generative AI techniques in deep learning.
generation. Generative AI as it matures, in the Generative Network.
As seen below, it merges two images,
near term, will augment many jobs and will
Generator trains with an objective function namely, a "content" image (C) and a "style"
potentially replace many in future.
on whether it can fool the discriminator image (S), to create a "generated" image
Generative Networks consists of two deep network, whereas discriminator trains on (G). The generated image G combines the
neural networks, a generative network its ability to not be fooled and correctly "content" of the image C with the "style" of
and a discriminative network. They work identify real vs fake. Both network image S
together to provide high-level simulation learns through back propagation. The
of conceptual tasks. generator is typically a deconvolutional
neural network, and the discriminator is a
To train a Generative model, we first
convolutional neural network.
collect a large amount of data in some

External Document © 2019 Infosys Limited


Content image Style image Generated image

Colorful circle Blue painting Colorful circle with blue


painting style

Content image Style image Generated image

Louvre museum Impressionist style painting Louvre painting with


impressionist style

Content image Style image Generated image

Ancient city of Persepolis The Starry Night Persepolis in


(Van Gogh) Van Gogh style

Figure 4.0 : Novel Artistic Images through Neural Style Transfer. Source: Fisseha Berhane, Deep Learning & Art: Neural Style Transfer

Some of the other GAN variations that are and flowers. · eGANs (Evolutionary Generative
popular are Adversarial Networks) that generate
· Sketch-GAN, a Generative model for
photographs of faces with different
· Super Resolution GAN (SRGAN) that vector drawings, which is a Recurrent
ages, from young to old.
helps improve quality of images. Neural Network (RNN) and is able to
construct stroke-based drawings of · IcGAN, to reconstruct photographs
· Stack-GAN that generates realistic
common objects. The model is trained of faces with specific features, such
looking photographs from textual
on a dataset of human-drawn images as changes in hair color, style, facial
descriptions of simple objects like birds
representing many different classes. expression, and even gender.

External Document © 2019 Infosys Limited


Fine Grained Classification
Classification of an object into specific · Fine grained clothing style finder, type Fine Grained Classification
categories such as car, table, flower and of a shoe, etc. Approaches
such are common in Computer Vision. · Recognizing a car type
· Feature representations that better
However, establishing the objects’ finer
· Recognizing breed of a dog, plant preserve fine-grained information
class, based on specific characteristics, is
species, insect, bird species, etc.
where AI is making rapid progress. This is · Segmentation-based approaches that
because granular features of objects are However, fine-grained classification is facilitate extraction of purer features,
being trained and used for differentiation challenging due to the difficulty of finding and part/pose normalized feature
of objects. discriminative features. Finding those spaces
subtle traits that fully characterize the
Examples of Fine Grained Classification are · Pose Normalization Schemes
object is not straightforward.

In Fine Grained Classification, the interleaved with max-pooling can capture groups of detected key points are used
progression through the 8-layer CNN deformable parts, and fully connected to compute multiple warped image
network can be thought of as a progression layers can capture complex co-occurrence regions that are aligned with prototypical
from low to mid to high-level features. statistics. models. Each region is fed through a
The later layers aggregate more complex deep convolutional network, and features
Bird recognition is one of the major are extracted from multiple layers after
structural information across larger examples in fine grained classification, which they are concatenated and fed to a
scales–sequences of convolutional layers in the below image, given a test image, classifier.

Figure 5.0: Bird Recognition Pipeline Overview. Source: Branson, Van Hoen et al.: Bird Species Categorization

External Document © 2019 Infosys Limited


Car Detection System using Fine labels and then truncated, retaining of Localized Learned Features (ELLF)
Grained Classification the first two convolutional layers representation which is then used to
that retain spatial information). The predict fine-grained object categories.
The below pictures and steps depict fine
appearance of each part detected
grained classification approach for car d. A standard CNN passes the output
using the learned CNN features is
detection system of the convolutional layers through
described by pooling in the detected
several fully connected layers in order
a. Detects parts using a collection of region of each part.
to make a prediction.
unsupervised part detectors.
c. Appearance of any undetected part
b. Outputs a grid of discriminative is set to zero. This results in Ensemble
features (The CNN is learned with class

Figure 6.0: Car Detection System. Source: Learning Features and Parts for Fine-Grained Recognition

Capsule Network
Convolutional Network are so far the alignment of eyes, and eyebrows or say several capsules, each capsule consists
defacto and well accepted algorithms to eyebrows swaps with lips and ears are of several neurons. Capsules in lower
work with image based datasets. They placed on forehead, the same CNN trained layers are called primary capsules and are
work on the pixels of images using various algorithm would still go on and detect this trained to detect an object (e.g. triangle,
size filters (channels) by convolving, using as a human face. This is the huge drawback circle) within a given region of image. It
pooling techniques to bubble the stronger of CNN algorithm and happens due to outputs a vector that has two properties;
features to derive colors, textures, edges its inability to store the information on Length and Orientation. Length represents
and shapes and establish structures relative position of various objects. the probability of the presence of the
through lower to highest layers. object and Orientation represents the
Capsule Network, invented by Geoffery
pose parameters of the object such as
Given the face of a person, CNN identifies Hinton, addresses exactly this problem of
coordinates, rotation angle, etc.
the face by establishing eyes, ears, CNN by storing the spatial relationships of
eyebrows, lips, chin, etc. components various parts. Capsules in higher layers called routing
of the face. However, if the facial image capsules, detect larger and more complex
Capsule Network like CNN are multi
is provided with incorrect position and objects, such as eyes, ears, etc.
layered neural networks, consisting of

External Document © 2019 Infosys Limited


Figure 7.0: A simple CapsNet with 3 layers. This model gives comparable results to deep convolutional networks. Source: Dynamic
Routing Between Capsules, Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton

Figure 8.0: Capsule Network for House or Boat classification. Source: Beginners’ Guide to Capsule Networks

Routing by Agreement considerably larger than the bets on the CNN, during edge detection, kernel
presence of a boat or a car. for edge detection works only on a
Unlike CNN which primarily bubbles higher
specific angle and each angle requires
order features using max. or avg. pooling, Advantage over CNN a corresponding kernel. When dealing
Capsule Network bubbles up features
· Less data for training - Capsule with edges CNN works well, because
using routing by agreement, where every
Networks need very less data for there are very few ways to describe
capsule participates in choosing the shape
training (almost 10%) as compared to an edge. Once we get up to the level
by voting (democratic election way).
CNN of shapes, we do not want to have a
In the figure given above, kernel for every angle of rectangles,
· Fewer parameters: The connections
· Lower level corresponds to rectangles, ovals, triangles, and so on. It would
between layers require fewer
triangles and circles. get unwieldy, and would become
parameters as capsule groups neurons,
even worse when dealing with more
· High level corresponds to houses, resulting in relatively less computations
complicated shapes that have 3
boats, and cars. bandwidth
dimensional rotations and features like
If there is an image of a house, the · Preserve pose and position - They lighting, the reason why traditional
capsules corresponding to rectangles preserve pose and position information neural nets do not handle unseen
and triangles will have large activation as against CNN rotations effectively.
vectors. Their relative positions (coded · High accuracy - Capsule Networks Capsule Networks are best suited for
in their instantiation parameters) will bet have higher accuracy as compared to object detection and image segmentation
on the presence of high-level objects. CNNs while it helps better model hierarchical
Since they will agree on the presence of relationships and provides high accuracy.
house, the output vector of the house · Reconstruction vs mere classification
However, Capsule Networks are still under
capsule will become large. This, in turn, - CNN helps you to classify the images,
research and relatively new and mostly
will make the predictions by the rectangle but not reconstruct the same image
tested and benchmarked on MNIST
and the triangle capsules larger. This whereas Capsule Networks help you to
dataset, but they will be the future in
cycle will repeat 4-5 times after which the reconstruct the exact image.
working with massive use cases emerging
bets on the presence of a house will be · Information retention vs loss - With from Vision datasets.

External Document © 2019 Infosys Limited


Meta Learning
Traditional methods of learning in Machine people learn and memorize with one Optimizer Meta-Learning
Learning focuses on taking a huge labeled instance of visual or auditory scan. Some
In this method, the emphasis is on
dataset and then learning to detect y people need multiple perspectives to
optimizing the neural network and its
(dependent variable, say classifying an strengthen the neural connections for
hyper- parameters. A great example of
image as cat or dog) and given set of x permanent memory. Some remember by
optimizer meta-learning are models that
(independent variables, images of cats writing while some remember through
are focused on improving gradient descent
and dogs). This process involves selection actual experiences. Meta Learning tries
techniques.
of an algorithm such as Convolution to leverage these to build its learning
Neural Net, and arriving at various hyper characteristics. Metric Meta-Learning
parameters such as number of layers
in the network, number of neurons in Types of Meta-Learning Models In this learning method, the metric space
Like the variety in human learning is narrowed down to improve the focus of
each layer, learning rate, weights, bias,
techniques, Meta Learning also uses learning. Then the learning is carried out
dropouts, activation function to activate
various learning methods based on only in this metric space by leveraging
the neuron such as sigmoid, tanh and
patterns of problems such as those based various optimization parameters that are
Relu. The learning happens through
on boundary space, amount of data, by established for the given metric space.
several iterations of forward and backward
passes (propagation) by readjusting (also optimizing size of neural network or using
Recurrent Model Meta-Learning
called learning) the weights based on recurrent network approach. Each of these
difference in the loss (actual vs computed). are briefly discussed inline. This type of meta-learning model is tailored
At the minimal loss, the weights and to Recurrent Neural Networks(RNNs)
Few Shots Meta-Learning such as Long-Short-Term-Memory(LSTM).
other network parameters are frozen
and are considered final model for future This learning technique focuses on In this architecture, the meta-learner
prediction tasks. This is obviously a long learning from a few instances of data. algorithm will train a RNN model to process
and tedious process and repeating this for Typically, Neural Nets need millions of a dataset sequentially and then process
every use case or task is engineering, data, data points to learn, however Few Shots new inputs from the task. In an image
and compute intensive. Meta- Learning uses only a few instances classification setting, this might involve
of data to build models. Examples being passing in the set of (image, label) pairs
Meta Learning focuses on how to learn to
Facial recognition systems using Single of a dataset sequentially, followed by new
learn. It is one of the fascinating discipline
Shot Learning, this is explained in detail in examples which must be classified. Meta-
of artificial intelligence. Human beings
Single Shot Learning section. Reinforcement Learning is an example of
have varying styles of learning. Some
this approach.

Transfer Learning (TL)


Humans can learn from their own existing However, with Transfer Learning, you can With Transfer Learning approach, you can
experiences or experiences they have introduce an additional layer on top of the reuse the existing pre-trained weights of an
heard, seen or observed. Transfer Learning existing pre-trained layer to start detecting existing trained model, with significantly
discipline of AI is based on similar traits of airplanes. less number of images ( 5 to 10 percent
human learning where new models can of actual images needed for training
Typically, in a no Transfer Learning
learn and benefit from existing trained ground up model) for the model to start
scenario, model needs to be trained and
model. detecting. As the pre-trained model has
during training right weights are arrived
already learnt some basic learning around
For example, if a Computer Vision based at by doing many iterations (epochs) of
identifying edges, curves and shapes in the
detection model, with no Transfer forward and back propagation, which takes
earlier layers, it needs to learn only higher
Learning, that already detects various significant amount of computation power,
order features specific to airplanes with
types of vehicles such as cars, trucks and and time. In addition, Vision models need
the existing computed weights. In brief,
bicycles needs to be trained to detect an significant amount of image data such as,
Transfer Learning helps eliminate the need
airplane, then, you may have to retrain the in this example, images of airplanes to be
to learn anything from scratch.
full model with images of all the previous trained.
objects.

External Document © 2019 Infosys Limited


Transfer Learning helps in saving recognition model. usage is for experimentative purpose.
significant amount of data, computational
Another key thing during Transfer Learning Earlier, having used the human brain
power and time in training new models as
is that it is important to understand the rationale, it is important to note that,
they leverage pre-trained weights from the
details of the data on which new use cases human brains have gone through centuries
existing trained models and architectures.
are being trained; as it can implicitly push of experiences and gene evolution and has
However, it is important to understand
the built-in biases from the underlying data the ability to learn faster, whereas transfer
that Transfer Learning approach, today,
into newer systems. It is recommended learning is just a few decades old and is
is only matured enough to be applied to
that the datasheets of underlying models becoming ground for new vision and text
similar use cases, that is, you cannot use
and data be studied thoroughly unless the use cases.
the above discussed model to train a facial

Figure 9.0: Transfer Learning Layers. Source: John Cherrie, Training Deep Learning Models with Transfer Learning.

External Document © 2019 Infosys Limited


Single Shot Learning
Humans have the impressive skill to reason Single Shot Learning based system using threshold. The model training approach
about new concepts and experiences with existing pre-trained FaceNet model and involves creating pairs of (Anchor, Positive)
just a single example. They have the ability facial encoding based approach on top of and (Anchor, Negative) and training the
for one-shot generalization: the aptitude it can be very effective to establish face model in a way where (Anchor, Positive)
to encounter a new concept, understand similarity by computing distance between pair distance difference is smaller and
its structure, and then generate compelling the faces. (Anchor, Negative) distance is farther.
alternative variations of the same.
In this approach, 128 bit encoding of each “Anchor” is the image of a person for whom
Facial recognition systems are good face image is generated and compared the recognition model needs to be trained.
candidates for Single Shot Learning, with other image’s encoding to determine
“Positive” is another image of the same
otherwise needing ten thousands of if the person is same or different.
person.
individual face images to train one neural Various distance based algorithms such
network can be extremely costly, time as Euclidean distance can be used to “Negative” is image of a different person.
consuming and infeasible. However, a determine if they are within specified

Figure 10.0: Encoding approach inspired from ML Course from Coursera

External Document © 2019 Infosys Limited


Deep Reinforcement Learning (RL)
This is a specialized Machine Learning bring in human level performance on the robots, driverless car, etc.
discipline where an agent learns to behave given task.
In reinforcement learning, policy, p,
in an environment by getting reward or
Deep Reinforcement Learning has found controls what action we should take. Value
punishment for the actions performed. The
significant relevance and application in function, v, measures how good it is to be
agent can have an objective to maximize
various game design systems such as in a particular state. The value function
short term or long-term rewards. This
creating video games, chess, alpha Go, tells us the maximum expected future
discipline uses deep learning techniques to
Atari, as well as in industrial applications of reward the agent will get at each state.

State
Qtable Q value
Action

Q* learning

Q value action 1
Deep
State Q Neural Q value action 2
network
Q value action 3
Deep Q* learning
Figure 11.0: Schema inspired by the Q learning notebook by Udacity

Three Approaches to Reinforcement Learning


Value Based mathematical function to arrive at a state of the reward an agent can expect to
based on action. accumulate over the future, starting at that
In value-based RL, the goal is to optimize
state.
the value function V(s). Qtable uses any The value of each state is the total amount

Reward
Expected Given that state
discounted

The agent will use this value function to There are two types of policies:
select which state to choose at each step.
1. Deterministic: A policy which at a given
Policy Based state will always return the same action.

In policy-based RL, we want to directly 2. Stochastic: A policy that outputs a


optimize the policy function π(s) without distribution probability over actions.
using a value function.
Value based and Policy based are more
action = policy(state)
The policy is what defines the agent conventional Reinforcement Learning
behavior at a given time. approaches. They are useful for modeling
relatively simple systems.

External Document © 2019 Infosys Limited


Model Based approach is, each environment needs a Deep Mind released AlphaGo Zero in late
dedicated trained model. 2017 which beat AlphaGo and did not
In model-based RL, we model the
involve any training from previous games
environment. This means we create a AlphaGo was trained by using data from
data to train deep network. The deep
model of the behavior of the environment; several games to beat the human being
network training was done by picking
then this model is used to arrive at results in the game of Go. The training accuracy
the training samples from AlphaGo and
that maximises short term or long-term was just 57% and still it was sufficient to
AlphaGo Zero playing games against
rewards. The model equation can be beat the human level performance. The
itself and selecting best moves to train
any equation that is defined based on training methods involved reinforcement
the network and then applying those
the environments behavior and must be learning and deep learning to build a
in real games to improve the results
sufficiently generalized to counter new policy network that tells what moves are
iteratively. This is possible because deep
situations. promising, and a value network that tells
reinforcement learning algorithms can
how good the board position is. Searches
When Model based approach uses Deep store long-range tree search results for the
for the final move from these networks
Neural Network algorithms to sufficiently next best move in memory and do very
is done using Monte Carlo Tree Search
well generalize and learn the complexities large computations that are difficult for a
(MCTS) algorithm. Using supervised
of the environment to produce optimal human brain.
learning, a policy network was created to
results, it is called Deep Reinforcement
imitate the expert moves.
Learning. The challenge with model based

Auto ML (AML)
Designing machine learning solution As the complexity of these and other tasks intensive, and requires an expertise that
involves several steps such as, collecting can easily get overwhelming, the rapid limits its use to a smaller community of
data, understanding, cleansing and growth of machine learning applications scientists and engineers. That’s why we’ve
normalizing data, doing feature has created a demand for off-the-shelf created an approach called AutoML,
engineering, selecting or designing machine learning methods that can be showing that it’s possible for neural nets
the algorithm, selecting the model used easily and without expert knowledge. to design neural nets” while Google’s
architecture, selecting and tuning model’s The AI research area that encompasses Head of AI, Jeff Dean, suggested that 100x
hyper-parameters, evaluating model’s progressive automation of machine computational power could replace the
performance, deploying and monitoring learning pipeline tasks is called AutoML need for machine learning expertise.
the machine learning system in an online (Automatic Machine Learning).
AutoML Vision relies on two core
system and so on. Such machine learning
Google CEO Sundar Pichai wrote, techniques: transfer learning and neural
solution design requires an expert Data
“Designing neural nets is extremely time architecture search.
Scientist to complete the pipeline.

AutoML system
Bayesian Optimization
Hand-crafted
portfolio

{ Xtrain, Ytrain, Meta Build


Data Feature Ytest,
Xtest, budget } Learning Classifier Ensemble
Processor Preprocessor

ML Pipeline

Figure 12.0: An example of Auto sklearn pipeline. Source: André Biedenkapp, We did it Again: World Champions in AutoML

External Document © 2019 Infosys Limited


Implementing AutoML >>> cls = autosklearn.classification. algorithms and compute results. At
AutoSklearnClassifier() present, GPU resources are extremely
Here is a look at the few libraries that help
costly to execute even simple Machine
in implementing AutoML >>> cls.fit(X_train, y_train)
Learning workloads such as CNN algorithm
AUTO-SKLEARN >>> predictions = cls.predict(X_test, to classify objects. If multiple such
y_test) alternate algorithms should be executed,
AUTO SKLEARN automates several key
SMAC (Sequential Model-Based the computation dollar needed would be
tasks in Machine Learning pipeline such
Algorithm Configuration) exponential. This is impractical, infeasible
as addressing column missing values,
and inefficient for the current state of Data
encoding of categorical values, data scaling SMAC is a tool for automating certain
Science industry. Adoption of AutoML will
and normalization, feature pre-processing, AutoML steps. SMAC is useful for selection
depend on two things, one, the maturity
and selection of right algorithm with of key features, hyper-parameter
of AutoML pipeline and second but more
hyper-parameters. The pipeline supports optimization and to speed up algorithmic
important, how quickly GPU clusters
15 Classification and 14 Feature processing outputs.
become cheap. The second being most
algorithms. Selection of right algorithm can
BOHB (Bayesian Optimization critical. Selling Cloud GPU capacity could
happen based on ensembling techniques
Hyperband searches) be one of the motivation of several cloud
and applying meta knowledge gathered
based infrastructure-running companies
from executing similar scenarios (datasets BOHB combines Bayesian hyper parameter
to promote AutoML in the industry. Also,
and algorithms). optimization with bandit methods for
AutoML will not replace the Data scientist’s
faster convergence.
Usage work but can provide augmentation
Google, H2O also have their respective and speed to certain tasks such as data
Auto-sklearn is written in python and can
AutoML tools which are not covered here standardization, model tuning and
be considered as replacement for scikit-
but can be explored in specific cases. trying multiple algorithms. It is only the
learn classifiers. Here is a sample set of
commands. AutoML needs significant memory and beginning for AutoML but this technique
computational power to execute alternate has high relevance and usefulness for
>>> import autosklearn.classification
solving ultra-complex problems.

External Document © 2019 Infosys Limited


Neural Architecture Search (NAS)
Neural Architecture Search (NAS) is a iterations. Since AlexNet deep neural influencers such as applicability to the
component of AutoML and addresses network architecture won the ImageNet problem, accuracy, number of parameters,
the important step of designing Neural (image classification based on ImageNet memory and computational footprint and
Network Architecture. dataset) competition in 2012, several size of the architecture that govern the
architecture styles such as VGG, ResNet, overall functioning efficiency.
Designing fresh Neural Net architecture
Inception, Xception, InceptionResNet,
involves an expert establishing and Neural Architecture Search tries to address
MobileNet and NASNet have significantly
organizing Neural Network layers, filters this problem space by automatically
evolved. However, selection of the right
or channels, filter sizes, selecting other selecting right Neural Network architecture
architecture for the right problem is also
optimum Hyper parameters and so on to solve a given problem.
a skill due to the presence of various
through several rounds of computational

AutoML

Hyperparameter
Optimization

NAS

Figure 13.0: Source: Liam Li, Ameet Talwalkar, What is neural architecture search

External Document © 2019 Infosys Limited


Key Components of NAS

Components of NAS
Optimization Evaluation
Search Space
Method Method

DAG Representation Reinforcement Learning Full Training

Cell Block Evolutionary Search Partial Training

Meta-Architecture Gradient-Based Weight-Sharing


Optimization
Network Morphism NAS Specific
Bayesian Hypernetworks
Optimization

Figure 14.0: Components of NAS. Source: Liam Li, Ameet Talwalkar, What is neural architecture search.

Search space: The search space provides These are also usually hand crafted by could be done using full training approach
boundary within which the specific expert data scientists. or doing partial training and then applying
architecture needs to be searched. certain specialized methods such as partial
Optimization method: This is responsible
Computer Vision (captioning the scene, training or early stopping, weights sharing,
for providing mechanism to search the
or product identification) based use network morphism, etc.
best architecture. It could be searched
cases would need a different neural
and applied randomly or using certain For selective problem spaces, as
network architecture style, as against
statistical or Machine Learning evaluation compared to manual methods, NAS have
Speech (speech transcription, or speaker
approach such as Bayesian method or outperformed and is showing definite
classification) or unstructured Text (Topic
reinforcement learning methods. promise for future. However, it is still
extraction, intent mining) based use cases.
evolving and not ready for production
Search space tries to provide available Evaluation method: This has the role
usages as several architectures need to be
catalogs of best in class architectures based of evaluating the quality of architecture
established and evaluated depending on
on other domain data and performance. considered by optimization method. It
the problem space.

External Document © 2019 Infosys Limited


Addressing H3 AI Trends at Infosys
In this paper, we looked at some key H3 AI things possible and looks highly promising. interesting client problems. Here is a look
areas; by no means this is an exhaustive We are keenly experimenting with these, at how we are employing these H3 trends
list. Amongst all discussed, Transfer building early use cases, and integrating in the work we do.
Learning, Capsule Networks, Explainable into our product stack Infosys Enterprise
AI, Generative AI are making interesting Cognitive platform (iECP) to solve

# Trend Use cases

Applicable across where results need to be traced e.g. Tumor


1 Explainable AI (XAI)
Detection, Mortgage Rejection, Candidate Selection, etc.

Generative AI, Neural Style Art Generation, Sketch Generation, Image or Video Resolution
2
Transfer (NST) Improvements, Data Generation/Augmentation, Music Generation

Vehicle Classification,
3 Fine Grained Classification
Type of Tumor Detection

Image Re-construction,
4 Capsule Networks
Image Comparison/Matching

Intelligent Agents, Continuous Learning scenarios


5 Meta Learning
for document review and corrections.

Identifying person not wearing helmet Logo/brand detection in the


6 Transfer Learning
image Speech Model training for various accents, vocabularies

7 Single Shot Learning Face Recognition, Face Verification

Deep Reinforcement Learning Intelligent Agents, Robots, Driverless cars, Traffic Light Monitoring,
8
(RL) Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction,


9 Auto ML
Document Classification, Document Clustering

CNN or RNN based use cases such as Image Classification, Object


10 Neural Architecture Search (NAS)
Identification, Image Segmentation, Speaker Classification, etc.

Table 2.0: AI Use cases, Infosys Research

External Document © 2019 Infosys Limited


Reference:
1. Explainable AI (XAI)
• https://christophm.github.io/interpretable-ml-book/

• https://simmachines.com/explainable-ai/

• https://www.cmu.edu/news/stories/archives/2018/october/explainable-ai.html

• https://medium.com/@QuantumBlack/making-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

• https://towardsdatascience.com/explainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

2. Fine Grained Classification


• https://vision.cornell.edu/se3/wp-content/uploads/2015/02/BMVC14.pdf

3. Capsule Networks
• https://arxiv.org/pdf/1710.09829.pdf

• https://keras.io/examples/cifar10_cnn_capsule/

• https://www.youtube.com/watch?v=pPN8d0E3900

• https://www.youtube.com/watch?v=rTawFwUvnLE

• https://medium.freecodecamp.org/understanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

4. Meta Learning
• https://medium.com/@jrodthoughts/whats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-
e0f6651a39f0

• http://proceedings.mlr.press/v48/santoro16.pdf

• https://towardsdatascience.com/whats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

5. Transfer Learning
• https://www.fast.ai/2018/07/23/auto-ml-3/

6. Single Shot Learning


• https://arxiv.org/pdf/1603.05106.pdf

7. Deep Reinforcement Learning (RL)


• https://deepmind.com/blog/article/deep-reinforcement-learning

• https://medium.freecodecamp.org/an-introduction-to-reinforcement-learning-4339519de419

• https://medium.com/@jonathan_hui/alphago-zero-a-game-changer-14ef6e45eba5

• https://arxiv.org/pdf/1811.12560.pdf

8. Auto ML
• https://www.ml4aad.org/automated-algorithm-design/algorithm-configuration/smac/

• https://www.fast.ai/2018/07/23/auto-ml-3/

• https://www.fast.ai/2018/07/16/auto-ml2/#auto-ml

• https://competitions.codalab.org/competitions/17767

• https://www.automl.org/automl/auto-sklearn/

• https://www.ml4aad.org/automated-algorithm-design/algorithm-configuration/smac/

• https://automl.github.io/HpBandSter/build/html/optimizers/bohb.html

External Document © 2019 Infosys Limited


9. Neural Architecture Search (NAS)
• https://www.oreilly.com/ideas/what-is-neural-architecture-search

10. Infosys Enterprise Cognitive Platform


• https://www.infosys.com/services/incubating-emerging-technologies/offerings/Pages/enterprise-cognitive-platform.aspx

About the author


Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP),
a microservices API based Artificial Intelligence platform. He has over 21 years of experience
in creating products, solutions and working with clients on industry problems. His current
areas of interests are Computer Vision, Speech and Unstructured Text based AI possibilities.

To know more about our work on the H3 trends in AI, write to [email protected].

For more information, contact [email protected]

© 2019 Infosys Limited, Bengaluru, India. All Rights Reserved. Infosys believes the information in this document is accurate as of its publication date; such information is subject to change without notice. Infosys
acknowledges the proprietary rights of other companies to the trademarks, product names and such other intellectual property rights mentioned in this document. Except as expressly permitted, neither this
documentation nor any part of it may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, printing, photocopying, recording or otherwise, without the
prior permission of Infosys Limited and/ or any named intellectual property rights holders under this document.

Infosys.com | NYSE: INFY Stay Connected

You might also like