Machine Learning 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

International Journal of Computer Applications (0975 – 8887)

Volume 115 – No. 9, April 2015

Applications of Artificial Intelligence in Machine


Learning: Review and Prospect
Sumit Das Aritra Dey Akash Pal Nabamita Roy
Department of IT Department of EE Department of CSE Department of CSE
JIS College of JIS College of JIS College of JIS College of
Engineering, Kalyani, Engineering, Kalyani, Engineering, Kalyani, Engineering, Kalyani,
India India India India

ABSTRACT in the last two years itself and the inclusion of machine
Machine learning is one of the most exciting recent learning library known as Mahout into Hadoop ecosystem
technologies in Artificial Intelligence. Learning algorithms in has enabled to encounter the challenges of Big Data,
many applications that’s we make use of daily. Every time a especially unstructured data.
web search engine like Google or Bing is used to search the In the area of machine learning research the emphasis is
internet, one of the reasons that works so well is because a given more on choosing or developing an algorithm and
learning algorithm, one implemented by Google or conducting experiments on the basis of the algorithm. Such
Microsoft, has learned how to rank web pages. Every time highly biased view reduces the impact or real world
Facebook is used and it recognizes friends' photos, that's also applications.
machine learning. Spam filters in email saves the user from
having to wade through tons of spam email, that's also a In this paper the various applications under the appropriate
learning algorithm. In this paper, a brief review and future category of machine learning has been highlighted. This
prospect of the vast applications of machine learning has paper makes an effort to bring all the major areas of
been made. applications under one umbrella and present a more general
and realistic view of the real world applications. Apart from
Keywords this two application suggestions have been presented
Artificial intelligence, Machine learning, Supervised forward. The field of machine learning is so vast and ever
learning, Unsupervised learning, Reinforcement learning growing that it proves to be useful in automating every facet
Applications. of life.

1. INTRODUCTION 2. MACHINE LEARNING


An Artificial Intelligence (AI) program is called Intelligent According to Arthur Samuel Machine learning is defined as
Agent. Intelligent agent gets to interact with the environment. the field of study that gives computers the ability to learn
The agent can identify the state of an environment through its without being explicitly programmed. Arthur Samuel was
sensors and then it can affect the state through its actuators. famous for his checkers playing program.
Initially when he developed the checkers playing program,
Arthur was better than the program. But over time the
checkers playing program learned what were the good board
positions and what were bad board positions are by playing
many games against itself.
A more formal definition was given by Tom Mitchell as a
computer program is said to learn from experience (E) with
The important aspect of AI is the control policy of the agent respect to some task (T) and some performance measure (P),
which implies how the inputs obtained from the sensors are if its performance on T, as measured by P, improves with
translated to the actuators, in other words how the sensors are experience E then the program is called a machine learning
mapped to the actuators, this is made possible by a function program.
within the agent.
In the checkers playing example the experience E, was the
The ultimate goal of AI is to develop human like intelligence experience of having the program playing games against
in machines. However such a dream can be accomplished itself. The task T was the task of playing checkers. And the
through learning algorithms which try to mimic how the performance measure P, was the probability that it won the
human brain learns. next game of checkers against some new opponent.
Machine learning, which is a field that had grown out of the In all fields of engineering, there are larger and larger data
field of artificial intelligence, is of utmost importance as it sets that are being understood using learning algorithms.
enables the machines to gain human like intelligence without
explicit programming. 3. TYPES OF MACHINE LEARNING
However AI programs do the more interesting things such as ALGORITHMS
web search or photo tagging or email anti-spam. So, machine 3.1 Supervised Learning
learning was developed as a new capability for computers This learning process is based on the comparison of
and today it touches many segments of industry and basic computed output and expected output, that is learning refers
science. There is autonomous robotics, computational to computing the error and adjusting the error for achieving
biology. Around 90% of the data in the world was generated

31
International Journal of Computer Applications (0975 – 8887)
Volume 115 – No. 9, April 2015

the expected output. For example a data set of houses of form a group of different individuals such that each of them
particular size with actual prices is given, then the supervised has a certain gene. So a clustering algorithm can be run to
algorithm is to produce more of these right answers such as group individuals into different categories or into different
for new house what would be the price. types of people. So this is Unsupervised Learning because
the algorithm is not given any information in advance
3.2 Unsupervised Learning whether there are type 1 people, type 2 persons, and type 3
Unsupervised learning is termed as learned by its own by persons and so on. Instead a bunch of data is given and the
discovering and adopting, based on the input pattern. In this algorithm automatically finds structure in the data into these
learning the data are divided into different clusters and hence types of individuals. [23]
the learning is called a clustering algorithm. One example
where clustering is used is in Google News (URL
news.google.com). Google News groups new stories on the
web and puts them into collective news stories.

3.3 Reinforcement Learning


Reinforcement learning is based on output with how
an agent ought to take actions in an environment so as to
maximize some notion of long-term reward. A reward is
given for correct output and a penalty for wrong output.
Reinforcement learning differs from the supervised
learning problem in that correct input/output pairs are never
presented, nor sub-optimal actions explicitly corrected.

3.4 Recommender Systems


Recommender systems can be defined as a learning
techniques by virtue of which online user can customize their
sites to meet customer’s tastes. For example, online user can
get a rating of a product or/ and related items when he/she Figure 3: DNA microarray data
searching an items because of the existing recommender
system. That is why it changed the way people find products, 4.1.2 Organizing large computer clusters:
information, and even other people. There are mainly two At large data centers that are large computer clusters,
approaches: content based recommendation and collaborative unsupervised learning helps to figure out which machines
recommendation, which help the user for obtaining and tend to work together, so that if those machines are put
mining data, making intelligent and novel recommendations, together or if there is some crisis, then the data centers can
ethics. Most e-commerce site uses this system. [58] work more efficiently. [16]

4.1.3 Social network analysis:


Unsupervised Machine learning algorithms can automatically
identify the friends within a user circle in Facebook or
Google, or it can identify the maximum number of mails sent
to a particular person and categorize into collective groups.
It also identifies which are groups of people that all know
each other. [17]

4.1.4 Market segmentation:


Many companies have huge databases of customer
Figure-2: Types of Machine Learning information. So, Unsupervised Machine learning algorithms
can look at this customer data set and automatically discover
4. APPLICATIONS OF MACHINE market segments and automatically group customers into
LEARNING AND LITERATURE different market segments so that the company can
automatically and more efficiently sell or market the different
SURVEY market segments together. Again, this is Unsupervised
This section elaborates classified applications of machine Learning because it is not known in advance what the market
learning according to different machine learning algorithm segments are, or which customer belongs to which segment.
under supervised learning, unsupervised learning, [18]
reinforcement learning and recommender learning.
4.1.5 Astronomical data:
4.1 Unsupervised Learning 4.1.5.1 Astronomical data analysis:
In machine learning, the problem of unsupervised learning is These clustering algorithms give surprisingly interesting
that of trying to find hidden structure in unlabeled data. Since useful theories of how galaxies are born.
the examples given to the learner are unlabeled, there is no
error or reward signal to evaluate a potential solution. 4.1.5.2 Anomaly/Novelty detection in
4.1.1 DNA classification: Understanding astronomical data:
Modern astronomical observatories are very advanced and
genomics can produce massive amount of data which the researchers
Figure 3 shows a DNA microarray data, the colors, red,
don’t even have time to look at. Sometimes the researchers
green, gray and so on, show the degree to which different
even lack the adequate knowledge, experience and training to
individuals do or do not have a specific gene. The idea is to

32
International Journal of Computer Applications (0975 – 8887)
Volume 115 – No. 9, April 2015

deduce the exact significance or meaning of these data sets. It 4.1.9 Analysis of gene expression data: cancer
is not unusual that these large-scale astronomical data sets diagnosis:
can contain anomalies/novelties. Thus the need for machines
Cancer can be defined as a class of diseases that is
which can be trained to go through the data generated and in
characterized with out of control cell growth. There are about
the process detect any anomalies that may be present in the
a 100 different types of cancer claiming the lives of
data set (at a much faster rate and in most cases with better
innumerable people across the world. Thus identifying the
accuracy) becomes evident. Anomaly/Novelty Detection is
type of cancer is a crucial step in its treatment. It is done
the process of finding unusual things or characteristics which
through classification of patient samples. The classification
are different from our prevalent knowledge about the data.
process and results may be improved by analysing the gene
Anomalies detection problems are primarily of two types: 1)
expression of the patient which may provide additional
point anomaly - anomalies of this kind are individual
information to the doctors. The merger of medical science
celestial objects that present unusual characteristics. 2) group
and technology has already led to a lot of life saving
anomalies - this is an unusual collection of points. A group of
breakthroughs in the field of medicine. Thus the involvement
points can be considered abnormal either because it is a
of technology in fighting cancer is of no surprise. Machine
collection of anomalous points, or because that the way its
learning techniques, such as Bayesian networks, neural trees,
member points aggregate is unusual, even if the points
and radial basis function (RBF) networks, are used for the
themselves are perfectly normal. [30]
analysis of the datasets and classifying cancer types. These
techniques have their own properties including the ability of
finding important genes for cancer classification, revealing
relationships among genes, and classifying cancer. [33] [34]
[40] [48]

4.1.10 Speech Activity Detection (SAD):


Power of speech is a primary way for humans to express
themselves. Often the audio or speech contains silent pauses
which are pauses where speech is absent; this is where
speech activity detection (SAD) finds its application. SAD is
a technique used to detect the presence of human speech, it
can help reduce the load on human listeners by removing
long and noisy non-speech intervals. SAD is language
Figure 4: Summary of the Sloan Digital Sky Survey independent and can be of two types namely: Supervised and
(SDSS) data set. (a) The coverage map of SDSS. (b) A Unsupervised. Supervised SAD depends largely on the
sample imaging data. (c) A sample spectroscopic data i.e. training data so its use is limited to the availability of training
the spectrum. data and consistency of the test environment whereas
Unsupervised SAD is a feature-based technique where
4.1.6 The cocktail party problem: performance degrades with increase in noise. Speech activity
At a cocktail party with two people, two people talking at the detection (SAD) has applications in a variety of contexts
same time. Two microphones are put in the room at two such as speech coding, automatic speech recognition (ASR),
different distances from the speakers; each microphone speaker and language identification, and speech
records a different combination of these two speaker voices. enhancement. [53]
Maybe speaker one is a little louder in microphone one and
maybe speaker two is a little bit louder on microphone two 4.1.11 Acoustic Factor Analysis for Robust
because the 2 microphones are at different positions relative Speaker Verification:
to the 2 speakers, but each microphone would cause an Identification or recognition of the speaker by analysing the
overlapping combination of both speakers' voices. These two voice data for authentication is Speaker Recognition or
microphone recorders are given to an Unsupervised Learning Verification. Mismatch between training and test conditions
algorithm called the cocktail party algorithm. The cocktail represent one of the most challenging problems facing
party algorithm separates out these two audio sources that researchers in this field today. Some of the sources of
were being added or being summed together. [19] introduction of these mismatches are: transmission channel
differences, handset variability, background noise, and
4.1.7 Medical records: session variability due to physical stress, vocal effort such as
With the advent of automation, electronic medical records whisper, Lombard effect, non-stationary environment, and
have become prevalent, so if medical records are turned into spontaneity of speech. In order to enable machines to
medical knowledge, then disease could be understood in a produce reliable and authenticate data researchers need to
better way. [21] [35] train them to eliminate or overcome these mismatches. One
of the ways in which this can be achieved is the analysis of
4.1.8 Computational biology: the acoustic factors which are supposed to represent the
Computational biology also known as bioinformatics is the listener’s efficiency in processing directional cues, while
use of biological data to develop algorithms and establish suppressing some unwanted channel components. [55]
relations among various biological systems. With automation
again, biologists are collecting lots of data about gene 4.2 Supervised Learning
sequences, DNA sequences, gene expression array analysis, Supervised learning is the machine learning task of inferring
combinatorial chemistry and so on, and machines running a function from labeled training data. The training data
algorithms are providing a much better understanding of the consist of a set of training examples. In supervised learning,
human genome, and what it means to be human. [22] [31] each example is a pair consisting of an input object (typically
[32] [45] [50] a vector) and a desired output value. A supervised learning
algorithm analyzes the training data and produces an inferred

33
International Journal of Computer Applications (0975 – 8887)
Volume 115 – No. 9, April 2015

function, which can be used for mapping new examples. 4.2.2 Handwriting recognition:
It turns out one of the reasons it's so inexpensive today to
4.2.1 E-mail data: route a piece of mail across the countries, is that when an
4.2.1.1 Automatic answering of incoming address is written on an envelope, it turns out there's a
messages: learning algorithm that has learned how to read the
Instead of typing out the same reply every time someone handwriting so that it can automatically route this envelope
emails with a common queries and problems, now machine on its way, and so it costs less. [4]
learning algorithms analyses those mails and automatically
generates a reply. This proves useful in case of large 4.2.3 Face recognition:
companies. [1] Human face is not unique, rigid object and numerous factors
cause the appearance of the face to vary. There are numerous
4.2.1.2 Automatic mail organization into folders: application areas where face recognition can be exploited
With the bulk amount of messages pouring daily it proves such as security measure at an ATM, areas of surveillance,
highly inconvenient for users to segregate the messages closed circuit cameras, image database investigation,
manually. Therefore machine learning proves to be most criminal justice system, and image tagging in social
beneficial by categorizing the mail automatically into various networking sites like Face book etc. [5]
user-defined inbox tabs such as primary, social, promotions,
update, forums etc. If a particular message from a particular 4.2.4 Speech recognition:
sender is moved from update tab to primary tab, then all All speech recognition software utilizes machine learning.
other future messages from that user will end up in the Speech recognition systems involve two distinct learning
primary tab. [1] phases: one before the software is shipped (training the
general system in a speaker-independent fashion), and a
4.2.1.3 Email and thread summarization: second phase after the user purchases the software (to
The incoming messages are analyzed and the important achieve greater accuracy by training in a speaker -dependent
sentences are extracted from the email thread and are fashion). [3]
composed into a summary. This summary is generated based
on special characteristics of email. [1] 4.2.5 Information retrieval:
Information retrieval (IR) is finding material (usually
4.2.1.4 Spam filtering: documents) of an unstructured nature (usually text) that
It is mainly used to filter unsolicited bulk Email (UBE), junk satisfies an information need from within large collections
mail, or unsolicited commercial email (UCE) from the (usually stored on computers). The user provides an outline
genuine e-mails. The spam filter saves the user from having of their requirements—perhaps a list of keywords relating to
to wade through tons of spam email, that's also a learning the topic in question, or even an example document. The
algorithm. The spam filter can also be learned by watching system searches its database for documents that are related to
which emails you do or do not flag as spam. So in an email the user’s query, and presents those that are most relevant.
client if spam button is clicked to report some email as spam, The information retrieval process can be divided into four
but not other emails and based on which emails are marked distinct phases: indexing, querying, comparison, and
as spam, the e-mail program learns better how to filter spam feedback. All phases of information retrieval can be
e-mail. [1] [29] performed manually, but automation has many benefits—
larger document collections can be processed more quickly
4.2.1.5 Email Batch Detection: and consistently, and new techniques can be easily
The problem of detecting batches of emails that have been implemented and tested. The instant availability of enormous
created according to the same template needs to be amounts of textual information on the Internet and in digital
addressed. This problem is motivated by the desire to filter libraries has provoked a new interest in software agents that
spam more effectively by exploiting collective information act on behalf of users, sifting through what is there to identify
about entire batches of jointly generated messages. Senders documents that may be relevant to users’ individual needs.
of spam, phishing, and virus emails avoid mailing multiple [10]
identical copies of their messages. Once a message is known
to be malicious, all subsequent identical copies of the 4.2.6 Operating system:
message could be blocked easily, and without any risk of One of the main purposes of using computers is to get the job
erroneously blocking regular emails. [27] done as fast as possible. In such a scenario it is important that
the applications start and respond quickly thus reducing he
waiting time for the user. Different computer users have
different usage preferences, which mostly refer to
applications that are used most frequently by the user. This
fact can be used by the underlying operating system for
predicting the user application choices and pre-fetching them
into the local memory for speedy start-up. This is achieved
with the help of inbuilt software which trains itself by
observing the actions of the user over time and learning from
them. The Super-Fetch subsystem present in the kernel of
Microsoft’s Windows Vista operating system is an example
of such a system. [2]

4.2.7 Natural language processing or computer


vision:
Figure 5: Classification accuracy with batch information These are the fields of AI pertaining to understanding

34
International Journal of Computer Applications (0975 – 8887)
Volume 115 – No. 9, April 2015

language or understanding images. Most of natural language problems. When not otherwise specified, text classification is
processing and most of computer vision today is applied implied. Text categorization (TC—a.k.a. text classification,
machine learning. or topic spotting), the activity of labelling natural language
texts with thematic categories from a predefined set, is one
4.2.8 Intrusion detection: such task. TC is now being applied in many contexts, ranging
Intrusion detection is the process of monitoring the events from document indexing based on a controlled vocabulary, to
that are occurring in the systems or networks and analyzing document filtering, automated metadata generation, word
them for signs of possible incidents, which are violations or sense disambiguation, etc. According to the machine
threats to computer security policies, acceptable use policies, learning (ML) paradigm a general inductive process
or standard security practices It is mainly of two types based automatically builds an automatic text classifier by learning,
on the intrusions first is Misuse or signature based detection from a set of preclassified documents, the characteristics of
and the other is Anomaly detection. [11] the categories of interest. [28] [29] [31].
4.2.9 Anomaly detection or recognizing 4.2.12.1 Automatic Indexing for Boolean
anomalies: Information Retrieval Systems:
Detection of unusual sequences of credit card transactions, Here each document is assigned one or more key words or
detection of unusual patterns of sensor reading in a nuclear key phrases describing its content, where these key words
power plant or unusual sound in car engine for such purpose and key phrases belong to a finite set called controlled
dynamic machine learning method is used where instead of dictionary. [28]
looking at individual operation, a sequence of operations are
analyzed as a whole so that it is more robust to minor shifts 4.2.12.2 Document Organization:
in legitimate behavior. [12] Indexing with a controlled vocabulary is an instance of the
general problem of document base organization. [28]
4.2.10 Signature based detection:
This technique of detection looks for evidence which 4.2.12.3 Text Filtering:
indicates misuse. In a network, predetermined attack patterns Text filtering is the activity of classifying a stream of
forms a signature and these signatures are used to determine incoming documents dispatched in an asynchronous way by
further network attacks .Machine learning enables an information producer to an information consumer. [28]
examination of the network traffic with predefined signatures
and each time database is updated. An example of Signature 4.2.12.4 Word Sense Disambiguation:
based Intrusion Detection System is SNORT. [13] Word sense disambiguation (WSD) is the activity of finding,
given the occurrence in a text of an ambiguous word, the
4.2.11 Epileptic Seizure Detection: sense of this particular word occurrence. [28]
Epilepsy is a central nervous system disorder, in which the
patient suffers from recurrent seizures that occur at 4.2.12.5 Hierarchical Categorization of Web
unpredictable times and usually without warning. Seizures Pages:
can result in a lapse of attention or a whole-body convulsion. Automatic classification of Web pages, or sites, under
Frequent seizures increase an individual’s risk of sustaining hierarchical catalogues.[28]
physical injuries and may even result in death. With the help
of supervised learning we can to construct patient-specific 4.2.13 Data Center Optimization:
detectors capable of detecting seizure onsets quickly and with The modern data center (DC) is a complex interaction of
high accuracy. These classifiers detect the onset of an multiple mechanical, electrical and controls systems. The
epileptic seizure through analysis of the scalp sheer number of possible operating configurations and
electroencephalogram (EEG), a non-invasive measure of the nonlinear interdependencies make it difficult to understand
brain’s electrical activity. [25] and optimize energy efficiency. One of the most complex
challenges is power management. Growing energy costs and
environmental responsibility have placed the DC industry
under increasing pressure to improve its operational
efficiency. The application of machine learning algorithms to
existing monitoring data provides an opportunity to
significantly improve DC operating efficiency. A typical
large-scale DC generates millions of data points across
thousands of sensors every day, yet this data is rarely used
for applications other than monitoring purposes. Advances in
processing power and monitoring capabilities create a large
opportunity for machine learning to guide best practice and
improve DC efficiency. The objective is to provide a data
driven approach for optimizing DC performance. Neural
Figure 6: A seizure within the scalp EEG of a patient network is selected as the mathematical framework for
training DC energy efficiency models. Neural networks are a
4.2.12 Automated Text Categorization: class of machine learning algorithms that mimic cognitive
In document categorization, the texts are catalogued and it is behaviour via interactions between artificial neurons. They
a problem in library science, information science and are advantageous for modelling intricate systems as they
computer science. The task is to assign a document to one or search for patterns and interactions between features to
more classes. This may be done algorithmically. The automatically generate a best fit model. As with most
documents to be classified may be texts, images, music, etc. learning systems, the model accuracy improves over time as
Each kind of document possesses its special classification new training data is acquired. [26]

35
International Journal of Computer Applications (0975 – 8887)
Volume 115 – No. 9, April 2015

4.2.14 SVM and Dimensionality Reduction in


Cognitive Radio:
With the passing time machine learning is gaining popularity
and finding usage in various fields. Cognitive radio is one
such field where machine learning finds its application.
Cognitive radio is a radio that allows dynamic programming
options and is designed to provide more wireless
communication by detecting and using the best wireless
channels available in a given area. The control of degrees of
freedom (DOF) i.e. dimensionality reduction is considered to
be the initial phase for radar and sensing signal processing.
Whereas, SVM or Support Vector Machines are models that
use a learning algorithm for developing pattern recognition Figure 7: Accuracy of long-term prediction
and classification capabilities. These two approaches i.e.
dimensionality reduction and SVM can be applied jointly to
4.2.17 Semantic Scene Classification:
In pattern recognition scene classification is a very common
Cognitive Radio thus obtaining performance enhancements
task in which the system scans a picture and analyzes the
while classifying wireless signal data. The reason behind this
various elements in it and uses them to categorize the image
gain in performance measure is that application of
to a certain class or group. In this process often such a
dimensionality reduction helps in doing away with the
situation may occur when the classes are not mutually
redundant signal data thus improving classification by the
exclusive by definition. For example a semantic scene
SVM method. [32] [41] [42] [43]
classification, categorizes images into semantic classes such
4.2.15 Classification of Software Engineering as beaches, sunsets or parties, in which many images may
belong to multiple semantic classes. Figure 8 shows an image
Artifacts Using Machine Learning:
that falls under both beach and an urban scene categories.
A large amount of data is produced during the course of
This multiple classification however is not ambiguous as one
development of Software projects. The data generated in the
would think but is fully a member of each class (due to
process is not only vast in its quantity but also varying in the
multiplicity). The same can occur in other types of
nature of its contents; it can include a range of different type
categorisation like text or music categorization, medical
of information pieces like the deployment details of the
diagnosis, etc. Multi-label machine learning provides
software system, component analysis, object and class
methods that treats such cases differently and offers a
models etc; not only this, but the interrelations among these
solution to the challenges posed by them to the classic pattern
information documents provide further insights to the
recognition paradigm. [39]
project. It is natural that each one of these artifacts has some
distinguishing attributes which can be used to categorize the
data and hence make them more manageable and put them to
constructive uses. The problem to this approach is that
classifying such huge amount data which is ever on the
increase is no task for humans this is where machine learning
comes in. Machine learning can be used to develop a network
which uses the defining properties of the existing artifacts for
training itself in the task of classification and then carry on
with the task of categorizing the artifacts by itself. [36]

4.2.16 Computational Finance:


In today’s world the financial market is one of the most
unstable and unpredictable. One has to be on his feet
constantly in order to survive and be successful in this Figure 8: Image that falls under both beach and an urban
market. In such an environment where market crashes and scene categories
sustained periods of loss, are common phenomenon and
techniques of machine learning have emerged as the leading 4.2.18 Applications to music:
performance measures used in the industry. For example Music is a vast sphere. The amount of data and material
systems have bee developed where the future stock prices available here is huge and almost every individual hosts a
moves can be predicted by training an automated intelligent different taste from others when it comes to music. Naturally
agent that discover patterns in the stock prices dynamic right the need for classification arises. One can classify music in a
before a major market move. During the exploitation stage, number of ways as there exists an ocean of options to select
the agent observes current state of market. If a pattern from when it comes to choosing a feature on whose basis the
recognized that was seen before, agent gives a buy/sell classification is to be done. Musical data is complex and
signal. Examples of commonly considered features include often highly dimensional (when represented as audio) and
market volatility, total volume and amount of open interest. this is where machines come to our assistance as machine
[38] [44] [46] [47] learning is very well suited for working with such data.
Classification is just one of the innumerable tasks that can be
executed on such a data set as music using machine learning.
Among other tasks we have music genre classification, music
transcription, instrument classification, beat detection, blind
instrument separation, capturing musical features, such as
melody, harmony and rhythm to name a few. With the
digitalization of music, a new and rapidly growing research

36
International Journal of Computer Applications (0975 – 8887)
Volume 115 – No. 9, April 2015

area has emerged, called Music Information Retrieval (MIR) otherwise considered to be useless and are attributed to
which is a research focused on the extraction of information degrading the quality of the audio can be put to advantage
from music audio and musical scores. Music Production is and used to extricate important information via various
another relatively new field of research in this context. With techniques like Acoustic Environment Identification (AEI),
machine learning we are able to alleviate some of the Audio Forensics, and ballistic settings. Of these, the acoustic
distance between musician and machine. Neural networks environment identification (AEI) has a wide range of
can be applied both to music audio signals and MIDI (Music applications ranging from audio recording integrity
Instrument Digital Interface) data. For example a neural authentication to real-time crime localization/identification
network can be trained with the user’s music rating history whereas forensic audio enhancement can be used to reveal
and time stamps and then it can be used to select songs more subtle or idiosyncratic background sounds that could provide
suitable to the user’s activities during a day. [51] important investigative clues. [54]

4.2.19 Evolving Signal Processing for Brain- 4.3 Recommender system


Computer Interfaces (BCI): Recommender systems are a subclass of information-retrieval
A BCI or Brain Computer Interface is collaboration between system that seeks to predict the 'rating' or 'preference' that
a brain and a device or machine that reads the electrical user would give to an item, which allow the online customer
signals from the brain the uses them to guide some external to choose the best item.
activities like moving the cursor or a prosthetic limb. The
device or machine acts as an interface between the brain and 4.3.1 Mobile Learning Environments:
the object to be controlled by the brain. Figure 9 shows a Mobile learning (m-learning) means ―learning on the move‖
conceptual schematic overview of evolving BCI design which differs from typical e-learning where there is wastage
principles. Data obtained from sensors and devices within, of bandwidth [6]. Information can be easily accessed as and
on, and around a human subject are transformed into when desired due to the mobile or portable devices. So
informative representations via domain-specific signal pre- machine learning caters the learning process of different
processing. The resulting signals are combined to produce users by providing information which is customized to the
psychomotor state representations. These estimates may be preferences of the user. [7]
made available to the systems the subject is interacting with.
4.3.2 Computational advertisement:
[52]
Computational advertisement is a new scientific sub-
discipline, at the intersection of Large scale search and text
analysis, Information retrieval, Statistical modeling, Machine
learning, Classification, Optimization, Microeconomic,
Recommender systems. Computational advertisement is
almost the exact opposite of classical advertisement which
has, billions of opportunities, billions of creativities, totally
personalize-able, tiny cost per opportunity and much more
quantifiable. Computational advertisement intends to find the
"best match" between a given user in a given context and a
suitable advertisement. The context could be a user entering a
query in a search engine ("sponsored search"), a user reading
a web page ("content match" and "display ads") a user
watching a movie on a portable device, and so on. [9]

4.3.3 Sentiment analysis/ opinion analysis:


When we hear a person speak we hear the words as well as
the emotions in the person’s voice and if the conversation is
face to face we see their expressions as well. Textual data
captures the facts and information but it mostly fails to
capture the sentiments of the speaker leading to the
misinterpretation of the true essence of the words. This can
be seen as a loss of valuable information. Hence, sentiment
or overall opinion towards the subject matters— for example,
whether a product review is positive or negative. However
sentiment analysis can be challenging and it must update
itself with the ever complex use of statements to express and
so learning algorithm proves to be significantly beneficial to
that effect. Sentiment classification proves to be helpful in
business intelligence applications, movie reviews and
recommender systems. [15]
Figure 9: Conceptual schematic overview of evolving
BCI design principles
4.3.4 Database mining (DM):
With the growth of the web and automation came much
4.2.20 Acoustic Environment Identification larger data sets than ever before. In such a scenario an
important task is to maintain these data in such a way that
(AEI) and Audio Forensics:
can prove to be useful. Effective algorithms need to be
An audio recording are prone to a number of possible
developed that can use this data to learn and serve the users
distortions and artifacts like acoustic reverberation,
more efficiently. For example, tons of Silicon Valley
background noise, etc. These disturbances which are
companies are today collecting web click data, also called

37
International Journal of Computer Applications (0975 – 8887)
Volume 115 – No. 9, April 2015

click stream data, and are trying to use machine learning 4.4.4 Stock market analysis:
algorithms to mine this data to understand and serve the users The stock market and its trends keep changing day in and day
better. [20] [37] [49] out and in order to be able to make profit and survive in this
financial market proper understanding of it and prediction
skills are must haves. Although many lack that insight and
the task is tedious and keeps getting difficult with the
evolution of the business world, the obvious solution to this
is computers. Machine learning has been extensively used for
prediction of financial markets. Popular algorithms, such as
support vector machine (SVM) and reinforcement learning
have been quite effective in tracing the stock market and
maximizing the profit of stock option purchase while keeping
the risk low. It also incorporates sentiment analysis which
considers the opinions of the general investors in addition to
that the global stock data is included to predict the next-day
stock trend. [14]

4.4.5 Semantic Annotation of Ubiquitous


Learning Environments:
In today’s world practical knowledge is gaining importance
Figure 10: Comparative results on ML techniques
fast in almost every field. It not only helps in acquiring
applied to DM tasks over the period 1995-2004.
practical skills which are more helpful on-field but also gives
4.3.5 Self- customizing programs: better understanding of the subject to the person studying it.
Today we resort to the internet in order to meet a host of our Moreover evaluation of skill- based learning systems helps
needs like listening to music and watching videos online, researchers to better understand how students are learning.
downloading songs, movies, apps etc, shopping, banking, The use of semantic annotations as part of a skills-based
making reservations, making travelling arrangements and so learning environment is very useful in this case. Simulations
on. It is common experience for users to get of real life situations help in the promotion and of practical
recommendations from the sites that they visit based on their skills like decision making, team working, communication,
activity history on that site. The sites achieve this by means and problem solving. They can be incorporated in the process
of a learning algorithm which learns by observing the user of assessment of students’ performance. The University of
activities and choices over time and customizes itself to the Southampton for example has such a clinical skills
users’ preferences. Learning algorithms are being used today laboratory; Figure 11. The ward contains computerized and
to understand human learning and to understand the brain. interactive simulated mannequins, non-computerized
mannequins, and a range of equipments which provides
4.4 Reinforcement learning clinical set-up and activities for the students. The students are
Reinforcement learning is an area of machine learning given a number of tasks to perform and the computerized
inspired by behaviorist psychology, concerned with how mannequins are programmed to alter their parameters to a
software agents ought to take actions in an environment so as point of significant deterioration in health such that
to maximize some notion of cumulative reward. emergency responses would be required from the students.
These activities provoke the students to perform as they
4.4.1 Traffic forecasting service: would in a real situation such as move themselves around the
With the ever increasing number of vehicles plying on the ward, to interact with each other and the supervising staff
roads traffic management seems to a huge problem these members, etc. As the students and mentors are ―immersed‖ in
days. Machines can be trained and used to solve this the simulation and behaving ―as in real practice,‖ the
problem. For example, systems that overlay predictions about captured video data can be used to provide important
future traffic conditions on a digital traffic flow map. These information about their performance. Skills-based helps
systems can also be used to know the current and future ensure that practitioners are ―fit for practice‖. [56]
traffic conditions of a region and also provide users with
routing options based on that information.[57]

4.4.2 Computer games:


The gaming industry has grown tremendously in the recent
years. AI driven agents are used widely to create interactive
gaming experience for the players. These agents can take a
variety of roles such as player’s opponents, teammates or
other non-player characters. Apart from interacting with the
human players, a game needs to satisfy a host of other
requirements like the audio and visual effects, the gaming
environment etc the different fields of machine learning
caters to all these needs and helps programmers develop
games that are well suited to the present market demands. [8]

4.4.3 Machinery applications:


There are those applications that cannot be programmed by
hand. For example, autonomous helicopters in which the
computer could learn by itself on how to fly the helicopters. Figure 11 Clinical skills laboratory

38
International Journal of Computer Applications (0975 – 8887)
Volume 115 – No. 9, April 2015

5. IMPRESSION AND VIEWS build machines which are not only strong but also intelligent
With the amount of data set getting large by every passing and hence machine learning has emerged to become an area
day, the analysis of these immense amounts of data is beyond of study that is ever in the bloom. Machine learning has not
the capacity of human eye. So Artificial agents take up the just made the machines autonomous, bringing forward the
responsibility of interacting with the environment and in turn concept of autonomous computing, but it has also reduced
influence it. The inception of the ―Big Data‖ has also resulted the constant vigilance users are required to keep upon the
in improvisation of the machine learning algorithms as they applications. In this paper, discusses the four categories of
have larger data sets to gain more experience. The concern is machine learning i.e. supervised learning, unsupervised
not how big ―Big Data‖ is but it’s more about finding learning, and reinforcement learning and recommender
patterns within it. system and also presents the numerous applications under
them. Apart from that two proposed applications namely
In Machine learning the artificial agents learns from training information time machine and virtual doctor have been put
data or by interacting with the environment and influences it forward. The main purpose of machine learning is to develop
to facilitate the best possible result. So Machine Learning is algorithms that assist in the creation of intelligent machines
definitely a subfield of Artificial Intelligence. This notion has thus reducing the jobs of the programmers as the machine
made the present day applications autonomous. learns in due course of time to improve its performance.
Although a lot of advancements have been made in this field
In the field of medicine and diagnosis AI has created virtual
still then there exists glaring limitations in the data set from
doctors as shown in Figure-12. Providing the early symptoms
which machine learns. It can be rectified by constantly
to a machine algorithm helps in early detection and diagnosis
keeping the data sets up-to-date as learning is a continuous
of the disease. The ultimate desire is to create a diagnostic
process. Apart from this issue, a great number of publications
dream machine for this purpose. [35] [50]
on machine learning evaluate new algorithms on a handful of
isolated benchmark data sets. In spite of all these
shortcomings machine learning has solved varying problems
of global impact. Machine learning has proven to be vastly
useful in a variety of fields such as data mining, artificial
intelligence, OCR, statistics, computer vision, mathematical
Figure-12: Virtual Doctor optimization, etc and its importance tends to remain ever on
the increase. Machine learning theories and algorithm are
In the context of search engine, machine learning not only inspired from the biological learning systems where the
provides result on the basis of the search content but also performance depends on factors like the amount of available
gives preferences to the users’ choices and activity online, data, the learning history and experience, etc, and thus help
which has resulted in a complete revolution of the search explaining human learning. The applications of machine
engines. learning are therefore never ending and it still remains an
Machine learning can prove immensely helpful in the process active field of research with immense development options
of building an information time machine as shown in Figure- and a promising future.
14. Information time machine requires large database of the Future challenge is to develop emergence automated
present and the past. One of the ways to extrapolate the prescription at critical condition using machine learning
database of the past is to digitize the historical archives in concept, which can minimize the error in diagnosis.
which case machine learning can prove useful.
7. ACKNOWLEDGMENTS
Our thanks to the experts Dr. Susanta Biswas, Kalyani
University, who have advised and encouraged us for such
kind of development. Also our special thanks to Dr.
Somsubhra Gupta, JIS College of Engineering for providing
Figure-13 Information Time Machine
all kinds of required resources.
The best result so far has been the invention of autonomous
driving vehicles making use of Machine learning, making the 8. REFERENCES
routers more intelligent in a network and also application in [1] Tzanis, George, et al. "Modern Applications of Machine
cloud computing is a big prospect. Learning." Proceedings of the 1st Annual SEERC
Doctoral Student Conference–DSC. 2006.
As in Machine learning, supervised and unsupervised
learning are of the two major types. And AI agents are [2] Horvitz, Eric. "Machine learning, reasoning, and
general problem solvers and can be applied in various fields. intelligence in daily life: Directions and challenges."
Proceedings of. Vol. 360. 2006.
So, AI is not about perfectly replicating human, it's about
figuring out the principles that allow agents to act [3] Mitchell, Tom Michael. The discipline of machine
intelligently and improving upon us. The bottom line is that learning. Carnegie Mellon University, School of
intelligence is no longer exclusive to only humans. Computer Science, Machine Learning Department,
2006.
6. CONCLUSION
[4] Ball, Gregory R., and Sargur N. Srihari. "Semi-
Humans have always sought to build a comfortable life, the
supervised learning for handwriting recognition."
proof of this lies in the fact that we have always depended on
Document Analysis and Recognition, 2009. ICDAR'09.
machines to get our work done more easily, in a faster and
10th International Conference on. IEEE, 2009.
more efficient manner. In the past machines have been used
to reduce the manual labor required get a job done, but at [5] Valenti, Roberto, et al. "Machine learning techniques
present, with the advent of machine learning humans seek to for face analysis." Machine Learning Techniques for

39
International Journal of Computer Applications (0975 – 8887)
Volume 115 – No. 9, April 2015

Multimedia. Springer Berlin Heidelberg, 2008. 159-187. [22] Caragea, Cornelia, and Vasant Honavar. "Machine
Learning in Computational Biology." Encyclopedia of
[6] Al-Hmouz, Ahmed. "An adaptive framework to provide Database Systems (2009): 1663-1667.
personalisation for mobile learners." (2012).
[23] Cho, Sung-Bae, and Hong-Hee Won. "Machine learning
[7] Al-Hmouz, Ahmed, Jun Shen, and Jun Yan. "A machine in DNA microarray analysis for cancer classification."
learning based framework for adaptive mobile learning." Proceedings of the First Asia-Pacific bioinformatics
Advances in Web Based Learning–ICWL 2009. conference on Bioinformatics 2003-Volume 19.
Springer Berlin Heidelberg, 2009. 34-43. Australian Computer Society, Inc., 2003.
[8] Thore Graepel ―Playing Machines: Machine Learning [24] Wagstaff, Kiri. "Machine learning that matters." arXiv
Applications in Computer Games‖, ICML 2008 Tutorial preprint arXiv:1206.4656 (2012).
- 5 July 2008, Helsinki, Finland.
[25] Shoeb, Ali H., and John V. Guttag. "Application of
[9] Broder, Andrei, and Vanja Josifovski. "Introduction to machine learning to epileptic seizure detection."
computational advertising." (2010). Proceedings of the 27th International Conference on
[10] Cunningham, Sally Jo, James Littin, and Ian H. Witten. Machine Learning (ICML-10). 2010.
"Applications of machine learning in information [26] Gao, Jim, and Ratnesh Jamidar. "Machine Learning
retrieval." (1997). Applications for Data Center Optimization." Google
[11] Kaur, Harjinder, Gurpreet Singh, and Jaspreet Minhas. White Paper (2014).
"A Review of Machine Learning based Anomaly [27] Haider, Peter, Ulf Brefeld, and Tobias Scheffer.
Detection Techniques." arXiv preprint arXiv:1307.7286 "Supervised clustering of streaming data for email batch
(2013). detection." Proceedings of the 24th international
[12] Wiese, Bénard, and Christian Omlin. Credit card conference on Machine learning. ACM, 2007.
transactions, fraud detection, and machine learning: [28] Sebastiani, Fabrizio. "Machine learning in automated
Modelling time with LSTM recurrent neural networks. text categorization." ACM computing surveys (CSUR)
Springer Berlin Heidelberg, 2009. 34.1 (2002): 1-47.
[13] Kumar, Vinod, and Dr Om Prakash Sangwan. [29] Bratko, Andrej, et al. "Spam filtering using statistical
"Signature Based Intrusion Detection System Using data compression models." The Journal of Machine
SNORT." International Journal of Computer Learning Research 7 (2006): 2673-2698.
Applications & Information Technology 1 (2012).
[30] Xiong, Liang, et al. "Anomaly detection for
[14] Shen, Shunrong, Haomiao Jiang, and Tongda Zhang. astronomical data." (2010).
"Stock market forecasting using machine learning
algorithms." (2012). [31] Guyon, Isabelle, and André Elisseeff. "An introduction
to variable and feature selection." The Journal of
[15] Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. Machine Learning Research 3 (2003): 1157-1182.
"Thumbs up?: sentiment classification using machine
learning techniques." Proceedings of the ACL-02 [32] Hou, Shujie, et al. "SVM and Dimensionality Reduction
conference on Empirical methods in natural language in Cognitive Radio with Experimental Validation."
processing-Volume 10. Association for Computational arXiv preprint arXiv:1106.2325 (2011).
Linguistics, 2002.
[33] Hwang, Kyu-Baek, et al. "Applying machine learning
[16] Liao, Shih-wei, et al. "Machine learning-based prefetch techniques to analysis of gene expression data: cancer
optimization for data center applications." Proceedings diagnosis." Methods of Microarray Data Analysis.
of the Conference on High Performance Computing Springer US, 2002. 167-182.
Networking, Storage and Analysis. ACM, 2009.
[34] Luca Silvestrin, ―Machine Learning in Biology‖,
[17] Haider, Peter, Luca Chiarandini, and Ulf Brefeld. Universita degli studi di Padova.
"Discriminative clustering for market segmentation."
Proceedings of the 18th ACM SIGKDD international [35] Magoulas, George D., and Andriana Prentza. "Machine
conference on Knowledge discovery and data mining. learning in medical applications." Machine Learning
ACM, 2012. and its applications. Springer Berlin Heidelberg, 2001.
300-307.
[18] Haider, Peter, Luca Chiarandini, and Ulf Brefeld.
"Discriminative clustering for market segmentation." [36] Bruegge, Bernd, et al. "Classification of Software
Proceedings of the 18th ACM SIGKDD international Engineering Artifacts Using Machine Learning."
conference on Knowledge discovery and data mining. [37] Shhab, Areej, Gongde Guo, and Daniel Neagu. "A
ACM, 2012. Study on Applications of Machine Learning Techniques
[19] Haykin, Simon, and Zhe Chen. "The cocktail party in Data Mining." Proc. of the 22nd BNCOD workshop
problem." Neural computation 17.9 (2005): 1875-1902. on Data Mining and Knowledge Discovery in
Databases, Sunderland, UK. 2005.
[20] Clarke, Bertrand, Ernest Fokoue, and Hao Helen Zhang.
Principles and theory for data mining and machine [38] Boyarshinov, Victor. Machine learning in computational
learning. Springer Science & Business Media, 2009. finance. Diss. Rensselaer Polytechnic Institute, 2005.

[21] Kononenko, Igor. "Machine learning for medical [39] Shen, Xipeng, et al. "Multilabel machine learning and
diagnosis: history, state of the art and perspective." its application to semantic scene classification."
Artificial Intelligence in medicine 23.1 (2001): 89-109. Electronic Imaging 2004. International Society for

40
International Journal of Computer Applications (0975 – 8887)
Volume 115 – No. 9, April 2015

Optics and Photonics, 2003. [52] Makeig, S.; Kothe, C.; Mullen, T.; Shamlo, N. B.;
Zhang, Z. & Kreutz-Delgado, K. (2012), 'Evolving
[40] Zararsiz, Gokmen, Ferhan Elmali, and Ahmet Ozturk. Signal Processing for Brain-Computer Interfaces.',
"Bagging Support Vector Machines for Leukemia Proceedings of the IEEE 100 (Centennial-Issue) , 1567-
Classification." development 1 (2012): 2. 1584 .
[41] Tsagkaris, Kostas, Apostolos Katidiotis, and Panagiotis [53] Seyed Omid Sadjadi and John H. L. Hansen,
Demestichas. "Neural network-based learning schemes ―Unsupervised Speech Activity Detection Using
for cognitive radio systems." Computer Voicing Measures and Perceptual Spectral Flux‖, IEEE
Communications 31.14 (2008): 3394-3404. signal processing letters, March 2013.
[42] Tabaković, Željko. "A Survey of Cognitive Radio [54] Hafiz Malik, ―Acoustic Environment Identification and
Systems." Post and Electronic Communications Agency, Its Applications to Audio Forensics‖. IEEE Transactions
Jurišićeva 13. on Information Forensics and Security, Vol. 8, No. 11,
[43] Hosey, Neil, et al. "Q-Learning for Cognitive Radios." November 2013.
Proceedings of the China-Ireland Information and [55] ―Acoustic Factor Analysis For Robust Speaker
Communications Technology Conference (CIICT Verification, Fellow, IEEE. IEEE Transactions On
2009). ISBN 9780901519672. National University of Audio, Speech, And Language Processing, Vol. 21, No.
Ireland Maynooth, 2009. 4, April 2013.
[44] Pawar, Prashant. Machine Learning applications in [56] Mark J. Weal, Danius T. Michaelides, Kevin Page,
financial markets. Diss. Indian Institute of Technology, David C. De Roure, Fellow, IEEE, Eloise Monger, and
Bombay Mumbai. Mary Gobbi. , ―Semantic Annotation of Ubiquitous
[45] Tarca, Adi L., et al. "Machine learning and its Learning‖, Environments IEEE Transactions On
applications to biology." PLoS computational biology Learning Technologies, Vol. 5, No. 2, April-June 2012.
3.6 (2007): e116. [57] EJ Horvitz, J Apacible, R Sarin, L Liao - arXiv preprint
[46] Prof. St´ephan Cl´emen ,―A Machine-Learning View of arXiv:1207.1352, 2012 - arxiv.org
Quantitative Finance‖¸ con - Institut Mines Telecom [58] https://www.coursera.org/learn/recommender-systems
LTCI UMR Telecom Paris Tech.
[47] Shen, Shunrong, Haomiao Jiang, and Tongda Zhang. 9. AUTHOR’S PROFILE
"Stock market forecasting using machine learning Sumit Das is presently working as an Asst. Professor in the
algorithms." (2012). Department of Information Technology, JIS College of
Engineering, West Bengal, India. He completed his M.Tech
[48] Wang, Yu, et al. "Gene selection from microarray data degree in Computer Science and Engineering from
for cancer classification—a machine learning approach." University of Kalyani in the year 2008.
Computational biology and chemistry 29.1 (2005): 37-
46. Akash Pal is a student of 4th year in Computer Science &
Engineering Department in JIS College of Engineering, West
[49] Prof. Pier Luca Lanzi, Laurea in Ingegneria Informatica, Bengal, India.
Politecnico di Milano, Polo di Milano
Leonardo,―Machine Learning, Data Mining, and Aritra Dey is a student of 3rd year in Electrical Engineering
Knowledge Discovery: An Introduction‖ Department in JIS College of Engineering, West Bengal,
India.
[50] Sajda, Paul. "Machine learning for detection and
diagnosis of disease." Annu. Rev. Biomed. Eng. 8 Nabamita Roy is a student of 4th year in Computer Science
(2006): 537-565. & Engineering Department in JIS College of Engineering,
West Bengal, India
[51] Øland, Anders. "Machine Learning and its Applications
to Music." (2011).

IJCATM : www.ijcaonline.org 41

You might also like