Preet Hi
Preet Hi
Preet Hi
Tensorflow is the google library for machine learning. In simple words it's a
library for numerical computation that uses graphs, on this graph the nodes are
the operations, while the edges of this graph are tensors. Just to remember
tensors, are multidimensional matrices, that will flow on the tensorflow graphs.
After this computational graph is created it will create a session that can be executed
by multiple CPUs, GPUs distributed or not.
Simple example
Just as a hello world let's build a graph that just multiply 2 numbers. Here notice some
sections of the code.
1. Import tensorflow library
2. Build the graph
3. Create a session
4. Run the session
Also notice that on this example we're passing to our model some constant values so
it's not so useful in real life.
Exchanging data
Tensorflow allow exchanging data with your graph variables through
"placeholders". Those placeholders can be assigned when we ask the session to run.
Imagine placeholders as a way to send data to your graph when you run a session
"session.run"
# Import tensorflow
import tensorflow as tf
# Build graph
a = tf.placeholder('float')
b = tf.placeholder('float')
# Graph
y = tf.mul(a,b)
# Create session passing the graph
session = tf.Session()
# Put the values 3,4 on the placeholders a,b
print session.run(y,feed_dict={a: 3, b:4})
TensorFlow can use data to understand patterns and behaviour from large datasets and
deploy various analysis models. Following are some example applications of
TensorFlow:
Health care
The health care industry can use TensorFlow and AI imaging technologies to
increase the speed and accuracy of interpretation for medical images. DermAssist, a
free mobile application that allows users to take pictures of their skin and identify
potential health complications. Automated billing and cost estimation tools for
hospitals are also one area where TensorFlow can be helpful.
Education
Virtual learning platforms can use TensorFlow to filter out inappropriate chat
messages in classrooms. An online learning platform can use TensorFlow to create a
customised curriculum for each student. It can also help evaluate assessments and
grade students at scale.
Social media
Social media platforms can use TensorFlow to rank posts by relevance to a
user and display them in their feed accordingly. Sentiment analysis using TensorFlow
can help organisations monitor talks about products and services and optimise their
social media strategy to manage the brand's public image. Photos storing apps can use
computer vision to understand the objects and people in photographs and
automatically group similar photos or allow advance searches.
Search engine
Search engines use natural language processing (NLP) to understand the
content of a webpage and decide its relevancy to a search term. TensorFlow can also
help analyse enormous amount user behaviour data to use them as ranking signals.
Search engines can use TensorFlow machine learning capabilities for pattern
detections, which can help identify spam and duplicate content.
Retail
Using AI and TensorFlow machine learning can help a retail business forecast
how many goods they would need on a particular day in response to their consumer
demands. E-commerce platforms can use TensorFlow to understand their preference
and provide personalised recommendations to their customers. A company selling
spectacles can use TensorFlow to create an augmented reality experience for
customers to test various spectacles on their faces.
TensorFlow Pipeline
yields outputs.
The life blood of any model is data . However, when ML is used in real-
world applications, the raw information that we get from the real-world
is often not ready to be fed into the ML algorithm. The biggest problem
is that these source data can be found in vastly different formats that
often have to write far more lines of code to get the data and to slice the
General Problem : A machine learning algorithm usually takes clean (and often
tabular) data, and learns some pattern in the data, to make predictions on new data.So a
lot of data preprocessing needs to be done to create input data for the ML algorithm or
to load data into GPU for training. Similarly, the output of the ML algorithm by itself
is just some number in software, which will need to be processed to perform some
Data pipelines work on the principle of ETL, which stands for extract, transform, load.
1. Extract : With this principle, tfds and tf.data cannot just extract data from
simpler sources, like from memory arrays but it can also create dataset from large
data sets which are composed of multiple fragments and distributed in cloud
storage systems. We can also use tensorflow builtin datasets by using tfds.load
different features and performing augmentation on our data. It can also convert
data to tensors when necessary to get it ready for training.(When writing a
TensorFlow program, the main object we manipulate and pass around is the
tf.Tensor)
3. Load : After transforming our data , we can load our dataset into the
appropriate device such as a GPU or TPU with a set of tools in Tensorflow for
training. How we load the data can have a significant impact on the performance
So rather than writing tons of lines of code to manage comma separated files, space
separated files, binary file formats. tf.data helps to set the data into a standardized
format makes it much easier to manage the data and also to share the data with others.
Following the basic steps of Data Pipelines , here is the code :
import tensorflow as tf
#Extracting the data and reading files in TFRecord format or creating dataset from
from_tensor_slices module
dataset = tf.data.TFRecordDataset(file_name)
or
dataset = dataset.shuffle(buffer_size = X)
dataset = dataset.batch(batch_size = Y)
So in previous point we loaded our data from storage system. To train a model from
that data we all need acquiring, formatting and pipelining of data to be as simple as
possible. With TensorFlow builtin datasets, we can deal away with all of these
problems.
Tensorflow Datasets provide over 50 in-built datasets which used a consistent API.
Code :
import tensorflow as tf
print(tfds.list_builders())
#Transform
dataset.shuffle(100)
#Load
So , here is the basic idea behind data input pipeline in Tensorflow 2.0.
UNIT I
INTRODUCTION TO MACHINE LEARNING (ML):
Machine Learning is a subset of artificial intelligence that focuses on developing
algorithms and models that enable computers to learn from data without being explicitly
programmed. The goal of machine learning is to allow computers to identify patterns, make
predictions, and solve complex problems by adapting their behavior based on the data they
receive.
Prediction/Inference: After successful training and evaluation, the model can be used to
make predictions or infer outcomes for new, unseen data.
There are various ways to classify machine learning problems. Here, we discuss the
most obvious ones.
1. On basis of the nature of the learning “signal” or “feedback” available to a learning
system
Supervised learning: The model or algorithm is presented with example inputs and their
desired outputs and then finds patterns and connections between the input and the output.
The goal is to learn a general rule that maps inputs to outputs. The training process
1
Data analytics Using R Unit - IV
continues until the model achieves the desired level of accuracy on the training data. Some
real-life examples are:
Image Classification: You train with images/labels. Then in the future, you
give a new image expecting that the computer will recognize the new object.
Market Prediction/Regression: You train the computer with historical market
data and ask the computer to predict the new price in the future.
Unsupervised learning: No labels are given to the learning algorithm, leaving it on its
own to find structure in its input. It is used for clustering populations in different groups.
Unsupervised learning can be a goal in itself (discovering hidden patterns in data).
Clustering: You ask the computer to separate similar data into clusters, this is
essential in research and science.
High-Dimension Visualization: Use the computer to help us visualize high-
dimension data.
Generative Models: After a model captures the probability distribution of your
input data, it will be able to generate more data. This can be very useful to make
your classifier more robust.
A simple diagram that clears the concept of supervised and unsupervised learning is shown
below:
As you can see clearly, the data in supervised learning is labelled, whereas data in
unsupervised learning is unlabelled.
Semi-supervised learning: Problems where you have a large amount of input data and
only some of the data is labelled, are called semi-supervised learning problems. These
problems sit in between both supervised and unsupervised learning. For example, a photo
archive where only some of the images are labelled, (e.g. dog, cat, person) and the majority
are unlabeled.
Reinforcement learning: A computer program interacts with a dynamic environment in
which it must perform a certain goal (such as driving a vehicle or playing a game against
an opponent). The program is provided feedback in terms of rewards and punishments as it
navigates its problem space.
2
Data analytics Using R Unit - IV
and predicts whether or not something belongs to a particular class. This is typically
tackled in a supervised way. Classification models can be categorized in two groups:
Binary classification and Multiclass Classification. Spam filtering is an example of binary
classification, where the inputs are email (or other) messages and the classes are “spam”
and “not spam”.
Regression: It is also a supervised learning problem, that predicts a numeric value and
outputs are continuous rather than discrete. For example, predicting stock prices using
historical data.
3
Data analytics Using R Unit - IV
Naive Bayes, KNN(K nearest neighbours), K-Means, Random Forest, etc. Note: All these
algorithms will be covered in upcoming articles.
• 1945 Vannevar Bush proposed in “As We May Think” published in “The Atlantic”, a
system which amplifies peoples own knowledge and understanding. Bush’s memex was based
on what was thought, at the time, to be advanced technology of the future: ultra high resolution
microfilm reels, coupled to multiple screen viewers and cameras, by electro mechanical
controls. Through this machine, Bush hoped to transform an information explosion into a
knowledge explosion.
• 1948 John von Neumann suggested that machine can do anything that peoples are able to
do.
• 1950 Alan Turing asked can machines think? in “Computing Machine and Intelligence” and
proposed the famous Turing test. The Turing is carried out as imitation game. On one side of a
computer screen sits a human judge, whose job is to chat to an unknown gamer on the other
side. Most of those gamers will be humans; one will be a Chabot with the purpose of tricking
the judge into thinking that it is the real human. Alan Turing (1912-1950)
• 1956 John McCarthy coined the term artificial intelligence.
• 1959, Arthur Samuel, the American pioneer in the field of computer gaming and artificial
intelligence, defined machine learning as a field of study that gives computers the ability to
learn without being explicitly programmed. The Samuel Checkers-playing Program appears to
be the world’s first self learning program, and as such a very early demonstration of the
fundamental concept of artificial intelligence (AI).
However, an increasing emphasis on the logical, knowledge-based approach caused a
rift between AI and machine learning. Probabilistic systems were plagued by theoretical and
practical problems of data acquisition and representation, which were unsolvable because of
small capacity of hardware memory and slow speed of computers that time. By 1980, expert
systems had come to dominate AI, and statistics was out of favor. Expert system uses the idea
that “intelligent systems derive their power from the knowledge they possess rather than from
the specific formalisms and inference schemes”. Work on symbolic based learning did
continue within AI, leading to inductive logic programming. Neural networks research had
been abandoned by AI and computer science around the same time. Their main success came
in the mid-1980s with the reinvention of a algorithm in neural network which was able thanks
to increasing speed of computers and increasing hardware memory.
4
Data analytics Using R Unit - IV
- knowledge representation
- automated reasoning to use the stored information to answer questions and to draw
new conclusions;
- machine learning to adapt to new circumstances and to detect and extrapolate
patterns,
- computer vision to perceive objects,
- robotics. All the listed above domains of artificial intelligence except knowledge
representation and robotics are now considered domains of machine learning. Pattern
detection and recognition were and are still considered to be domain of data mining but
they become more and more part of machine learning. Thus AI = knowledge
representation + ML + robotics.
• representation learning, a new word for knowledge representation but with a different
flavor, is a part of ML.
• Robotics = ML + hardware. Why did such a move from artificial intelligence to
machine learning happen? The answer is that we are able to formalize most concepts
and model problem of artificial intelligence using mathematical language and represent
as well as unify them in such a way that we can apply mathematical methods to solve
many problems in terms of algorithms that machine are able to perform.
Machine Learning (ML) is a subfield of artificial intelligence (AI) that focuses on
developing algorithms and statistical models that enable computers to learn from data and
make predictions or decisions without being explicitly programmed to do so.
The history of ML can be traced back to the mid-20th century, and it has since grown
into a prominent area of research and practical applications.
Dartmouth Workshop (1956): The term "Artificial Intelligence" was first coined during the
Dartmouth Workshop, where researchers discussed the idea of building machines that could
simulate human intelligence, including the concept of machine learning.
5
Data analytics Using R Unit - IV
The Reemergence of Neural Networks (1980s-1990s): Neural networks, which had been
relatively dormant since the 1950s, saw a resurgence during this period. Researchers like Geoff
Hinton, Yann LeCun, and others contributed to the development of backpropagation
algorithms, making it possible to train deeper neural networks. However, their full potential
was limited due to hardware constraints.
Support Vector Machines (SVMs) and Kernel Methods (1990s-2000s): SVMs and kernel
methods gained popularity during this time for their ability to perform classification tasks and
deal with non-linear data. They provided an alternative to neural networks and were widely
used in various applications.
Big Data and the Rise of Deep Learning (2000s-Present): The advent of big data and the
increase in computational power allowed for more extensive training of deep neural networks.
Deep learning, which involves training neural networks with multiple hidden layers, has
revolutionized ML in areas like computer vision, natural language processing, and speech
recognition.
Democratization and Application Growth (2010s-Present): ML has become more
accessible, thanks to the availability of open-source ML libraries like TensorFlow and
PyTorch. This accessibility has led to its widespread adoption across various industries,
including healthcare, finance, marketing, and more.
Advancements in Reinforcement Learning and Transfer Learning: In recent years, there
have been significant advancements in reinforcement learning, where agents learn to take
actions based on feedback from their environment. Transfer learning has also become popular,
allowing models to leverage knowledge learned from one task to improve performance on
another.
Machine learning continues to evolve rapidly, driven by ongoing research,
advancements in hardware, and the increasing volume of data available for training. As a
result, it is expected to remain at the forefront of technology and innovation in the coming
years.
Machine learning continues to evolve rapidly, driven by ongoing research,
advancements in hardware, and the increasing volume of data available for training. As a
result, it is expected to remain at the forefront of technology and innovation in the coming
years.
Machine Learning (ML) faces various challenges and encompasses a wide range of
techniques to address those challenges. Some common problems and corresponding techniques
in ML include:
1.Overfitting and Underfitting: Overfitting occurs when a model performs well on the
training data but poorly on unseen data, while underfitting happens when a model is too
simplistic to capture the underlying patterns in the data.Overfitting and underfitting are
common issues that occur in machine learning when building models. Both problems stem
from the model's inability to generalize well to unseen data, but they manifest in different
ways.
Overfitting: Overfitting occurs when a model learns to fit the training data too closely,
capturing noise and random fluctuations instead of learning the underlying patterns. As a
6
Data analytics Using R Unit - IV
result, the model performs very well on the training data but fails to generalize to new, unseen
data.
Signs of overfitting:
The model has high accuracy or low error on the training data.
The model's performance significantly drops on the test/validation data compared to the
training data.
The model exhibits high variance in predictions, causing it to be sensitive to small
changes in the input data.
Causes of overfitting:
Using an overly complex model that can memorize the training data.
Inadequate or noisy data that contains outliers or errors.
Insufficient regularization, allowing the model to become too flexible.
Techniques to mitigate overfitting:
Regularization: Adding a penalty term to the model's objective function that
discourages complex solutions.
Cross-validation: Evaluating the model on different subsets of the data to assess
generalization performance.
Early stopping: Stopping the training process when the model's performance on the
validation data starts to degrade.
Feature selection/reduction: Selecting or transforming relevant features to reduce
model complexity.
Underfitting: Underfitting occurs when a model is too simple to capture the underlying
patterns in the data. It fails to learn from the training data and performs poorly both on the
training data and new, unseen data.
Signs of underfitting:
The model has low accuracy or high error on both the training data and the
test/validation data.
The model's performance is significantly worse than expected given the complexity of
the problem.
Causes of underfitting:
Using an overly simplistic model that cannot capture the data's complexity.
Insufficient training or convergence issues that prevent the model from learning
effectively.
Insufficient or irrelevant features, leading to an inadequate representation of the
data.
Techniques to mitigate underfitting:
Use a more complex model: Consider using a more sophisticated algorithm or
increasing the model's capacity.
Feature engineering: Introduce additional relevant features or transformations to
improve the model's representation of the data.
Adjust hyperparameters: Increase the number of training iterations or adjust learning
rates to improve convergence.
Balancing between overfitting and underfitting is crucial to building a well-performing
ML model. Regularization techniques and cross-validation are often employed to strike the
right balance and ensure the model generalizes well to new, unseen data.
7
Data analytics Using R Unit - IV
2.Bias and Fairness: ML models can inherit biases present in the training data, leading
to unfair or discriminatory decisions.
Bias and fairness are critical considerations in machine learning that address the
potential discriminatory or unfair impacts of ML models on certain groups or individuals. ML
models can inadvertently learn and perpetuate biases present in the training data, leading to
unfair decisions or outcomes. Understanding and mitigating bias is essential to ensure the
ethical and equitable use of AI systems.
Bias in ML: Bias in ML refers to the systematic errors or inaccuracies that arise from
the algorithms' assumptions or the data used for training. These biases can lead to unfair or
discriminatory predictions or decisions, affecting certain groups more than others.
Types of Bias:
Data Bias: Bias present in the training data, which can arise due to historical
discrimination or imbalances in data collection.
Algorithmic Bias: Bias introduced by the design and structure of the ML algorithm,
which may favor certain groups over others.
Societal Bias: Bias that reflects societal norms, stereotypes, or prejudices present in the
training data.
Fairness in ML: Fairness in ML refers to the objective of ensuring that ML models do
not disproportionately favor or disfavor particular individuals or groups based on sensitive
attributes (e.g., race, gender, age) or protected characteristics. Fairness aims to prevent
discrimination and ensure equitable treatment for all individuals in the decision-making
process.
Types of Fairness:
Individual Fairness: Treating similar individuals similarly, ensuring that similar cases
receive similar predictions or outcomes.
Group Fairness: Ensuring that different groups receive fair treatment and are not
unfairly favored or disadvantaged.
Statistical Parity: Ensuring that prediction outcomes have similar statistical
distributions across different groups.
Equalized Odds: The true positive rates and false positive rates are equal across
different groups.
Addressing Bias and Promoting Fairness:
Data Preprocessing: Carefully curating and cleaning training data to reduce bias. This
may involve removing or anonymizing sensitive attributes, balancing data representation
across groups, and addressing data collection issues.
Algorithmic Fairness: Developing ML algorithms that are designed to be fair and
unbiased. This could include modifying the objective function to explicitly account for fairness
constraints or using techniques like adversarial learning.
Post hoc Analysis: Evaluating the model for fairness after training to identify any
biased behavior. Various fairness metrics can be used to assess the model's performance across
different groups.
Fairness-aware Learning: Incorporating fairness constraints directly into the learning
process to promote fairness during model training.
8
Data analytics Using R Unit - IV
Techniques: Fairness-aware learning, bias detection and mitigation, and using diverse
and representative datasets.
3.Curse of Dimensionality: As the number of features or dimensions in the data
increases, the data becomes sparse, making it challenging for algorithms to effectively learn
patterns. The Curse of Dimensionality is a term used in machine learning to describe the
phenomenon where the performance of certain algorithms deteriorates significantly as the
number of features or dimensions in the data increases. As the number of dimensions grows,
the data becomes more sparse, making it difficult for algorithms to effectively learn and make
accurate predictions. The Curse of Dimensionality can lead to various issues in ML problems,
including increased computational complexity, reduced generalization, and overfitting.
Here are some key aspects of the Curse of Dimensionality in ML:
Sparsity of Data: As the number of dimensions increases, the volume of the feature
space grows exponentially. In high-dimensional spaces, data points become sparse, meaning
there are fewer data points per unit volume. Consequently, algorithms have fewer samples to
learn from, making it harder to find meaningful patterns.
9
Data analytics Using R Unit - IV
Feature Engineering: Creating new features that capture important patterns in the data
can improve model performance, even with a reduced number of dimensions.
Data Augmentation: In some cases, generating additional synthetic data points can
help alleviate sparsity and improve model performance.
By carefully managing dimensionality and using appropriate techniques, practitioners
can overcome the Curse of Dimensionality and build effective machine learning models that
generalize well to new data and avoid overfitting.
4.Imbalanced Datasets: When the number of instances for different classes is heavily
skewed, models may struggle to accurately predict the minority class.Imbalanced datasets are
a common challenge in machine learning where the number of instances in different classes (or
categories) is significantly skewed. In other words, one class has a much larger number of
samples than the other(s). Imbalanced datasets can lead to biased model training and inaccurate
predictions, especially for the minority class. This issue is prevalent in various real-world
scenarios, such as fraud detection, medical diagnosis, and anomaly detection, where the
positive cases are rare compared to the negative cases.
Problems Caused by Imbalanced Datasets:
Bias Towards the Majority Class: ML algorithms can be biased towards the majority
class since they are trained to optimize overall accuracy. As a result, the model may under
represent the minority class, leading to poor performance on positive instances.
10
Data analytics Using R Unit - IV
Class Weighting: Assigning higher weights to the minority class during model training
can help the algorithm pay more attention to the underrepresented class.
Ensemble Methods: Using ensemble techniques like Random Forest and AdaBoost,
which can handle imbalanced data more effectively than single models.
Anomaly Detection: For highly imbalanced datasets where the minority class
represents anomalies or rare events, specialized anomaly detection algorithms may be more
appropriate.
Data Augmentation: Introducing small variations to existing data points can help
increase the size of the minority class.
11
Data analytics Using R Unit - IV
Simpler Models: Opting for simpler, more interpretable models, such as linear
regression or decision trees, can provide transparency in decision-making.
12
Data analytics Using R Unit - IV
Why R?
R is a programming and statistical language.
R is used for data Analysis and Visualization.
R is simple and easy to learn, read and write.
R is an example of a FLOSS (Free Libre and Open Source Software) where one can freely
distribute copies of this software, read its source code, modify it, etc.
Who uses R?
• The Consumer Financial Protection Bureau uses R for data analysis
• Statisticians at John Deere use R for time series modeling and geospatial analysis in a
reliable and reproducible way.
13
Data analytics Using R Unit - IV
14
Data analytics Using R Unit - IV
data in a list format. LISP's primary focus is on symbolic computing and manipulation of
symbolic expressions. Some key features of LISP include:
Functional Programming: LISP is a functional programming language, meaning it treats
computation as the evaluation of mathematical functions and avoids changing state or mutable
data.
Recursion: LISP encourages the use of recursion, allowing elegant solutions to problems that
involve repetitive tasks.
Dynamic Typing: LISP is dynamically typed, meaning data types are determined at runtime,
providing flexibility but requiring careful error handling.
Garbage Collection: LISP employs automatic memory management through garbage
collection, which helps manage memory resources.
LISP has been widely used in various areas, including artificial intelligence (AI) and natural
language processing (NLP). It played a pivotal role in the early development of AI, and many
AI applications and research projects were written in LISP. Even today, LISP variants such as
Common Lisp and Clojure continue to be used in specific domains.
Python in AI:
15
Data analytics Using R Unit - IV
Machine Learning and Deep Learning: Python is the go-to language for machine learning
and deep learning tasks. Libraries like TensorFlow, Keras, PyTorch, and Scikit-learn offer
extensive support for building and training AI models.
Natural Language Processing (NLP): Python's rich ecosystem of NLP libraries, such as
NLTK (Natural Language Toolkit) and spaCy, make it ideal for processing and analyzing text
data.
Data Manipulation and Analysis: Python's data manipulation libraries like Pandas, NumPy,
and SciPy are widely used for data cleaning, preprocessing, and analysis in AI projects.
Computer Vision: Python, along with libraries like OpenCV, is commonly used for computer
vision tasks such as image and video processing, object detection, and facial recognition.
Reinforcement Learning: Python has gained popularity in the field of reinforcement learning,
with libraries like OpenAI Gym providing environments for training RL agents.
Java in AI:
Big Data Processing: Java is commonly used for big data processing and analytics in AI
applications. It is well-suited for distributed systems like Apache Hadoop and Apache Spark.
Rule-Based Systems: Java can be used for building rule-based expert systems where decision-
making is based on a set of logical rules.
Web Applications and Backend Services: Java's robustness and scalability make it an
excellent choice for developing AI-powered web applications and backend services.
Internet of Things (IoT): Java is used in IoT applications for device management, data
processing, and edge computing.
Natural Language Processing: While Python is more dominant in NLP, Java also has some
NLP libraries like Stanford NLP and Apache OpenNLP.
16
Data analytics Using R Unit - IV
Certainly! C++ is a powerful and versatile programming language that builds upon the
features of the C programming language. It supports both procedural and object-oriented
programming paradigms, making it suitable for a wide range of applications.
Here are some basic concepts of C++:
Syntax and Structure: C++ syntax is similar to that of C, but it adds more features and
constructs. A C++ program is structured with functions, and a typical program includes a main
function where the program execution starts.
Data Types: C++ provides built-in data types like int, float, double, char, and more. It
also supports user-defined data types through classes and structures.
Variables: Variables are used to store data. They must be declared before use,
specifying the data type. Variables can be assigned values and manipulated in various ways.
Operators: C++ supports various operators for arithmetic, logical, comparison,
assignment, and more. Examples include +, -, *, /, %, ==, !=, &&, ||, etc.
Control Structures: C++ offers control structures like if, else, while, for, switch, and
do-while to manage program flow based on conditions.
Functions: Functions are blocks of code that perform a specific task. C++ allows you
to define your own functions, pass parameters, and return values.
Classes and Objects: C++ is object-oriented, allowing you to define classes that
encapsulate data and behavior. Objects are instances of classes, representing real-world
entities.
Inheritance: Inheritance enables creating new classes (derived classes) based on
existing classes (base classes). Derived classes inherit properties and behaviors from their base
classes.
Polymorphism: Polymorphism allows objects of different classes to be treated as
objects of a common base class. It's achieved through function overloading and virtual
functions.
Encapsulation: Encapsulation is the concept of bundling data and methods that operate
on the data within a single unit (class), hiding the internal details from the outside world.
Abstraction: Abstraction focuses on providing essential features while hiding complex
implementation details. Classes serve as abstractions by exposing only relevant information.
Pointers and References: C++ supports pointers, which hold memory addresses, and
references, which provide a way to work with existing variables without copying their data.
Memory Management: C++ allows manual memory management using new and
delete operators for dynamic memory allocation. It's important to manage memory to avoid
memory leaks and undefined behavior.
Templates: Templates allow creating generic functions and classes that can work with
different data types without duplicating code.
17
Data analytics Using R Unit - IV
Standard Template Library (STL): The STL provides a collection of pre-built classes
and functions for common data structures (like vectors, maps, and queues) and algorithms.
These are just some of the fundamental concepts in C++. As you delve deeper, you'll
encounter more advanced topics like exception handling, file I/O, smart pointers, and more.
C++ is a vast language, offering both low-level control and high-level abstractions, making it
suitable for various types of software development.
BASIC CONCEPTS OF R
18
Data analytics Using R Unit - IV
19
Data analytics Using R Unit - IV
Plotting: Julia has several packages for creating high-quality plots and visualizations,
such as Plots.jl and Gadfly.jl.
Exception Handling: You can use try, catch, and finally blocks to handle exceptions
and errors in your code.
Package Ecosystem: Julia's package ecosystem is a vital part of the language's appeal.
It offers a wide range of packages for various tasks, including numerical computing, data
manipulation, machine learning, and more.
Interoperability: Julia can interact with other programming languages, allowing you to
call C, Fortran, and Python functions directly from Julia code.
These are just some of the basic concepts of Julia. The language's performance,
combined with its modern and expressive syntax, makes it a compelling choice for scientists,
engineers, and data analysts who require both numerical power and ease of use.
Certainly! Haskell is a functional programming language known for its strong type
system, referential transparency, and focus on immutability. It's designed to encourage a more
declarative and mathematical approach to programming. Here are some basic concepts of
Haskell:
Functional Paradigm: Haskell is a pure functional programming language, which
means that functions are the primary building blocks of programs. In Haskell, functions are
first-class citizens, meaning they can be passed as arguments to other functions, returned as
results, and stored in data structures.
Immutable Data: In Haskell, data is immutable by default. Once a value is assigned to
a variable, it cannot be changed. Instead of modifying data, you create new data structures with
modified values.
Lazy Evaluation: Haskell uses lazy evaluation, which means that expressions are only
evaluated when their values are actually needed. This can lead to more efficient and concise
code, as only the necessary computations are performed.
Type System: Haskell has a strong and static type system that helps catch errors at
compile time. Type inference allows the compiler to deduce most types, but explicit type
annotations can also be provided.
Type Classes: Haskell's type classes are similar to interfaces in other languages. They
allow you to define common behavior for different types. The most famous type class is Num,
which includes numeric types like integers and floating-point numbers.
Pattern Matching: Pattern matching is a powerful feature in Haskell. It allows you to
destructure data and define functions based on different patterns.
Lists and Tuples: Lists are a fundamental data structure in Haskell, and they can hold
elements of the same type. Tuples, on the other hand, can hold elements of different types and
have a fixed length.
Higher-Order Functions: Haskell encourages the use of higher-order functions, which
are functions that take other functions as arguments or return them as results. This leads to
more modular and reusable code.
Monads: Monads are a concept used in Haskell for handling side effects in a pure
functional way. They provide a way to encapsulate computations that involve impurity, like
I/O, in a controlled manner.
20
Data analytics Using R Unit - IV
The field of Natural Language Processing (NLP) has its roots in linguistics, computer
science, and artificial intelligence. The origins of NLP can be traced back to the 1950s and
1960s when researchers began exploring ways to enable computers to understand and generate
human language.
One of the earliest milestones in NLP was the development of the "Shannon-Weaver
model" for machine translation in the late 1940s by Claude Shannon and Warren Weaver.
However, early efforts at machine translation were not very successful due to the complexity
and nuances of language.
The term "Natural Language Processing" itself began to gain prominence in the 1950s.
The Georgetown-IBM experiment in 1954 marked a significant event in machine translation,
where an IBM computer translated 60 sentences from Russian to English. While the results
were far from perfect, this experiment sparked interest in the potential of computers
understanding and processing human language.
Over the decades, research in NLP expanded to cover a wide range of tasks such as
speech recognition, text generation, sentiment analysis, language understanding, and more. The
advent of machine learning, especially neural networks and deep learning, has propelled NLP
to new heights, enabling computers to achieve impressive results in tasks like language
translation, question answering, and language modeling.
The origin of Natural Language Processing (NLP) can be traced back to the mid-
20th century when researchers began exploring ways to enable computers to understand and
generate human language. The field emerged at the intersection of linguistics, computer
science, and artificial intelligence. Here are some key milestones and contributors in the early
development of NLP:
1950s - Early Efforts: The origins of NLP can be attributed to the development of
machine translation systems. In 1950, computer scientist Alan Turing proposed the "Turing
Test," which aimed to determine a machine's ability to exhibit intelligent behavior
21
Data analytics Using R Unit - IV
NLP is a complex and challenging field due to the inherent complexities of human
language. Some of the major challenges include:
Phrasal Ambiguity:
22
Data analytics Using R Unit - IV
Natural language is often ambiguous, with words and phrases having multiple
meanings depending on context.NLP requires precise representation of contents. Resolving this
ambiguity is a key challeng.eg.NLP can’t differentiate the sentences; Can you do me a
favor?,Can you help me?
Contextual Understanding:
Language often relies on context to convey meaning. Understanding context and
correctly interpreting it is difficult for computers.
Syntax and Semantics:
Parsing sentences for their grammatical structure and understanding the underlying
semantics is a challenge, especially in languages with complex syntax.
Variability:
Language usage can vary greatly between different individuals, cultures, and regions.
NLP models must handle this variability effectively.
Named Entity Recognition:
Identifying and categorizing named entities like names of people, places,
organizations, etc., is challenging due to variations and potential ambiguity.
Sentiment Analysis:
Determining the sentiment or emotion behind text is complex, as it often involves
understanding sarcasm, irony, and subtle nuances.
Lack of Data:
Training accurate NLP models requires massive amounts of labeled data, which can be
scarce or expensive to obtain for certain languages and domains.
Common Sense Reasoning:
Understanding and applying common sense knowledge, which humans often take for
granted, is a significant challenge in NLP.
Long-range Dependencies:
Understanding the relationships between words or phrases that are separated by many
other words (long-range dependencies) is challenging for traditional models.
Machine Translation:
Accurate translation between languages involves dealing with idiomatic expressions,
cultural differences, and syntactic variations.
Ethical and Bias Concerns:
NLP models can reflect biases present in training data, leading to unfair or
discriminatory results. Addressing ethical concerns is crucial.
Multilingual NLP:
Developing models that work across multiple languages and exhibit similar
performance is a complex challenge.
Despite these challenges, advances in machine learning, deep learning, and the
availability of large datasets have led to significant progress in NLP, making it a rapidly
evolving and exciting field with applications in various industries, including communication,
healthcare, finance, and more.
TENSORFLOW: INTRODUCTION
23
Data analytics Using R Unit - IV
models, particularly deep learning models. TensorFlow provides a comprehensive set of tools
and libraries for building and training various types of neural networks and other machine
learning algorithms. Here's an introduction to TensorFlow:
Key Features:
Flexibility: TensorFlow offers a flexible platform for building and training machine
learning models across a wide range of domains.
Scalability: It supports training models on multiple devices, including CPUs, GPUs,
and TPUs (Tensor Processing Units), allowing for scalable and efficient computation.
Abstraction Levels: TensorFlow provides different levels of abstraction, from low-
level APIs for experienced developers to high-level APIs like Keras for quicker model
development.
Visualization: TensorFlow includes tools for visualizing model architectures, training
progress, and other useful metrics.
Deployment: TensorFlow supports deployment to various platforms, including mobile
devices, browsers, and cloud services.
Tensors: The name "TensorFlow" is derived from "tensor," which is a mathematical
term for a multi-dimensional array. In TensorFlow, tensors are the fundamental building blocks
that represent data. They can be scalar values, vectors, matrices, or higher-dimensional arrays.
Computational Graph: TensorFlow uses a computational graph to represent the
sequence of mathematical operations that make up a machine learning model. Nodes in the
graph represent operations, and edges represent the data (tensors) flowing between these
operations. This graph-based approach allows TensorFlow to optimize and parallelize
computations.
Eager Execution: TensorFlow originally used a "define-and-run" approach, where you
first define a computational graph and then execute it. However, TensorFlow also introduced
eager execution, which enables immediate execution of operations like a traditional imperative
programming language. This makes debugging and experimentation easier.
High-Level APIs: TensorFlow provides high-level APIs like Keras, which simplifies
the process of building and training neural networks. Keras offers a user-friendly interface
while leveraging the capabilities of TensorFlow under the hood.
Low-Level APIs: For more control and customization, TensorFlow offers lower-level
APIs that allow you to define and manipulate tensors and operations directly. This level of
control is useful for research and specialized use cases.
Model Training: TensorFlow provides optimization algorithms, loss functions, and
various utilities to train machine learning models. You can define the model architecture,
specify the loss function, and use gradient-based optimization techniques to update the model's
parameters.
TensorBoard: TensorBoard is a visualization tool that comes with TensorFlow. It
allows you to monitor training progress, visualize the computational graph, and analyze
various metrics associated with your model.
Community and Ecosystem: TensorFlow has a large and active community that
contributes to its development. It also has a rich ecosystem of pre-trained models, libraries, and
tools for various machine learning tasks.
TENSORFLOW: APPLICATIONS
24
Data analytics Using R Unit - IV
25
Data analytics Using R Unit - IV
Certainly! Let's go over some of the basics of TensorFlow, including how to install it,
create tensors, define operations, and build a simple neural network.
1. Installation:
You can install TensorFlow using pip, the Python package manager:
bash
pip install tensorflow
2. Importing TensorFlow:
python
import tensorflow as tf
3.Tensors:
Tensors are the basic building blocks in TensorFlow. They represent multi-
dimensional arrays with a specified data type.
python
# Creating a scalar tensor
scalar = tf.constant(5)
4. Operations:
TensorFlow supports various operations that you can perform on tensors.
Python
a = tf.constant(3)
b = tf.constant(4)
26
Data analytics Using R Unit - IV
addition = tf.add(a, b)
subtraction = tf.subtract(a, b)
multiplication = tf.multiply(a, b)
division = tf.divide(a, b)
5. TensorFlow Sessions:
In TensorFlow, computations are not executed immediately. Instead, you create
a computation graph and then execute it within a session.
python
with tf.compat.v1.Session() as sess:
result = sess.run(addition)
print(result)
# This will print the result of the addition operation
input_data = tf.compat.v1.placeholder(tf.float32)
output_data = input_data * 2
These are some of the basic concepts and operations you'll encounter when working
with TensorFlow. As you become more familiar with the framework, you can delve into more
27
Data analytics Using R Unit - IV
advanced topics such as custom models, training loops, saving and loading models, and using
TensorFlow for more complex machine learning tasks.
TENSORFLOW: COMPONENTS
TensorFlow is a comprehensive machine learning framework with several components
that work together to enable the creation, training, and deployment of machine learning
models. Here are the key components of TensorFlow:
TensorFlow Core: This is the foundational library that provides the basic data
structures (tensors), operations, and computation graph infrastructure for building machine
learning models.
TensorBoard: TensorBoard is a web-based visualization tool that allows you to
monitor and visualize various aspects of your model's training and performance. It helps you
understand the model's behavior, debugging, and optimizing it.
TensorFlow Lite: TensorFlow Lite is a lightweight version of TensorFlow designed
for mobile and embedded devices. It allows you to deploy machine learning models on
resource-constrained platforms, making it suitable for applications like mobile apps and IoT
devices.
TensorFlow.js: TensorFlow.js is a JavaScript library that enables training and
deploying machine learning models in web browsers and Node.js environments. It allows for
client-side inference without the need for server communication.
Keras: Keras is a high-level API that is now tightly integrated with TensorFlow. It
provides an intuitive and user-friendly interface for defining and training neural networks.
Keras greatly simplifies the process of building and experimenting with different model
architectures.
Estimators: Estimators provide a high-level API for creating complex machine
learning models. They abstract away much of the model-building process, making it easier to
create production-ready models. Estimators are particularly useful for tasks like distributed
training.
Datasets: TensorFlow provides APIs for managing and processing datasets, including
reading data from various sources, transforming data, and creating data pipelines for efficient
training.
TFX (TensorFlow Extended): TFX is a platform for deploying production machine
learning pipelines. It includes tools for data validation, preprocessing, model training,
evaluation, and deployment. TFX supports scalable and reliable machine learning workflows.
Distributed TensorFlow: TensorFlow supports distributed computing across multiple
devices, machines, and GPUs. This allows you to train models faster by distributing
computation and data storage.
AutoML: TensorFlow's AutoML tools offer automated machine learning solutions,
such as AutoML Tables for structured data and AutoML Vision for image classification tasks.
These tools help non-experts create high-quality machine learning models.
Community and Ecosystem: TensorFlow has a vibrant community that contributes to
the development of libraries, tools, and extensions. There are numerous third-party libraries
and pre-trained models available that can enhance your machine learning workflows.
28
Data analytics Using R Unit - IV
In the context of machine learning, a pipeline is a sequence of data processing steps that
are chained together to transform raw data into a final model or result. A TensorFlow pipeline
refers to the process of building and managing the flow of data through various stages,
including data preprocessing, model training, evaluation, and deployment. TensorFlow
provides tools and libraries to help you create efficient and organized pipelines for your
machine learning projects.
Here's an overview of creating a TensorFlow pipeline:
Data Collection and Preparation: The first step in building a TensorFlow pipeline is
to collect and preprocess your data. This may involve reading data from various sources,
performing data cleaning, feature extraction, normalization, and splitting the data into training,
validation, and test sets.
Data Loading and Input Pipelines: TensorFlow's tf.data API allows you to create
efficient data input pipelines. It provides a way to load, preprocess, and batch your data in a
performant manner. Input pipelines help optimize memory usage and keep the CPU or GPU
busy during training.
Feature Engineering and Transformation: You can use TensorFlow's preprocessing
layers to perform feature engineering and transformations directly within the pipeline. These
layers can handle tasks like normalization, one-hot encoding, and more complex
transformations.
Model Building: Build your machine learning model using TensorFlow's high-level
API, such as Keras. Define the model architecture, specify the loss function, optimizer, and
metrics. Your model can include layers for processing data, such as convolutional layers,
recurrent layers, and dense layers.
Training Loop: Create a training loop where you iterate over your training dataset,
feed batches of data to your model, and apply backpropagation to update the model's weights.
During training, you can monitor the loss and metrics to track progress.
Model Evaluation: After training, use your validation dataset to evaluate the
performance of your model. Calculate metrics such as accuracy, precision, recall, and F1-score
to assess how well your model generalizes to new data.
Hyperparameter Tuning: Tune hyperparameters like learning rate, batch size, and
regularization strength to optimize the model's performance. Techniques like grid search or
random search can be employed for hyperparameter tuning.
Model Deployment and Inference: Once your model is trained and evaluated, deploy
it for inference. You can use TensorFlow Serving, TensorFlow Lite, or other deployment
options based on your use case. Ensure that the deployed model performs well on new, unseen
data.
29
Data analytics Using R Unit - IV
Sure! Here are some TensorFlow examples that cover different aspects of machine
learning and deep learning using TensorFlow's Python API:
1. Linear Regression:
Simple linear regression example using TensorFlow to fit a line to a set of data
points.
Python code
import tensorflow as tf
# Training loop
for epoch in range(100):
with tf.GradientTape() as tape:
current_loss = loss(model(x_data), y_data)
30
Data analytics Using R Unit - IV
print("Slope:", w.numpy())
print("Bias:", b.numpy())
Building a simple neural network using Keras to classify handwritten digits from
the MNIST dataset.
Python code
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense
Building a CNN using Keras for image classification on the CIFAR-10 dataset.
python code
import tensorflow as tf
31
Data analytics Using R Unit - IV
These examples cover linear regression, neural network classification for MNIST,
and convolutional neural network classification for CIFAR-10. They demonstrate various
aspects of using TensorFlow to build and train machine learning models.
32