Da CH2 Slqa

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

1. Define machine learning.

= Machine learning is a subfield of artificial intelligence, which is broadly defined as the


capability of a machine to imitate intelligent human behavior. Artificial intelligence systems
are used to perform complex tasks in a way that is similar to how humans solve problems.
2. Define deep learning?
= Deep learning is a type of machine learning and artificial intelligence (AI) that imitates the
way humans gain certain types of knowledge.
3. List types of machine learning.
= 1) supervised learning, 2) unsupervised learning, and 3) reinforcement learning.
4. Enlist three parameters for machine learning.
 = Model Parameters: These are the parameters in the model that must be determined
using the training data set. ...
 Hyper parameters: These are adjustable parameters that must be tuned in order to obtain
a model with optimal performance
5. Define classification and regression.
= Classification is the task of predicting a discrete class label. Regression is the task of
predicting a continuous quantity.
6. Define reinforcement machine learning.
= Reinforcement learning is a machine learning training method based on rewarding
desired behaviors and/or punishing undesired ones. In general, a reinforcement learning
agent is able to perceive and interpret its environment, take actions and learn through trial
and error.
7. State any two uses of machine learning.
= Machine learning is used in internet search engines, email filters to sort out spam,
websites to make personalised recommendations, banking software to detect unusual
transactions, and lots of apps on our phones such as voice recognition.
8. Define Neural Networks (NNs).
= a computer architecture in which a number of processors are interconnected in a manner
suggestive of the connections between neurons in a human brain and which is able to learn
by a process of trial and error. — called also neural net.
9. Define Artificial intelligence (AI).
= Artificial intelligence is the simulation of human intelligence processes by machines,
especially computer systems. Specific applications of AI include expert systems, natural
language processing, speech recognition and machine vision.
10. List AI applications. Any two.
= Specific applications of AI include expert systems, natural language processing, speech
recognition and machine vision.
11. Define model. How it is use for?
= Data modeling is the process of creating a visual representation of either a whole
information system or parts of it to communicate connections between data points and
structures. Image recognition, predicting demographics such as population growth or health
metrics
12. Define supervised machine learning.
= supervised machine learning, is a subcategory of machine learning and artificial
intelligence. It is defined by its use of labeled datasets to train algorithms that to classify
data or predict outcomes accurately.
13. Give purpose of k-NN algorithm.
= The abbreviation KNN stands for “K-Nearest Neighbour”. It is a supervised machine
learning algorithm. The algorithm can be used to solve both classification and regression
problem statements. The number of nearest neighbours to a new unknown variable that
has to be predicted or classified is denoted by the symbol 'K'.
14. Define decision tree.
= A decision tree is a graph that uses a branching method to illustrate every possible output
for a specific input
15. What is the purpose of SVM?
= The purpose of the support vector machine algorithm is to find a hyperplane in an N-
dimensional space (N — the number of features) that distinctly classifies the data points.
To separate the two classes of data points, It can handle both classification and regression
on linear and non-linear data.
16. Give use of Naive Bayes.
= Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems. It is mainly used in text classification
that includes a high-dimensional training dataset.
17. Define unsupervised machine learning.
= unsupervised machine learning, uses machine learning algorithms to analyze and cluster
unlabeled datasets. These algorithms discover hidden patterns or data groupings without
the need for human intervention.
18. Define clustering
= In machine learning too, we often group examples as a first step to understand a subject
(data set) in a machine learning system. Grouping unlabeled examples is called clustering.
As the examples are unlabeled, clustering relies on unsupervised machine learning.
19. Define association rule mining.
= Association rule mining is a procedure which is meant to find frequent patterns,
correlations, associations, or causal structures from data sets found in various kinds of
databases such as relational databases, transactional databases, and other forms of data
repositories.
20. What is the purpose of Apriori algorithm?
= The Apriori algorithm is used for mining frequent itemsets and devising association rules
from a transactional database. The parameters “support” and “confidence” are used.
Support refers to items' frequency of occurrence; confidence is a conditional probability.
Items in a transaction form an item set.
21. Define anomaly detection.
= Anomaly detection is the process of finding outliers in a given dataset. Outliers are the
data objects that stand out amongst other objects in the dataset and do not conform to the
normal behavior in a dataset.
22. Differentiate between supervised and unsupervised machine learning.
= Supervised ML: * Uses off-line analysis * Number of Classes are known * Accurate and
Reliable Results * Very Complex * Uses Known and Labeled Data as input
Unsupervised ML: * Uses Real Time Analysis of Data * Number of Classes are not known
* Moderate Accurate and Reliable Results * Less Computational Complexity * Uses Unknown
Data as input
23. Define semi-supervised machine learning.
= Semi-supervised machine learning is a combination of supervised and unsupervised
machine learning methods. With more common supervised machine learning methods, you
train a machine learning algorithm on a “labeled” dataset in which each record includes the
outcome information.
24. Define regression analysis.
= Regression is a technique for investigating the relationship between independent
variables or features and a dependent variable or outcome. It's used as a method for
predictive modelling in machine learning, in which an algorithm is used to predict
continuous outcomes.
25. Define regression model.
= A regression model is a statistical model that estimates the relationship between one
dependent variable and one or more independent variables using a line (or a plane in the
case of two or more independent variables).
26. What is logistic regression?
= Logistic regression is a statistical analysis method to predict a binary outcome, such as
yes or no, based on prior observations of a data set.
27. Define linear regression.
= Linear regression analysis is used to predict the value of a variable based on the value of
another variable.
28. Define polynomial regression.
= Polynomial regression, like linear regression, uses the relationship between the variables
x and y to find the best way to draw a line through the data points.
29. List ensemble techniques.
= bagging, stacking, boosting, blending
30. Define classification.
= classification refers to a predictive modeling problem where a class label is predicted for
a given example of input data.
31. Define cluster and clustering.
= Clustering or cluster analysis is a machine learning technique, which groups the
unlabelled dataset. It can be defined as "A way of grouping the data points into different
clusters, consisting of similar data points.
32. Enlist types of clustering.
= 1) Centroid-based Clustering. 2) Density-based Clustering. 3) Distribution-based
Clustering. 4) Hierarchical Clustering.
33. Give long form of DBSCAN.
= DBSCAN stands for density-based spatial clustering of applications with noise. It is
able to find arbitrary shaped clusters and clusters with noise (i.e. outliers).
34. Define SOM.
= A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised
machine learning technique used to produce a low-dimensional (typically two-dimensional)
representation of a higher dimensional data set while preserving the topological structure of
the data.
1. What's is machine learning advantages and disadvantages
= Advantages: It is automatic: the whole process of data interpretation and analysis is
done by computer. It is used in various fields: Machine learning is used in various fields
of life like education, medicine, engineering, etc. It can handle varieties of data: it can
handle a variety of data. It is multidimensional as well as a multi tasker. Scope of
advancement: As humans after gaining experience improve themselves in the same way
machine learning improve themselves and become more accurate and efficient in
work. Can identify trends and patterns: A machine can learn more when it gets more
data and since it gets more data it also learns the pattern and trend Considered best for
Education: education is dynamic and nowadays smart classes, distance learning, and e-
learning for students have increased a lot
Disadvantage: Chance of error or fault is more: considered to be more accurate it is
highly vulnerable. Data requirement is more: more data is required to input to the
machine for better forecasting or decision making. But it may sometimes not be possible.
Time-consuming and more resources required: effectiveness and efficiency can only
come through experience which again requires time. Inaccuracy of interpretation of
data: a little manipulation or biased data could lead to a long drawn error chain and
therefore there are chances of the inaccuracy of interpretation also. More space
required: As more data is required for interpretation more space is required to store the
data
2. What's is machine learning various applications?
= 1) Traffic Alerts 2) Transportation and Commuting 3)Products Recommendations
4)Virtual Personal Assistants 5) Self Driving Cars 6) Dynamic Pricing 7) Google Translate
8) Online Video Streaming 9)Fraud Detection 10) Social Media
2.1 Deep learning How it works? Explain diagrammatically.
= Deep learning networks learn by discovering intricate structures in the data they
experience. By building computational models that are composed of multiple processing
layers, the networks can create multiple levels of abstraction to represent the data.
3. What is its purpose AI?
= the goal of AI is to provide software that can reason on input and explain on output. AI will
provide human-like interactions with software and offer decision support for specific tasks
3.1 Advantages and Disadvantages AI.
= Advantages: It defines a more powerful and more useful computers. It introduces a new
and improved interface for human interaction. It introduces a new technique to solve new
problems. It handles the information better than humans. It is very helpful for the
conversion of information into knowledge. It improves work efficiency so reduce the
duration of time to accomplish a task in comparison to humans.
Disadvantages: The implementation cost of AI is very high. The difficulties with software
development for AI implementation are that the development of software is slow and
expensive. Few efficient programmers are available to develop software to implement
artificial intelligence. A robot is one of the implementations of Artificial intelligence with them
replacing jobs and lead to serve unemployment. Machines can easily lead to destruction if
the implementation of machine put in the wrong hands the results are hazardous for human
beings.
4. With the help of diagram describe relationship between AI, ML and DL.
= ML refers to an AI system that can self-learn based on the algorithm. Systems that get
smarter and smarter over time without human intervention is ML. Deep Learning (DL) is a
machine learning (ML) applied to large data sets. Most AI work involves ML because
intelligent behaviour requires considerable knowledge.
5. Write a short note on: Learning models for algorithms.
= What is Algorithmic Modeling. Algorithmic Modeling in Audience Manager refers to the
use of data science to either expand your existing audiences or classify them into
personas. This is done through two types of algorithms: Look-Alike Modeling and Predictive
Audiences.
7. With the help of suitable diagram describe machine learning model.
= A machine learning model is a file that has been trained to recognize certain types of
patterns. You train a model over a set of data, providing it an algorithm that it can use to
reason over and learn from those data.
8. How to engineer features of model?
= 1) Imputation. When it comes to preparing your data for machine learning, missing values are one of the
most typical issues. 2) Handling Outliers. Outlier handling is a technique for removing outliers from a
dataset. 3) Log Transform. 4) One-hot encoding. 5) Scaling.
9. How to train and validate a model? Describe in detail.
= Train: Step 1: Begin with existing data. Machine learning requires us to have existing data—not the
data our application will use when we run it, but data to learn from. ...
Step 2: Analyze data to identify patterns. ...
Step 3: Make predictions.
Validate: 1) Create the Development, Validation and Testing Data Sets. 2) Use the Training Data Set to
Develop Your Model. 3) Compute Statistical Values Identifying the Model Development Performance. 4)
Calculate the Model Results to the Data Points in the Validation Data Set.
10. What are the types of machine learning? Compare them.
=
Criteria Supervised Learning Unsupervised Learning Reinforcement Learning
Input Data Input data is labelled. Input data is not labelled. Input data is not predefined.
Problem Learn pattern of inputs and Divide data into classes. Find the best reward between a
their labels. start and an end state.

Solution Finds a mapping equation Finds similar features in input Maximizes reward by assessing
on input data and its data to classify it into classes. the results of state-action pairs
labels.
Model Model is built and trained Model is built and trained prior to The model is trained and tested
Building prior to testing. testing. simultaneously.

Applications Deal with regression and Deals with clustering and Deals with exploration and
classification problems. associative rule mining exploitation problems.
problems.
Algorithms Decision trees, linear K-means clustering, k-medoids Q-learning, SARSA, Deep Q
Used regression, K-nearest clustering, agglomerative Network
neighbors clustering
Examples Image detection, Customer segmentation, feature Drive-less cars, self-navigating
Population growth elicitation, targeted marketing, vacuum cleaners, etc
prediction etc

11. Supervised learning How it works?


= Supervised learning uses a training set to teach models to yield the desired output. This
training dataset includes inputs and correct outputs, which allow the model to learn over
time. The algorithm measures its accuracy through the loss function, adjusting until the
error has been sufficiently minimized.
11.1 Supervised learning Advantages and Disadvantages.
= Advantages: Supervised learning in Machine Learning allows you to collect data or
produce a data output from the previous experience. Helps you to optimize performance
criteria using experience. Supervised machine learning helps you to solve various types of
real-world computation problems.
Disadvantages: Computation time is vast for supervised learning. Unwanted data downs
efficiency. Pre-processing of data is no less than a big challenge. Always in need of
updates. Anyone can overfit supervised algorithms easily.

12. k-NN How it works? Explain diagrammatically. = KNN works by finding the distances
between a query and all the examples in the data, selecting the specified number examples
(K) closest to the query, then votes for the most frequent label (in the case of classification)
or averages the labels (in the case of regression).

12.1 k-NN Advantages and Disadvantages.


= Advantages of KNN: 1) No Training Period: KNN is called Lazy Learner (Instance
based learning). It does not learn anything in the training period. It does not derive any
discriminative function from the training data. In other words, there is no training period for
it. It stores the training dataset and learns from it only at the time of making real time
predictions. This makes the KNN algorithm much faster than other algorithms that require
training e.g. SVM, Linear Regression etc.
2) Since the KNN algorithm requires no training before making predictions, new data can
be added seamlessly which will not impact the accuracy of the algorithm.
3) KNN is very easy to implement. There are only two parameters required to implement
KNN i.e. the value of K and the distance function (e.g. Euclidean or Manhattan etc.)
Disadvantages of KNN: 1) Does not work well with large dataset: In large datasets,
the cost of calculating the distance between the new point and each existing points is huge
which degrades the performance of the algorithm.
2) Does not work well with high dimensions: The KNN algorithm doesn't work well with
high dimensional data because with large number of dimensions, it becomes difficult for the
algorithm to calculate the distance in each dimension.
3) Need feature scaling: We need to do feature scaling (standardization and
normalization) before applying KNN algorithm to any dataset. If we don't do so, KNN may
generate wrong predictions.
4) Sensitive to noisy data, missing values and outliers: KNN is sensitive to noise in the
dataset. We need to manually impute missing values and remove outliers.
13. What is decision tree Advantages and Disadvantages.
= Advantages: 1) Compared to other algorithms decision trees requires less effort for data
preparation during pre-processing. 2) A decision tree does not require normalization of data.
3) A decision tree does not require scaling of data as well. 4) Missing values in the data also
do NOT affect the process of building a decision tree to any considerable extent. 5) A
Decision tree model is very intuitive and easy to explain to technical teams as well as
stakeholders.
Disadvantage: 1) A small change in the data can cause a large change in the structure of
the decision tree causing instability. 2) For a Decision tree sometimes calculation can go far
more complex compared to other algorithms. 3) Decision tree often involves higher time to
train the model. 4) Decision tree training is relatively expensive as the complexity and time
has taken are more. 5) The Decision Tree algorithm is inadequate for applying regression
and predicting continuous values.
16. Unsupervised and advantages and disadvantages.
= Advantages: 1) No previous knowledge of the image area is required. 2) The opportunity for
human error is minimized. 3) It produces unique spectral classes. 4) Relatively easy and fast to
carry out.
Disadvantages: 1) The spectral classes do not necessarily represent the features on the
ground. 2) It does not consider spatial relationships in the data. 3) It can take time to interpret the
spectral classes.
17. With the help of example explain k-means clustering algorithm.
= K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled
dataset into different clusters. Here K defines the number of pre-defined clusters that need
to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be
three clusters, and so on.
18. Association rule mining Describe with the example.
= The example, which seems to be fictional, claims that men who go to a store to buy
diapers are also likely to buy beer. Data that would point to that might look like this: A
supermarket has 200,000 customer transactions.
20. What is semi-supervised machine learning advantages and disadvantages.
= Advantages: 1) It is easy to understand. 2) It reduces the amount of annotated data used. 3) It
is a stable algorithm. 3) It is simple. 4) It has high efficiency.
Disadvantages: 1) Iteration results are not stable. 2) It is not applicable to network-level data. 3) It
has low accuracy.
22. Define logistic regression with assumptions.
= Logistic regression is a statistical analysis method to predict a binary outcome, such as
yes or no, based on prior observations of a data set.
regression assumptions: 1) Linear relationship. 2) Multivariate normality. 3) No or little
multi collinearity. 4) No auto-correlation. 5) Homoscedasticity.
23. Write a short note on: Ensemble techniques.
= Ensemble methods are techniques that aim at improving the accuracy of results in
models by combining multiple models instead of using a single model. The combined
models increase the accuracy of the results significantly. This has boosted the popularity of
ensemble methods in machine learning.
24. With the help of example explain concept of classification. Also list various
classification techniques.
= Explanation:The definition of classifying is categorizing something or someone into a
certain group or system based on certain characteristics. An example of classifying is
assigning plants or animals into a kingdom and species. An example of classifying is
designating some papers as "Secret" or "Confidential."
Classification techniques: 1) Logistic Regression. 2) Naive Bayes. 3) K-Nearest Neighbors. 4) Decision
Tree. 5) Support Vector Machines.
25. What is random forest? Describe diagrammatically.
= Random Forest is a supervised machine learning algorithm made up of decision trees.
Random Forest is used for both classification and regression—for example, classifying
whether an email is “spam” or “not spam”
26. Clustering How it works? Explain with example.
= Hierarchical clustering algorithm works by iteratively connecting closest data points to
form clusters. Initially all data points are disconnected from each other; each data point is
treated as its own cluster. Then, the two closest data points are connected, forming a
cluster.
In machine learning too, we often group examples as a first step to understand a subject
(data set) in a machine learning system. Grouping unlabeled examples is called clustering.
As the examples are unlabeled, clustering relies on unsupervised machine learning.
27. Describe various clustering techniques. Describe two of them in short.
= The various types of clustering are: 1. Connectivity-Based Clustering (Hierarchical
Clustering) 2. Divisive Approach. 3. Agglomerative Approach
Hierarchical-based clustering is typically used on hierarchical data, like you would get
from a company database or taxonomies. It builds a tree of clusters so everything is
organized from the top-down.
Agglomerative clustering is best at finding small clusters. The end result looks like a
dendrogram so that you can easily visualize the clusters when the algorithm finishes.
28. What is DBSCAN clustering? Explain with example.
= DBSCAN is a density-based clustering algorithm that works on the assumption that
clusters are dense regions in space separated by regions of lower density. It groups
'densely grouped' data points into a single cluster.
It is able to find arbitrary shaped clusters and clusters with noise (i.e. outliers). The main
idea behind DBSCAN is that a point belongs to a cluster if it is close to many points from
that cluster.
30. What is reinforcement learning advantages and disadvantages.
= Advantages: 1) Reinforcement Learning is used to solve complex problems that cannot be
solved by conventional techniques. 2) This technique is preferred to achieve long-term results
which are very difficult to achieve. 3) This learning model is very similar to the learning of human
beings. Hence, it is close to achieving perfection.
Disadvantages: 1) Too much reinforcement learning can lead to an overload of states which can
diminish the results. 2) This algorithm is not preferable for solving simple problems. 3) This
algorithm needs a lot of data and a lot of computation. 4) The curse of dimensionality limits
reinforcement learning for real physical systems.
31. Differentiate between ssupervised, unsupervised, semi-supervised and
reinforcement machine learning.
=
Criteria Supervised ML Unsupervised ML Reinforcement ML

Trained using
Learns by using labelled unlabelled data
Definition Works on interacting with the environment
data without any
guidance.

Type of
Labelled data Unlabelled data No – predefined data
data

Type of Regression and Association and


Exploitation or Exploration
problems classification Clustering

Supervision Extra supervision No supervision No supervision

Linear Regression,
K – Means, Q – Learning,
Algorithms Logistic Regression,
C – Means, Apriori SARSA
SVM, KNN etc.

Discover underlying
Aim Calculate outcomes Learn a series of action
patterns

Recommendation
Risk Evaluation, Forecast
Application System, Anomaly Self Driving Cars, Gaming, Healthcare
Sales
Detection

32. What is meant by predicting new observations? Explain in detail.


= Structurally, predictions are identical with explanations. They have, like explanations,
covering laws and initial conditions with the difference that in explanations the conclusion
already occurs, and the explanans are sought, but in predictions the explanans are given
and the conclusion is sought.

You might also like