DWM Microproject Report GRP No.24
DWM Microproject Report GRP No.24
DWM Microproject Report GRP No.24
Thakur Polytechnic
TYCO-B
Semester-6
Academic year 2022-2023
GROUP-24(70-72)
1
Data Warehousing and Mining Techniques (70-72)
Seal of
institution
2
Data Warehousing and Mining Techniques (70-72)
ACKNOWLEDGEMENT
THANK YOU.
3
Data Warehousing and Mining Techniques (70-72)
PROPOSAL
4
Data Warehousing and Mining Techniques (70-72)
3. Proposed Methodology:
1. Getting the overview of the project and understanding the concept thoroughly.
2. Making of the proposal.
3. Collecting information about “KNN(K-Nearest Neighbours)algorithm of data
mining”.
4. Making of the Report.
5
Data Warehousing and Mining Techniques (70-72)
4. Action Plan:
Sr. Details of Planned start Planned Name of responsible team
No. activity date finish date member
5. Resources Required:
Sr. No. Name of the Specifications Quantity Remarks
resources
1 Laptop Intel i5, 8GB RAM, 512GB 1 Available
2 Microsoft Office 365 1 Available
Word
3 Internet Minimum 32 Mbps 1 Available
6
Data Warehousing and Mining Techniques (70-72)
_______________
Mr. Dhrupesh savdiya
(SUBJECT TEACHER)
7
Data Warehousing and Mining Techniques (70-72)
REPORT
8
Data Warehousing and Mining Techniques (70-72)
9
Data Warehousing and Mining Techniques (70-72)
The data mining process typically involves several steps, including data
cleaning and preprocessing, data exploration, model building, and model
evaluation. The first step involves cleaning and transforming the data to
remove any noise or inconsistencies and make it suitable for analysis. The
next step involves exploring the data to identify any patterns or relationships
that may exist.
10
Data Warehousing and Mining Techniques (70-72)
11
Data Warehousing and Mining Techniques (70-72)
Table with brief description of each and every algorithm related to dw:-
12
Data Warehousing and Mining Techniques (70-72)
13
Data Warehousing and Mining Techniques (70-72)
Natural Language
Processing (NLP)
algorithms Text analysis Analyzes and processes natural language data
Note: "DBSCAN" stands for Density-Based Spatial Clustering of Applications with Noise.
14
Data Warehousing and Mining Techniques (70-72)
Cluster analysis is used in various fields such as biology, marketing, and social
sciences to identify patterns in data and gain insights into the relationships
between different data points. For example, in biology, cluster analysis can be
used to group genes with similar expression patterns, which can help in
identifying the function of unknown genes. In marketing, cluster analysis can
be used to group customers with similar preferences, which can help in
developing targeted marketing campaigns.
15
Data Warehousing and Mining Techniques (70-72)
1. Data: Clustering analysis requires a dataset that contains the data points to be
clustered. The dataset can be of any type, such as numeric, categorical, or
mixed.
16
Data Warehousing and Mining Techniques (70-72)
Application of clustering:-
Clustering analysis has a wide range of applications in various fields, including:
17
Data Warehousing and Mining Techniques (70-72)
The basic idea behind the KNN algorithm is that similar data points
tend to be clustered together in the feature space. Therefore, if we
want to predict the label or value of a new data point, we can look at
its K nearest neighbors and use their labels or values to predict the
label or value of the new data point. The distance between two data
points is typically measured using a distance metric, such as
Euclidean distance, Manhattan distance, or cosine distance.
In the case of classification, the KNN algorithm assigns the label that
appears most frequently among the K nearest neighbors to the query
point. For example, if K=3 and the nearest neighbors of a query point
are labeled as A, A, and B, the KNN algorithm would predict the
label of the query point as A. In the case of regression, the KNN
algorithm computes the average or weighted average of the values
of the K nearest neighbors and uses this as the predicted value of the
query point.
18
Data Warehousing and Mining Techniques (70-72)
19
Data Warehousing and Mining Techniques (70-72)
Some variations of the KNN method include weighting the distance of each neighbor
based on its proximity to the new data point, using different distance measures for
different features, and optimizing the value of k for the given dataset.
In summary, the KNN method is a simple and effective way to classify new data
points based on their proximity to the nearest neighbors in the training dataset. It is
widely used in various fields such as image recognition, text classification, and
bioinformatics.
21
Data Warehousing and Mining Techniques (70-72)
• Disadvantages:
1. Computationally expensive: KNN requires calculating the distance between
each new data point and all training data points, which can be computationally
expensive and time-consuming for large data sets.
2. Sensitive to outliers: KNN is sensitive to outliers in the data set, which can
affect its performance.
3. Sensitive to the choice of k: The choice of the number of neighbors k can have
a significant impact on the performance of KNN. Choosing the optimal k value
can be challenging and may require trial and error.
4. Curse of dimensionality: KNN can suffer from the curse of dimensionality
when dealing with high-dimensional data, where the distance between data
points becomes less meaningful in higher dimensions.
22
Data Warehousing and Mining Techniques (70-72)
23
Data Warehousing and Mining Techniques (70-72)
___________
Mr. Dhrupesh Savdiya
(SUBJECT TEACHER)
24