Data Mining

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

Data Mining

General information
Data mining is one of the most useful techniques that help entrepreneurs, researchers, and
individuals to extract valuable information from huge sets of data. Data mining is also
called Knowledge Discovery in Database (KDD).

The knowledge discovery process includes:

Data cleaning  Data integration Data selection Data transformation Data mining

 Pattern evaluation  Knowledge presentation.


What is Data Mining?
The process of extracting information to identify patterns, trends, and useful data that would allow the business
to take the data-driven decision from huge sets of data is called Data Mining.

In general terms, “Mining” is the process of extraction of some valuable material from the earth e.g. coal
mining, diamond mining, etc. In the context of computer science, “Data Mining” can be referred to
as knowledge mining from data, knowledge extraction, data/pattern analysis, data archaeology, and data
dredging.  It is basically the process carried out for the extraction of useful information from a bulk of data or
 data warehouses.  In the case of coal or diamond mining, the result of the extraction process is coal or diamond.
But in the case of Data Mining, the result of the extraction process is not data!! Instead, data mining results are
the patterns and knowledge that we gain at the end of the extraction process. In that sense, we can think of Data
Mining as a step in the process of Knowledge Discovery or Knowledge Extraction.
Example
banks typically use ‘data mining’ to find out their prospective customers who could be interested
in credit cards, personal loans, or insurance as well. Since banks have the transaction details and
detailed profiles of their customers, they analyze all this data and try to find out patterns that help
them predict that certain customers could be interested in personal loans, etc. 
Data Mining is a process used by organizations to extract specific data from huge databases to
solve business problems. It primarily turns raw data into useful information.
Advantages of Data Mining
•The Data Mining technique enables organizations to obtain knowledge-based data.
•Data mining enables organizations to make lucrative modifications in operation and production.
•Compared with other statistical data applications, data mining is a cost-efficient.
•Data Mining helps the decision-making process of an organization.
•It Facilitates the automated discovery of hidden patterns as well as the prediction of trends and behaviors.
•It can be induced in the new system as well as the existing platforms.
•It is a quick process that makes it easy for new users to analyze enormous amounts of data in a short time.
Disadvantages of Data Mining
•There is a probability that the organizations may sell useful data of customers to other organizations for
money. As per the report, American Express has sold credit card purchases of their customers to other
organizations.
•Many data mining analytics software is difficult to operate and needs advance training to work on.
•Different data mining instruments operate in distinct ways due to the different algorithms used in their
design. Therefore, the selection of the right data mining tools is a very challenging task.
•The data mining techniques are not precise, so that it may lead to severe consequences in certain
conditions.
Data Mining Applications
Data Mining is primarily used by organizations with intense consumer demands- Retail,
Communication, Financial, marketing company, determine price, consumer preferences, product
positioning, and impact on sales, customer satisfaction, and corporate profits. Data mining
enables a retailer to use point-of-sale records of customer purchases to develop products and
promotions that help the organization to attract the customer.
Data Mining Applications
Data Mining in Healthcare:

Data Mining in Market Basket Analysis:

Data mining in Education:

Data Mining in CRM (Customer Relationship Management):

Data Mining in Fraud detection:

Data Mining in Lie Detection:

Data Mining Financial Banking:


Challenges of Implementation in Data
mining
Although data mining is very powerful, it faces many challenges during its execution. Various
challenges could be related to performance, data, methods, and techniques, etc. The process of
data mining becomes effective when the challenges or problems are correctly recognized and
adequately resolved.
Incomplete and noisy data:
The process of extracting useful data from large volumes of data is data mining. The data in the real-
world is heterogeneous, incomplete, and noisy. Data in huge quantities will usually be inaccurate or
unreliable. These problems may occur due to data measuring instrument or because of human errors.
Suppose a retail chain collects phone numbers of customers who spend more than $ 500, and the
accounting employees put the information into their system. The person may make a digit mistake
when entering the phone number, which results in incorrect data. Even some customers may not be
willing to disclose their phone numbers, which results in incomplete data. The data could get changed
due to human or system error. All these consequences (noisy and incomplete data)makes data mining
challenging.
Data Distribution:
Real-worlds data is usually stored on various platforms in a distributed computing environment.
It might be in a database, individual systems, or even on the internet. Practically, It is a quite
tough task to make all the data to a centralized data repository mainly due to organizational and
technical concerns. For example, various regional offices may have their servers to store their
data. It is not feasible to store, all the data from all the offices on a central server. Therefore,
data mining requires the development of tools and algorithms that allow the mining of
distributed data.
Complex Data:
Real-world data is heterogeneous, and it could be multimedia data, including audio and video,
images, complex data, spatial data, time series, and so on. Managing these various types of data
and extracting useful information is a tough task. Most of the time, new technologies, new tools,
and methodologies would have to be refined to obtain specific information.
Performance
The data mining system's performance relies primarily on the efficiency of algorithms and
techniques used. If the designed algorithm and techniques are not up to the mark, then the
efficiency of the data mining process will be affected adversely.
Data Privacy and Security
Data mining usually leads to serious issues in terms of data security, governance, and privacy.
For example, if a retailer analyzes the details of the purchased items, then it reveals data about
buying habits and preferences of the customers without their permission.
Data Visualization
In data mining, data visualization is a very important process because it is the primary method
that shows the output to the user in a presentable way. The extracted data should convey the exact
meaning of what it intends to express. But many times, representing the information to the end-
user in a precise and easy way is difficult. The input data and the output information being
complicated, very efficient, and successful data visualization processes need to be implemented
to make it successful
Data Mining tools
Data Mining tools have the objective of discovering patterns/trends/groupings among large sets
of data and transforming data into more refined information.

It is a framework, such as Rstudio or Tableau that allows you to perform different types of data
mining analysis.

We can perform various algorithms such as clustering or classification on your data set and
visualize the results itself. It is a framework that provides us better insights for our data and the
phenomenon that data represent. Such a framework is called a data mining tool.
Rapid Miner

Rapid Miner is one of the most popular predictive analysis systems created by the company with
the same name as the Rapid Miner. It is written in JAVA programming language. It offers an
integrated environment for text mining, deep learning, machine learning, and predictive analysis.
The instrument can be used for a wide range of applications, including company applications,
commercial applications, research, education, training, application development, machine
learning.
Rapid Miner provides the server on-site as well as in public or private cloud infrastructure. It has
a client/server model as its base. A rapid miner comes with template-based frameworks that
enable fast delivery with few errors(which are commonly expected in the manual coding writing
process)

You might also like