Normalization A Preprocessing Stage
Normalization A Preprocessing Stage
Normalization A Preprocessing Stage
net/publication/274012376
CITATIONS READS
35 1,049
2 authors, including:
2 PUBLICATIONS 36 CITATIONS
SEE PROFILE
All content following this page was uploaded by S GOPAL KRISHNA PATRO on 01 April 2015.
Research Scholar, Department of CSE & IT, VSSUT, Burla, Odisha, India1
Assistant Professor, Department of CSE & IT, VSSUT, Burla, Odisha, India2
Abstract: As we know that the normalization is a pre-processing stage of any type problem statement. Especially
normalization takes important role in the field of soft computing, cloud computing etc. for manipulation of data like
scale down or scale up the range of data before it becomes used for further stage. There are so many normalization
techniques are there namely Min-Max normalization, Z-score normalization and Decimal scaling normalization. So by
referring these normalization techniques we are going to propose one new normalization technique namely, Integer
Scaling Normalization. And we are going to show our proposed normalization technique using various data sets.
Normalization is scaling technique or a mapping Parameter is called as Z-score Normalization [3-6]. So the
technique or a pre processing stage [1]. Where, we can unstructured data can be normalized using z-score
find new range from an existing one range. It can be parameter, as per given formulae:
helpful for the prediction or forecasting purpose a lot [2]. −
As we know there are so many ways to predict or forecast ′=
( )
but all can vary with each other a lot. So to maintain the
large variation of prediction and forecasting the
Normalization technique is required to make them closer. Where,
But there is some existing normalization techniques as vi’ is Z-score normalized one values.
mentioned in my abstract section namely Min-Max, Z- vi is value of the row E of ith column
score & Decimal scaling excluding these technique we are std (E) = ∑ ( − )
presenting new one technique called Integer Scaling ( )
technique. This technique comes from the AMZD = ∑ or mean value
(Advanced on Min-Max Z-score Decimal scaling) [3-6].
II. RELATED STUDY In this technique, suppose we are having five rows
namely X Y, Z, U, and V with different variables or
The descriptions of existing normalization methodology columns that are ‘n’ in each row. So in each row above z-
are given below: score technique can be applied to calculate the normalized
ones. If suppose some row having all the values are
The technique which provides linear transformation on identical, so the standard deviation of that row is equal to
original range of data is called Min-Mix Normalization zero then all values for that row are set to zero. Like that
[3-6]. The technique which keeps relationship among Min-Max normalization the z-score also gives the range
original data is called Min-Mix Normalization. Min-Max of values between 0 and 1.
normalization is a simple technique where the technique
can specifically fit the data in a pre-defined boundary The technique which provides the range between -1 and 1
with a pre-defined boundary. is nothing but Decimal Scaling [3-6]. So, as per the
decimal scaling technique,
As per Min-Max normalization technique, =
Where,
A’= ∗( − )+
vi is the scaled values
Where, v is the range of values
A’ contains Min-Max Normalized data one j is the smallest integer Max(|vi|)<1
If pre defined boundary is [C, D]
If A is the range of original data But as we all know about these above mentioned
& B is the mapped one data then, techniques well. But the proposed technique one we will
discuss in coming section details:
The technique which gives the normalized values or range
technique with different data sets.
III. PROPOSED MODEL Below we are comparing our proposed technique with
Min-Max normalization technique through table as well
As we have studied so many research article, the as through graph with different data sets like BSE sensex,
researchers or scholars who are working in the area of soft NNGC and college enrollment data set.
computing, data mining etc. and excluding these areas
other areas like Image processing, cloud computing etc., TABLE I
of different branches or discipline. If their area of research BSE_SENSEX Data Set [7]
related to dataset, then must of the dataset are not well
structured or dataset are unstructured. Sl. No. Original Min-Max Integer
Data Normalization Scaling
So to make the dataset well structured or make it into the Normalization
structured one, we proposed one technique, which gives 1 1229 0.0976 0.229
the scaled or transformed or structured or normalized one 2 1264 0.129 0.264
dataset for our research work within the range 0 and 1. 3 1397 0.25 0.397
4 1455 0.303 0.455
5 1483 0.3284 0.483
As like Min-Max, z-score, z-score standard deviation,
6 1523 0.385 0.523
decimal scaling normalization technique, our proposed
7 1548 0.388 0.548
normalization technique (AMZD normalization) also 8 1594 0.429 0.594
gives the range of values between 0 and 1. 9 1670 0.498 0.670
10 1680 0.5076 0.680
Our proposed normalization technique having following
features:-
Individual element scaling or transformation
technique.
Independent of amount of data (large or medium
or small data set)
Independent of size of data (number of digits in
each element)
Scale can be done between 0 and 1.
Is applicable for integer numbers only.
(| |) ∗(| |)
Y=
IV. CONCLUSION
As we have studied that, our normalization technique
works well in each and every field of research work like
soft computing (we are working), image processing and
cloud computing etc. so well,. So we planned it to
Fig.2 Comparison Graph on Min-Max Vs Proposed Technique for
propose some other types of normalization technique and
NNGC Dataset also use our technique into the fast going research area
namely time series financial forecasting as well wherever
TABLE III the data set concept will be arise.
Colleges Enrollment Data Set [9]
REFERENCES
Sl. No. Original Min-Max Integer [1] Shalabi, L.A., Z. Shaaban and B. Kasasbeh, Data Mining: A
Data Normalization Scaling Preprocessing Engine, J. Comput. Sci., 2: 735-739, 2006
Normalization
1 1645 0.082 0.645 [2] S.Gopal Krishna Patro, Pragyan Parimita Sahoo, Ipsita Panda,
Kishore Kumar Sahu, "Technical Analysis on Financial
2 2300 0.157 0.300 Forecasting", International Journal of Computer Sciences and
3 2472 0.176 0.472 Engineering, Volume-03, Issue-01, Page No (1-6), E-ISSN: 2347-
4 1105 0.021 0.105 2693, Jan -2015
5 7946 0.796 0.946
6 1657 0.084 0.657
7 9742 1 0.742 [3] Sanjaya K. Panda, Subhrajit Nag and Prasanta K. Jana, “A
Smoothing Based Task Scheduling Algorithm for Heterogeneous
8 4112 0.362 0.112 Multi-Cloud Environment”, 3rd IEEE International Conference on
9 917 0 0.17 Parallel, Distributed and Grid Computing (PDGC), IEEE,
10 7219 0.714 0.219 Waknaghat, 11th - 13th Dec 2014.