PHD Intervieww
PHD Intervieww
PHD Intervieww
Viyasar Mouly
[email protected]
October,2023
Abstract
In the era of healthcare, and its related research fields, the dimensionality problem of high
dimensional data is a massive challenge as it contains a huge number of variables forming
complex data matrices. The demand for dimension reduction of complex data is growing
immensely to improvise data prediction, analysis and visualization. In general, dimension
reduction techniques are defined as a compression of dataset from higher dimensional matrix
to lower dimensional matrix. Several computational techniques have been implemented for
data dimension reduction, which is further segregated into two categories such as feature
extraction and feature selection. There are various feature extraction and feature selection
methods has been carried out with a systematic comparison of several dimension reduction
techniques for the analysis of high dimensional data and to overcome the problem of data
loss.
Keywords: Feature extraction ,Feature selection ,High Data Dimensional, Principle
component analysis
I. Introduction
The present century is the century of data. We are collecting and processing data of all kinds
on scales unimaginable earlier. High-dimensional data have been regarded as one of the most
important types of big data in practice. It happens frequently in practice including genetic
study, financial study, and geographical study. High-dimensional data analysis is the study of
data sets where the number of features is comparable to or larger than the number of
observations. This type of data is closely related to machine learning and AI. Data where the
number of features (variables observed), p, are close to or larger than the number of
observations (or data points), n. Data where the number of features p is larger than the
number of observations N, often written as p >> N.
2.Literature Review:
Recently a term called “high dimensional data (HDD)” is a buzzword in medical science, data science
and healthcare sectors (Alexander and Wang 2017; Hossain and Muhammad 2016). Its application has
tremendous impact on data analysis, visualization, processing and classification. Huge amount of
patient data can be recorded which could be utilized by machine learning for the benefit of health care
sector (Archenaa and Mary Anita 2015). A dataset represents a statistical data matrix with domains or
subjects in rows and variables in columns. An individual column from the input dataset is termed as
features. Technically a feature is the measurable properties of sampled data. The HDD depends on
three factors such as data velocity (rate at which data is generated), data veracity (types of data) and
data volume. It is useful in data interpretation, management, analysis and visualization (Raghupati and
Raghupati 2014). But the storage, processing and maintenance of such a mass feature of HDD needs a
lot of memory space which may result in data loss. Further data privacy, global data transparency, data
storage and security are some of the unavoidable issues in research fields of data analysis (Deyan and
Zhao 2012).
3.Objective
Dimension reduction is the main topic related to this problem that refers to the transformation
of high-dimensional data to a low-dimensional representation. Feature extraction is the
process of transforming the raw data into mass feature subset either in time or frequency
domain or both time–frequency domains. However feature selection is a prior technique of
choosing some of the best features from the feature subset that boost the research area in data
interpretation and analysis. Further feature selection scheme is needed when we want to
determine the “best” feature subset for most approachable data anticipation
High-dimensional data refers to datasets with a large number of variables or features relative
to the number of observations. This scenario presents unique challenges, such as the curse of
dimensionality and sparsity. High-dimensional data analysis aims to uncover relevant
structures, relationships and patterns in these datasets. Techniques used in this context include
variable selection methods (e.g., LASSO, ridge regression), dimensionality reduction
techniques (e.g., principal component analysis, sparse principal component analysis) and
clustering techniques tailored for high-dimensional data.
High-dimensional data analysis is crucial in understanding complex systems, making
predictions and extracting meaningful insights from data. These techniques help researchers
and practitioners uncover hidden patterns, identify important variables, reduce data
dimensionality and develop models that capture the underlying structure of the data.
4.Methodology