Automated Personality Classification Using Data Mining Techniques
Automated Personality Classification Using Data Mining Techniques
Automated Personality Classification Using Data Mining Techniques
net/publication/316547065
CITATIONS READS
0 3,701
5 authors, including:
Prajakata Gogate
Pillai Institute of Information Technology, Engineering, Media Studies and Research
1 PUBLICATION 0 CITATIONS
SEE PROFILE
All content following this page was uploaded by Prajakata Gogate on 28 April 2017.
Abstract— This project comes across areas where it scale. The manual analysis does not make sense of
has access to large amounts of person behavioral analyzing user learning experiences which are huge in
data. This data can be helpful to classify persons volume with different Internet slang and the timing of the
using Automated personality classification (APC). In user posting on the web. The sentiment analysis of the
this project, the system proposes an advanced APC – user collected data does not cover much relevant
automated personality classification system. The experience because even for a human judge to determine
system uses learning algorithms like Naive Bayes and what user problems a data indicates is a more
SVM, Decision tree along with advanced data mining complicated task than to determine just the sentiment of
to mine user characteristics data and learn from the a data the people, while some non-relevant features are
patterns. This learning can now be used to used in the judgment. Humans are prone to biases and
classify/predict user personality based on past prejudices which may affect the accuracy of their
classifications. The system analyses vast user judgments. Also, certain features of a Facebook profile
characteristics and behaviors and based on the or other social networks text data are difficult for humans
patterns observed, it stores its own user to grasp. For example, while the number of Facebook
characteristics patterns in a database. The system friends is clearly displayed on the profile, it is more
now predicts new user personality based on difficult for a human to determine features such as the
personality data stored by classification of previous network density.
user data. This system is useful to social networks as
well as various ad selling online networks to classify
user personality and sell more relevant ads. Also the
II. LITERATURE SURVEY
system is useful for government agencies to observe
user personality and predict new user personality on 1. Novel approaches to automated personality
a large scale. classification: Ideas and their potentials:
This paper[4] proposes several new research directions
regarding the problem of Automated Personality
Classification (APC). Firstly, we investigate possible
I. INTRODUCTION improvements of the existing solutions to the problem
Personality identification of a human being by their of APC, for which we use different combinations of
nature the APC corpora, psychological trait measurements,
an old technique. Earlier these were done manually by and learning algorithms. Afterwards, we consider
spending lot of time to predict the nature of the person. extensions of the APC problem and the related tasks,
Data mining is primarily used today by companies with such as dynamical APC and detecting personality
a strong consumer focus - retail, financial, inconsistency in a text. This entire research was
communication, and marketing organizations.Methods performed in the context of social networks and the
used to analyze the data include surveys, interviews, related data mining mechanisms.
questionnaires, classroom activities, shopping website
data, social network data about the user experiences and 2. Educational Game (Detecting personality of
problems they are facing. But these traditional methods players in an educational game):
are time consuming and very limited in scale. The One of the goals of Educational Data Mining[1] is to
manual analysis does not make sense of analyzing user develop the methods for student modeling based on
learning experiences which are huge in volume with educational data, such as; chat conversation, class
different Internet slang and the timing of the user posting discussion, etc. On the other hand, individual behavior
on the web. The sentiment analysis of the user collected and personality play a major role in Intelligent Tutoring
data does not cover much relevant experience because Systems (ITS) and Educational Data Mining (EDM).
even for a human judge to determine what user problems Thus, to develop a user adaptable system, the student’s
a data indicates is a more complicated task than to behaviors that occurring during interaction has huge
determine just the sentiment of a data. impact EDM and ITS. In this chapter, we introduce a
novel data mining techniques and natural language
processing approaches for automated detection student’s
Methods used to analyze the data include surveys,
personality and behaviors in an educational game (Land
interviews, questionnaires, classroom activities, shopping
Science) where students act as interns in an urban
website data, social network data about the user
planning firm and discuss in groups their ideas. In order
experiences and problems they are facing. But these
to apply this framework, input excerpts must be
traditional methods are time consuming and very limited
classified into one of six possible personality classes. We
in
applied this personality classification method using
machine learning algorithms, such as: Naive Bayes, Facebook profile or other social networks text data are
Support Vector Machine (SVM) and Decision Tree. difficult for humans to grasp. For example, while the
number of Facebook friends is clearly displayed on the
3. A System for Personality and Happiness profile, it is more difficult for a human to determine
Detection; features such as the network density.
This[3] work proposes a platform for estimating
personality and happiness. Starting from Eysenck's IV. PROPOSED SYSTEM
theory about human's personality, authors seek to provide
Personality classification is one of the problems
a platform for collecting text messages from social media
considered by personality psychology, a branch of
(Whatsapp), and classifying them into different
psychology. The focus of this field is the study of
personality categories. Although there is not a clear link
personality and individual differences. According to that
between personality features and happiness, some
study[4], personality can be defined as a dynamic and
correlations between them could be found in the future.
organized set of characteristics of a person, which have a
In this work, we describe the platform developed, and as
unique influence on cognition, motivation and behavior
a proof of concept, we have used different sources of
of that person. In this paper the problem of automated
messages to see if common machine learning algorithms
personality classification is considered based on
can be used for classifying different personality features
information from the following content: textual content
and happiness.
that the person wrote and meta information about a
person received on request, through social networks or
4. Using Twitter Content to Predict Psychopathy:
other means. There are studies that also include speech,
An ever-growing number of users share their thoughts
analysis of facial characteristics, gestures and other
and experiences using the Twitter micro logging
aspects of behavior, but they are not the subjects of our
service. Although sometimes dismissed as containing
study. The standard approach to solving the APC
too little content to convey significant information,
problem based on the aforementioned content is
these messages can be combined to build a larger
described in the following steps: A. Gathering the corpus
picture of the user posting them. One particularly
data, B. Determination of the personality characteristics
notable personality trait which can be discovered this
of the participants, and C. Building the model.
way is psychopathy: the tendency for disregarding
In this proposed system, there are areas where there is
others and the rule of society. In this paper, we explore
access to large amounts of person behavioral data. This
techniques to apply data mining towards the goal of
data can help us classify persons using automated
identifying those who score in the top 1.4% of a well-
personality classification (APC). In this project, propose
known psychopathy metric using information available
an advanced APC – automated personality classification
from their Twitter accounts. We apply a newly-
system. The project use learning algorithms along with
proposed form of ensemble learning, Select RUSBoost
advanced data mining to mine user characteristics data
(which adds feature selection to our earlier imbalance-
and learn from the patterns. This learning can now be
aware ensemble in order to resolve high
used to classify/predict user personality based on past
dimensionality), employ four classification learners,
classifications. The system analyses vast user
and use four feature selection techniques. The results
characteristics and behaviors and based on the patterns
show that when using the optimal choices of
observed, it stores its own user characteristics patterns in
techniques, we are able to achieve an AUC value of
a database. The system now predicts new user
0.736. Furthermore, these results were only achieved
personality based on personality data stored by
when using the Select RUSBoost technique,
classification of previous user data. This system is useful
demonstrating the importance of feature selection, data
to social networks as well as various ad selling online
sampling, and ensemble learning. Overall, we show[2]
networks to classify user personality and sell more
that data mining can be a valuable tool for law
relevant ads. Also, the system is useful for government
enforcement and others interested in identifying
agencies to observe user personality and predict new user
abnormal psychiatric states from Twitter data.
personality on a large scale.
VI. REFERENCES
[1] Fazel Keshtkar, Candice Burkett, Haiying Li and
Arthur C. Graesser,Using Data Mining Techniques to
Detect the Personality of Players in an Educational Game