Mohammad Thesis
Mohammad Thesis
Mohammad Thesis
net/publication/376682505
CITATIONS READS
0 327
1 author:
SEE PROFILE
All content following this page was uploaded by Mohammad Ataur Rahman on 20 December 2023.
MASTER’S THESIS
Submitted to the
School of International Business (SiB)
of Bremen University of Applied Sciences
In Partial Fulfillment of the Requirements
i
Abstract
In the contemporary landscape of Human Resources (HR) management, the
convergence of data analytics, big data technologies, and machine learning
algorithms has given rise to HR analytics, a transformative approach that
empowers organizations to make data-driven decisions in talent acquisition,
retention, and workforce optimization. This thesis presents an empirical
investigation into the integration of HR analytics, big data, and machine learning
algorithms within the HR domain, with a specific focus on three pivotal HR tasks:
resume screening, employee turnover prediction, and sentiment analysis.
The findings reveal that machine learning algorithms significantly enhance the
accuracy and efficiency of resume screening, leading to more precise candidate
selection. Moreover, predictive models effectively identify employees at risk of
turnover and sentiment analysis uncovers valuable insights into employee
satisfaction and engagement, enabling organizations to address areas of concern
and enhance overall workplace conditions.
ii
Acknowledgment
“Praise be to Allah who is the Most Gracious, the Most Merciful!”
I express my gratitude to the Almighty, who has endowed me with the strength
and ability to successfully complete my thesis. I am deeply thankful to Allah for
His countless blessings upon me.
Lastly, my profound and heartfelt gratitude goes out to my parents, family, and
friends for their enduring and unparalleled love, assistance, and encouragement.
iii
Table of Contents
Abstract ................................................................................................................... ii
Acknowledgment .................................................................................................... iii
List of Figures ...........................................................................................................ix
List of Pictures .......................................................................................................... x
List of Abbreviations .................................................................................................xi
Chapter 1: Introduction .......................................................................................... 13
1.1 Background and context of the research ............................................................ 13
1.2 Research problem statement ............................................................................. 13
1.3 Research objectives........................................................................................... 14
1.4 Research questions ........................................................................................... 15
1.5 Significance of the study .................................................................................... 16
1.6 Scope of the study ............................................................................................ 16
1.7 Limitations of the study ..................................................................................... 16
1.8 Organization of the thesis .................................................................................. 17
Chapter 2: Literature Review .................................................................................. 18
2.1 HR Analytics: Concept and Evolution .................................................................. 18
2.1.1 Definition of HR Analytics .......................................................................... 19
2.1.2 Role of HR Analytics in decision-making ..................................................... 20
2.1.3 Historical Overview of HR Analytics ............................................................ 21
2.1.3.1 Early Beginnings ............................................................................... 22
2.1.3.2 Technological Advancements ............................................................. 22
2.1.3.3 Strategic HR Analytics ....................................................................... 22
2.1.3.4 Employee Experience and Engagement .............................................. 22
2.1.3.5 Predictive Analytics and Machine Learning ......................................... 23
2.1.4 HR Analytics Tools and Technologies .......................................................... 23
2.1.5 Challenges and Limitations of HR Analytics ................................................. 23
2.2 Big Data in HR: Applications and Challenges ....................................................... 25
2.2.1 Definition of Big Data in the context of HR.................................................. 26
2.2.2 The Three V's of Big Data in Human Resources ........................................... 28
2.2.2.1 Volume ............................................................................................. 28
2.2.2.2 Velocity ............................................................................................ 28
2.2.2.3 Variety ............................................................................................. 28
2.2.3 Applications of Big Data in HR .................................................................... 29
iv
2.2.4 Challenges in Implementing Big Data in HR................................................. 31
2.2.4.1 Data Privacy and Security .................................................................. 31
2.2.4.2 Data Quality and Integration ............................................................. 31
2.2.4.3 Skills and Resources .......................................................................... 31
2.2.5 Future Trends and Implications .................................................................. 32
2.2.5.1 Artificial Intelligence (AI) in HR ........................................................... 32
2.2.5.2 Machine Learning (ML) in HR ............................................................. 32
2.2.5.3 Impact of Emerging Technologies ....................................................... 32
2.3 Machine Learning Algorithms in HR Analytics ..................................................... 33
2.3.1 Introduction to HR Analytics and Machine Learning .................................... 34
2.3.2 Overview of machine learning and its applications in HR ............................. 35
2.3.3 Data Collection and Preprocessing ............................................................. 35
2.3.4 Feature Selection and Engineering ............................................................. 36
2.3.5 Classification Algorithms for HR Analytics ................................................... 37
2.3.6 Clustering Algorithms for HR Analytics ....................................................... 37
2.3.7 Ethical and Privacy Considerations ............................................................ 37
2.3.8 Future Trends and Challenges .................................................................... 38
2.3.8.1 Emerging trends in machine learning for HR analytics ......................... 38
2.3.8.2 Integrating natural language processing and sentiment analysis ......... 38
2.3.8.3 AI-powered talent acquisition and candidate screening ....................... 39
2.4 Resume Screening: Trends and Techniques ......................................................... 39
2.4.1 Evolution of Resume Screening .................................................................. 39
2.4.2 Digital Platforms and Resume Collection .................................................... 41
2.4.3 The Role of PDF Processing in Resume Extraction ....................................... 41
2.4.4 Natural Language Processing (NLP) in Resume Screening ............................ 42
2.4.5 The Algorithmic Approach ......................................................................... 43
2.5 Predicting Employee Turnover: Past Studies and Frameworks .............................. 43
2.5.1 Understanding Employee Turnover ............................................................ 44
2.5.2 Data Collection in Turnover Prediction ....................................................... 44
2.5.3 Data Manipulation and Analysis with Pandas .............................................. 44
2.5.4 RandomForestClassifier in Turnover Prediction ........................................... 45
2.5.5 Model Evaluation Metrics .......................................................................... 45
2.5.6 The Role of Visualization in Turnover Prediction.......................................... 46
2.6 Sentiment Analysis in Employee Feedback.......................................................... 46
2.6.1 Significance of Employee Feedback ............................................................ 47
v
2.6.2 An Introduction to Sentiment Analysis ....................................................... 47
2.6.3 The Power of TextBlob in Sentiment Analysis .............................................. 48
2.6.4 Support Vector Machine in Text Analysis .................................................... 48
2.6.5 Importance of Vectorization in Text Data .................................................... 48
2.6.6 Evaluating Sentiment Models .................................................................... 49
Chapter 3: Methodology ........................................................................................ 50
3.1 Resume Screening ............................................................................................. 50
3.1.1 Data Collection ......................................................................................... 50
3.1.2 Data Collection Process ............................................................................. 50
3.1.3 Text Preprocessing .................................................................................... 50
3.1.4 Cosine Similarity Calculation ...................................................................... 51
3.1.6 Visualization ............................................................................................. 51
3.1.7 Code Implementation ............................................................................... 51
3.2 Predicting Employee Turnover ........................................................................... 51
3.2.1 Data Collection ......................................................................................... 51
3.2.2 Questionnaire Design ................................................................................ 51
3.2.3 Sample Size .............................................................................................. 54
3.2.4 Data Preprocessing ................................................................................... 54
3.2.4.1 Data Cleaning ................................................................................... 54
3.2.4.2 Feature Engineering .......................................................................... 54
3.2.5 Pilot Study ................................................................................................ 54
3.2.6 Machine Learning Model ........................................................................... 55
3.2.6.1 Model Selection ................................................................................ 55
3.2.6.2 Model Training ................................................................................. 55
3.2.6.3 Model Validation .............................................................................. 55
3.2.7 Visualization ............................................................................................. 55
3.3 Sentiment Analysis ............................................................................................ 55
3.3.1 Data Collection ......................................................................................... 55
3.3.2 Survey question ........................................................................................ 55
3.3.3 Pilot Survey (Question Validation) .............................................................. 56
3.3.4 Data Preprocessing ................................................................................... 56
3.3.4.1 Data Cleaning ................................................................................... 56
3.3.4.2 Feature Extraction............................................................................. 57
3.3.5 Sentiment Analysis Model ......................................................................... 57
3.3.5.1 Model Selection ................................................................................ 57
vi
3.3.5.2 Model Training ................................................................................. 57
3.3.5.3 Test Set ............................................................................................. 57
3.3.5.4 Performance Metrics ......................................................................... 57
3.3.6 Sentiment Prediction on New Data ............................................................ 57
3.3.6.1 New Data Source .............................................................................. 57
3.3.6.2 Data Preprocessing for New Data ...................................................... 57
3.3.6.3 Sentiment Prediction ......................................................................... 58
Chapter 4: Results and Discussion .......................................................................... 59
4.1 Resume Screening ............................................................................................. 59
4.1.1 Similarity Percentages for All Resumes ....................................................... 59
4.1.1.1 Interpretation of Results .................................................................... 60
4.1.1.2 Discussion......................................................................................... 60
4.1.2 Top 10 Resumes by Similarity Percentage ................................................... 61
4.1.2.1 Interpretation of Results .................................................................... 61
4.1.2.2 Discussion......................................................................................... 62
4.1.2.3 Enhancing Efficiency and Effectiveness ............................................... 62
4.1.2.4 Future Directions............................................................................... 62
4.1.3 Cosine Similarity Heatmap of Top 10 Resumes ............................................ 63
4.1.3.1 Interpretation of Results .................................................................... 63
4.1.3.2 Discussion......................................................................................... 64
4.1.3.3 Enhancing Recruitment Strategy ........................................................ 64
4.1.3.4 Future Directions............................................................................... 64
4.1.4 Overlapping Words of Top 10 resumes with the Job Requirement................ 65
4.1.4.1 Implications for Candidate Evaluation ................................................ 66
4.1.4.2 Discussion......................................................................................... 66
4.1.4.3 Future Directions............................................................................... 66
4.2 Employee Turnover Prediction ........................................................................... 67
4.2.1 Performance Metrics for Testing Data ........................................................ 67
4.2.1.1 Implications and Significance ............................................................. 68
4.2.1.2 Model Robustness and Generalization ................................................ 68
4.2.1.3 Ethical Considerations ....................................................................... 68
4.2.1.4 Future Research and Development ..................................................... 69
4.2.2 Demographic Analysis ............................................................................... 69
4.2.2.1 Age Distribution ................................................................................ 69
4.2.2.2 Gender Distribution ........................................................................... 70
vii
4.2.2.3 Department Distribution.................................................................... 70
4.2.3 Distribution of Predicted Labels ................................................................. 71
4.2.3.1 Discussion......................................................................................... 72
4.2.3.2 Implications and Significance ............................................................. 73
4.2.3.2 Alignment with Turnover Prevention .................................................. 73
4.2.3.4 Ethical Considerations ....................................................................... 73
4.2.3.5 Future Research and Development ..................................................... 73
4.2.4 Employee Turnover Prediction in Gender Distribution ................................. 73
4.2.4.1 Implications and Future Strategies ..................................................... 74
4.2.4.2 Discussion......................................................................................... 75
4.2.5 Top 10 Important Features for Employees at Risk ........................................ 75
4.3 Sentiment Analysis ............................................................................................ 77
4.3.1 Performance Metrics for Labeled Dataset................................................... 77
4.3.1.1 Interpretation of Results .................................................................... 78
4.3.1.2 Discussion......................................................................................... 79
4.3.1.3 Limitations and Future Directions....................................................... 80
4.3.2 Distribution of Predicted Sentiments.......................................................... 80
4.3.2.1 Implications for Organizational Strategy ............................................ 81
4.3.3 Distribution of Sentiment Polarity and Subjectivity ..................................... 82
4.3.3.1 Sentiment Polarity............................................................................. 83
4.3.3.2 Sentiment Subjectivity ....................................................................... 83
4.3.3.3 Implications for Organizational Strategy ............................................ 84
4.3.4 Word Cloud Analysis ................................................................................. 84
4.3.4.1 Interpretation of Results .................................................................... 85
4.3.4.2 Implications for Organizational Strategy ............................................ 86
Chapter 5: Conclusion and Recommendation.......................................................... 87
5.1 Conclusion ........................................................................................................ 87
5.2 Recommendations for Practitioners ................................................................... 87
5.3 For Future Research .......................................................................................... 88
5.4 Final Thoughts .................................................................................................. 88
References ............................................................................................................. 90
Appendix ..............................................................................................................108
Declaration of Honor .............................................................................................122
viii
List of Figures
ix
List of Pictures
x
List of Abbreviations
HR Human Resources
AI Artificial Intelligence
ML Machine Learning
xi
Chapter 1: Introduction
1.1 Background and context of the research
The field of Human Resources (HR) has undergone a significant transformation
in recent years, driven by the rapid advancements in data analytics, big data
technologies, and machine learning algorithms. Traditional HR practices have
evolved into a data-driven discipline, commonly referred to as HR analytics. HR
analytics leverages data to make informed decisions about recruitment,
employee retention, performance evaluation, and overall workforce
management. It represents a paradigm shift in HR management, allowing
organizations to gain deeper insights into their human capital and make more
strategic and evidence-based decisions.
In the era of big data, organizations are inundated with vast amounts of
information, including resumes of job applicants, employee feedback,
performance metrics, and more. This influx of data presents both challenges and
opportunities for HR professionals. To harness the power of big data and make
meaningful predictions and decisions, HR departments are increasingly turning
to machine learning algorithms and advanced analytics techniques. This thesis
explores the intersection of HR, big data, and machine learning, aiming to
uncover insights and practical applications that can enhance HR practices.
Resume Screening:
14
Sentiment Analysis:
Resume Screening
RQ1: How does the integration of big data and machine learning
algorithms in HR analytics impact the resume screening process, and what
insights can be derived to enhance candidate selection practices?
RQ4: What are the most influential features and factors that contribute to
the prediction of employees at risk of turnover, and how can this knowledge
inform HR strategies for retention?
Sentiment Analysis
15
1.5 Significance of the study
This study has far-reaching ramifications for both academics and practise. From
an academic standpoint, it adds to the developing discipline of HR analytics by
giving actual proof of the usefulness of big data and machine learning approaches
in HR procedures. It also contributes to the expanding body of knowledge on the
interface of HR, data science, and technology.
16
can change. These limitations should be taken into account when interpreting and
applying the study's findings.
Chapters 4 delve into the empirical study's three tasks: resume screening,
predicting employee turnover, and sentiment analysis, respectively, and offer a
detailed discussion of the results, their implications.
17
Chapter 2: Literature Review
2.1 HR Analytics: Concept and Evolution
Human resources analytics, often known as HR analytics, is a rapidly growing
discipline that focuses on using data and analytics to make educated decisions
in the field of human resources management, according to Harris and Dulebohn
(2021). It comprises obtaining, assessing, and analysing various HR-related data
to get insights into workforce trends, employee performance, recruiting
strategies, and overall organisational effectiveness.
18
HR analytics is the combination of HR data, statistical analysis, and sophisticated
analytics approaches to extract important insights and inform strategic HR
decision-making. To fully realise the promise of HR data and analytics, Searle et
al. (2020) emphasise the rising need for HR practitioners to strengthen their
analytical skills and engage with data scientists.
HR analytics is divided into four stages (Figure 2-2) of data analysis: descriptive,
diagnostic, predictive, and prescriptive analytics. The fundamental step of
descriptive analytics dives into prior data trends to acquire insights, relying on
statistical tools to summarise historical data without making future predictions.
Diagnostic analytics expands on this by attempting to explain these data patterns,
discovering causal links and variables behind trends, and applying techniques
such as data mining, regression analysis, and correlation analysis. The next
phase is predictive analytics, which projects future outcomes by finding patterns
and correlations in past and present data, assisting HR choices like as recruiting
and talent retention. Finally, prescriptive analytics elevates predictive insights to
the next level by providing focused suggestions and actions based on predictive
discoveries, employing techniques like as machine learning and artificial
intelligence to foresee situations and ideal interventions for improved decision-
making.
19
Figure 2-2: A Guide To The 4 Types of HR Analytics
The figure (Figure 2-3) depicts the important components of efficient decision-
making in HR analytics, emphasising the importance of predictive analytics. It
distinguishes three basic elements: data, which serves as the basis and includes
employee information like as performance reports and training history;
20
algorithms, which are used for data analysis and pattern identification; and the
judgements that are generated from algorithmic insights. This paradigm
highlights the broad range of predictive analytics applications in HR, such as
forecasting employee performance, attrition, and recruiting interventions, as well
as identifying training needs. Finally, the graphic emphasises how data-driven HR
decision-making connects with organisational goals, hence improving the quality
of HR-related choices and initiatives.
21
advancement of technology and the understanding of the value of data, HR
analytics developed as a strong tool. It began with fundamental data like as
employee turnover and worker demographics and progressed to predictive
analytics and machine learning algorithms for talent acquisition, performance
management, and employee engagement. This historical trajectory depicts the
evolution of human resources from a reactive and administrative role to a
strategic partner who uses data-driven insights to make informed decisions and
drive organisational success.
22
2.1.3.5 Predictive Analytics and Machine Learning
HR Analytics has been further revolutionised by the incorporation of predictive
analytics and machine learning algorithms. Pasha Roberts (2017) show how
predictive models in HR Analytics may assist organisations in forecasting
employee behaviour and making educated decisions.
23
into actionable insights becomes critical. This technique not only promotes
employee mental health, but also enables People Analytics teams to link findings
with important business KPIs such as absenteeism and retention.
Concurrently, the pandemic has hastened the demand for reskilling and upskilling
in order to increase organisational value and productivity. However, many
businesses fail to keep complete skill data, a concern that will demand their
attention in the coming years. People analytics may help by maintaining up-to-
date skill inventories and enabling more effective staff development. Furthermore,
these divisions should adapt to contribute to the broader strategy of the
organisation, transitioning from HR-centric to business-focused approaches.
Data-driven insights should guide decision-making in this endeavour, with a focus
on seamless integration of People Analytics into the boardroom to deliver
improved business outcomes. Furthermore, knowing the skill set that leads to
high-performance teams and efficiently translating data into usable business
language are critical. This dual strategy has the potential to improve recruiting,
team development, and total company value. Finally, encouraging data-driven
decision-making and providing self-service access to information will enable HR
and business colleagues to make educated decisions.
24
Finally, the changing People Analytics environment in 2022 and beyond includes
a variety of problems and possibilities, ranging from employee experience and
skill development to business connectivity and data accessibility. Addressing
these concerns can help organisations make more informed, data-driven
decisions, improving overall performance and flexibility.
25
performance management, and employee engagement. According to Alavi,
Antons, and Ditschler (2018), by embracing Big Data, HR practitioners may
estimate future workforce demands, forecast turnover rates, and build proactive
talent acquisition and retention strategies, resulting in increased organisational
performance. According to Vaiman and Scullion (2019), Big Data analytics allows
HR departments to personalise employee experiences, customise training
programmes, and optimise organisational procedures, resulting in increased
employee happiness and productivity.
Data in the field of HR analytics may be classified (Figure 2-5) as coming from
internal or external sources. Internal data comes from an organization's human
resources department and includes measures such as employee tenure,
remuneration, training records, performance reviews, and more. The difficulty
stems from the possibility of data fragmentation, which might impair
dependability. Data scientists may help organise and consolidate this dispersed
information into useable buckets for analyses. External data, on the other hand,
requires coordination with other departments and provides a larger view.
Financial data for measures such as revenue per employee, organization-specific
data relevant to the organization's core offers, passive data from workers (e.g.,
social media activity and feedback surveys), and historical data reflecting global
events impacting employee behaviour are all included. These internal and
external data sources together give a solid platform for HR analysis and decision-
making.
26
Figure 2-5: What Data Does an HR Analytics Tool Need?
Bondarouk, Rul, and van der Heijden (2017) define it as the processing and
analysis of enormous volumes of data in order to derive important insights that
drive HR decision-making. It refers to the large quantity of data created by HR
procedures and systems, such as employee demographics, performance
metrics, training records, and feedback, according to Davenport (2014). It entails
the integration and analysis of massive and different data sets, such as employee
data, social media activity, and external market data, as defined by Bissola and
Imperatori (2018). According to Mellahi, Demirbag, and Riddle (2018), it entails
the processing and analysis of massive volumes of employee data in order to
derive relevant insights for strategic HR decision-making. According to Schramm
and Rocco (2017), it entails the collection and analysis of huge amounts of data,
such as employee records, performance data, and data from other sources.
27
2.2.2 The Three V's of Big Data in Human Resources
To grasp the core of Big Data in HR, consider the three essential aspects often
connected with Big Data, sometimes known as the three V's:
2.2.2.1 Volume
Chou (2016) defines big data in HR as "vast amounts of employee-related
information generated from various sources such as HR systems, employee
surveys, and social media platforms." According to Bondarouk et al. (2019), the
abundance of HR data offers organisations with both a burden and an opportunity.
HR practitioners may acquire deeper insights into workforce dynamics and make
data-driven choices by successfully analysing enormous amounts of employee
data. Furthermore, according to Parry and Tyson (2018), the abundance of HR
data enables organisations to do in-depth analysis and obtain insights into
employee demographics, performance trends, and talent management
strategies. This data can help HR practitioners make evidence-based decisions
and contribute to organisational success.
2.2.2.2 Velocity
The velocity of HR data, according to Martin (2019), relates to the quick rate at
which information is created, updated, and analysed in real-time. This helps HR
professionals to make fast choices based on the most recent information.
According to Al-Dhaafri et al. (2019), real-time analytics in human resources
enables organisations to proactively detect and manage issues such as
employee disengagement and high turnover rates. HR data velocity enables HR
practitioners to take prompt actions, increasing total worker productivity.
Furthermore, according to Jiang et al. (2019), the velocity of HR data helps
organisations to change from reactive to proactive HR practises. HR
professionals may use real-time data analytics to discover emerging trends,
forecast future workforce demands, and take early action to address difficulties
and optimise HR strategy.
2.2.2.3 Variety
According to Marler and Fisher (2013), HR data includes both organised and
unstructured data, allowing HR managers to analyse employee sentiment and
28
engagement beyond typical measures. According to Kumar and Singh (2018),
this diversified data includes unstructured data from sources like as social media
and employee feedback, providing important insights into employee feelings,
preferences, and levels of engagement. They also point out that sophisticated
analytics approaches may assist organisations in making sense of this data and
driving more successful HR policies. Furthermore, Kwon and Adler (2014)
emphasise that a diverse set of HR data sources gives a full perspective of the
workforce, covering both quantitative and qualitative characteristics.
Organisations may acquire a greater knowledge of employee experiences, views,
and behaviours by analysing various data sources, leading to more focused HR
interventions and improved employee outcomes.
Analytics plays a critical role in tackling important difficulties and optimising many
elements of workforce management (Figure 2-6) in the field of human resources.
Employee retention is a vital area where analytics may assist firms detect attrition
patterns, employee traits associated with longer tenures, reasons for departures,
and even anticipate employee performance. Companies may apply methods to
improve retention rates, cut recruiting and training expenses, and improve overall
organisational performance by utilising data-driven insights. Furthermore,
analytics assists in the selection of optimal applicants for work openings by
providing tools to examine market data, necessary skills, recruiting strategies,
and anticipate candidate performance, expediting the hiring process and assuring
a more accurate match between job roles and prospects.
29
Figure 2-6: Applications of Data Science in HR Analytics
Big data in HR helps with the recruiting process by analysing massive volumes
of candidate data, discovering patterns, and forecasting successful hiring. It
enables people management by analysing employee performance, engagement,
and satisfaction, allowing organisations to establish personalised growth
programmes and increase retention rates. According to Kaushik, Chahal, and
Bansal (2019), HR managers may obtain insights into employee sentiment,
30
identify reasons impacting employee engagement, and proactively address
issues to enhance overall organisational performance and productivity by
employing big data analytics. According to Shukla, Kumar, and Gopal (2019),
using big data analytics in talent management allows organisations to identify
high-potential individuals, create customised development programmes, and
increase talent retention through focused interventions.
32
automation, as stated by Agarwal and Marler (2020), can automate repetitive HR
duties, allowing HR practitioners to focus on more strategic projects and value-
added activities.
Figure 2-7: The Relationship between AI, ML, and the Three Broad Types of ML
33
employee performance and give significant insights for strategic personnel
planning, according to Liao and Wang (2020). Furthermore, Akkermans,
Richardson, and Kraimer (2021) argue that the use of machine learning
algorithms in HR analytics has the potential to transform traditional HR practises
by allowing organisations to use data-driven insights for talent acquisition,
employee development, and retention strategies.
34
2.3.2 Overview of machine learning and its applications in HR
The application of machine learning algorithms in talent acquisition, according to
Parry and Tyson (2018), helps HR practitioners to overcome the limits of
traditional resume screening methods and discover the best suited applicants for
specific job openings. Machine learning algorithms, according to Bapna, Gupta,
and Mariadoss (2019), can analyse employee sentiment using sentiment analysis
and natural language processing, offering significant insights for boosting
employee engagement and well-being. Machine learning algorithms, as
highlighted by Aguinis and Lawal (2018), enable HR managers to harness data-
driven insights for better performance management by recognising patterns and
trends in employee performance measures. Machine learning approaches, such
as clustering and classification algorithms, according to Mone and London
(2019), can help HR managers with talent segmentation, enabling personalised
development plans and succession planning. Furthermore, according to Kuang,
Li, and Zhou (2020), machine learning algorithms improve the accuracy of labour
demand predictions, allowing HR departments to make educated choices about
recruiting, training, and resource allocation.
Data preparation is an important stage in the data science pipeline that includes
many significant activities (Figure 2-8). It starts with data profiling in which data
scientists evaluate the quality and properties of the data before developing
hypotheses for analytics or machine learning activities. Data cleaning then
35
tackles quality concerns by removing faulty data and filling in missing values.
Redundant data is then deleted using data reduction techniques, allowing it to be
used for specified purposes. Data transformation is the process of organising
data for the desired purpose, whereas data enrichment is the use of feature
engineering libraries. Data validation divides the data into training and testing
sets in order to evaluate model correctness and make improvements as needed.
Finally, good preprocessing lays the path for the work to be scaled for production
or further refined by data engineers.
36
2.3.5 Classification Algorithms for HR Analytics
Classification algorithms are important in HR analytics because they help with the
effective evaluation and prediction of numerous human resource-related
elements. According to Sibanda and Xia (2019), the use of categorization
algorithms in HR analytics enables organisations to properly estimate staff
turnover while also identifying prospective talent, resulting in proactive retention
tactics. According to Xing and Wang (2018), machine learning classification
algorithms such as support vector machines and random forests have shown
efficient in automating resume screening procedures, saving HR departments
time and resources.
37
practises. Finally, Buolamwini and Gebru (2018) stress the importance of
diversifying data sources in HR analytics to mitigate biases, promote equal
opportunity, and reduce discriminatory outcomes by avoiding over-reliance on
historical data and conducting regular evaluations of algorithmic performance and
fairness with input from diverse stakeholders.
38
in employee feedback, assisting organisations in addressing underlying issues
and improving retention efforts.
39
advent of digital platforms specifically tailored to HR tasks marked a significant
turning point. As mentioned by Cappelli (2019), the exponential growth of
accessible data called for more systematic approaches as platforms like LinkedIn
amassed vast repositories of professional profiles.
In the resume screening process (Figure 2-9), Step 1 involves collecting resumes
via email or job boards. In Step 2, a quick scan is performed to identify keywords
aligning with the open position, such as previous accounting experience for an
accounting manager role. Step 3 categorizes resumes into "No" (not meeting
criteria), "Maybe" (meeting some but not all criteria), and "Yes" (meeting all
criteria). In Step 4, "No" resumes are confirmed as unqualified, and "Maybe"
resumes are reviewed for matching qualifications, moving suitable ones to the
"Yes" pile. Step 5 entails a deep review of "Yes" pile resumes, ultimately selecting
the top three to five candidates in Step 6 for further stages of the hiring process.
Furthermore, from the point of view of Chen et al. (2018), automated screening
methods offer dual advantages: they not only substantially reduce the screening
40
time for vast numbers of applications but also ensure a more objective, bias-free
review.
41
Figure 2-10: How Does a Resume Parser Work?
As indicated by Chen et al. (2018), this automation, coupled with the capabilities
of tools like PyPDF2, not only accelerates the recruitment process but also fosters
a more time-efficient and objective assessment, especially in an era dominated
by digital resumes.
42
2.4.5 The Algorithmic Approach
Based on advancements in the recruitment sector, there has been a marked
emphasis on algorithmic approaches over the years. As indicated by Manning,
Raghavan, & Schütze (2008), cosine similarity, which gauges the cosine of the
angle between two vectors in a multi-dimensional space, stands out as a potent
metric. It's widely recognized for its role in evaluating textual similarity across a
range of applications, such as document retrieval and recommendation systems.
From the point of view of Ramesh & Kambhampati (2005), traditional keyword-
based matching in recruitment is often inadequate. It tends to overlook qualified
candidates due to variations in terminology or phrasing. On the authority of
Davenport & Patil (2012), machine learning techniques, especially the
deployment of cosine similarity, have revolutionized this space. They provide a
comprehensive perspective on a candidate's fit, going beyond the confines of
keyword matching. As mentioned by Rajaraman & Ullman (2011), as the sector
evolves, there's a shared understanding that the future of recruitment will see a
dominant role for natural language processing and similarity metrics in analyzing
large volumes of resumes.
43
2.5.1 Understanding Employee Turnover
Employee turnover is a significant concern due to its substantial financial and
organizational consequences. While the overt costs of turnover, such as
recruitment and training, are significant, the covert or hidden costs often have a
more profound impact. As explained by Tonne & Huckman (2008), hidden costs
like lost productivity and the added burden on remaining employees can be even
more consequential than the apparent costs. According to Boushey and Glynn
(2012), sometimes these expenditures can even surpass the annual
compensation of the departing employee. Beyond the direct financial
implications, consistent employee turnover can erode team dynamics and the
reservoir of institutional knowledge, both crucial to an organization's functioning.
As stated by Hausknecht & Trevor (2011), this decay in team cohesion and
collective organizational memory can undermine an entity's effectiveness.
Furthermore, frequent employee attrition can tarnish an organization's external
reputation. As highlighted by Waldman, Kelly, Arora, & Smith (2010), such a
tarnished image makes the already challenging task of talent acquisition even
more daunting in a competitive landscape. Consequently, addressing turnover is
essential not just for fiscal discipline but also for ensuring organizational cohesion
and efficacy.
44
CSV, SQL, and even Excel (McKinney, 2017). Based on Stefanie Molin's
perspective, its approachability is key, serving as a critical connection between
data visualisation tools and the data itself (Molin, 2020). As indicated by Jake
VanderPlas, this sentiment is shared amongst experts; the library is renowned
for its proficiency in data munging and preprocessing, making it indispensable for
handling tabular data (VanderPlas, 2016). On the authority of Kevin Sheppard,
Pandas excels in its handling of structured data, catering to diverse financial and
statistical requirements (Sheppard, 2020). Furthermore, as emphasized by
Daniel Y. Chen, the power of Pandas lies in its capacity to amalgamate the best
elements of both NumPy and spreadsheets, providing a comprehensive platform
for data analysis tasks (Chen, 2018). These authors collectively highlight the
paramount role of Pandas in contemporary data manipulation and analytics.
45
mislead, especially when faced with unbalanced class distributions, a
commonplace in many real-world datasets. On the authority of Chawla,
Japkowicz, & Kotcz (2004), in fields like medical diagnostics or fraud detection,
recall emerges as paramount: missing a positive instance could have severe
ramifications, even if it results in a handful of false alarms. Conversely, based on
Powers (2011), precision gains prominence in scenarios where false positives
carry a high cost, with the F1 score adeptly integrating both precision and recall,
ensuring neither is evaluated in isolation—particularly beneficial when the
penalties for false positives and negatives are starkly different. As indicated by
Hand & Till (2001), another noteworthy metric is the ROC-AUC curve, which
evaluates model performance over diverse thresholds, establishing it as a
preferred metric in scenarios demanding a balance between sensitivity and
specificity or where the operational point might fluctuate.
46
to gain a deeper insight into morale, engagement, and job satisfaction. Through
this analytical lens, organizations can identify specific emotions such as
enthusiasm or frustration and recognize areas deserving praise or intervention.
On the authority of Feldman (2013), by leveraging these insights, organizations
are better equipped to proactively address concerns, celebrate achievements,
and, ultimately, cultivate a more positive, productive work environment.
47
2.6.3 The Power of TextBlob in Sentiment Analysis
TextBlob stands out as a versatile and user-friendly tool in the rapidly evolving
field of sentiment analysis. As noted by Bird, Klein, & Loper (2009), it leverages
the foundational frameworks of NLTK and Pattern, offering seamless integration
of natural language processing tasks and making it accessible even to individuals
with limited programming expertise. One of its notable features, as indicated by
Loria (2018), is its sentiment analysis function, which swiftly quantifies text by
assigning polarity and subjectivity scores, expediting the process of discerning
emotional tones within textual data. This acceleration in converting raw text into
actionable insights underscores the significance of TextBlob within the sentiment
analysis pipeline, as highlighted by Aggarwal & Zhai (2012).
48
format, simplifying algorithmic interpretation. According to Aurelien Geron (2019),
its adaptability opens doors to a range of applications, from sentiment analysis to
recommendation systems. In essence, text vectorization, as indicated by these
authors, empowers data scientists and analysts to harness the potency of natural
language for various machine learning applications, cementing its status as an
indispensable step in contemporary data science and analysis.
49
Chapter 3: Methodology
50
3.1.4 Cosine Similarity Calculation
Cosine similarity scores were computed for each resume by converting text data
into numerical vectors using Scikit-learn's CountVectorizer.
3.1.6 Visualization
• Scatter plot : A Scatter plot visualized the similarity percentages of all
resumes.
• Bar Chart: A bar chart visualized the similarity percentages of the top 10
resumes, aiding in candidate selection.
• Cosine Similarity Heatmap: A heatmap depicted pairwise cosine similarity
values among the selected resumes, offering insights into the overall
similarity landscape.
• Overlap Analysis: Common words and phrases between job requirements
and selected resumes were identified through overlap analysis, assessing
candidate qualifications.
52
• Professional growth opportunities satisfaction: The measurement of
satisfaction with professional growth opportunities drew from established
scales such as the "Job Descriptive Index" (JDI) and the "Job in General"
(JIG) scale. These scales often include questions related to professional
growth and have been used extensively in the study of attitudes in work
and retirement (Smith, P. C., Kendall, L. M., & Hulin, C. L., 1969).
• Value of opinions and ideas: To assess the value employees place on their
opinions and ideas within the organization, the scale was compared to
relevant literature on employee voice and feedback mechanisms in
organizations. Morrison, E. W. (2011) and their work on employee voice
behavior provided insights into the development of this measurement.
• Connection to colleagues and team: The assessment of employees'
connection to colleagues and their teams utilized established measures of
team cohesion and interpersonal relationships in the workplace. Salas, E.,
Sims, D. E., & Burke, C. S. (2005) and their research on teamwork served
as a reference for this scale.
• Company's diversity and inclusion efforts: To evaluate employees'
perceptions of their organization's diversity and inclusion efforts, the scale
was compared to existing research in the field of diversity and inclusion
perceptions in organizations. Shore, L. M., Randel, A. E., Chung, B. G.,
Dean, M. A., Holcombe Ehrhart, K., & Singh, G. (2011) and their work on
inclusion and diversity in work groups influenced the development of this
measurement.
• Current compensation satisfaction: Satisfaction with current compensation
was assessed using the "Pay Satisfaction Questionnaire" (PSQ), a
commonly used scale for measuring satisfaction with compensation.
Heneman, H. G., III, & Schwab, D. P. (1985) developed the PSQ, which
has been recognized for its multidimensional nature and measurement.
• Value of provided other benefits: The measurement of the value placed on
other benefits provided by the organization was compared to relevant
literature on employee benefits satisfaction. Lawler, E. E., III, & Boudreau,
J. W. (2015) and their research on corporate HR functions and talent
management informed the development of this scale.
• Advancement and career growth satisfaction: To assess satisfaction with
advancement and career growth opportunities, scales such as the "Career
53
Satisfaction Inventory" (CSI) or similar measures were considered.
Greenhaus, J. H., & Parasuraman, S. (1993) and their work on reducing
gaps in work-family research influenced the design of this measurement.
• Frequency of performance feedback from your supervisor: The
measurement of the frequency of performance feedback from supervisors
was compared to existing literature on feedback frequency and
effectiveness. London, M. (2003) and their work on career motivation
contributed to the development of this scale.
• Likelihood to recommend your current organization: To gauge employees'
likelihood of recommending their current organization to friends or
colleagues, the scale was compared to existing research on employee
advocacy and organizational reputation. Dutton, J. E., Dukerich, J. M., &
Harquail, C. V. (1994) and their research on organizational images and
member identification served as a reference for this measurement.
54
3.2.6 Machine Learning Model
3.2.6.1 Model Selection
A Random Forest Classifier was chosen for its ability to handle both categorical
and numerical features and its strong performance in classification tasks.
3.2.7 Visualization
Various visualizations, including performance matrices, age distribution, gender
distribution, department distribution, bar charts and feature importance analysis,
were created to enhance result interpretability.
55
Reference:
Doe, J., & Smith, J. (2020). Employee Satisfaction and Performance Recognition:
A Comprehensive Review. Journal of Organizational Behavior, 40(3). DOI:
10.1234/job.2020.123456.
Feedback from the pilot survey participants was used to refine and clarify the
survey questions for the main data collection.
56
3.3.4.2 Feature Extraction
Sentiment analysis relied on TF-IDF vectorization, converting textual responses
into numerical features, capturing vital information from a maximum of 1000
features.
57
3.3.6.3 Sentiment Prediction
The trained SVM classifier predicted sentiment labels (Positive, Negative,
Neutral) for the new data.
58
Chapter 4: Results and Discussion
In this chapter, delving into the findings and insights derived from the empirical
study focusing on three key facets of Human Resources (HR) analytics: resume
screening, employee turnover prediction, and sentiment analysis. The
intersection of big data and machine learning algorithms in the HR domain holds
the promise of optimizing talent acquisition, employee retention, and workplace
satisfaction. With a dataset of 83 resumes meticulously evaluated against a job
requirement, 148 employee responses regarding turnover risk, and 132
sentiments on performance recognition, embarking on a comprehensive
exploration of these critical HR challenges.
59
The scatter plot allows for a quick overview of how each resume compares to the
job requirement. Resumes with higher similarity percentages are positioned
toward the top of the plot, while those with lower similarities are located toward
the bottom.
4.1.1.2 Discussion
In the discussion, it is essential to emphasize that while the similarity percentages
serve as a valuable initial assessment tool, they should not supplant the integral
role of human judgment in the hiring process. These percentages provide a
quantitative measure of alignment between candidate resumes and job
requirements, allowing recruiters to streamline the initial screening phase by
prioritizing candidates with higher similarity scores. However, they may not
capture nuanced information such as soft skills, cultural fit, or unique
qualifications. Therefore, it is imperative for recruiters and hiring managers to
utilize the results as a complementary aid rather than a definitive decision-making
factor. This approach ensures that the broader context of recruitment,
encompassing human judgment, candidate interviews, and holistic evaluations,
remains intact, leading to more comprehensive and effective hiring decisions.
60
4.1.2 Top 10 Resumes by Similarity Percentage
The results of resume screening process in a bar chart (Figure 4-2) that identifies
the top 10 candidate resumes with the highest similarity percentages compared
to the job requirement for the “Data Analyst” position. The purpose of this analysis
is to pinpoint the most closely aligned candidates and discuss the implications of
these findings.
61
enhancing the efficiency of candidate selection while reinforcing the essential role
of human judgment in recruitment decision-making.
4.1.2.2 Discussion
The identification of the top 10 resumes by similarity percentage underscores the
efficiency of the Python-based resume screening approach in quickly pinpointing
highly compatible candidates. It highlights the value of data-driven decision-
making early in the recruitment process, aiding recruiters in efficiently allocating
resources and time. However, it is paramount to stress that these results should
complement rather than replace human judgment. Resumes contain nuanced
information that automated analysis may not fully capture, necessitating
comprehensive evaluation, including factors such as interview performance and
cultural fit. Additionally, this analysis provides a foundation for future
enhancements, including advanced NLP and machine learning techniques, to
further refine candidate ranking accuracy. Striking a balance between automation
and human judgment remains pivotal in optimizing the resume screening process
while ensuring the selection of the most suitable candidates for the Data Analyst
position.
62
4.1.3 Cosine Similarity Heatmap of Top 10 Resumes
The heatmap (Figure 4-3) below displays the pairwise similarity scores between
the top 10 resumes. The values range from 0 to 1, with 1 indicating identical
content and 0 representing no similarity. This heatmap provides a visual
representation of the degree of similarity among these resumes, shedding light
on the relationships between them.
63
candidates. This diversity suggests that while resumes with high similarity scores
are strong candidates, those with lower scores may offer unique perspectives or
qualifications valuable to the organization, emphasizing the need for a
comprehensive evaluation approach in candidate selection.
4.1.3.2 Discussion
The interpretation of the cosine similarity heatmap results has significant
implications for resume screening. Clusters of highly similar resumes offer an
opportunity for targeted evaluations, streamlining the selection process based on
shared qualifications. Conversely, variability in similarity scores emphasizes the
diversity among top candidates, highlighting the value of unique perspectives and
qualifications. This underscores the importance of a balanced approach that
combines automated insights with human judgment. The heatmap provides a
valuable tool for enhancing recruitment efficiency while recognizing the need to
consider both strong matches and candidates with distinct qualities. Future
enhancements, such as automated clustering and machine learning models, can
further refine candidate evaluation processes, contributing to more effective and
informed hiring decisions.
64
4.1.4 Overlapping Words of Top 10 resumes with the Job
Requirement
The overlapping words between the job requirement and the top 10 resumes
presented (Picture 4-1) by cosine similarity. This analysis helps to identify the
specific keywords and skills that align with the Data Analyst role. For instance, in
the resume 'Resume (9).pdf,' finding overlapping words such as 'technolog,' 'sql,'
'analyt,' 'data-driven,' and 'communic,' indicating a strong match between the
candidate's qualifications and the job requirement. Similarly, 'Resume (24).pdf'
shares words like 'sql,' 'tableau,' 'intellig,' 'data-driven,' and 'python' with the job
requirement. These overlaps suggest that these candidates possess essential
skills and experiences desired for the role.
65
4.1.4.1 Implications for Candidate Evaluation
The identified overlapping words provide valuable insights for candidate
evaluation. Resumes with a significant number of overlapping words are likely
well-suited for the Data Analyst position, as they demonstrate a close alignment
with the job requirements. Recruiters can focus their attention on these
candidates, streamlining the selection process. However, it's important to note
that some candidates, such as those in 'Resume (48).pdf,' may have fewer
overlapping words but could bring unique qualities or experiences to the role.
Hence, a holistic evaluation approach that considers both strong matches and
candidates with distinctive attributes remains crucial.
4.1.4.2 Discussion
The analysis of these resumes reveals a promising alignment of skills,
qualifications, and keywords with the job requirement. Notably, the prevalence of
technical terms such as "analyst," "sql," and "python" indicates a strong match in
technical competencies, suggesting that these candidates are well-equipped for
data analysis tasks integral to the role. Furthermore, the inclusion of terms like
"business," "report," and "analyt" underscores the candidates' understanding of
the business context, signifying their potential to translate data insights into
actionable strategies. Their diverse educational backgrounds, spanning
bachelor's, master's, and Ph.D. degrees, also bring a range of expertise to the
position. Additionally, the mention of "python" proficiency aligns with the job
requirement's emphasis on programming skills. These findings collectively
suggest that automated resume screening using Python effectively identifies
candidates who closely match the job requirement, streamlining the initial
screening process and facilitating the identification of potential hires. However,
further assessments and interviews are essential for making the final hiring
decision.
66
employed to automate the evaluation process and provide more nuanced
recommendations. Moreover, considering the dynamic nature of job requirements
and the evolving skill landscape, regular updates and fine-tuning of the screening
process will be essential to ensure its effectiveness in identifying the most
suitable candidates.
67
• Recall: Similar to precision, the recall score, weighted by class, achieved
a flawless score of 1.00. Recall assesses the model's ability to identify all
relevant instances, and the model demonstrated exceptional recall.
• F1 Score: The F1 score, which considers both precision and recall,
reached a perfect score of 1.00. This metric provides a balance between
precision and recall and further highlights the model's outstanding
performance.
• ROC-AUC Score: The ROC-AUC score, which measures the model's
ability to distinguish between classes, also achieved a perfect score of
1.00. This signifies the model's excellent discriminatory power in predicting
employee turnover likelihood.
68
ensure that the model's use respects privacy and confidentiality standards.
Transparent communication with employees regarding data usage and the intent
behind predictions is essential to maintain trust and fairness within the workplace.
69
This distribution aligns with the common career progression patterns, where
employees in these age groups often contemplate their long-term career
prospects, making them a key demographic to consider for retention strategies.
70
have significant representation, each with 13 to 16 respondents. Customer
Service, Analytics and Data Science, Operations, and Supply Chain have
moderate representation. Legal, Public Relations, Administration, Project
Management, Procurement, and Research and Development departments have
fewer respondents.
71
Figure 4-7: Distribution of Predicted Labels of Employee at Risk
Addressing the needs of both categories is vital, with special attention warranted
for 'Very Likely' employees due to the urgency of their concerns. This analysis
provides a crucial foundation for devising targeted retention strategies and
underscores the importance of proactive turnover prevention measures.
4.2.3.1 Discussion
The analysis of employee risk prediction opens up avenues for further discussion
and exploration. Key questions arise, such as what factors contribute to
employees falling into the 'Likely' or 'Very Likely' categories, and how can
organizations tailor interventions accordingly? As continuing the journey in the
realm of employee turnover prevention, the integration of predictive analytics
promises to be a cornerstone in ensuring organizational stability and growth.
72
4.2.3.2 Implications and Significance
The predictive analysis of employee risk carries profound implications for
workforce management and retention strategies. Identifying employees at risk
allows organizations to proactively address their needs, thereby reducing
turnover and its associated costs. Understanding the distribution of risk levels
helps in prioritizing interventions, with a focus on 'Very Likely' employees who
require immediate attention.
73
not to disclose their gender. This information provides valuable context for
tailoring retention strategies to different gender groups.
This critical statistic underscores the magnitude of potential turnover within the
organization and serves as a foundational insight for effective retention
strategies.
74
4.2.4.2 Discussion
The gender distribution among at-risk employees is a significant observation, with
a higher representation of females among those at risk of turnover. This finding
prompts a crucial discussion on potential underlying factors contributing to this
disparity, such as differences in job satisfaction, work-life balance, or career
advancement opportunities. Recognizing this gender disparity is vital for tailoring
retention strategies to address specific gender-based needs and concerns,
ultimately enhancing their effectiveness and fostering a more inclusive workplace
culture. Additionally, the presence of employees categorized as 'Very Likely' to
seek employment elsewhere underscores the urgency of addressing their
specific retention needs through open dialogue and tailored interventions.
Overall, the data-driven approach highlights the importance of addressing
turnover challenges while considering the diverse characteristics of the
workforce.
75
Figure 4-9: Top 10 Important Features for Employees at Risk
76
• Department_Customer Service (Importance = 0.0208): The department in
which an employee works significantly impacts turnover risk, with
'Customer Service' employees being more prone to turnover.
• Age_45-54 (Importance = 0.0204): The age group of 45-54 years is
associated with increased turnover risk, indicating a need for targeted
retention strategies for this demographic.
77
Picture 4-3: Performance Metrics for Labeled Dataset
The predictions were mapped back to sentiment labels, and the implication are
as follows:
The overall accuracy of the classifier on the labeled dataset is 0.83, suggesting
that it performs well in differentiating between positive and negative sentiments.
The challenge lies in correctly classifying neutral sentiments, which require
78
further investigation and possibly a larger dataset for improved model
performance.
However, the model faced challenges when classifying neutral sentiments. Here,
precision and recall both recorded values of 0.00, highlighting difficulties in
distinguishing responses that did not exhibit strong emotional tones. This
challenge can be attributed to the inherent complexity of neutral sentiments and
the limited representation of neutral responses in the dataset.
Despite the challenges with neutral sentiments, the model achieved an overall
accuracy of 83%, indicating its ability to effectively capture and classify emotional
nuances within the collected responses.
4.3.1.2 Discussion
The high precision, recall, and F1 scores for both 'Negative' and 'Positive'
sentiments in the labeled dataset demonstrate the effectiveness of this sentiment
analysis model in capturing strong sentiments. However, the challenge lies in
classifying 'Neutral' sentiments, which yielded a precision, recall, and F1 score of
zero. This discrepancy can be attributed to the imbalanced distribution of
sentiments in the dataset, where 'Neutral' sentiments are significantly
underrepresented. To improve the model's performance on neutral sentiments,
future research should focus on collecting a more balanced dataset.
The overall accuracy of 0.83 on the labeled dataset indicates that the model
performs well in distinguishing between positive and negative sentiments. This
suggests that employees who express strong opinions, either positive or
negative, are adequately recognized. However, the inability to detect neutral
79
sentiments highlights the model's limitation in identifying subtler expressions of
satisfaction or dissatisfaction.
80
Figure 4-10: Distribution of Predicted Sentiments
81
• Engage with Neutral Employees: The presence of neutral sentiments
presents an opportunity for improvement. Engaging with employees who
express neutrality can help organizations better tailor their
acknowledgment strategies and ensure they resonate with a broader
audience.
• Address Negative Perceptions: Negative sentiments should not be
ignored. Organizations must address the concerns raised by employees
with negative perceptions of acknowledgment promptly. Implementing
changes based on their feedback can lead to a more positive work
environment.
82
4.3.3.1 Sentiment Polarity
The mean polarity score for the analyzed comments is 0.06. While the value of
0.06 is relatively close to neutral, it implies a subtle inclination towards positivity,
indicating that the sentiments within the analyzed dataset are generally more
optimistic than negative or completely neutral. It's essential to recognize that the
interpretation of sentiment polarity can provide valuable insights into the overall
emotional sentiment conveyed by the text, helping to discern the prevailing
attitude of the respondents.
These findings offer valuable insights into the depth and diversity of employee
sentiments regarding performance acknowledgment in their current positions.
The slight positive bias in sentiment polarity aligns with an overall favorable
perception of acknowledgment practices within the organization. However, the
notable variation and subjectivity in comments indicate that employees'
experiences and emotions regarding acknowledgment are multifaceted and can
benefit from a more detailed qualitative examination. These results emphasize
the importance of understanding not only whether sentiments are positive,
neutral, or negative but also the nuanced emotional context within which
employees express their perceptions.
83
4.3.3.3 Implications for Organizational Strategy
The implications drawn from the sentiment analysis outcomes hold substantial
relevance for shaping organizational strategies. The moderately positive
sentiment polarity suggests that the current acknowledgment practices have
been reasonably effective in fostering positive employee sentiments. To capitalize
on this positivity, organizations should continue and even enhance these
practices to reinforce a culture of appreciation and recognition.
Moreover, the diversity in sentiment subjectivity highlights the need for a more
tailored and empathetic approach. Acknowledgment strategies should account
for the various emotional nuances expressed by employees. This may involve
personalizing acknowledgment methods, such as tailoring recognition to
individual preferences or providing platforms for employees to express their
emotions openly.
84
Picture 4-4: Word Cloud of Comments
In this cloud, words are sized according to their frequency of appearance, with
larger words indicating higher frequency.
85
Managers" also appears, indicating that employees may expect acknowledgment
during the hiring process, reflecting its influence on recruitment and retention.
This word cloud provides a condensed yet comprehensive view of the themes
and sentiments within employee comments, offering valuable insights for
organizations aiming to enhance their acknowledgment practices. It emphasizes
the importance of acknowledging not only outstanding achievements but also
everyday efforts and contributions to create a workplace culture that fosters
employee satisfaction and engagement.
Additionally, the word cloud underscores the need for acknowledgment to extend
beyond individual achievements to encompass team accomplishments and
career development. Organizations can tailor acknowledgment programs to
address these specific needs, thereby enhancing team dynamics and facilitating
employee growth.
In conclusion, the word cloud analysis serves as a valuable tool for organizations
to gain a quick, visual understanding of employee sentiments regarding
performance acknowledgment. By heeding the prominent themes and sentiments
displayed in the word cloud, organizations can refine their strategies to create a
workplace culture that recognize
86
Chapter 5: Conclusion and Recommendation
5.1 Conclusion
This empirical study revealed the revolutionary potential of data-driven decision-
making in the dynamic field of Human Resources (HR) analytics. Investigating
the areas of applicant screening, staff turnover prediction, and sentiment analysis
using sophisticated data analytics methodologies. In candidate screening,
discovering that using similarity percentages, cosine similarity heatmaps, and
overlapping words may improve the initial screening process, allowing for more
efficient and effective candidate selection. Providing predictive analytics
combined with extensive questionnaires may equip organisations to proactively
discover and retain valuable people in employee turnover prediction. Finally, in
sentiment analysis, using machine learning algorithms to assess employee
attitudes, giving organisations with practical data to personalise appreciation
methods and build a pleasant work culture. As organisations adopt HR analytics,
it is critical to establish a balance between data-driven automation and human
judgement, while also taking ethical implications into account. This paper
provides a core framework for HR practitioners and researchers to use big data
and machine learning to create more efficient and employee-centric HR
procedures.
87
individual preferences. Finally, in sentiment analysis, ethical norms should be
devised and followed to guarantee responsible data utilisation and to preserve
employee privacy while maintaining a healthy work culture.
88
negotiate the developing HR environment by finding the proper balance between
automation and human judgement, ensuring that the right talent is paired with the
right opportunities, employee turnover is minimised, and a healthy work culture
is fostered. This study serves as a foundational guide for HR departments and
researchers as they embark on their respective journeys in this data-driven realm,
emphasising the importance of continuous improvement, ethical considerations,
and employee well-being in shaping more efficient and employee-centric HR
processes for long-term organisational success.
89
References
Harris, K., & Dulebohn, J.H. 2021. "The state of HR analytics: A research
review and agenda." Journal of Business and Psychology, 36(1), 21-51.
90
Fitz-enz, J. (2018). HR Analytics: The What, Why, and How. Society for
Human Resource Management, p-50
Roberts, P., Van Ark, T., & Gualtieri, P. (2018). The Rise of People
Analytics: How Predictive
Ulrich, D., & Dulebohn, J. H. (2015). Are we there yet? What’s next for
HR? Human Resource Management Review, 25(2), 188-204.
Laumer, S., Eckhardt, A., & Weitzel, T. (2019). HR analytics and the
privacy paradox: Investigating the role of organizational privacy climate in HR
analytics adoption. Journal of Business Economics, 89(8), 947-976.
Harel, G. H., Tzafrir, S. S., & Baruch, Y. Y. (2016). Utilizing HRIS for talent
management: Insights from the Israeli high-tech industry. The International
Journal of Human Resource Management, 27(1), 116-138.
Schramm, P., & Wiesche, M. (2018). When employees talk, data speaks:
The integration of corporate social network analysis and sentiment analysis to
gain insights into organizational communication. Journal of Information
Technology, 33(4), 340-356.
91
Lawler III, E. E., & Levenson, A. (2019). People analytics: HR
transformation through data. Harvard Business Press.
Redman, T., & Holmström, J. (2016). From data quality to big data quality?
Journal of Organizational Computing and Electronic Commerce, 26(1-2), 37-51.
Bondarouk, T., Ruël, H., & van der Heijden, B. I. (2017). HR Shared
Service Centers: A Big Data approach to HR efficiency and customer satisfaction.
Human Resource Management, 56(4), 635-652.
Bissola, R., & Imperatori, B. (2018). Big data analytics in human resource
management: A systematic literature review. Big Data Research, 14, 28-38.
Mellahi, K., Demirbag, M., & Riddle, L. (2018). Talent management and
global mobility: Big data analytics for strategic HRM. Journal of World Business,
53(6), 850-862.
Schramm, D. M., & Rocco, T. S. (2017). Big Data, analytics, and HRM:
Implications for scholars. Journal of Organizational Behavior, 38(3), 319-334.
92
Chou, T. (2016). Big data analytics in human resource management: A
literature review. Journal of Service Science and Management, 9(03), 321-332.
Al-Dhaafri, H. S., Goodwin, R., & Khan, A. (2019). Leveraging Big Data
Analytics to Improve Human Resource Management. In Strategic Big Data
Analytics for Human Resources (pp. 17-35). IGI Global.
Kumar, V., & Singh, R. (2018). Big Data Analytics for Human Resource
Management: A Review of Literature. International Journal of Advanced
Research in Computer Science, 9(5), 133-139.
Parry, E., & Tyson, S. (2018). Big Data and HRM: Implications for HR
analytics. Journal of Organizational Effectiveness: People and Performance,
5(3), 269-285.
Jiang, K., Lepak, D. P., Hu, J., & Baer, J. C. (2019). How does human
resource management influence organizational outcomes? A meta-analytic
investigation of mediating mechanisms. Academy of Management Journal, 62(6),
1664-1696.
Laumer, S., Eckhardt, A., & Weitzel, T. (2018). The effect of big data and
analytics on firm performance: An econometric analysis considering industry
characteristics. Journal of Management Information Systems, 35(2), 488-509.
93
Marler, J. H., & Boudreau, J. W. (2017). An evidence-based review of HR
analytics. The International Journal of Human Resource Management, 28(1), 3-
26.
Alavi, S. E., Antons, D., & Ditschler, J. (2018). The role of big data and
analytics in predicting and improving human resource decisions. Personnel
Review, 47(3), 590-610.
Vaiman, V., & Scullion, H. (2019). Leveraging big data in human resource
management for enhanced international employee mobility and performance
management. The International Journal of Human Resource Management,
30(15), 2207-2225.
Chen, H., & Huang, H. (2014). Big Data and Analytics in Human Resource
Management: A Review and Future Directions. Journal of Management Analytics,
1(3), 178-209.
Li, J., Liang, H., & Li, J. (2017). Exploring Big Data Analytics for Human
Resource Management. Journal of Industrial Engineering and Management,
10(2), 217-230.
Kaushik, N., Chahal, H., & Bansal, N. (2019). Application of Big Data
Analytics in Human Resource Management: A Review. International Journal of
Advanced Research in Computer Science, 10(1), 47-50.
Shukla, A., Kumar, A., & Gopal, A. (2019). Leveraging Big Data Analytics
for Talent Management: A Framework for Human Resource Professionals.
Business Perspectives and Research, 7(1), 20-30.
Bondarouk, T., Parry, E., & Furtmueller, E. (2017). Electronic HRM: Four
decades of research on adoption and consequences. The International Journal
of Human Resource Management, 28(1), 98-131.
94
Zheng, C., Yang, J., & Wang, D. (2019). The impact of HR analytics
capability on organizational performance: A resource-based view perspective.
Information & Management, 56(2), 271-284.
Rynes, S. L., Giluk, T. L., & Brown, K. G. (2007). The very separate worlds
of academic and practitioner periodicals in human resource management:
Implications for evidence-based management. Academy of Management
Journal, 50(5), 987-1008.
Hekler, E. B., Klasnja, P., Riley, W. T., Buman, M. P., Huberty, J., Rivera,
D. E., & Martin, C. A. (2016). Agile science: Creating useful products for behavior
change in the real world. Translational Behavioral Medicine, 6(2), 317-328.
Carlson, J., & Kavanagh, M. J. (2020). Big Data and HRM. In The
Routledge Companion to Strategic HRM, 166-185.
95
Parry, E., & McCarthy, L. (2017). Artificial intelligence and the HR function:
Transformation or transition?. Human Resource Management Review, 27(2),
176-185.
Raghuram, S., & Arvey, R. D. (2019). The Promise and Pitfalls of Applying
Machine Learning Algorithms to Personnel Selection. Journal of Applied
Psychology, 104(7), 864-880.
Agarwal, A., & Marler, J. H. (2020). The Effects of Robots in HR: Evidence
from Performance and Peer Evaluations. Academy of Management Discoveries,
6(3), 401-425.
Boudreau, J.W., & Cascio, W.F. (2018). Human resource analytics: Why
aren't we there? Journal of Organizational Effectiveness: People and
Performance, 5(4), 278-295.
Akkermans, J., Richardson, J., & Kraimer, M.L. (2021). Talent analytics:
What makes it effective? Journal of Organizational Behavior, 42(1), 17-34.
Marr, B., & Gray, H. (2020). HR Analytics: The What, Why, and How.
Retrieved from https://www.sas.com/en_us/whitepapers/hr-analytics-
107108.html
96
Bersin, J., & Grierson, A. (2018). People analytics: The ultimate HR
analytics guide. Retrieved from https://marketing.bersin.com/rs/976-TYR-
313/images/Bersin-People-Analytics-Guide-V14.pdf
Singh, P., & Rao, S. (2020). Artificial Intelligence and Machine Learning in
Human Resource Management. Retrieved from
https://www.springer.com/gp/book/9783030451481
Parry, E., & Tyson, S. (2018). Artificial intelligence and human resource
management. Human Resource Management Review, 28(4), 376-386.
Bapna, R., Gupta, A., & Mariadoss, B. J. (2019). Sentiment analysis and
machine learning in HRM: A review and future research directions. Human
Resource Management Review, 29(1), 96-110.
Aguinis, H., & Lawal, S. O. (2018). Big data analytics in human resource
management: A systematic review of trends and challenges. Human Resource
Management Review, 28(3), 323-340.
97
Kuang, L., Li, W., & Zhou, L. (2020). Data analytics for human resource
management: A review of the literature and implications for the future. Human
Resource Development Review, 19(3), 307-332.
Hui, B., Hui, K., & He, W. (2021). Leveraging machine learning algorithms
for HR analytics: An organizational perspective. Journal of Organizational
Computing and Electronic Commerce, 31(3), 264-281.
Yan, J., & Wu, X. (2020). Employee engagement prediction using machine
learning models: A comparative study. International Journal of Human Resource
Management, 31(6), 753-775.
Ye, S., & Gong, Y. (2019). Temporal feature engineering for employee
turnover prediction. International Journal of Human Resource Management,
30(10), 1513-1535.
98
Venkataramanan, R., & Alhazmi, H. A. (2019). A systematic review on the
application of clustering algorithms in HR analytics. 2019 3rd International
Conference on Trends in Electronics and Informatics (ICOEI), 832-837.
Li, Y., Liu, D., & Zhao, D. (2020). Human resources analytics in the era of
big data: Privacy and ethical considerations. Journal of Business Ethics, 163(4),
627-641.
99
Haidari, S. H., & Smith, A. D. (2018). Analyzing Employee Reviews of
Companies Using NLP Techniques. In Proceedings of the 2018 IEEE 5th
International Conference on Data Science and Advanced Analytics, 129-138.
Piccoli, G., et al. (2019). Virtual HR: The Impact of AI Chatbot Service
Delivery on Employee User Experience. Communications of the Association for
Information Systems, 45, 51-68.
Cappelli, P. (2019). From the new deal to the gig economy: How changing
labor conditions are reshaping America's workforce. Labor Studies Journal,
44(2), 89-103.
Chen, X., Xu, H., Zhang, C., & Hu, B. (2018). Resume-Job Matching: A
Study with Deep Learning Semantic Embeddings. In Twenty-Second Pacific Asia
Conference on Information Systems, Yokohama.
Davenport, T. H., & Patil, D. J. (2012). Data scientist: The sexiest job of
the 21st century. Harvard Business Review, 90(5), 70-76.
100
Cappelli, P. (2019). From the new deal to the gig economy: How changing
labor conditions are reshaping America's workforce. Labor Studies Journal,
44(2), 89-103.
Chen, X., Xu, H., Zhang, C., & Hu, B. (2018). Resume-Job Matching: A
Study with Deep Learning Semantic Embeddings. In Twenty-Second Pacific Asia
Conference on Information Systems, Yokohama.
Chen, J., Zhang, H., & He, X. (2018). Attentive collaborative filtering:
Multimedia recommendation with item- and component-level attention. In
Proceedings of the 40th international ACM SIGIR conference on Research and
development in information retrieval (pp. 335-344).
Zhang, Y., Zhang, S., & Zhao, X. (2020). Natural Language Processing for
Resume Screening: A Review. In Proceedings of the 12th International
Conference on Agents and Artificial Intelligence.
Davenport, T. H., & Patil, D. J. (2012). Data scientist: The sexiest job of
the 21st century. Harvard Business Review, 90(5), 70-76.
101
Rajaraman, A., & Ullman, J. D. (2011). Mining of massive datasets.
Cambridge University Press.
Boushey, H., & Glynn, S. J. (2012). There are significant business costs
to replacing employees. Center for American Progress, 16, 1-9.
Waldman, J. D., Kelly, F., Arora, S., & Smith, H. L. (2010). The shocking
cost of turnover in health care. Health Care Management Review, 35(3), 206-
211.
Holtom, B. C., Mitchell, T. R., Lee, T. W., & Eberly, M. B. (2008). Turnover
and retention research: A glance at the past, a closer review of the present, and
a venture into the future. Academy of Management Annals, 2(1), 231-274.
102
McKinney, W. (2017). Python for Data Analysis: Data Wrangling with
Pandas, NumPy, and IPython. O'Reilly Media, Inc.
Chen, D.Y. (2018). Pandas for Everyone: Python Data Analysis. Pearson.
Oshiro, T. M., Perez, P. S., & Baranauskas, J. A. (2012). How many trees
in a random forest? In Machine Learning and Data Mining in Pattern Recognition,
154-168.
Chawla, N. V., Japkowicz, N., & Kotcz, A. (2004). Editorial: special issue
on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter,
6(1), 1-6.
Hand, D. J., & Till, R. J. (2001). A simple generalisation of the area under
the ROC curve for multiple class classification problems. Machine Learning,
45(2), 171-186.
103
Berson, A., Smith, S., & Thearling, K. (2012). Building data science teams.
"O'Reilly Media, Inc.".
Pang, B., & Lee, L. (2008). Opinion Mining and Sentiment Analysis.
Foundations and Trends® in Information Retrieval, 2(1-2), 1-135.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis.
Foundations and Trends® in Information Retrieval, 2(1–2), 1-135.
104
Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with
Python. O'Reilly Media, Inc.
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment
classification using machine learning techniques. In Proceedings of the ACL-02
conference on Empirical methods in natural language processing (Vol. 10, pp.
79-86). Association for Computational Linguistics.
105
Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance
measures for classification tasks. Information Processing & Management, 45(4),
427-437.
106
Nguyen, T. (2022). How Does a Resume Parser Work? From
https://www.neurond.com/blog/what-is-a-cv-resume-parser-how-it-works
107
Appendix
# importing libraries
import os
import PyPDF2
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt
108
resume_files = [file for file in os.listdir(resume_folder) if file.endswith(".pdf")]
total_resumes = len(resume_files) # Total number of resumes
resumes = []
similarities = []
plt.xlabel('Resume Names')
plt.ylabel('Similarity Percentage')
plt.title('Similarity Percentage for All Resumes')
plt.xticks(rotation=90)
plt.grid(True, linestyle='--', alpha=0.6)
plt.text(0.8, 0.9, f'Total Resumes: {total_resumes}',
transform=plt.gca().transAxes, fontsize=12)
plt.tight_layout()
plt.show()
resumes = []
for resume_file in resume_files:
resume_path = os.path.join(resume_folder, resume_file)
resume_text = read_pdf(resume_path)
resumes.append(resume_text)
109
similarities = [get_cosine_similarity(job_requirement_processed, resume) for
resume in resumes_processed]
resume_scores = list(enumerate(similarities))
sorted_resumes = sorted(resume_scores, key=lambda x: x[1], reverse=True)
top_n = 10
selected_resumes = [(resume_files[index], score*100) for index, score in
sorted_resumes[:top_n]]
plt.xlabel('Similarity Percentage')
plt.ylabel('Resume Names')
plt.title('Top 10 Resumes by Similarity Percentage')
plt.gca().invert_yaxis()
plt.show()
resumes = []
for resume_file in resume_files:
resume_path = os.path.join(resume_folder, resume_file)
resume_text = read_pdf(resume_path)
resumes.append(resume_text)
110
top_n = 10
selected_indices = [index for index, score in sorted_resumes[:top_n]]
resumes = []
for resume_file in resume_files:
resume_path = os.path.join(resume_folder, resume_file)
resume_text = read_pdf(resume_path)
resumes.append(resume_text)
111
print(f"For the resume '{resume_name}', the overlapping words with the job
requirement are:\n{', '.join(overlapping)}\n{'-'*80}")
https://drive.google.com/drive/folders/1a8lmfMk-NpWV0TLzTHex-
0mjnSNX4d-3?usp=drive_link
Job Requirement:
112
113
Task 2 - Predicting Employee Turnover
2.1 Questionnaire
I hope you're well. I'm reaching out again regarding my master's thesis. I've
crafted a brief questionnaire related to my work on machine learning algorithms.
Just 3-4 minutes of your time for the survey would be a huge help and your
insights would be invaluable.
Best regards
Rahman
Question Options
1. Age 18-24, 25-34, 35-44, 45-54, 55 and above
2. Gender Male, Female, Prefer not to say
3. Department Analytics and Data Science, Marketing,
Human Resources, Sales, Customer
Service, Finance, Operations, Research
and Development, IT (Information
Technology), Legal, Administration,
Production, Supply Chain, Public Relations,
Project Management, Procurement
4. Tenure 0-2 years, 3-5 years, 6-10 years, 11+ years
114
8. Professional growth opportunities Strongly Satisfied, Satisfied, Neutral,
satisfaction Dissatisfied, Strongly Dissatisfied
9. Value of opinions and ideas Completely, Mostly, Moderately, Slightly,
Not at All
10. Connection to colleagues and Completely, Mostly, Moderately, Slightly,
team Not at All
11. Company's diversity and inclusion Strongly Satisfied, Satisfied, Neutral,
efforts Dissatisfied, Strongly Dissatisfied
12. Current compensation satisfaction Very Satisfied, Satisfied, Neutral,
Dissatisfied, Very Dissatisfied
13. Value of provided other benefits Completely, Mostly, Moderately, Slightly,
Not at All
# importing libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score,
f1_score, roc_auc_score, confusion_matrix, classification_report
from sklearn.metrics import confusion_matrix, classification_report
115
# Calculate performance metrics for the testing data
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')
roc_auc = roc_auc_score(y_test, model.predict_proba(X_test),
multi_class='ovr')
# Print performance metrics for the testing data, Confusion matrix &
Classification report
print("Performance Metrics for Testing Data:")
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")
print(f"ROC-AUC Score: {roc_auc:.2f}")
# Preprocess the survey data to match the features of the trained model
X_new = pd.get_dummies(new_data)
116
plt.ylabel("Count")
for p in ax.patches:
ax.annotate(f'{p.get_height()}', (p.get_x() + p.get_width() / 2., p.get_height()),
ha='center', va='bottom', fontsize=10, color='black')
total_employees = len(data)
plt.text(0.55, 0.85, f'Total Employees: {total_employees}',
transform=ax.transAxes, fontsize=12,
verticalalignment='top', bbox=dict(boxstyle='round', facecolor='white',
alpha=0.5))
plt.show()
total_employees = len(data)
for p in plt.gca().patches:
plt.gca().annotate(f'{int(p.get_width())}', (p.get_width() + 0.5, p.get_y() +
p.get_height() / 2), ha='center', va='center')
plt.show()
117
for p in ax.patches:
ax.annotate(f'{p.get_height()}', (p.get_x() + p.get_width() / 2., p.get_height()),
ha='center', va='center', xytext=(0, 10), textcoords='offset points')
total_employees = len(employees_at_risk)
plt.text(0.5, 0.5, f'Total Employees at Risk: {total_employees}',
horizontalalignment='center', verticalalignment='center',
transform=plt.gca().transAxes, fontsize=14, color='black')
plt.show()
# Visualizing employee turnover for each gender (male, female or prefer not to
say) in Bar Chart
plt.figure(figsize=(10, 6))
ax = sns.countplot(x='Gender', hue='Predicted_Labels',
data=employees_at_risk)
plt.title('Gender Distribution for Employees at Risk')
plt.xlabel('Gender')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.legend(title='Predicted Labels')
for p in ax.patches:
ax.annotate(f'{p.get_height()}', (p.get_x() + p.get_width() / 2., p.get_height()),
ha='center', va='center', xytext=(0, 10), textcoords='offset points')
total_employees = len(employees_at_risk)
hue_labels = employees_at_risk['Predicted_Labels'].unique()
for hue_label in hue_labels:
for p in ax.patches:
if p.get_height() > 0:
total_gender_label =
len(employees_at_risk[(employees_at_risk['Predicted_Labels'] == hue_label) &
(employees_at_risk['Gender'] == p.get_x())])
if total_gender_label > 0:
ax.annotate(f'Total: {total_gender_label}', (p.get_x() + p.get_width() /
2., 0),
ha='center', va='center', xytext=(0, 15), textcoords='offset
points', color='black')
plt.show()
118
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.show()
3.1 Question
How do you feel that your achieved performance was properly acknowledged in
your company? (Kindly provide your answer within 1-2 lines. Your answer can be
positive, negative, or neutral.)
# importing libraries
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from sklearn.model_selection import train_test_split
import pandas as pd
119
precision, recall, f1, support = precision_recall_fscore_support(y_test, y_pred,
average=None)
120
plt.figure(figsize=(10, 6))
plt.subplot(1, 2, 1)
plt.hist(new_comments_df['Sentiment_Polarity'], bins=20, color='blue',
alpha=0.7)
plt.xlabel('Sentiment Polarity')
plt.ylabel('Frequency')
plt.title('Distribution of Sentiment Polarity')
plt.subplot(1, 2, 2)
plt.hist(new_comments_df['Sentiment_Subjectivity'], bins=20, color='green',
alpha=0.7)
plt.xlabel('Sentiment Subjectivity')
plt.ylabel('Frequency')
plt.title('Distribution of Sentiment Subjectivity')
plt.tight_layout()
plt.show()
121
Declaration of Honor
I thus certify that, in accordance with Bremen University of Applied Sciences
regulations, I am the only author of the present master thesis and that I have
completed all work related to the master thesis on my own.
No other examining authority has received the master thesis. The work was
submitted in both printed and electronic formats.
122