Academia.eduAcademia.edu

The Effect of Bad Password Habits on Personal Data Breach

International Journal of Emerging Trends in Engineering Research

Users tend to utilize bad or weak passwords with memorable characteristics such as simple words from the dictionary and easy to remember sequence of numbers from birthdays. Poor or bad password habits lead to compromise of personal data privacy and allow hackers to gain unauthorised access to these passwords and use them for criminal and fraudulent cyber activities. The purpose of this research is to examine the impact of password habits among Malaysians on their personal data breaches. This study provides insights into the behaviour of users concerning their passwords use. This study used a positivism paradigm and applied a quantitative approach and used a convenience sampling technique to collect data from 297 respondents from Malaysian nationals. IBM SPSS AMOS 24 is used to conduct the analysis. The result from the study shows that "lacking the use of second-factor authentication' have a significant and a positive impact on the personal data breaches. Based on this finding, it can be concluded that the lack of second-factor authentication is an essential factor that significantly impacts personal data breach. This finding provides a different perspective from the usual connection of bad password habits of weak password length and combination, easy to guess the password and common password reuse to be the main contributing factor of the personal data breaches by the previous literature. The contribution of this research is the provision of empirical evidence that emphasise the need to continually beef up own security by correctly using second-factor authentication across individual accounts. Doing so is crucial to curb personal data breaches.

ISSN 2347 - 3983 Volume 8. No. 10, October 2020 Praveen Raj Santhira Rajah et al., International Journal of Emerging Trends in Engineering Research, 8(10), October 2020, 6950 - 6960 International Journal of Emerging Trends in Engineering Research Available Online at http://www.warse.org/IJETER/static/pdf/file/ijeter538102020.pdf https://doi.org/10.30534/ijeter/2020/538102020 The Effect of Bad Password Habits on Personal Data Breach 1 Praveen Raj Santhira Rajah1, Omkar Dastane2, Kinn Abass Bakon3, Zainudin Johari Anglia Ruskin University, United Kingdom, [email protected] 2 Curtin University, Malaysia, [email protected] 3 The National University of Malaysia (UKM), Malaysia, [email protected] 4 FTMS College, Malaysia, [email protected]  under the responsibility of the controller or owner of the data. The term data breach existed even before companies or associations begin storing data which are protected and confidential digitally. Data breaches have existed as long as individuals or companies store or maintain records of private information in any form, including paper [2]. ABSTRACT Users tend to utilize bad or weak passwords with memorable characteristics such as simple words from the dictionary and easy to remember sequence of numbers from birthdays. Poor or bad password habits lead to compromise of personal data privacy and allow hackers to gain unauthorised access to these passwords and use them for criminal and fraudulent cyber activities. The purpose of this research is to examine the impact of password habits among Malaysians on their personal data breaches. This study provides insights into the behaviour of users concerning their passwords use. This study used a positivism paradigm and applied a quantitative approach and used a convenience sampling technique to collect data from 297 respondents from Malaysian nationals. IBM SPSS AMOS 24 is used to conduct the analysis. The result from the study shows that “lacking the use of second-factor authentication’ have a significant and a positive impact on the personal data breaches. Based on this finding, it can be concluded that the lack of second-factor authentication is an essential factor that significantly impacts personal data breach. This finding provides a different perspective from the usual connection of bad password habits of weak password length and combination, easy to guess the password and common password reuse to be the main contributing factor of the personal data breaches by the previous literature. The contribution of this research is the provision of empirical evidence that emphasise the need to continually beef up own security by correctly using second-factor authentication across individual accounts. Doing so is crucial to curb personal data breaches. A group of characters, symbols and numbers used for authentication, to gain access to a source or prove the identity of oneself is known as a password [3]. Poor or bad password habits are prone to lead to compromise of personal data privacy, criminal and fraudulent activities over online cyberspace [4]. It is typically common for users to follow bad or weak password security practices; this may result in their accounts being vulnerable or exposed to attack [5]. Users tend to utilize bad or weak passwords with essential characteristics for example using simple words from the dictionary, easy to remember number sequence, i.e. birthday dates or month and year [6]. It is common for a user to pick easy to remember password and yet meeting minimum password complexity as required by websites using a combination of name, date of birth or simple dictionary. This example illustrates how easily these passwords can be targeted for compromise. Users apply this bad password habit in their multiple accounts [5]. Password Fatigue is another challenge that tends to affect many users due to the hardship in having to facilitate and remember multiple and numerous passwords [8]. It is a common and well known practise by attackers on the Internet who attempt to access other user accounts to commonly guess passwords or by retrieving data from a particular user, for example, his favourite soccer team, usually this data or information is easy to obtain from the user’s social networking sites like Facebook, Instagram or LinkedIn [9]. A research performed on American consumers showed that 61% respondents are likely to have a tendency to reuse similar passwords across multiple websites and 54 % were found to have less than five passwords [10]. Key words: Password habits, Bad passwords, Data breach. 1. INTRODUCTION The European Data Privacy Services (EDPS) [1] defined personal data breach as a violation or breach of security that lead towards accidental or lawful damage, modification, destruction, loss or unauthorized disclosure or access to personal or individual data transmitted, processed or stored Attackers and intruders are most often seen to exploit vulnerabilities to compromise a system account or access [11]. Vulnerability or weakness like weak passwords can be 6950 Praveen Raj Santhira Rajah et al., International Journal of Emerging Trends in Engineering Research, 8(10), October 2020, 6950 - 6960 associated with either an internal or external security threat [12]. The multiple forms of password attacks are brute force attacks, e.g. looking or searching for poor hashes to crack weak passwords, dictionary attack and the case of rainbow tables attack to generate information data upfront to enable looking up for hashes [13]. Verizon in 2017 Data Breach Investigations Report (DBIR) highlighted that over 81% of current data breaches are attributed to hacked, stolen or weak passwords [14]; [15]. Why this problem or issue is relevant now is because data breaches are becoming frequent. We have heard of some of the most significant data breaches involving Yahoo with 3 billion records in 2013, Facebook in 2019 involving 540 million users and FriendFinder networks in 2016 with 412 million accounts [16]. This has become an important research topic and password habits in particular require attention of researchers. having at least eight characters. Recent research has shown that focus on increasing password length is a more promising alternative than password complexity with minimal password length [18]. B. Easy to Guess Password Dictionary SANS Institute, [17] has defined that weak password as also having a word or more that can be obtained in a dictionary either in English or in other foreign languages. Narayanan & Shmatikov [19] explained that the distribution of letters in easily formed guessable passwords are likely to be similar to the distribution of letters in a user’s language is native to that person. Thus, this becomes relatively simple when trying to exploit easy to guess passwords in a dictionary attack. Passwords with easily guessable diction are known to contain major weakness or vulnerability due to the possibility of automating scanning software that can be programmed to run ordered and systematic dictionary attacks [20]. The purpose of this research is to study and examine the impact of password habits among Malaysians on the personal data breach. This research will provide insights on the behaviour of users handling his or her online password security and primarily will be focusing on the context of Malaysian users. The objectives of this research are to examine; (1) the impact of the weak password length and combination on personal data breach; (2) the impact of easy to guess password dictionary on personal data breach; (3) the impact of everyday password use on personal data breach; (4) the impact of lacking use of second-factor authentication on personal data breach. Due to the research objectives above, the research questions for this study are as follows; (1) What is the impact of weak password length on personal data breach?(2). What is the impact of easy to guess password dictionary on personal data breach? (3) What is the impact of password use on personal data breach? (4) What is the impact of lacking use of second-factor authentication on personal data breach? C. Common Password Use Common password use is the scenario when users maintain the same passwords between multiple websites or account logins to cope with difficulties or problem to remember too many passwords [5]. In current society, the required number of passwords or protected accounts with a password is growing increasingly. This cause’s limitation to human memory capacity and capability to remember a multitude of passwords; thus, this behaviour tends to lead to the symptom of password reuse [21]. Password reuse brings on a form of security weakness and vulnerability as it enables intruders or attackers who have successfully compromised or exploited one of the victim’s passwords. This gives the attacker an advantage to use the same compromised password in other protected accounts or website login, which are now an easy target for personal breach [22]. 2. REVIEW OF KEY THEORIES AND CONCEPTS D. Lack of use of second-factor authentication 2.1. Definition of Key Concepts and Terms Second-factor authentication is a technology for consumers which have been around for a long time to improve digital security either optional or mandatory depending on the environment it is used for [23]. Two-factor authentication is defined as a means to authenticate users using two separate sets of information or way of identification, the first factor is normally your standard account password and the second factor being a one-time password generated from a third-party authenticator mechanism either soft or hard tokens. 2.1.1. Password Habits Password habits can be categorized and classified into four key concepts which are weak password length, easy to guess password dictionary, common password reuse and lacking use of second-factor authentication. A. Weak Password Length and Combination 2.1.2. Personal Data Breach Password length is the right size of character sets that are measured or calculated to the proportional length of characters [5]. SANS Institute [17] being a renowned security research institute had described that a weak password is termed as having characteristics of containing less than fifteen characters. Most global organizations or standard practices by entities are inclined to setting passwords to The European Data Privacy Services (EDPS) [1] has defined personal data breach as a violation or breach of security that lead towards accidental or lawful damage, modification, destruction, loss or unauthorized disclosure or access to personal or individual data transmitted, processed or 6951 Praveen Raj Santhira Rajah et al., International Journal of Emerging Trends in Engineering Research, 8(10), October 2020, 6950 - 6960 stored under the responsibility of the controller or owner of the data. It can also be related to the personal breach of the three known security principles which are a breach of ‘Confidentiality’, ‘Availability’ and/or ‘Integrity’. The breach may occur due to several possible reasons either through negligence, as a result of an accident or due to intentional wrongdoing or act by a person or threat actor [1]. There are laws enacted around the globe to protect personal or private data. In Malaysia, the Personal Data Protection Act 2010 came into full effect on Nov 15, 2013, which is intended to prevent misuse of individual’s personal data for wrongful intention or commercial purposes [24]. The first-ever known data breach was known publicly in 2004, involving an AOL worker being arrested in stealing his company’s subscriber list for selling it for personal financial gain [25]. In 2005, the first data breach to have compromised more than 1 million records of credit card numbers about DSW Shoe retail warehouse [26]. Statista [27] released statistics that the increase of personal data breaches over the years since 2005 have been significantly on the rise with 157 million records reported in 2005 and highest in 2017 with 1.6 billion records of personal data breaches. information is processed, this relationship may reflect the high or low abstract levels of construing the objects or events [35]. It is, therefore, a trade-off between feasibility and desirability, whereby strong emphasis is given on desirability while considering events in the distant future as compared to a stronger emphasis on feasibility while considering in the near future [36]. Thus, it is the context of password management or habits. Users are bound to place a stronger and higher emphasis on security that is desirability or the consequences of possible future events like a data breach or leakage as compared to an expectation of users applying weaker or bad passwords with the emphasis given on feasibility. This is considering the events of the near future, as no immediate or near-future threat of such consequences [37]. 2.2.3. Agent-Based Modelling Agent-based modelling in the form of multi-agent simulation. It tries to locate explanation and insights into the collective behaviour of agents, thus conforming to the rules in its natural system [38]. Agent-based modelling specific to password habits highlights the radical difference while reviewing and validating security in practice as compared to security in the abstract. Kothari et al. [39] highlight users have tendencies to circumvent password policy security by which application of the agent-based modelling. It provides a clearer and better understanding of the aggregated security that is measured to include circumventions, risks and costs for better judgement and decisions. 2.2. Review of Key Theories 2.2.1 Markov Model Markov model is used in studying the probability of stochasticity, which is useful in modelling random change of systems or behaviour of the subject [28]. Markov model is applied to learn the probabilistic occurrence of the state of future by recognizing patterns of the present status of with sequential data statistical analysis [29]. Through the years, the Markov model is a proven model that is demonstrated in password security. Narayanan & Shmatikov [19] has displayed the ability of password habits have led to cracking and disassembling of the passwords using the Markov model. Utilizing the construction using Markov model, password strengths can be tested by understanding the insecurity and potentiality of leakage of the password itself by adding on a finite amount of noise parameter for running a test [30]. The construction of the Markov model for checking password provides high accuracy with fine measurement of its password strength [31]. Tansey [32] explained a layered approach while using the Markov model. 2.2.4. Critical Review of Theories Markov model uses the stochasticity approach, which can be exemplified in password modelling. By understanding the user behavioural approach, this amplifies a more significant result when password cracking. Though it provides high accuracy in the n-gram model, it tends to affect usability when wrongly estimated [31]. Construal Level Theory, on the other hand, is used to determine the desirability of an event in the future state similarly in the event of password breach due to malformed practices within password habits. This theory is dependent on time frame effect but has weakness like not focus on the quality of the attribute, i.e. password quality [37]. The Agent-based modelling revolves around an agent’s belief or cognitive burden [39], which is quite relevant to users’ circumvention around password security due to limitation of cognitive capacity to perform stringent and complex passwords. 2.2.2. Construal Level Theory Construal Level Theory, or widely known as CLT, was developed by Trope & Liberman [33]. It is a theory used commonly in social psychology to describe the relationship between psychological distance and the extent of people’s thinking. Psychological distance in CLT defines the spatial, temporal, hypothetical & social distance as the most common and understood dimensions [34]. Conceptual differences are referring to the data that is perceived to the mind, and perceptual differences otherwise are how the data or 2.3. Research Gap From previous research and empirical studies carried out, most studies are carried around password habits with variable factors, i.e. weak password length or combination, passwords re-use across multiple sites or easy password dictionary. They're also international and local scale studies in Malaysia concerning data breaches. The first gap identified on the 6952 Praveen Raj Santhira Rajah et al., International Journal of Emerging Trends in Engineering Research, 8(10), October 2020, 6950 - 6960 conducted a study and it revealed the top 10 list of bad or weak password lengths most common discovered with a number of credentials found in each ranking from the previous data breach of personal accounts of users involving Myspace [14]. Therefore, this study proposes this hypothesis: international level of studies, no studies are linking the password habits and its variable factors towards personal data breach. Another research gap in the local Malaysian context, is no clear and distinct studies have been made significantly on password habits and as well as its relationship towards personal data breach. As identified in the above gaps, this provides an opportunity to conduct research linking password habits towards personal data breach among Malaysians that could prove beneficial in understanding these attributes and its construct in this research. H1: Weak password length and combination has a significant positive impact on personal data breach Easy dictionary words are observed to be easily cracked or compromised by tools with dictionary listing. It is usually advised when constructing passwords to obfuscate dictionary words with numerals or special characters such as ‘Winter’ to ‘W1nt3r’ [43]. Based on empirical research conducted, common password dictionary that is considered weak are examples like using nouns, birthday dates, family, pet names, or even anniversary date [44]. In the past, a comparative analysis was performed based on different sets of password attacks. One of the main breach issues is dictionary attack versus other sets like brute force, shoulder surfing, replay, keylogging and phishing attacks [13]). Thus, therefore this study proposes following hypothesis: 2.4. Conceptual Framework The Conceptual framework for this research displayed in Figure 1. Four critical components of password habits comprising of weak password length, easy to guess password dictionary, password reuse and lacking second-factor authentication are indicated as independent variables. A personal data breach is a dependent variable in this case. H2: Easy to guess password dictionary has a significant positive impact on personal data breach Das et al. [22] studied the password strategies for end-users who appeared or surfaced in multiple related credential leaks and estimated that 43% of passwords were re-used [22]. The research on the impact of password reuse across multiple sites and its weak practices leading towards security or personal data breaches was done by comparing same password reuse across 21 top universities in the United States [21]. An empirical study done on larger domino effects of password reuse has bigger implications. For example, a case study of the stolen or leaked credential from a rival in South Korea used to make illegal trade had amounted to $22 million in loss [45]. [46] supports the notion of the influence and impact common password reuse has on an individual’s vulnerability towards data breach. Therefore, this study proposes this hypothesis: Figure 1. Password Habits Effect on Personal Data Breach Conceptual Framework 2.5. Hypothesis Development A past study has shown that 81% of data breaches that occur are due to weak passwords; this is normally associated with user negligence [40]. A previous user study conducted based on 294 participants, 30% are reported to have at least one of their accounts breached or compromised due to poor or weak password length [18]. Ablon et al. [41]in a survey conducted across 2618 adults in the United States has stated that 26% of the participants or respondents received a notice on data breach infliction from prominent and popular online services that have implicated in a compromise like LinkedIn, MySpace, Adobe, Dropbox and Yahoo. Researchers conducting a study at Preempt Firewall Company had looked at the LinkedIn personal data breach and have discovered that 65% of the passwords leaked or compromised are attributed to weak and poor password combination and password lengths. They were discovered to be cracked easily using brute force technique from off-the-shelf cracking tools like Jack-the-Ripper and Hashcat [42]. InfoSecurity Group, [42] H3: Common Password use has a significant positive impact on personal data breach There have been known breaches in the past involving not just users but also organizations that forced the decision to move to second-factor authentication. Duo Security [47] reported that organizations or service providers like Bitly, Twitter, Buffer, Hootsuite, Tumblr and many more had turned into providing second-factor authentication not just for their internal users but their consumers too. The difference in adoptions to newer technology compared between 2010 and 2017 among US internet users to deal with newer advances in personal breaches exist [47]. It only proves from above depiction that second-factor authentication is becoming improvingly important to help users mitigate weakness in 6953 Praveen Raj Santhira Rajah et al., International Journal of Emerging Trends in Engineering Research, 8(10), October 2020, 6950 - 6960 single password authentication or shield potential breach if a single password is compromised. Second-factor authentication is elevating the security of an individual’s account as well as intended to circumvent shortfalls when the only single-factor password is used. Therefore, this study proposes this hypothesis: qualification, web surfing frequency and content, were acquired in the survey. To begin with, this survey was intended to target respondents who are Malaysians as to study the pattern of password habits among this nationality of users. From a total of 297 respondents, the majority refers to Malay ethnicity constituting of 36%, followed by Chinese 33%, Indians 26% and 5% making up the remaining. From a gender perspective, 38% or 113 respondents comprised females and 58% or 173 respondents are males. The biggest age group responding to this survey comes from the age group of 35-44 years old with a composition of 35% of total respondents or 105 out of 297. The second-largest group of respondents are from the age group of 25-34 years old, with 92 respondents, followed by 45-54 years old with 55 respondents. From the perspective of education qualification, 47% or close to half of the respondents come with Tertiary education background. The second-largest group are 81 respondents with Master’s Degree qualification. From here, we can understand the majority of the respondents are qualified with an educational background with a high literacy rate. Majority of the respondents at about 76% claim confidently to use the Internet frequently. This behaviour is essential to support the idea of the importance of password security involving online accounts that may lead to unwarranted incidents, i.e. personal data breach. 38% of respondents spend more than 8 hours, and 27% spend between 6 to 8 hours. This is relevant to show high usage of the Internet each day to support further in this research analysis. Online services content which has associated online login account, signifying the use of password required. 91% of respondents use Email as the most common online content, followed by 78% on social media and 72% online or mobile banking. This information is useful to show the distribution of online services which require password or log in, which helps in this research to understand further password habits of the respondents. H4: Lacking use of second-factor authentication has a significant positive impact on personal data breach 3. RESEARCH METHODOLOGY T his research adopted positivism research as it involves realism of participative context, which is arguably a world perception, and it is more fulfilling and certain [48]. A quantitative approach was applied in this research, and convenience sampling technique was used to collect data from 297 respondents from Malaysian nationals and between the age group of 18 to 80. The questionnaire were designed in English as well as Bahasa Melayu The first part of the questionnaire was designed to measure the demographic makeup of the respondents and the second part of the questionnaire is made up of 4 subsections which were designed to investigate the variable of (a) Weak Password Length and Combination, (b) Easy to guess password dictionary, (c) Common Password Reuse, (d) Lacking use of second-factor authentication and (e) Personal Data Breach. In total, the questions were 32. The respondents were asked to select one of these seven Likert scale ranges; “1- Strongly Disagree”, “2-Disagree”, “3-Somewhat Disagree”, “4Neither Disagree nor Agree” and 5- Somewhat Agree, 6 Agree and 7 - Strongly Agree. The four conventional statistical approaches include descriptive statistics, which is a method of describing the variables and used to measure central tendencies and variability [49]. The demographic analysis is also part of this deliverable. Next, reliability and normality testing are conducted whereby reliability assessment will look at findings to ensure stability determination and normality assessment is to test the distribution of the data according to a normal distribution [50]. Confirmatory Factor Analysis (CFA) is part of and validity assessment to address the truthfulness of the data findings. Finally, Structural Equation Model or SEM, is used whereby it combines factor and path analysis, also known as co-variance modelling structure [51]; [52]; [53]. Two main statistical tools used for analysis are IBM SPSS 24 for performing descriptive analysis, testing or assessment of reliability and normality and IBM SPSS AMOS 24 for performing the validity testing of CFA and SEM. Hypothesis testing is then conducted. 4.2. Normality Assessment The normality test if involving small to medium size samples (i.e. n < 300) would typically include formal normality test using skewness and kurtosis [54]. Skewness measures the asymmetry of the variable distribution and whereas kurtosis looks at peakedness of the distribution [55]. As a rule of thumb, Gravetter & Wallnau [56] illustrated that an acceptable range of skewness and kurtosis values are between -2 and +2. From the results for all question items fall into the range of -2 to +2, therefore rendering the results to be in an acceptable range. This further proves the fair normal distribution of the data collected for this research. 4. ANALYSIS 4.3. Reliability Assessment 4.1. Demographic Analysis Reliability testing is conducted to measure internal consistency; in this case, measuring the reliability of the questionnaire instrument using the Likert scale (1 to 7). Cronbach [57] introduced Cronbach’s Alpha test, which is In this sample size of 297 respondents, various demographic related inquiry, i.e. gender, age, ethnicity, education 6954 Praveen Raj Santhira Rajah et al., International Journal of Emerging Trends in Engineering Research, 8(10), October 2020, 6950 - 6960 mainly used for reliability testing involving the development of scales for measuring attitudes and any other development constructs [58]. Cronbach’s Alpha coefficient range was developed to measure the coefficients consistency for the items from the questionnaire [59]. An alpha value above 0.9 is deemed as excellent, whereas below 0.5 is regarded as unacceptable. Based on results, the overall measurement of Cronbach’s Alpha for this research questionnaire items is 0.755, which is acceptable. The highest is being 0.877 for the first variable, and the lowest is 0.580 for the third variable. The smallest is, however, not deemed as unacceptable for this research. The other remaining variables are above 0.7, which are acceptable based on coefficient range. As per the Dependent Variable, the Cronbach’s Alpha is rated at 0.682, which is deemed still within the safe level in the coefficient range when measuring for internal consistency and reliability. Figure 2 - CFA Path Diagram (Final Run) 4.3.4 Confirmatory Factor Analysis 4.4 Structural Equation Modelling Structural Equation Modelling or SEM is related to CFA and often is used to analyse latent constructs using one or more observed variables and provides structural modelling to impute the relationship between the latent variables [68]. Figure 3 below depicts the SEM Path Diagram with the revised mapping of the relationship with each variable. This is also in sync with the list of questionnaire items dropped to improvise the modelling. CFA is performed in two phases, namely initial run by loading all items of the questionnaire and final run after seeing the best fit of the questionnaire items and removing any redundancy or noise. According to Hair et al. [64]; [60]. [61]. [62]). [63]. acceptable fit value for goodness of fit (GFI), comparative fit index (CFI), incremental fit index (IFI) and Tucker-Lewis index (TLI) are well above 0.900. Finally, the root mean square of error approximation (RMSEA) analyzes discrepancy between hypothesized model, parameter estimates and the population co-variance with RMSEA ranging from 0 to 1 with 0.08 or less as an acceptable indication [65]. The criteria for selecting unfit questionnaire items and dropping those items based on manifest variables with loading factor or value less than 0.5 is dropped [66]. Another criterion is also to review the Modification Indices (MI). High value, MI > 15 has an indication that redundant items exist in the model; therefore, it needs to be removed [67]. Figure 2 below depicts the new CFA Path Diagram with the revised mapping of the relationship with each variable. The default model value of X2/DF or CMIN/DF above is 2.710, which is deemed acceptable as it is below the accepted value of 3.00. The GFI, CFI, IFI and TLI values in above are respectively 0.882, 0.914, 0.914 and 0.894. Two of the values are above 0.900, therefore are CFI and IFI. Consequently, they are accepted fit. However, GFI and TLI are just on the borderline to the value 0.900. Finally, the RMSEA value for CFA final run is measured at 0.076 in the table and is deemed accepted as fit since it is lesser than 0.08. The summary of results from the final run, as depicted in However, GFI and TLI are as indicated below in the borderline very close to 0.900. Figure 3 - Structural Equation Modelling Path Diagram The default model value of X2/DF or CMIN/DF above is 2.710, which is deemed acceptable as it is below the accepted value of 3.00. The GFI, CFI, IFI and TLI values in above are respectively 0.882, 0.914, 0.914 and 0.894. Two of the values are above 0.900, therefore are CFI and IFI; consequently, they are accepted fit. However, GFI and TLI are just on the 6955 Praveen Raj Santhira Rajah et al., International Journal of Emerging Trends in Engineering Research, 8(10), October 2020, 6950 - 6960 results show easy to guess password dictionary has a negative impact on the personal data breach. The p-value is rather close to 0.05 but still not being significant due to lesser than 0.05. A personal data breach will decrease by 40.3% with the relationship of easy to guess a password, which is negative. This means that respondents do not necessarily agree to that easy to guess password dictionary greatly lead to individual or personal security breach. This could be possible due to split of respondents in two extremes for questions referring to simple diction and password that is containing a name of known people with top two majorities are either agreeing or disagreeing. It is therefore not evident enough to support the positive hypothesis, which resulted in the result is otherwise. This contradicts a previous empirical research done by [44] that found common password dictionary or passwords that are considered weak and uses nouns, family name, birth dates, pet names or even anniversary date significantly affect personal that breach. borderline to the value 0.900. Finally, the RMSEA value for SEM run is measured at 0.076 in the table and is deemed accepted as fit since it is lesser than 0.08. The summary of results from the SEM model fit analysis as depicted in Table 6 below shows 4 out of 6 criteria are met and acceptable within the level. However, GFI and TLI are as indicated below in the borderline very close to 0.900. It is also indicative that both CFA and SEM results are identical. 3. 4.7 Hypothesis Testing Result Hypothesis testing is running from AMOS and the results from the regression weights and standardized regression weights. As illustrated in Table 7 below the results shows that hypothesis H1, H2 and H3 were negatively found to have a relationship with a personal data breach. However, hypothesis four (H4), was found to have a significant positive impact on the personal data breach. Table 7. Summary of Hypothesis Testing Result H1 H2 H3 H4 Dependent Variable Personal Data Breach (PDB) Personal Data Breach (PDB) Personal Data Breach (PDB) Personal Data Breach (PDB) Independent Variables Weak Password Length and Combination (WPLC) Easy to Guess Password Dictionary (EPD) Common Password Reuse (CPR) β p Decision 0.159 0.08 8 Not Supported -0.403 0.05 7 Not Supported 0.242 0.26 3 Not Supported Lack of Second Factor Authentication (SFA) 0.881 *** Supported H3: Common Password use has a significant positive impact on personal data breach This hypothesis too, is not supported. As illustrated in Table 7, the value for this hypothesis is β = 0.242 and p = 0.263 (p >0.05 which is minimum acceptable value). This implies that common password reuse is not creating a significant positive impact on personal data breach due to insignificant p-value, which is more than 0.05. In terms of the relationship between the two variables, there is an increase of only 24.2% over personal data breach when common password reuse has a positive factor of 100%. This has been proven to be the opposite of the suggested past literature that common password reuse has been a common denominator to many reported security breaches. [69] reported that not many companies or individual in Malaysia are compelled to report or disclose data breaches which now is an increasing requirement to have laws that mandate such reporting. A 2018 Global Password Security Report shows a staggering 50% of users have the tendency of using the same password for their work or personal and this was further amplified by Google research identifying 65% doing so in 2019 resulting in compromised passwords responsible for 81% of hacking-related breaches [70]. This is probably not familiar to many Malaysia users in the impact of their perception of common password reuse towards personal data breach. *p<0.05, **p<0.01, ***p<0.001 5. RESULT AND DISCUSSION H1: Weak password length and combination has a significant positive impact on personal data breach The hypothesis above is not supported, as shown in Table 7 due to β = 0.159 and p = 0.088 (p >0.05, which is the minimum acceptable value). This means weak password length and combination for Malaysian Internet users are not significant impacts towards personal data breach due to insignificant p-value. In terms of the relationship between the two variables, there is an increase of only 15.9% over personal data breach when weak password length factor has a positive factor by 100%. However, any results can not be concluded if results are not significant. There might be another statistical reason behind such findings. H4: Lacking use of second-factor authentication has a significant positive impact on personal data breach The hypothesis is Supported as, β = 0.881 and p <0.001, which is excellent. This signifies that there can be increase of 88.1% in personal data breach when lacking use of second-factor authentication is increased by 100%. Technology Visionaries [71] reported that even when users H2: Easy to guess password dictionary has a significant positive impact on personal data breach The hypothesis is not supported due to β = -0.403 and p = 0.057 (p >0.05, which is minimum acceptable value). The 6956 Praveen Raj Santhira Rajah et al., International Journal of Emerging Trends in Engineering Research, 8(10), October 2020, 6950 - 6960 authentication, it should be considered as a minimum viable security measure to securing login credentials. tend to have bad password habits which include easy to guess password or common password reuse for multiple accounts, second-factor authentication has significantly helped to protect users from stolen credentials or personal login being hacked. This is a significant insight among Malaysian respondents that second-factor authentication supersedes any other possible weak factors like password length or combination, easy to guess the password and/or common password reuse across multiple logins that have a severe impact towards personal data breach. The data represented from the standardized regression weights indicate a β value of 0.881 with desirable p-value under 0.01. This goes in line with a suggested study by Duo Security that between 2010 and 2017, US internet users are tech-savvy in moving up the advancement of second-factor authentication to deal with newer advances in personal breaches [47]. What is also implied from this research, is that most respondents in Malaysia agree to this as a single biggest factor contributing to securing from personal data breaches. This study confirms a finding by Albayram, et al., [72] that Internet users are willing to try second-factor authentication after they were exposed to both the Risk and Self-efficacy themes and correlates the experiment by Siadati et al.,[73] that found second-factor authentication prevent personal data breaches and social engineering attacks. Next, in regards to Internet Content Providers, not all websites or internet content providers with logon requirements have created a minimum viable solution using various features of second-factor authentication. This ought to be addressed by the providers to ensure that they provide a reliable and trusted avenue to their users to protect themselves from being a victim of a personal data breach. They are many free or commercial possible solutions to enable a second-factor authentication feature so that users can feel safe to use their services. Search engine providers like Google, Bing, etc. should consider tagging every single website in terms of reliability, including the availability of second-factor authentication features. This, in return, provides a confidence index to users to understand the risks of any websites which they entrust they login credentials to be safe or free from a personal data breach. Finally, looking at the role of regulations or law enforcement, there are current state-wise laws like Data Privacy Act, GDPR, National Security laws to protect the public and its people from criminalist or espionage activities like a breach of personal users. These laws should consider enforcing data processors like Internet Content Providers with stricter penal codes for enforcing second-factor authentication feature to provide public assurance of their data integrity, confidentiality and security. 6. CONCLUSION From the four proposed hypotheses, hypothesis 4 was found to have a positive impact on Personal Data Breach significantly. Based on this finding, it can be concluded that the lack of second-factor authentication is a major factor that significantly impacts personal data breach. This study provides a different perspective on the usual connection of bad password habits such as weak password length and combination, easy to guess the password and common password reuse to be the main contributing factor of personal data breach in the past literature. The contribution of this research is the provision of evidence that indicate that beefing up personal security using second-factor authentication across Internet online accounts, is very important to curb personal data breach. In regards to individual users, each individual should seriously consider enabling second-factor authentication wherever possible. This can be adopted by enforcing the use of One-Time-Pin (OTP). It can be enforced for online banking transactions, payment gateways, recovery of forgotten password, or lost account due to inactivity after a long period. For online accounts which are integrated with other second-factor mechanisms like Authenticator feature, i.e. Google, Microsoft authenticator, etc., users show start to activate or move to this feature to strengthen their credential access from password leaks of account hacks. For online accounts or websites with minimum feature like using email or other simple techniques for a way of second-factor The limitation of this research is; firstly, the limitation of the data sampling. The total; the number of respondents was 297. Future researchers should consider increasing the size of the data sample to increase the prediction power of the findings. This research only collected data from a single location in Malaysia; future researchers should try to collect data from all regions of Malaysia to provide wide geographical coverage and have a proper representation of the population. The final limitation of this study is the number of variables used in this study. Future researchers should explore variables such as back doors, application vulnerabilities, malware, social engineering, insider threats and physical attacks in their studies as that could lead to a better understanding of factor influencing data breach. REFERENCES 1. The European Data Privacy Services or EDPS (2018) [0nline] Available at https://op.europa.eu/webpub/edps/2018-edps-annua l-report/en/ [Accessed 15.02.2020] 2. J.De Groot, “The History of Data Breaches, Digital Guardian” 2019 [Online] Available at https://digitalguardian.com/blog/history-data-breac hes [Accessed 30.01.2020] 3. Liu, Z., Hong, Y., and D. Pi, D. “A Large-Scale Study of Web Password Habits of Chinese 6957 Praveen Raj Santhira Rajah et al., International Journal of Emerging Trends in Engineering Research, 8(10), October 2020, 6950 - 6960 https://www.knowbe4.com/hubfs/rp_DBIR_2017_R eport_ execsummary_en_xg.pdf [Accessed 21.02.2019] 16. K. Kiesnoski, “5 of the biggest data breaches ever” CNBC 2019, [Online] Available at https://www.cnbc.com/2019/07/30/five-of-the-bigge st-data-breaches-ever.html [Accessed 12.11.2019] 17. SANS Institute, “Password Policy” 2014 [Online] Available at https://www.sans.edu/student-files/projects/passwor d-policy-updated.pdf [Accessed 19.04.2019] 18. R. Shay, I, Ion, R.W, Reeder, and S. Consolvo, " My Religious Aunt Asked Why I Was Trying To Sell Her Viagra Experiences With Account Hijacking”. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2014, pp. 2657-2666). 19. A.Narayanan, and V. Shmatikov, “Fast Dictionary Attacks On Passwords Using Time-Space Tradeoff” 12th ACM conference on Computer and communications security, 2005, pp. 364-372 20. B. Pinkas, and T.Sander, Securing Passwords Against Dictionary Attacks, 9th ACM conference on Computer and communications security, 2002, pp. 161-170. 21. J., Abott , D.Calarco, and L. Jean Camp, “ Factor Influencing Password Reuse: A Case Study”, Conference: TPRC46: Research Conference on Communications, Information and Internet Policy, 2018 22. A.Das, J. Bonneau, M. Caesar,N. Borisov and X.Wang, “The Tangled Web of Password Reuse, In NDSS, vol. 14, pp. 23-26, 2014. 23. A.F. Pomputius, “A Review of Two-Factor Authentication: Suggested Security Effort Moves to Mandatory”, Journal Medical Reference Services Quarterly, vol. 37, no. 4, pp. 397-402, 2019. 24. BorneoPost, “Personal Data Protection Act comes into force on Nov 15”. 2013, [Online] Available at https://www.theborneopost.com/2013/11/27/person al-data-protection-act-comes-into-force-on-nov-15/ [Accessed 10.11.2019] 25. CNN “AOL Worker Arrested In Spam Scheme, CNN Money” 2004 [Online] Available at https://money.cnn.com/2004/06/23/technology/aol_ spam/ [Accessed 30.01.2020] 26. NortonLifeLock, “A Brief History of Data Breaches” 2018 [Online] Available at https://www.lifelock.com/learn-data-breaches-histo ry-of-data-breaches.html [Accessed 30.01.2020] 27. Statista, “Personal Data Breaches and disclosures”. 2017,[Online] Available at https://www.statista.com/statistics/273550/data-bre achesrecorded-in-the-united-states-by-number-of-br eaches-and-records-exposed/ [Accessed 30.01.2020] Network Users.” JSW, vol 9, no 2, pp. 293-297, 2014. 4. E.H., Spafford, “Preventing Weak Password Choices”, Computers & Security, vol.11, no 3, pp. 273-278, 1992 5. R.Wash, E, Rader, R. Berman, R., and Z .Wellmer, “Understanding password choices: How frequently entered passwords are re-used across websites”, Twelfth Symposium on Usable Privacy and Security, SOUPS, pp. 175-188, 2016. 6. J.S. Vorster and R.P., and Van Heerdeen, “A Study Of Perceptions of Graphical Passwords”, Journal of Information Warfare vol.14, no. 3, pp.75-85, 2015, 7. T.Hussain, K., Atta, N.Z Bawany, and T, Qaamr, “Passwords and User Behaviour”, Journal of Computers vol.13, no. 6, pp.692-704,2018. 8. H.Sanchez, D.Sanchez, D. and J. Murray, “Putting Your Passwords on Self Destruct Mode: Beating Password Fatigue” 2016, [Online] Available at https://www.usenix.org/system/files/conference/sou ps2016/wsf_paper_sanchez_final.pdf [Accessed 20.02.2019] 9. A.MacGibbon, and N Phair, N., “Cyber White Paper”, 2011, [Online], Available at http://www.canberra.edu.au/cis/storage/CentreforIn ternetSafetyCyberWhitePaperSubmission.pdf [Accessed 21.02.2019] 10. C. Gredler, “Consumer Survey: Password Habits” 2012, [Online] Available at https://www.csid.com/2012/09/consumer-passwordhabits-unveiled/ [Accessed 17.02.2019] 11. K.,Nayak, D., Marino, P. Efstathopoulos, and T. Dumitraş, “Some Vulnerabilities Are Different Than Others, International Workshop on Recent Advances in Intrusion Detection”, In International Workshop on Recent Advances in Intrusion Detection, Springer, Cham pp. 426-446, T, 2014 12. G.Elahi., E. Yu,. and N. Zamone, “A Vulnerability-Centric Requirements Engineering Framework” 2010, [Online] Available at http://citeseerx.ist.psu.edu/viewdoc/download?doi= 10.1.1.%20421.1569&rep=rep1&type=pdf [Accessed 21.02.2019] 13. M.Raza, M. Iqbal, M. Sharif, and W Haider “A Survey on Password Attacks and Comparative Analysis on Methods of Secure Authentication” World Applied Sciences Journal vol.19, no.4, pp.439-440, 2012. 14. K. Thomas, F.Li, A., Zand, J., Barrett, J. Ranieri, and L.Invernizzi and D. Margolis, Data breaches, phishing, or malware? Understanding the risks of stolen credentials. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security 2017, pp. 1421-1434. 15. Verizon “Data Breach Investigations Report” 2017 [Online], Available at 6958 Praveen Raj Santhira Rajah et al., International Journal of Emerging Trends in Engineering Research, 8(10), October 2020, 6950 - 6960 Available at https://www.infosecurity-magazine.com/news/linke din-breach-weak-passwords/ [Accessed 21.04.2019] 43. Forbes, “The Worst Passwords Of 2018 Show The Need For Better Practices” ,2018, [Online], Available at https://www.forbes.com/sites/kateoflahertyuk/2018/ 12/14/these-are-the-top-20-worst-passwords-of-201 8/#4b17d3e14541 [Accessed 19.04.2019] 44. J.A. Cazier and B.D. Medlin “Password Security: An Empirical Investigation Into Ecommerce Passwords And Their Crack Times”, Information Systems Security: the (ICS)2 Journal,vol.15, no.6, pp.45-55, 2006, 45. B. Ives, K. R.,Walsh and H. Schneider,“The Domino Effect Of Password Reuse.” Communications of the ACM, vol. 47, no.4, pp.75-78, 2004. 46. K. Helkala, and T.H. Bakås, “National Password Security Survey: Results”, In EISMC, pp. 23-33,2013, 47. Duo Security Submit, 2015. [Online] Available at https://duo.com/blog/2015-duo-security-summit-rec ap-adventures-in-sf 48. A.A. Aliyu, M.U,Bello, R. Kasim, and D. Martin, “Positivist and Non-Positivist Paradigm in Social Science Research: Conflicting Paradigms or Perfect Partners?” Journal of Management and Sustainability, vol. 4 no.3., pp.79-95, 2014. 49. P.Patel, “Introduction to Quantitative Methods”, In Empirical Law Seminar, 2009. 50. D. L., Altheide, and J.M Johnson “ Criteria for Assessing Interpretive Validity in Qualitative Research”, N. K. Denzin & Y. S. Lincoln (Eds.). Handbook of Qualitative Research, pp. 485-499. 1994. 51. N.J. Gogtay, and U.M Thatte, “Principles Of Correlation Analysis”. Journal of the association of physicians of India, vol.65, pp.78-81,2017 52. J.J.Hox, and T.M Bechger “An Introduction To Structural Equation Modeling” Family Science Review vol. 11,pp. 354-373, 1998. 53. R.Teclaw, M.C, Price, and K. Osatuke, “Demographic Question Placement: Effect On Item Response Rates And Means Of A Veterans Health Administration Survey”. Journal of Business and Psychology, vol.27,no. 3, pp. 281-290, 2012. 54. H.Y. Kim, “Statistical Notes For Clinical Researchers: Assessing Normal Distribution (2) Using Skewness And Kurtosis”. Restorative dentistry & endodontics, vol. 38 no.1. pp. 52-54, 2013. 55. S.G.West, J.F.Finch, and P.J. “Structural Equation Models With Nonnormal Variables: Problems And Remedies” In: Hoyle RH, editor. Structural equation modeling: Concepts, issues and 28. P.A., Gagniuc, “Markov Chains: From Theory To Implementation And Experimentation” USA, NJ: John Wiley & Sons, 2017 29. L.R., Rabiner, “A Tutorial On Hidden Markov Models And Selected Applications In Speech Recognition”, Proceedings of the IEEE , vol.77 no.2, pp.257-286, 1989. 30. C. D. Manning , and H Schutze, “Foundations Of Statistical Natural Language Processing”, MIT Press, Cambridge, MA, 1999, 31. C. Castelluccia, M. Dürmuth, and D. Perito, “Adaptive Password-Strength Meters from Markov Models”, In NDSS, 2012. 32. W.Tansey, “Improved models for password guessing” University of Texas, Tech. Rep. 2011. 33. Y. Trope, and N. Liberman, “Construal-Level Theory Of Psychological Distance”, 2010. 34. Y. Bar-Anan, N. Liberman, and Y. Trope “The Association Between Psychological Distance and Construal Level: Evidence From An Implicit Association” Test, Journal of Experimental Psychology: General, vol. 135, no.4, pp.609–622. 2006. 35. N. Liberman, and J. Förster “The Effect Of Psychological Distance On Perceptual Level Of Construal”, Cognitive Science: A Multidisciplinary Journal, vol.33, no.7, pp. 1330–1341, 2009. 36. T. Eyal, N. Liberman, Y.Trope, and E. Walther, “The pros and cons of temporally near and distant action”. Journal of personality and social psychology, vol 86. no.6 781. 2004. 37. L.Tam, M. Glassman, and M.Vandenwauver “The Psychology Of Password Management: A Tradeoff Between Security And Convenience”, Journal of Behaviour & IT, vol.29 pp. 233-244,2010. 38. V. Grimm and S.F. Railsback “Individual-based Modeling and Ecology”, Princeton University Press, 2005 39. V. Kothari, J. Blythe, S.W.Smith, and R. Koppel “Measuring The Security Impacts Of Password Policies Using Cognitive Behavioral Agent-Based Modeling” In Proceedings of the 2015 Symposium and Bootcamp on the Science of Security, 2015,pp. 13 40. TraceSecurity, “Data Breaches Due to Poor Passwords” ,2017 [Online] Available at https://www.tracesecurity.com/blog/articles/81-of-c ompany-data-breaches-due-to-poor-passwords [Accessed 23.04.2019] 41. L. Ablon, P.Heaton, D.C.Lavery and S. Romanosky “ Consumer Attitudes Toward Data Breach Notifications And Loss Of Personal Information”, In Proceedings of the Workshop on Economics of Information Security (WEIS),2016. 42. Infosecurity Group, “Linkedin Breach: Weak Passwords Are The Norm” 2017, [Online], 6959 Praveen Raj Santhira Rajah et al., International Journal of Emerging Trends in Engineering Research, 8(10), October 2020, 6950 - 6960 71. Technology Visionaries, “Why Two-Factor Authentication Can Significantly Reduce Your Chance of a Data Breach” 2019. [Online] Available at https://www.technologyvisionaries.com/two-factorauthentication-reduce-breach/ [Accessed 20.02.2020] 72. Y. Albayram, M. M. H, Khan, and M.Fagan, A study on designing video tutorials for promoting security features: A case study in the context of two-factor authentication (2FA). International Journal of Human–Computer Interaction, vol.33,no.11, pp 927-942, 2017. 73. H.,Siadati,T. Nguyen, P. Gupta, M. Jakobsson, and N. Memon “Mind your SMSes: Mitigating Social Engineering In Second Factor Authentication”. Computers & Security, pp.65, 14-28, 2017. applications, Newbery Park, CA: Sage, pp. 56-75, 2013. 56. F.J. Gravetter, and L.B. Wallnau, “Essentials Of Statistics For The Behavioral Sciences”. Cengage Learning,2020. 57. L. J., Cronbach, “Coefficient Alpha and The Internal Structure of Tests” Psychometrika, vol.16,no 3, pp. 297–334, 1951. 58. N.Schmitt, “Uses And Abuses Of Coefficient Alpha”. Psychological assessment, vol.8, no.4, 350, 1996. 59. K.S.Taber, “The Use Of Cronbach’s Alpha When Developing And Reporting Research Instruments In Science Education” Research in Science Education, vol.48, no.6, pp.1273-1296, 2018. 60. C.Huber-Carol, N. Balakrishnan, M. Nikulin, and M. Mesbah, (Eds) “Goodness-Of-Fit Tests And Model Validity”. Springer Science and Business Media, 2012. 61. P.M., Bentler “Comparative Fit Indexes In Structural Models” Psychological bulletin, vol.107 , no.2, pp.238,1990. 62. K.A. Bollen, “A New Incremental Fit Index For General Structural Equation Models” Sociological Methods & Research, vol.17, no.3, pp 303-316, 1989. 63. D.A.,Kenny, , “Measuring Model Fit”, 2015 [Online] Available at http://davidakenny.net/cm/fit.htm 64. J. F., Hair Black, W. C., Babin, B. J., Anderson, R. E., & Tatham, R. L. “SEM: An Introduction”. Multivariate Data Analysis: A Global Perspective”, vol.5, no.6,pp. 629-686. (2010). 65. T. Brown, “Confirmatory Factor Analysis for Applied Research”, New York London: The Guilford Press T., 2015. 66. W.W., Chin “The Partial Least Squares Approach To Structural Equation Modeling”. Modern Methods For Business Research, vol.295, no.2, pp. 295-336, 1998. 67. L. M., Ahmad, S. A. Ahmad and Z. Smith, “U.S. Patent No. 9,299,238.” Washington, DC: U.S. Patent and Trademark Office,2016. 68. M.C., Shelley, “Structural Equation Modeling” Encyclopedia of Educational Leadership and Administration,2006. 69. TheStar, “Making It Mandatory To Declare Data Breaches”.2018.[Online] Available at https://www.thestar.com.my/tech/tech-news/2018/0 7/02/making-it-mandatory-to-declare-data-breaches / [Accessed 15.02.2020] 70. HelpNetSecurity, “The Password Reuse Problem Is A Ticking Time Bomb”, 2019. [Online] Available at https://www.helpnetsecurity.com/2019/11/12/passw ord-reuse-problem/ [Accessed 10.02.2020] 6960