Academia.eduAcademia.edu

MEASURING PRIVACY IN ONLINE SOCIAL NETWORKS

2015, IJSPTM

https://doi.org/10.5121/ijsptm.2015.4201

Online Social Networking has gained tremendous popularity amongst the masses. It is usual for the users of Online Social Networks (OSNs) to share information with friends however they lose privacy. Privacy has become an important concern in online social networks. Users are unaware of the privacy risks involved when they share their sensitive information in the network.[1] One of the fundamental challenging issues is measurement of privacy .It is hard for social networking sites and users to make and adjust privacy settings to protect privacy without practical and effective way to quantify , measure and evaluate privacy. In this paper, we discussed Privacy Index (PIDX) which is used to measure a user's privacy exposure in a social network. We have also described and calculated the Privacy Quotient (PQ) i.e. a metric to measure the privacy of the user's profile using the naive approach. [2] The users should be aware of their privacy quotient and should know where they stand in the privacy measuring scale. At last we have proposed a model that will ensure privacy in the unstructured data. It will utilize the Item Response Theory model to measure the privacy leaks in the messages and text that is being posted by the users of the online social networking sites.

International Journal of Security, Privacy and Trust Management (IJSPTM) Vol 4, No 2, May 2015 MEASURING PRIVACY IN ONLINE SOCIAL NETWORKS Swathi Ananthula, Omar Abuzaghleh, Navya Bharathi Alla, Swetha Prabha Chaganti, Pragna chowdary kaja, Deepthi Mogilineedi Department Of Computer Science and Engineering, University Of Bridgeport Bridgeport, USA ABSTRACT Online Social Networking has gained tremendous popularity amongst the masses. It is usual for the users of Online Social Networks (OSNs) to share information with friends however they lose privacy. Privacy has become an important concern in online social networks. Users are unaware of the privacy risks involved when they share their sensitive information in the network.[1] One of the fundamental challenging issues is measurement of privacy .It is hard for social networking sites and users to make and adjust privacy settings to protect privacy without practical and effective way to quantify , measure and evaluate privacy. In this paper, we discussed Privacy Index (PIDX) which is used to measure a user’s privacy exposure in a social network. We have also described and calculated the Privacy Quotient (PQ) i.e. a metric to measure the privacy of the user’s profile using the naive approach. [2] The users should be aware of their privacy quotient and should know where they stand in the privacy measuring scale. At last we have proposed a model that will ensure privacy in the unstructured data. It will utilize the Item Response Theory model to measure the privacy leaks in the messages and text that is being posted by the users of the online social networking sites. KEYWORDS Online Social Security (OSN), Privacy Measurement, Privacy Index 1. INTRODUCTION Information diffusion consists of a process in which a new thought or activity spreads through communication channels. Online Social Networks are the most used means for this nowadays [3]. Sociologists, marketers, and epidemiologists widely studied this area. The OSN alludes to a network of independent IT consultants that utilize the network for an assortment of purposes, for example, information sharing, publicizing new opportunities, finding new companions. The OSN might likewise include companies that wish to make utilization of the services given by the consultants. Clearly, such companies ought to have a selective access to OSN resources. Hubs can likewise form smaller networks or groups. In the figure 1, the OSN is characterized as a directed labeled graph, where every hub resembles a network member and edges signify relationships between two separate members. Specifically, the initial hub of an edge indicates the member who established the relationship, while the terminal hub signifies the member who accepted to establish the relationship. Each edge is named by the sort of the relationship established and the corresponding trust level, representing how much the user that established the relationship trust the other user concerning that specific relationship. The OSN model interpreted in the figure comprises of three companies i.e. C1, C2 and C3, whereas the remaining hubs represent agents. [4] DOI : 10.5121/ijsptm.2015.4201 1 International Journal of Security, Privacy and Trust Management (IJSPTM) Vol 4, No 2, May 2015 E F A B C1 C3 C2 Figure 1: OSN model The principle importance of an OSN is to make relationships with different users and accomplish such relationships for allocating resources of different nature. So, it is acknowledged that any access control model for OSNs ought to be relation-ship based[5]. 2. SECURITY ISSUES IN ONLINE SOCIAL NETWORKS Clients give an astounding measure of individual data intentionally, and OSN administration suppliers store this data .Three primary people interact with each other in an OSN: the service provider, the users, and third-party applications. 2.1 Breaches from Service Providers: OSN’s available client–server architecture intrinsically directs that clients must trust service provider to ensure all the individual data they've uploaded. However, service providers can clearly advantage from inspecting and imparting this data — for promoting purposes, for instance. Since service providers have the ability to utilize such data anyway they wish, researchers have raised genuine concerns and have endeavored to change this power imbalance. Researchers have proposed alternative OSN architectures as defenses. Scientists have proposed different option OSN architectures as resistances. [6] These proposition propose that end users ought to manage the fine-grained strategies with respect to who may see their data.[7] Some Works were done by Service Providers in the area of protecting private information and enhancing security features with in social network sites. The most important solution is Locker. There are two observations under this solution. Firstly, we must use social relationships to describe access control policies. [8] Secondly, we must separate social networks from their content delivery and sharing. The idea is that Lockr proposals social network users' access control of their distribution data by hiding and mapping them into third-party storage. For example, images could be hidden in a storage server like Picasa6. The central idea with the Lockr extension is the necessity to be dependent on trusted third party storage for the hidden information. 2.2 Breaches from Other Users: OSNs encourage interaction among friends. While satisfying this reason, service providers shield clients' security from unsubstantiated access. As an exchange off, all major OSNs let a client's friend get to the individual data the friend has transferred to his or her profile of course, while blocking other users from doing as such. Here, the approach of "friends" in an OSN is just a social connection that the two clients have consented to build in that OSN, paying little respect to the real disconnected from the net relationship. This disparity gives a potential channel to taking 2 International Journal of Security, Privacy and Trust Management (IJSPTM) Vol 4, No 2, May 2015 individual data by become a close acquaintance with clients in OSNs.For instance, 75,000 out of 250,000 irregular Facebook clients reached utilizing a programmed script acknowledged the script's appeal to turn into a Facebook friend. [9] Leyla Bilge and her associates have displayed two more-complex attacks.6 The first assault is called same-site profile cloning. An attacker copies a client's profile in the same OSN and uses the duplication to convey companion solicitations to the client's friend. Accepting the appeal has originated from a recognizable individual, the unalerted companions can acknowledge it and in this manner uncover their own data to the aggressor. The second assault is cross-site profile cloning. The aggressor identifies a client from OSN An, alongside this present client's companion list. The assailant then copies the profile to OSN B, where the client hasn't yet joined, and sends companion asks for on OSN B to the target's companions who have likewise enlisted on OSN B. Cross-site profile cloning is conceivably a larger number of hazardous than same-site cloning in light of the fact that its less inclined to stimulate suspicion. [10][11] At present, no guard can secure counter to such attacks. On the other hand, Leyla Bilge and her partners recommend expanding clients' readiness concerning their acknowledgement of companion requests.6 Also, enhancing the quality of Captcha can help avoid huge scale profile-cloning assaults utilizing robotized scripts. 2.3 Breaches from Third-Party Applications: As OSNs grow their administrations, third-party applications are prospering in light of client requests for extra functionalities. Despite the fact that these applications live on the OSN stage, an outsider creates them, so they're basically untrusted. Also, clients must allow the application access to their own information before they can introduce those applications, in light of the fact that such get to is fundamental for a few applications to perform their usefulness. For instance, a horoscope application must know the client's birthday. Shockingly, neither the service provider nor the clients know precisely which bit of data is genuinely important for the applications. Thus, they must trust the applications to effectively proclaim the data they require. What's more, the component to screen how the applications control the individual data is missing. [12]This invites the applications to abuse that data. 3. RELATED WORK 3.1 Social Network Privacy Measurement Techniques 3.1.1 Privacy Quotient: The unstructured data pose a problem for privacy score evaluation. The focus here is to evaluate a user’s privacy risks in exchanging, sharing, publishing, and disclosing unstructured data – namely, text messages. A text message may contain sensitive information about user. The message is first checked for any sensitive information such as the user’s phone numbers, address, email, or location. The message is then classified as sensitive or non-sensitive by means of a naïve binary classifier. [13].Each sensitive part of the message is treated as an “item” that has some sensitivity. If a particular user j has shared information about the profile item i, then R(i,j)=1 i.e the information i is made public by the user j. If a particular user j has not shared information about the profile item i, then R(i,j)=0 i.e the information i is made private by the user j. Privacy quotient can be measured on two parameters i.e the sensitivity of the information and the visibility of the information. [14] 3 International Journal of Security, Privacy and Trust Management (IJSPTM) Vol 4, No 2, May 2015 A.Calculation Of sensitivity Sensitivity is the property of an information which makes it private. As sensitivity increases, privacy risks involved in sharing the item also increases. Hiding of such kind of information makes the user more private. Sensitivity (βi) of an item i can be calculated using the formula where |Ri| = j R(i,j) i.e the summation of all the cells of the column of profile item i where it has been made public. Figure 7 explains the same. On the basis of the data collected we have calculated the sensitivity of the profile items. The following table illustrates the same. [15] B. Calculation of visibility Visibility is the property of information that captures the popularity of an item in the network. The wider is the spread of information the more visible it is. V(i,j) i.e the visibility of a profile item i by the user j is calculated as; V(i,j) = Pr[R(i,j)=1]X1+ Pr[R(i,j)=0] X 0; V(i,j) = Pr[R(i,j)=1] + 0; V(i,j) = Pr[R(i,j)=1]; where Pr[R(i,j)=1] = Probability that the value of R(i,j)=1; and Pr[R(i,j)=0] = Probability that the value of R(i,j)=0; C. Calculation of privacy quotient If βi is the sensitivity of the profile item i and V(i,j) is the visibility of the profile item i for a user j. PQ(i,j) is the privacy quotient for a profile item i for user j and is calculated as; [16] PQ(i, j) = βi ∗ V (i, j) To calculate the overall privacy quotient of the user j PQ(j) = ∑ PQ(i, j) = ∑ βi ∗ V (i, j) where the range of items i.e. i varies from 1 ≤ i ≤ n. 3.1.2 Privacy Armor - The Proposed Model To Ensure Privacy In Unstructured Data We now propose a model that will measure privacy in the unstructured data in OSNs. The status updates, tweets and posts are all unstructured in nature. To calculate the percentage of privacy leaks in such data sets we have proposed Privacy Armor- a model that will warn the users if they are intentionally or unintentionally sharing some sensitive content online. [17] 4 International Journal of Security, Privacy and Trust Management (IJSPTM) Vol 4, No 2, May 2015 Figure 2: Proposed model of Privacy Armor 3.1.2.1 Crowdsourcing and Data Collection: Initially using the crowdsourcing method we will gather the information about the items being shared on the user’s profile. If they have willingly shared the data we will take it as 1 otherwise we will consider it as private entry and mark it as 0. The resultant will be a N X n dichotomous response matrix. Where N will be the number of users and n will be the number of profile items. [18] 3.1.2.2 Selecting a Model and calculating the Privacy Quotient The advantage of naive approach is that it is a fairly simple approach and one can follow it easily as it has more practical implications. The disadvantage is that the sensitivity values obtained are significantly biased by the user population. If the users by nature are introverts and do not like to share a lot of information then the estimated sensitivity will be high, on the other hand if the users of the group are extrovert then the sensitivity will be low [9]. The real world data is too messy to fit the data effectively hence Liu et al have calculated the privacy scores by choosing the Item Response Theory model. To measure some trait of a person, there has to be a measurement scale [16]. An assumption that is made here is that every individual has some attitude, i.e., either the individual is an extrovert or an introvert. So users will have some attitude score that will place them somewhere on the attitude scale. This is denoted by θ. The probability that the jth individual having an attitude of θ will share their sensitive content i is denoted by P(θij ). If we plot a graph with θ on the x axis and P(θij ) on the y axis. It will a smooth S shaped curve that is called as the Item Characteristic Curve. [19] This curve has two properties, one is the sensitivity that is denoted by β and the other is the discrimination constant denoted by α. Privacy quotient can be calculated as stated in equation 4. Using the item response theory model the V(i,j) is calculated as where βi is the sensitivity of the ith profile item, αi is the discrimination constant of the ith profile item, θj is the ability of the jth user. The calculated values of the parameters like the sensitivity, visibility are highly intuitive. This computation can also be parallelized using the Map Reduce technique, which can thereby increase the performance of the algorithm as well. After calculating the sensitivity and visibility. We can compute the privacy quotient of each of the users using the 5 International Journal of Security, Privacy and Trust Management (IJSPTM) Vol 4, No 2, May 2015 equation 4. Sharing of messages in the form of status updates, tweets etc is very common now-adays. Such information may contain some sensitive information about the user [17] [21]. Some users intentionally share it whereas some users are not aware of the privacy risks that follows. Privacy armor will warn such users and send them an alert showing the privacy leakage percentage. In Figure 10 the message posted by the user is first analyzed by the privacy armor to check for any sensitive information such as their phone numbers, email, address, location etc. By making use of a binary classifier the posts are either classified as sensitive or not sensitive. Also a percentage of privacy leakage is shown to the users. Privacy leakage is calculated as Where σ is kσi here k are number of sensitive items in the post and σi is the sensitivity of the i th profile item. β is the total sensitivity of all the n items. ϑ is the percentage of privacy leakage. For eg: If the user shares “Having lunch with Congress supporters”. [22] Here the user is sharing the political view. A certain amount of privacy leak is associated with this post which can be calculated using equation 6. As calculated by the naive approach the sensitivity of political views is .6833 and the overall sensitivity is 4.183. ϑ = (.6833/4.183 )*100; ϑ = 16.33 %; Privacy leakage associated with this post is 16.33 %. People with low privacy quotient, are likely to share the information without considering the privacy risk. Sharing information with such users is often risky and will have a high percentage of privacy leaks. 3.1.3 Privacy Index (PIDX): Privacy Index (PIDX) is used to describe an entity’s privacy exposure factor based on known attributes in actor model. We expand PIDX to social network model to measure an actor’s privacy exposure to another. [23][24] Privacy Index PIDX(i,j) is used to describe actor ’s privacy exposure to based on ′ visible attributes to . High PIDX value indicates high exposure of privacy. Privacy Index is between 0 and 100. PIDX value can be used for privacy monitoring and risk control.[23][25] PIDX is defined as the ratio of the sum of the privacy impact factors of the published items, set K, to the sum of the privacy impact factor of all the items, set I. The privacy index PIDX computation for the user is the same as the computation of a message’s privacy leakage of Privacy Quotient because sensitivity of an item i is Si = βi. [25] 3.2 SONET model: An online social network may attract millions of clients. These clients are linked together through ties with friend. Every client can be described by client profile, privacy settings, and a friend list. A profile comprises of personal information of a client. Privacy settings describe how clients need to convey their own information. A friend list includes a group of individuals who are connected together. The friend list can be further classified as different groups, for example, friends, friends of friends, or public, etc. Privacy settings can thus be characterized according to the groups. [26] SONET model provides an effective and practical way to model privacy in social networks. Figure 1 shows a mapping between a social network and the SONET model. Only two users are demonstrated in the figure. [27] 6 International Journal of Security, Privacy and Trust Management (IJSPTM) Vol 4, No 2, May 2015 Figure 3: SONET Model for Social Networks In SONET model, clients are represented as actors. Profile is depicted by their attribute list. The attributes are further extended with hidden data and virtual attributes. Friend list is depicted by a degrees of separation function ℎ( , ) . Privacy settings are depicted by a attribute visibility function ( , ) which gives back a numeric value to show that if a specific attribute is visible to another actor. PIDX can accordingly be assessed to reflect a client's privacy presentation. Further, SONET model also supports events, for example, attribute value changes, privacy settings update, and friend add/delete. SONET model can be utilized to stimulate privacy changes in an social network and assess privacy affect in case clients accounts are compromised. 3.3 PrivAware: PrivAware is a tool to detect and report unintended information loss in social networks. PrivAware calculated privacy score based on the total number of attributes visible to the third party applications to the total number of attributes of a user. It does not consider sensitivity of an attribute. 3.4Privometer Privometer is used to measure the amount of sensitive information leakage in a user’s profile. The leakage is indicated by a numerical value. The model of Privometer also considers substantially more information that a potentially malicious application installed in the user’s friend realm can access. [28] 4. CONCLUSION Privacy measurement is a perplexing concern in social networks. In this paper, we cover SONET model to support privacy measurement in social networks. [28] We propose to use PIDX(x, y) to measure actor y ’s privacy exposure to x . We Measure Privacy by taking sensitivity and visibility attributes in to consideration. SONET model gives experimental and efficient way for online users and social networking sites to measure privacy. 7 International Journal of Security, Privacy and Trust Management (IJSPTM) Vol 4, No 2, May 2015 REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] R. Gross and A. Acquisti, “Information revelation and privacy in online social networks,” in Proceedings of the 2005 ACM workshop on Privacy in the electronic society. ACM, 2005, pp. 71–80. L. Backstrom, C. Dwork, and J. Kleinberg, “Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography,” in Proceedings of the 16th international conference on World Wide Web. ACM, 2007, pp. 181–190. J. DeCew, “Privacy,” in The Stanford Encyclopedia of Philosophy, E. N. Zalta, Ed., 2012. [4] Y. Altshuler, Y. Elovici, N. Aharony, and A. Pentland, “Security and privacy in social networks.” Springer, 2013. M. Huber, M. Mulazzani, E. Weippl, G. Kitzler, and S. Goluch, “Exploiting social networking sites for spam,” in Proceedings of the 17th ACM conference on Computer and communications security. ACM, 2010, pp. 693–695. P. Gundecha and H. Liu, “Mining social media: A brief introduction.” B. Fung, K. Wang, R. Chen, and P. S. Yu, “Privacy-preserving data publishing: A survey of recent developments,” ACM Computing Surveys (CSUR), vol. 42, no. 4, p. 14, 2010. R. Agrawal and R. Srikant, “Privacy-preserving data mining,” in ACM Sigmod Record, vol. 29, no. 2. ACM, 2000, pp. 439–450. K. Liu and E. Terzi, “A framework for computing the privacy scores of users in online social networks,” in Data Mining, 2009. ICDM’09. Ninth IEEE International Conference on. IEEE, 2009, pp. 288–297. L. Fang and K. LeFevre, “Privacy wizards for social networking sites,” in Proceedings of the 19th international conference on World wide web. ACM, 2010, pp. 351–360. A. Braunstein, L. Granka, and J. Staddon, “Indirect content privacy surveys: measuring privacy without asking about it,” in Proceedings of the Seventh Symposium on Usable Privacy and Security. ACM, 2011, p. 15. S. Guo and K. Chen, “Mining privacy settings to find optimal privacyutility tradeoffs for social network services,” in Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom). IEEE, 2012, pp. 656–665. J. L. Becker and H. Chen, “Measuring privacy risk in online social networks,” Ph.D. dissertation, University of California, Davis, 2009. J. Anderson, “Privacy engineering for social networks,” 2013. H. R. Lipford, A. Besmer, and J. Watson, “Understanding privacy settings in facebook with an audience view.” UPSEC, vol. 8, pp. 1– 8, 2008. F. Drasgow and C. L. Hulin, “Item response theory,” Handbook of industrial and organizational psychology, vol. 1, pp. 577–636, 1990. H. Mao, X. Shuai, and A. Kapadia, “Loose tweets: an analysis of privacy leaks on twitter,” in Proceedings of the 10th annual ACM workshop on Privacy in the electronic society. ACM, 2011, pp. 1–12. J. Becker, “Measuring Privacy Risk in Online Social Networks,” Design, vol. 2, p. 8, 2009. E. M. Maximilien, T. Grandison, T. Sun, D. Richardson, S. Guo, and K. Liu, “Privacy-as-a-Service : Models , algorithms , and results on the facebook platform,” in Web 2.0 Security and privacy workshop, 2009. K. U. N. Liu, “A Framework for Computing the Privacy Scores of Users in Online Social Networks,” Knowl. Discov. Data, vol. 5, no. 1, pp. 1–30, 2010. N. Talukder, M. Ouzzani, A. K. Elmagarmid, H. Elmeleegy, and M. Yakout, Privometer: Privacy protection in social networks, vol. 1, no. 2. VLDB Endowment, 2010, pp. 141–150. E. M. Maximilien, T. Grandison, T. Sun, D. Richardson, S. Guo, and K. Liu, “Privacy-as-a-service: Models, algorithms, and results on the facebook platform,” in Proceedings of Web, 2009, vol. 2. C. Akcora, B. Carminati, and E. Ferrari, “Privacy in Social Networks: How Risky is Your Social Graph?,” in 2012 IEEE 28th International Conference on Data Engineering, 2012, pp. 9–19. J. Bonneau and S. Priebusch, “The Privacy Jungle : On the Market for Data Protection in Social Networks,” in The Eighth Workshop on the Economics of Information Security, 2009, pp. 1–45. R. N. Kumar and Y. Wang, “SONET: A SOcial NETwork Model for Privacy Monitoring and Ranking,” in The 2nd International Workshop on Network Forensics, Security and Privacy, 2013. Y. Wang and R. N. Kumar, “Privacy Measurement for Social Network Actor Model,” in The 5th ASE/IEEE International Conference on Information Privacy, Security, Risk and Trust, 2013. 8 International Journal of Security, Privacy and Trust Management (IJSPTM) Vol 4, No 2, May 2015 [26] M. S. Ackerman, L. F. Cranor, and J. Reagle, “Privacy in ecommerce: examining user scenarios and privacy preferences,” in Proceedings of the 1st ACM conference on Electronic commerce, 1999, vol. 99, no. 1998, pp. 1–8. [27] A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee, “Measurement and analysis of online social networks,” Proc. 7th ACM SIGCOMM Conf. Internet Meas. IMC 07, vol. 40, no. 6, p. 29, 2007. [28] L. Sweeney, “Uniqueness of simple demographics in the U. S. population,” in Data privacy Lab white paper series LIDAP-WP4, 2000. 9