Academia.eduAcademia.edu

Privacy in Times of Digital Communication and Data Mining

2014, Anglistik

Big data' is a "catch-phrase, used to describe a massive volume of both structured and unstructured data […]". 'Big data analytics' "refers to the process of collecting, organizing and analyzing large sets of data to discover patterns and useful information" (webopedia 2014). 2 CeBIT is an "acronym for 'Centrum für Büroautomation, Informationstechnologie und Telekommunikation', which would literally translate as 'Center for Office Automation, Information Technology and Telecommunication'" (wikipedia 2014).

11 DANIELA WAWRA, Passau Privacy in Times of Digital Communication and Data Mining 1. The Challenge of Datability Privacy has become a major topic in societal (media) discourses in the United States and Germany as well as in other countries. This is because the still relatively young 'digital age,' which is characterized by digital communication and data mining, poses new challenges to our societies, especially when it comes to our private sphere and our personal data. Digital communication can be defined as the "electronic transmission of information that has been encoded digitally [i.e. expressed in discrete numerical form] (as for storage and processing by computers)" (thefreedictionary 2014). Data mining is "data processing using sophisticated data search capabilities and statistical algorithms to discover patterns and correlations in large preexisting databases; a way to discover new meaning in data" (thefreedictionary 2014) "that [might also] be used to predict future behavior" (webopedia 2014). Data in these definitions refer to "[n]umerical or other information represented in a form suitable for processing by computer" (thefreedictionary 2014). The technological innovations that allow companies and governments to collect and analyze data to an extent never seen before are often perceived as threats to our privacy. The major problem is, to quote Marwick, that "[t]he technology is developing far more rapidly than our consumer protection laws, which in many cases are out of date and difficult to apply to our networked world" (2014, 24). In the wake of the 'big data'1 gathering, demands for better data protection procedures and laws have become louder. The recent NSA scandal has fueled societal debates, especially in Germany and the United States. The "[t]op theme at CeBIT2 2014" – "the world's largest and most international computer expo" (Anon 2014, wikipedia) – for example is "the future of Big Data" (Anon 2014, CeBIT). The catchy term "datability," a blend of "data," "ability," "responsibility" and "sustainability" (Wiegand 2014) has been coined to describe the challenge companies face to reach the "ability to use large volumes of data sustainably and responsibly" (Anon 2014, CeBIT). In a more general sense, one could extend the concept of datability to also include the ability of individuals to handle their personal data responsibly. It will be crucial for the future development of our societies that we find solutions to the privacy issues we are confronted with so that the technological innovations will really turn out to be blessings rather than curses. This will be expanded on in what follows. However, before we return to current debates, we will trace the historical development of privacy and the issues connected with it. We will first take a closer diachronic look at the semantics – based on dictionary definitions – and collocations of 1 2 'Big data' is a "catch-phrase, used to describe a massive volume of both structured and unstructured data […]". 'Big data analytics' "refers to the process of collecting, organizing and analyzing large sets of data to discover patterns and useful information" (webopedia 2014). CeBIT is an "acronym for 'Centrum für Büroautomation, Informationstechnologie und Telekommunikation', which would literally translate as 'Center for Office Automation, Information Technology and Telecommunication'" (wikipedia 2014). Anglistik: International Journal of English Studies 25.2 (September 2014): 11-38. Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) DANIELA WAWRA, Passau privacy. This will lead us to discussions about the cognitive concept as it is reflected in how we speak of privacy issues in both English and German in current and recent debates in print media and on the web. With a 'culturomic' approach, a comparative quantitative perspective will be included here as well as a diachronic one that gives us further insight into the semantic development of privacy and its related concepts and their circulation in English and German since 1800. Also, the question will be addressed whether the 'big data' analyses that we run allow for cautious conclusions about statistical tendencies in prevailing cognitive patterns about privacy in Germany and the USA. The article concludes with reflections on the value of privacy and on the consequences digital communication and data mining can have for autonomous personhood and how we might deal with this challenge. 2. The (Cognitive) Semantics and Collocations of Privacy in English and German The word stem of 'privacy' and 'Privatheit' can be traced back to Latin "privare," meaning "bereave, deprive" (Oxford Dictionary of English Etymology 1966; New Shorter Oxford English Dictionary 1993). The derived Latin participle "privatus" means "withdrawn from public life" (Oxford Dictionary of English Etymology 1966; Oxford English Dictionary 32010; Online Etymology Dictionary 2014). To sketch the historical semantics of 'privacy,' we will concentrate on English here and highlight meanings that are relevant to this context. 'Private' has been used since about 1400 in the sense of "not open to the public" and since about 1500, (first documented in 1483) it has also meant "not holding a public position" (Oxford Dictionary of English Etymology 1966; Oxford English Dictionary 1978). In 1560, it is used in connection to communication to designate confidential talk and in 1586 in relation to people's minds and thoughts (Oxford English Dictionary 1978). While 'the private' started out as something that is deprived of the public, of public life, a public position, privacy changes into something positive that might be threatened and must therefore be defended. From 1814 onwards, privacy is the choice or – even stronger – the "right" not to be bothered and not to be the object of "public attention" (Oxford English Dictionary 21989). In this context, privacy has also been defined as a civil liberty: privacy as the right to be free "from interference or intrusion" (Oxford English Dictionary 21989) and from "unauthorized oversight and observation" (Webster's Third New International Dictionary 1976). In 1933, privacy was used for the first time in connection with technological innovation: For "[o]verseas radio telephone services operated by the Post Office," "privacy equipment" was advertised (Oxford English Dictionary 21989). Starting with definitions of privacy as a right, the concept implies the possibility of a violation of this right and consequently the need to protect and defend it. Obviously, privacy is seen as something valuable that is worthy of protection. Powered by TCPDF (www.tcpdf.org) Winter Journals for personal use only / no unauthorized distribution 12 3. Definitions of Privacy in Academic Research Let us take a look at definitions of privacy in academic research. According to Rössler (2001, 19), we characterize as private: (1) ways of acting and behaving, (2) certain knowledge or information and (3) spaces. Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) PRIVACY IN TIMES OF DIGITAL COMMUNICATION AND DATA MINING 13 'Access' and 'control' are central parameters of privacy (cf. Rössler 2001, 21; 23; Bok 1983, 10f.): Who has access to – or influence on – and control over persons (i.e. their behavior, their body, their thoughts and feelings) and spaces (i.e. your home, buildings, rooms, natural spaces)? Accordingly, Bok (1983, 10f.) defines privacy "as the condition of being protected from unwanted access by others – either physical access, personal information or attention" (Bok 1983, 10f., cited in Rössler 2001, 23). When claiming privacy, you claim control over access (cf. Bok 1983, 11, cited in Rössler 2001, 23). Starting from the central concept of 'access,' Rössler defines three kinds of privacy: - Decisional privacy is the right to be protected from unwanted access, i.e. unwanted interference and heteronomy, when it comes to decisions and actions. Informational privacy is the right to be protected from unwanted access to personal data. Local privacy is – in a literal sense – the right to be safe from other people's access to spaces. (2001, 25) While violations of privacy can occur in all three areas, we will concentrate on the first two here. Rössler also speaks of a dividing line ("Trennlinie," Rössler 2001, 25) – one could also use the term 'border' or 'boundary' instead – between the public and the private. This dividing line is constructed and not fixed. In liberal societies, it is open to debate, according to Rössler (2001, 25). Where we draw the line between public and private is up to us and, of course, the cultural influence on what is considered to be genuinely private must not be neglected (cf. also Rössler 2001, 26). The cultural standards with regard to what is considered to be private can and do change. We observe this currently, especially with regard to informational privacy, where the new technological possibilities of personal data distribution and collection have at least challenged existing cultural privacy norms. We are currently at a crossroads in our societies and face the difficult task of negotiating privacy boundaries that protect us individually and society as a whole from harm. Rössler states that there are very few comparative studies concerning the question whether there are different cultural interpretations of privacy in modern liberal Western societies (cf. Rössler 2001, 33). She draws a short comparison between Germany and the USA. Warren und Brandeis defined privacy as the "right to be let alone" (Warren and Brandeis 1984, 76 [orig. 1890]) at the end of the 19th century. This perspective on privacy is still very influential in the USA, according to Rössler (2001, 34): It emphasizes the right to be let alone by the state and society, to decide for oneself and act by oneself according to one's own free will. The autonomy of a person in his or her decisions and actions is closely linked to the concept of decisional privacy introduced above, which Rössler considers to be more prominent in the USA than in Germany, where this concept is hardly used. In Germany it is mainly freedom from insights into one's everyday life that is claimed and less freedom from intrusions upon individuals' decisions and actions (cf. Rössler 2001, 34). It is thus informational privacy that matters more to Germans than to US-Americans: People in Germany and also Europe feel more threatened by voyeurs, i.e. the state and individuals, and – I would add – companies, and claim privacy rights above all in such contexts (e.g. Google Street View). In addition, there is no law in the USA that corresponds to EU legislation with regard to informational data protection (cf. Rössler 2001, 35). To sum Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) 14 DANIELA WAWRA, Passau up, Rössler sees the following difference in the accentuation of privacy in Germany and the USA: In Germany, privacy claims to control the authenticity of selfpresentations are prevalent, i.e. informational privacy; in the USA it is the claim to privacy with regard to the protection of autonomous actions and decisions, i.e. decisional privacy, which is dominant (cf. Rössler 2001, 36).We will investigate whether this still applies to the current prominent privacy discourses. 4. Negotiations of Privacy in Recent and Current Print Media and Web Discourses on Privacy: a Qualitative Approach A closer look at selected prominent3 English and German privacy discourses will give an impression of typical negotiations of privacy that circulate currently, particularly in the USA and Germany. In these discourses, privacy is also often regarded as a value that must be protected and has become harder to protect in times of digital communication and data mining as the following examples will show. 4.1 USA: Privacy as a Civil Liberty and Invasions of Privacy The USA has recently been harshly criticized because of the NSA affair. The American Civil Liberties Union (ACLU), for example, criticizes that privacy legislation lags behind the new technologies and technological possibilities (Anon 2013; see also Marwick 2014, 24). They demand the protection of civil liberties in the digital age and emphasize that privacy is a central democratic right. They view the American government as their major opponent and as a major threat to society. They accuse the government of 'invading' our privacy rights. They refer to the American science fiction author Philip Dick and claim that finally the futuristic scenario has become reality in which we are all seen and treated as suspects, despite the fact that we are completely innocent. Mark Weinstein in an article published in the Huffington Post in 2013 asks whether privacy is dead. The "murderers" are, according to Weinstein, the big technological companies Google and Microsoft. Privacy is thus seen as something vital here, something that we essentially need to live as human beings, and Google and Microsoft are depicted as violent aggressors who have committed a crime in taking it away. Weinstein speaks of a "salvo war," a "technology invasion into our lives," of an "invasion of our privacy" (Weinstein 2013), as well as of data theft that is committed by Google. This clearly communicates that boundaries are crossed and property rights are injured. Like the American Civil Rights Union, he uses space metaphors. Weinstein also sees privacy as an inalienable civil right in democracies. It is ironic – according to the author – that while globally there seems to be a trend towards more democracy, "individual privacy is being eroded." As the beginning of the decay of privacy he identifies a time "about 15 years ago" when the internet was already in widespread use and when companies applied data mining to learn about their clients' preferences and make individually tailored offers. The next step that lead to the slow 3 'Prominent' is used here mainly because the articles were displayed on the first few pages by Google among the most relevant hits when searching for discussions about privacy on the web or because they were published in major newspapers with a large circulation like Süddeutsche Zeitung or Frankfurter Allgemeine Zeitung. Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) PRIVACY IN TIMES OF DIGITAL COMMUNICATION AND DATA MINING 15 death of privacy were social networking websites that were "sexy" and "fun." According to Weinstein "[w]e were broadcasting to the world." Individuals voluntarily sacrificed their privacy to the public. Weinstein describes this development as leading to an addiction to be noticed by the public and as being "exciting" at first because there was "a whole new world to explore" (Weinstein 2013). These semantics imply a careless, irresponsible application of the new technical possibilities. Many were not aware of the potential dangers. To be social and private is no contradiction, according to Weinstein. Thus, the author portraits social media and their usage as fashion trends that were used irresponsibly, without thinking much about them, until first problems arose and were recognized, for instance the fact that data, once published in the digital world, could not be erased or controlled. A central question, then is if the free market can or even wants to protect our privacy rights or if we need the government to do so. Weinstein classifies governmental intervention as negative, as being forced on the people. This critical attitude towards governmental intervention is consistent with the traditional skepticism many Americans have towards government regulation. Weinstein is convinced that the free market is perfectly able to protect our privacy. Companies cannot but meet their clients' wish for more privacy protection. The choice of a considerable number of clients will regulate the market, so no government is needed in this respect. Weinstein demands that the companies be transparent: They must show which information they collect, how they get it and how they use it. The author himself has founded a website (Sgrouples.com) that provides a Privacy Bill of Rights for its members. Through this and further arguments, Weinstein adds a historical dimension and emphasizes that the American nation has always been built on the protection of privacy rights. This has always been one of the major reasons why immigrants came to the United States from countries in which privacy rights were hardly respected or not respected at all. In other US-American privacy discourses, 9/11 is mentioned regularly. The terrorist threat is still seen as very real and big data collection and analysis are justified on the grounds of the war against terrorism. A contribution by Bruce Stokes (2013) on the internet platform foreignpolicy, for example, is titled "Trading privacy for security." In fact, most US Americans would allow restrictions on their personal privacy to enhance their security, as polls show. In a survey that was initiated by the Washington Post, 57% of the respondents said that threats by terrorists had to be looked into, even if this led to invasions of ordinary citizens' privacy. Generally, respondents with a college degree were more worried about their privacy than others. This suggests that the fear of terrorism is still strong in the USA and that this fear pushes privacy concerns into the background. Stokes (2013) also criticizes that the consequence is that the USA endangers its status as "protector of civil liberties" abroad. Thus, the author also categorizes privacy as a central civil liberty. David Simon, the author of the much praised US-American HBO series The Wire,4 wrote a blog in June 2013 that has received wide attention and offers quite a differentiated web discussion around the topics of data collection and data protection. In his blog, Simon argues that the collection of data is necessary to fight crime. For 4 This television series is about data collection and surveillance in the fight against drugs and drug-related crime, which draws on Simon's insights from his years as a police reporter (cf. Eschkötter 2012, 9). Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) 16 DANIELA WAWRA, Passau him, it is beyond doubt that data have to be collected. The point is to make sure that they are not used in a way that violate "individual liberties and […] personal privacy." The legislative power is to make sure of that. Simon weighs the risk of terrorism against potential privacy invasions. As a solution he suggests the surveillance of pay phones as it was practised in Baltimore during the 1990s to fight drug dealing: Collect big data, identify potential targets (i.e. suspects), analyze their data in more detail but not the data of normal citizens. The framing of the US-American discourses presented above often contrasts security and privacy as two opposing interests between which you have "to find the right balance" (Lynch 2013). This is understandable when you conceive of privacy "as a political or legal concept" only (Lynch 2013). But I agree with Lynch and others when they say that privacy is more than that: Privacy is necessary "to be an autonomous person." You have "privileged access" to your mental life, i.e. your thoughts (Lynch 2013) and feelings. This is also consistent with the "Cartesian concept of the self:" Your "mind is essentially private" (Lynch 2013). The unrestricted access to your thoughts is personal, nobody else has it. You can and should control how much access others have to your thoughts. And I would like to add: It also means to be in control of who gets how much access to your mental life. You are the gatekeeper to your informational and decisional privacy. When you allow other people access to your inner self, you have to decide carefully how much access you grant them, depending on how much you trust them. Because if somebody violates that trust and has gained access to a major part of your thoughts and feelings, that person can control you. This can also have negative political consequences in that governments might reign as autocrats (Lynch 2013). Thought intrusion and control are popular topics in science fiction and fantasy literature: Just think of George Orwell's 1984 or Harry Potter and Lord Voldemort, who can invade and read each other's minds at various points in the story and thus gain immense power over the other. I think that this is an archetypical fear of human beings. Consequently, the "loss of privacy" means the loss of autonomy, the "loss of freedom" (Lynch 2013), the loss of personhood. Distinct personhood is endangered when privacy is endangered: "To the extent we risk the loss of privacy we risk […] the loss of our very status as subjective, autonomous persons" (Lynch 2013). Derman, for example, equals freedom and privacy, when he explicates freedom as freedom from interference, control and observation (Derman 2013, 25). You are in danger of becoming an "object to be manipulated," "dehumanized" (Lynch 2013). The insight that the loss of privacy leads to dehumanization is not a new one, because this already happened in concentration camps. When we lose "privileged access to our psychological information […] we literally lose our selves." "[P]rivacy of thought" is necessary for "autonomous personhood" (Lynch 2013). I think that these examples of US-American web discourses on data collection and privacy issues demonstrate a recent change of mentality in US-American society in the wake of the NSA affair: Informational privacy, the right to protect access to personal data, has now become a prominent topic of societal discourse and has become more important for US-Americans as well. Thus, Rössler's (2001) statement from a few years back, that informational privacy is more important to Germans and less Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) PRIVACY IN TIMES OF DIGITAL COMMUNICATION AND DATA MINING 17 important to US-Americans (see above), has been challenged. We will look for further – rather large-scale – evidence below to substantiate this hypothesis. 4.2 Germany: The German Paradox Grehsin (2014, 35) considers it to be the first responsibility of a nation under the rule of law such as Germany to provide the prerequisites for a self-determined life. He deduces this duty from the inviolable dignity of the human being. This also requires the government to ensure that it is possible for the individual to protect himself or herself from unwanted intrusion into his or her private sphere, in order to be able to live the life s/he wants (cf. Grehsin 2014, 35). However, there is also a countermovement that has its origins in Germany and proclaims the end of privacy (cf. Deyhle 2013, 1). This 'post-privacy'5 movement basically argues that we can no longer protect our privacy in the digital age and should therefore not pursue a lost cause. Jarvis (2010, cited in Deyhle 2013) points out a German Paradox6: We highly value our privacy in Germany, which was evident in our greater resistance towards Google Street View, Google Analytics and our protest against unsatisfactory possibilities to protect personal data on Facebook. This also supports Rössler's (2001) hypothesis that informational privacy plays a greater role in Germany than in the USA (see above). On the one hand, we want to control other people's access to our personal information. On the other hand, the radical post-privacy movement has its origins in Germany. The concept is hardly used in the USA according to Deyhle (cf. Deyhle 2013, 1). Whether this can be upheld will be investigated below. Despite Eric Schmidt, Steve Jobs und Mark Zuckerberg having declared privacy to be "dead" in 2010, there has never been a real movement developing out of it (cf. Deyhle 2013, 2). Christian Heller, who invented the concept of post-privacy, thinks that it is a lost cause to try to keep control over data that have already been collected and saved. He sees great potential in 'big data' analyses and advocates the collection and connection of as many data as possible. They could provide completely new insights into our complex world (cf. Heller 2011, cited in Deyhle 2013, 4f.). The cultural studies specialist Seeman is also a proponent of the post-privacy movement. He considers it a great human achievement that we can collect large amounts of data. Everybody can then decide for him- or herself whether a piece of information is worth saving and then filter the data accordingly (cf. Seeman 2011, cited in Deyhle 2013, 7). This, however, requires a high level of competence and knowledge. Otherwise, there is the danger of information overflow and a lack of distinction between important and peripheral information. The post-privacy movement criticizes that privacy is not clearly defined and cannot be justified from a social-historical perspective. Privacy would divide society (cf. Deyhle 2013, 9). To unite societies, an open-data philosophy would be necessary: All data should be freely accessible. In the end, everybody would profit: A higher information flow between people would improve society as a whole (cf. Deyhle 2013, 6). I disagree here: First of all, I think that privacy is relatively clearly defined (see, for example, Rössler's definitions above) and the importance of privacy protection can 5 6 The concept of 'post-privacy' was used for the first time in 2007 by Christian Heller (cf. Deyhle 2013, 2). There is also the "Privacy Paradox," which refers to the contradiction that privacy is often highly estimated in theory but carelessly dealt with in practice (Bojaryn 2013, V1). Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) 18 DANIELA WAWRA, Passau be justified from a social-historical perspective. Privacy is something essentially human, and we cannot live autonomously without it. Privacy does not divide society but is essential for a democratic society. I will expand on this below. Second, the opendata philosophy is – in my view – a utopian one: It presupposes an ideal human being that is philanthropic, without envy, tolerant and much more. Heller (2011, cited in Deyhle 2013, 6), for example, makes a case for the publication of all tax data. He thinks that this would make it more difficult to exploit employees, challenge norms and break down walls. The publication of tax data is already a reality in Sweden and Norway, for example. There are, however, serious drawbacks, such as burglars using the data as a source of information for their crimes. Sweden has therefore "changed the rules so that tax information, rather than being posted online, was made available only to people who demonstrated that they had a 'legitimate need' for it. Similar changes are being discussed in Norway" (Hannan 2012). Thus, access to privacy has been restricted. 4.3 Metaphors of Privacy To conclude this section with prominent examples of current privacy discourses in Germany and the USA, a closer look at metaphorical references to privacy is in order, as it gives us an insight into our way of thinking about the concept. It is characteristic of metaphors to describe abstract concepts with more concrete ones and thus make them more graspable and understandable. In the discourses rendered above we find metaphors like the "violation" and the "invasion" of privacy. Privacy is thus conceived of as a spatial concept, a 'territory' that has borders enclosing this abstract entity. Here, the container metaphor is used for privacy: There are "insides" and "outsides" (Goatly 2007, 15), there are core and peripheral areas, they can be open or closed, full or empty (cf. Brandstetter 2009, 10). The choice of 'violation' and 'invasion' in connection with 'privacy' also suggests a comparison to war, which has sometimes even been explicitly mentioned, and the disrespect for a territory's or country's borders. By aggressive and forceful acts (both 'violation' and 'invasion' imply aggression and force), these boundaries are disrespected. When something is considered to be worth violating and invading, this also means that there must be something within the boundaries that is valuable. The territory or country that is privacy can be interpreted in three ways: 1. 2. 3. Literally and materially, as a space that is invaded such as, for example, your home. Materially, as a body that is violated, for example, when somebody touches you but you do not want to be touched. Immaterially, such as your thoughts and feelings that are accessed by others against your will. All three interpretations that ensue from our thinking about privacy can also be found in the definitions of the concept cited above. At this point, a brief look at the corresponding German expressions for the 'violation' and 'invasion' of privacy will help to clarify whether we typically conceive of privacy in the same way as English native speakers do. In German we can speak of 'Verletzung der Privatsphäre' or 'Privatheit.' 'Verletzung' is one of several possible translations of 'violation.' For 'invasion' of privacy we can use 'Eindringen' or 'Invasion' in German. 'Eindringen' has a less violent connotation than 'Invasion' and also Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) PRIVACY IN TIMES OF DIGITAL COMMUNICATION AND DATA MINING 19 suggests that it is on a lesser scale. Thus, in German, we can use two different terms for the English 'invasion' in relation to privacy. A number of questions arise when we want to pursue the comparative perspective in more detail and also add a diachronic dimension: How prominent have the topics of 'privacy,' respectively 'Privatheit,' 'Privatsphäre' and related concepts been in societal discourses in Germany and the USA over the centuries? Which of the possible terms for the concept of privacy and its violation have been used more often in German and English? What are the most prominent collocations of privacy in English and German? And what might this tell us about how we think about the concepts in the German and English language, in German and US-American society? These questions will be addressed in the following section with the help of 'big data' that are freely accessible on the internet. 5. The Historical 'Culturomics' of Privacy: a Quantitative Approach with Google Ngrams and Google Advanced Search 5.1 Culturomics Introduced Let us first address the question of the circulation of 'privacy,' 'Privatheit/Privatsphäre' and related concepts over the centuries. We will use what Zimmer (2012) has called a "Big Data approach to historical analysis with the label 'culturomics,'" i.e. the Google Books Ngram corpus and viewer (2013). This tool provides a quantitative content analysis of millions of books that have been digitized by Google and displays the results in a graph.7 According to googleresearch (Orwant 2012) they "[had] scanned 20 million books [by then], […] [i.e.] approximately one-seventh of all the books published since Gutenberg invented the printing press." Zimmer also states that "[t]he new edition extracts data from more than eight million out of the 20 million books that Google has scanned. That represents about six percent of all books ever published, according to Google's estimate. The English portion alone contains about half a trillion words" (Zimmer 2012; see also Lin et al. 2012, 169-170). Up to the present day however, searches on the Google Books Ngram corpus still render results only up to the year 2008. 'N-grams' are letter combinations: 'privacy' would be a '1-gram' or 'unigram,' 'privacy violation' would be a '2-gram' or 'bigram.' Google "included only ngrams that appear over 40 times across the corpus."8 When you search for an Ngram in the Google Books Ngram corpus, the additional viewer-tool "displays a graph showing how those phrases have occurred in a corpus of books (e.g., 'British English,' 'English Fiction,' 'French') over the selected years"9 (Michel et al. 2011). The English corpus contains 4,541,626 volumes and 468,491,999,592 tokens. The size of the American English corpus, which was also chosen for this study, is unfortunately not indicated in Lin et al. 7 8 9 More extensively, culturomics is defined on wikipedia as "a form of computational lexicology that studies human behavior and cultural trends through the quantitative analysis of digitized texts. Researchers data mine large digital archives to investigate cultural phenomena reflected in language and word usage. The term is an American neologism first described in a 2010 Science article called 'Quantitative Analysis of Culture Using Millions of Digitized Books,' co-authored by Harvard researchers Jean-Baptiste Michel and Erez Lieberman Aiden. Michel and Aiden helped create the Google Labs project Google Ngram viewer which uses Ngrams to analyze the Google Books digital library for cultural patterns in language use over time" (Anon 2014, wikipedia). http://storage.googleapis.com/books/ngrams/books/datasetsv2.html (28 April 2014). https://books.google.com/ngrams/info (28 April 2014). Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) 20 DANIELA WAWRA, Passau (2012). Goldberg and Orwant (2014, 4) however state that the American English corpus contains 1,400,000 books published in the USA and 146,200,000,000 tokens. The American English corpus is a sub-corpus of the English corpus. The German corpus consists of 657,991 volumes and 64,784,628,286 tokens (Lin et al. 2012, 170). Michel et al. state that the corpora allow for quantitative investigations of "cultural trends" (2011, 176), i.e. culturomics research. While Google's Ngram tool can be useful for discovering broad linguistic and cultural trends, it has been criticized for various methodological flaws, which must be taken into account when interpreting the results of an analysis. On the English Language & Usage Stack Exchange site10, for example, "a question and answer site for linguists, etymologists, and serious English language enthusiasts" (Anon 2014, s.v. stack exchange), the following criticism has been brought forward, which should above all be kept in mind when interpreting the Ngram-viewer displays in the following section: 1. There is the danger that people might "ignore the scale on the Y-axis, which reports the differences in a range that may be only a few cases in tens of millions. And very often NGram answers garner upvotes because people don't think about the data behind the charts and simply look at the pretty line graphs and upvote the answer." (@MrHen 2014) 2. A serious problem is faulty data: "I am working on a project that was originally going to use some Google Books data, but in-depth analysis seems to indicate that dates are way off (as in, 25% of the pre-1800 tokens I have looked at so far seem to be off on their publication dates by an average of ~100 years)." (@Kosmonaut 2014) I thus agree with the following user who wrote: For some time now, contributors to EL&U have offered NGrams in support of their arguments. Now, there is nothing wrong with this practice per se: I have done so myself, and have seen others do it in a way that acknowledges the margin for error inherent in a flawed system. When done well it is done in a spirit of inquiry, citing the NGram as possible evidence; when it is done poorly, it is trumpeted as absolute proof of someone's contention. (@Kosmonaut 2014) Google Ngram can thus provide data for formulating hypotheses that can then be tested further by collecting other evidence or counter-evidence. A final caveat is in order in the context of this study: Up to the present day, the Ngram viewer still only displays results up to the year 2008. The NSA scandal happened in 2013, and it is to be expected that around that time or with some delay, the circulation of 'privacy' and related terms will have increased considerably. This should be reflected in the results you get when you run queries on the Google Books Ngram corpus in the future, when books of the years 2013 and following will be included in the corpus. Therefore, this hypothesis will have to be checked in the future. 5.2 Empirical Study (1): Google Ngrams The following queries on the Google Books Ngram corpus are to provide first hypotheses as to how prominent privacy and related concepts have been in their circulation in English in general, in American English specifically, as well as in German since 1800. The general English corpus was chosen for the analysis as it allows for 10 See http://meta.english.stackexchange.com/questions/2469/should-we-allow-google-ngrams-to-be-presen ted-as-statistical-evidence-without-qu (6 May 2014). Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) PRIV IVACY CY IN N TIM MES OF O DIGIT GITAL COMM MMUN UNICA CATIO ION AND AN DATA TA MINING I NG 21 cautiou cautiouss ggene eneral ral conclu co clusio sionss abo bout diff ifferen rences ces betwe be ween en the he En Englis lish h and a d Geerma man llannguage. guage. Th The Ame America ican Engl E glish sh cor orpus us, a sub ub-cor corpus us of the Englis En lish ccorp rpus, s, con ontain ains oonly ly books books tha hat were were pub ublish ishedd inn tthee Uni United ed Sta tates es (see see ab above ve; Gold G ldberg erg and and O Orwa rwant ant 20 2014, 4, 4) and was as chos c osen n to o form f rmula ulate ate and nd give give first first eviden ev dence ce for or aga agains ainst hy hypo pothes heses es reegardin gar ing th thee circu circulat lation on of priv rivac acy to topic pics in US societ so ciety. y. Th he res resul ultss for f r tthee Am merican rican can Englis En lish co corp rpuss can an also be co comp pared red with with the he German Ge man n corp c rpus, s, ass assum ming ng that tha m most st (if not all all) bo book oks of the the latter la ter we were ere pub ublish ished d inn G Ger erman any y and a thu thus aallow low w fo forr deriv derivatio tions ns with wit regard regard rd to t G Germ rman any.. T This his might mi ht allow allow for or cautiou cautiouss conc c nclus lusion ons about about which which way ways ays of thin hinkin ing abou a out priva pr vacyy hhav avee been been dom omin inant nt in n Eng En English ish and an G Germ erman an an and in the US USA A and G Germ rmany any,, res respe specti ectively ely,, wh whe hether her they theey hav havee been be en more more or or lless ss similar similar or differen ferent ent and w wheth ether er thes hesee have have ch chang nged ed inn tthee cou course se off time. t ime.. The The und nderly rlying ing as assum umpti ptionn for the for formuulati ationn off hypo h pothe theses ses about ab ut how ow prom promine nentt ppriv rivacy acy and re relate atedd co conc ncept cepts ha have ve been bee een in i th thee USA US and an Germ G rman any and and iin which which w ways ays the co once cepts ts we were ere appro a proac ached, is as ached, follow fol ows:: T The he num numbe ber er of o in instan tance ces of thee concep co cepts ts in the he respect res ective ivee corpo co pora ra is i an in indiicator cat r of of th their eir pro promi minen encee iin ssocie cietal tal dis iscou ourse. se. One On e hhass to be b aw awar ware, are, how h weve ver, er, th that at the he data data could could be bias biased sed if, for ex exam ample, le, a lot l t of of in insta stances ces of a part particul cular ar lem l mmaa orr phras hrase ase occur occ ccur inn jjust st a few w boo ookss that hat were were not ot wid widely ely read re d in n the t e cult culture re of thei heirr origi origin oorr if the co corpo pora ra that t at are compared co pared red are ar vvery ry une neven ven in thi this respe reespect. ct. Th Theref erefore, re, further furtherr ev eviidence dence will wil be need neede eded d and and the he results results of the fo ollowi lowin wing analy a alyses ses ca can only o ly provid provide ffirst rst trends tre ds.. Th Thee follo followin wing g queri queries es are a bas based ed on on the he finding findingss of of the th pprev evious ous sect sectio tions. s. Private, Pr vate,, P Priv rivacy acy,, Priv Private te Sphe Spher ere – Prrivat, at,, P Priva rivathei theit, it, Pri Prrivats atsphä phäre re The The fo follow lowin ing ggraph aphs11 dis displa lay the the res resul esultss of of th thee quan q antita titative ive an analy lysis is off tthee freq frequen f ency cy of occ ccurren rrence ence of thee terms te ms 'pr priva vate,' 'pri privac acy,' 'pri private ate spher sp ere' and and the heirr co corre rrespo respond nding ng Germa Geermann tterm erms in thee Engli Englishh and a Germ Germa rman an Goo G Google le Bo Books ks Ngram Ng am co corpo rpora: ra: Th tren The trend rendss fo he En Englis lish and a d Ameri American an Engli En glish h corpus cor us are very very simila similar.. T The he forr the displa dis layss fo forr 'pri acy,' for or exa xamp ple, e, sho how,, tha hat of all unigra ms in the En Engli glish h corp corpus us privac un grams (E), (E) abou about 0.001 0. 0160% 60% (0.0 ( .0015 1594 4 too bbee exac exact) t) are re 'pri acy' inn 2 2008 008.. Alm Almost st the sa samee privac freque fre uency cy – 0.0 1590% 0% – oc occur urs in the th A Ameerica icann E nglish ish co corpu pus (AE) (AE). In 1800, 18 0, in .0015 Eng 11 For or reas easons ns off sspace ace, not no all grap raphss of o th the search se ches that th t w were n on th the Goog G ogle Ngram N ram cor corpora ora can be re run ren endere red below. be w. Oft ften oonly ly thee ggraph aph for the he sear earch h on o the th En lish or Am meric rican Engli E glish ccorpu pus is dissEnglis play layed. d. Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) 22 DANIE A ELA WAW WRA, Pas assau com compa pariso ison, n, it was was just ju t a bit bi more more th than n 0.0 .0002 020% 0% in both bo th cor orpor ora (0 (0.00 .00026 026% % E, 0.0 0.0002 028% 8% A AE). ). Th The graph g aphss dem demons onstra tratee that t at the term term 'pri privac acy' has has bee een uused ed more m re oft often n ove verr the t e cen centur turies es than tha 'priv rivate ate spher sphere,' both bo h in n Eng nglish lish gene g nerall ally and and in Am Ameri erican an Englis En lishh sspec ecific fically lly. 'P Priva ivate te sph phere re' has ha been been har ardly ly used used in i ei eithe her corpo corpora. ra. Esp Especi ecially lly the th usag u age of o 'ppriva ivacy'' has co contin tinuo uously sly in incre reased sed si since ce about ab ut the 1960s 19 0s in the Englis En lish h and a d the he Ameri American an Ennglis lishh ccorp rpus, s, rea eachin hingg a peak pe k in n 2004 2004 fo for th the Engl E glish sh cor corpus us and and iin 2003 2003 fo forr tthee Am meric rican an Englis En lish corp corpus. us. Privac Pr vacy y thus t us se seem ms to ha have ve bec becom ome a more more and nd more more imp mport ortant nt top opicc of of U US-A Am meric rican n soc ocieta etal disco di course rse over over the he cou course rse of tim ime. When When we com compa pare re the he res esults lts for the Englis English h corp c rpora ra with wi h th thee Germ German an on ones, s, whhich ch are displa di played yed in th thee grap graph h belo elow, w, we imm immedia diately tely sseee that hat in Germ G rman an the te term m 'Pr Privat vatsph phäre re' ('p priva ivate te sph phere') is the on one that that iis used u ed mo moree ofte o ften than than 'Pri Privat atheit eit' ('p priivac vacy'). ). It is al also sin incee around aro arou nd the 1960s 19 0s tha that at the the uusag sagee of o th thee term te rms – also als in inclu cludin ding 'p priivat vate' – has conti continuo uousl usly in incre crease ased,, reach reac achin ing tem temp pora raryy ppeak eakss iin 2007 2 07 (fo (for 'ppriv rivat' and an nd 'Pr Privat vatsph phäre äre')) and a d 200 0022 (('Pr Privat vathei eit'). ). Eve venn iiff we we add addd up freq quenc encies es off usa sagee for or frequ 'Pr Privat vatsph phäre äre' and an 'Priv ' rivath eit' we do nnott reach r ach the t e percen ercen centages t e English English te term m atheit agess of of the 'pri privac acy' in n the t e Eng English ish an d Ame meric ricann Eng nglish lish corpu corpus. s. Als lso,, tthee Ger erman an 'ppriv rivat' has as and Am bee been een used sed le less ss freque frequently (Ame merican rican) an) En Engli glish h 'pri ate' acco a cordin ding g too the he tly than than the (Am private Ng Ngram ram vviewe ewer.. Mig ight ht thi te tha that the the ttopi picc has h s bbee een m oree prom pr mine nentt in i U US-this indic in icate more Am Ameri erican an societ society?? W Wee wo ld need need fu furth rther er evid viden ence ce to t su supp pport his claim. cla m. would rt this Ki Kinds ds of Pri rivacy acy In accord acc rdanc ance ce with w th Rö össler (20 1) cla claim m that hat decisi de isional nal al privacy p ivacy y iss main mainly ly imp mport ortant nt sler'ss (2001) in the he US USA (see ( ee abo above ve),, w n t gget et any a hit hits for fo 'd dezi zision ionale eit' (''deci ecisio ional al wee do d not ale Privat Pr athei pri privac acy')) in n the t e Germ G German an co rpuss but b t oonly nly hits hi s fo for 'info ' format matio ationel theit' ('inf inform rmaacorpu ellee Priva Prrivathei tio tional al pri rivac acy'). ). In the he En lish as a w well ell as inn tthee Am meric erican an En ish corp c rpus, s, 'inf inform rmaaEnglis Englis tio tional al priv rivac acy' occu o curss much m uch h mor often ten than tha 'dec ' ecisio sional al privacy pri acy.' The he firs firstt hhit it for f r 'innmore oft for formaation ional al priv privacy cy' dates da tes back ear 11943 43 in bo both h Eng En ish corpo co pora, a, and it innbaack too tthee yea English cre crease reased co cons nsiderab derabl bly in us ge from from 1196 968 onward o wards rds in bot oth corp c rpora. ra. Deecis cisiona onal al privac p ivacy cy usage occ occurs ccurs fo forr the the first fi st tim ime in the E glish sh Googl Go gle corpo corpora ra in n 198 his co could uld be int interrthe Engl 984.. This Th pre preted ed as a contrad c ntradi adictio ctionn tto Röss (2001 01)) cl claim aim th that at ddec ecisio al privacy pri acy hhas as alway always ys R ssler''s (2 sional ma matter teredd mo ore re in i th thee USA US than nform rmatio ational nal privacy pr vacyy iin com c mpari arison on to o Ge Germa many ny.. If than info Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) PRIV IVACY CY IN N TIM MES OF O DIGIT GITAL COMM MMUN UNICA CATIO ION AND AN DATA TA MINING I NG 23 we loo lookk at the the freq equen encies ies off the th term rm in the he German Ge man n and a En Englis lish h corp c rpora ra and an tak take ake that tha as an indicat indicator or off how ow pr prom minen nently tly it is i ddiscu scussed ssed,, wee cou ouldd ddraw raw th thee conc c nclus usion on that tha in inform ormat ation onall priva p ivacy cy has alway alw ways been been mo more re off a topic topic inn E Eng English ish tha than in Germ G rman an in gen eneral ral and an m more ore re in the US USA A than in G Germ rmany any sp speci ecifica fically lly.. In the En Engli glish h corp coorpus us its freq frequen uency cy reach ched d a ppeak eak in 20 2005 05 (0.000 (0. 0000 00320 2084%) 4%), ), in the he Ameri American can E Engl glish sh corpus cor us in 2000 20 0 (0 (0.0 0.0000 0002 02571 711), ), in n the he Geerman man corp c rpuss in n 2007 2007 (0 (0.00 00000 00103 10377 77).. It is alsso notew n tewor orthy hy that tha the th fi first st occ ccurren rrence nce ce of 'in inform format ation onelle elle Pri Privat atheit eit' ('iinfo format matio tional al privacy pri acy')) in n the t e Germa Ger an co corp rpuss only only dates da es back ck to t th thee year year 19 1990, 0, i.e. i.e. it occ ccurs rs mu much ch later lateer th than an in n the t e English English corpo corpora. ra. 'M Mental entall ppriva rivacy'' is only nly us used ed in n the t e Eng nglis lish co corpo rpora. ra. There There is noo ddisp isplay lay for 'm ment ntale le Privat Pri atheit eit/Pri Privat atsph phäre äre' in the he German Ge man n corp corpus. c us. We need ed to o inves i vestig tigate ate furt furthe her whet w ether er the concep co ceptt has has not ot orr rarel arelyy bbeen een cu curre rrentt in i G Germ erman an.. The The frequ f equen ency cy of of uusee of o 'm men ental tal privacy pri acy' reaches reac es a peak peak in 198 981 1 for f r bboth oth th the E Engl glish sh and nd Amer American can Engli E glish h corp orpus us. A differe diffference ncee is that in the th A Ameerican erica co corp rpuss there there re is a firs irst consi consider erable in incre crease rease in the he usee of the concep co ceptt in i 1837 1837 wh which ch is far ar less less pro pronou nounced nced d in i the the E English Engl sh corp corpus us. Colloc Co locates ates of o Pri acy cy (1): (1): Pr Priva vacyy and nd * – Priva Privathei it/Priv rivats atsphä phäre re un und * riva theit/Pr For For Engli English, h, the shows ows that that – in descen descendin dingg oorde rder er of o freq frequ quenc ency – 'pr privacy vacy' has as he displa dis lay sh been bee een men mention ioned ostt often often in connect co nectio ction n with w 'se securi fol owed wed by 'co confid fident entiali ality,',' ed mo urity,',' follow 'fre freedo ,' 'co freedom, comfo quiet iet,' 'sseclu eclusio sion' and a 'ret retirem remen hile le tthee collocat col ocates tes 'se securi urity'' mfort,',' 'qu ent.' Whi and 'co confi nfide dential ity' have h ve str strong ngly ly increas inccreased ed in thei ir circula circ lation ion from fr m about ab ut 19 1991 91 tiality their (securi (seecurity) ty) and an 11989 89 (confi (confiden ential tiality) ity) onwa onwards ds, the th co llocate ate 'freed freed edom om' hadd oonce nce been een collo Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) 24 DANIELA WAWRA, Passau very frequent and reached a peak in about 1805. Afterwards – as a tendency – it decreased in usage. Since around the early 1960s, its frequency has increased again but it has never again been nearly as high as at the beginning of the 19th century. In the American English corpus, 'security,' 'confidentiality' and 'freedom' are also the most frequent collocates of 'privacy.' The fact that 'security' is the strongest collocate of 'privacy' reinforces the hypothesis and the impression of the qualitative discourse analysis above that the fear of terrorism is still prominent, specifically in the USA. Also, in comparison, there are no displays at all for the corresponding German phrase 'Privatheit und Sicherheit' and the frequency of 'Privatsphäre und Sicherheit' is considerably lower than 'privacy and security' in the English corpora. The collocation has been in use in the English corpora since 1804 (E) and 1808 (AE), has constantly increased in frequency since around the 1970s and reached a first peak in 1982 in both English corpora. Therefore, it seems to have been an important issue already before 9/11. The first occurrence and following strong increase of the concept happened much later in the German corpus, only in 1992. For both the German 'Privatsphäre' and 'Privatheit,' the most frequent collocate has been 'Öffentlichkeit,' increasing in frequency since around the mid-1950s and reaching a temporary peak around 2000. All other collocates are considerably less frequent – in the order of decreasing frequency they are 'Intimität,' 'Politik,' 'Individualität,' 'Subjektivität,' 'Autonomie' (for 'Privatheit') and 'Ämter' (for 'Privatsphäre'). 'Publicity,' 'transborder,' 'autonomy' have also been noteworthy collocates for the American English corpus. 'Privacy and autonomy' and the corresponding German phrase have increased considerably in frequency since the 1960s in all corpora. Since then, the concept has been circulating much more in English than in German. Another noteworthy difference between the English and the German corpora is that 'freedom' occurs as a collocate in English, but not in German, while 'individuality' ('Individualität'), and 'subjectivity' (Subjektivität) occur in the German as well as in both English corpora. However, they are not among the most frequent collocates in the English corpora and occur on a lower scale there, which is comparable to that of the German display. Therefore, 'individuality' and 'subjectivity' are not displayed as collocates of privacy in the graphs for the English corpora. As for the collocates 'privacy and freedom' and 'privacy and civil liberty/liberties,' which have been prominent in the selected current US-American web discourses, we can observe that while they have been used more frequently in the English corpora over the course of time, the corresponding collocates 'Privatheit/-sphäre und Freiheit(srecht/e)' are so rare in the German corpus that there is no display for them in the Ngram viewer. The connection made between 'privacy' and 'freedom' and 'privacy' and 'civil liberty' thus seems to be a typical cognitive pattern of the English language. The link between 'privacy' and 'freedom' strongly increased between 1960 and 1980 in both corpora. The frequency reached a peak in 1980 (E) and 1976 (AE). Thinking 'privacy' and 'civil liberty' together reached a first peak at the end of the 1970s/beginning of the 1980s and has increased steeply since around 2000. Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) PRIV IVACY CY IN N TIM MES OF O DIGIT GITAL COMM MMUN UNICA CATIO ION AND AN DATA TA MINING I NG 25 Colloc Co locates ates of o P Priva ivacy cy (2): Inv Inva vasion ion/Vi /Viola olatio tion/B /Brea each ch off Pri rivac acy – Invas In asion on/ VerV rletzun let etzung/E /Eindr indrin ndringen ngen in in die/ die/de /der er Priv P ivathe theit/ eit/Pri Privat atsph phäre häre 'Invasi 'Invasion n of pri privac acy'' iis used u ed consi co sidera erably more m re oft often en in the En Engli glishh cor orpor ora th than an 'violat 'violation' on' or 'brea 'breach ch'' of o pri priva vacy. acy. Inn Ger ermaan,, tther ere are no n hits h ts att aalll for for 'In 'Inva vasio asion'' aand nd 'Bruch 'Br ch dder er Priv Privath atheit/ eit/Pri Privats atsphä phäre. re.'' Here, Heere,, 'V 'Ver erletzu etzung ng'' is i uused ed more more often often th than an 'Eindri 'Ei dringen ngen en inn ddie ie Privat Pri atsph phäre.' äre.'.' U Unti ntil now, now, 'Invas 'In vasion on of pri priva vacy' acy' was used u ed m most st frequen fre uently tly in 19 1994 4 in i both b th En Englis lish co corp rpora, ra,, 'V 'Verle erletzu tzung g der er …' an and 'Eind 'E 'Eindring ringen en in die ie Privat Pri atsph phäre' äre' reach reache hed ed a peakk iin 2008 2 08. 'Inva 'Invasion ion of pri privac acy'' is i us usedd mu uchh m mor ore re o often ten in the Engli English h corp orpora ora ra than th n eve ven en 'Verl 'Verletz ' etzung zung der er Priv Pr Privatsp tsphä häre' e' and nd 'Eindri 'Eindring ingen en inn die ie Privat Pri atsph phäre' äre' tak taken en togeth tog ther. r. If you you compa compare re the he freq freque uenci ciess ooff uuse se for or 'violat 'violation on of privacy pri acy,',' 'breach 'breach of of priv privacy cy' and and the heirr more more re or o r lless ess direct di ect Ge Germ erman an transl translatio tions ns 'Verle 'Veerletzun zung g der der P Privats Privatsphär häre' e' and 'Ei 'Eindr dring ngenn iin die d eP Privatsp Privatsphäre' häre'' ((the heree are re no no hits h ts for or 'Missac 'M ssacht htung ng/Bru Bruch ch/Ve Verst rstoßß dder/ er/geg egen en die ie Pri Privath atheit eit/-sp sphär häre'), '), the thesee phras hrases ses are ar us used ed more more often often in Germ rman n than than 'vi 'viola lation ion'' and and 'breach 'breach' ach' in English English. locates ates of o P ivacy cy (3): Pr Priv rivacy cy prote pr tectio tion/pr /protecti ction on of pri privac acy/p y/prot rotect ction on of Coolloc Priv protect thee pri priva vate te sph sphere re – Schu S hutz tz de der er Pr Priva vathe theit/P t/Privvats phäre tsphä In bot oth th thee Eng En ish and an th thee Ge Ger erman an corpo corpora, ra, privacy pri acy prote ctionn has as co consis sisten ently ly innEnglish protecti creased cre reased as a top di cussio sion n since sincee around aroun a d tthee 196 0s. It reac reaches es a peak peak in the En Enggopicc of o discu 960s. lish co corpo rpora ra in in 22004 04 and an in th thee Germ G rman an corp corpus pus in 200 5. Itt is i th the on only ly item tem that that occu ccurs rs 2005. Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) 26 DANIE A ELA WAW WRA, Pas assau con consid siderab rably ly more mo e frequen freq ently tly in the the Germa Ger erman co corp rpus. s. Thi This could co uld indica in icate te tthat at – at a lea least st unt untill 2008 2008 – pri rivacy acy has h s bbeen een more m re of a topic topic – and theref th refore ore worry worry? – in German Ge many ny tha than in th thee USA US USA, A, stat statist atistica cally ly spea speakin king. Co Colloc locates ates of o Priva ivacy cy (4): ( 4): PostPo -Pri rivac acy and and Pri rivac acy is Dead De ad/ Thee End End d of P Pri rivacy acy 'Po Post(--)pri privac vacy' occu ccurs rs nei neithe her er in the Ge Germ erman an nor or in the Engli En glish h corpora. corp c ora.. On Onee has as to be carefu car ful not not to co coun unt 'ppost st privac pri acy' iin phras p rasess like "tto post p st privac pri acy ppolic licies ies [on a we web site] s te]." Ass the the ppost ost-pri privac vacy acy mov movem ement ent start started rted ar aroun und d the t e yyea earr 2009 2009 acc accord cording ng to wik ikipe pedia ia (Anon (Anon 20 2014) 4) an and at the en end ooff 2007 2007 at the t e earl arlies est (c (cf.. Deyh Deeyhle 201 013,, 2), 2), iit is not surpri su prisin ing that that the he con conce ceptt does does no nott occu o curr in the th G Goog ogle le cor orpus us un until il 2200 008.. W We wil will have have to che checkk iin a few w years yeears wh wheth ether er the t e con oncep cept has as reach re ached ed a circu c rculat lation on tha hat is hig high enou e ough gh to ren ender er a ddisp isplay lay in the Ngram N ram view viewer. wer. The Th pphra rasee 'priv rivacy acy is de dead'' occ occurs ccurs fo forr the the first fi st tim ime in 1983 1983 in the he Englis En lish corp c rpuss oonly nly,, i.i.e.. not not inn tthee Am meric rican an En Englis lish oone,,12 hittin h tting g a hi high h in i 1999 1 99.. Th Ther ere aree no n hits h ts for or eith ither er the he orig rigina inal al Engl E glish sh ver versio ion or or fforr pos ossib iblee ttran anslat lation ons ooff the t e Eng nglis lish pphra hrasee 'pri privac acy is i ddead ead' (die (die 'Pr Privat vathei eit/Pr /Priva vatsph sphäre äre is ist tot/g tot/gesto storbe rben,' 'To Tod d der er Priva Privathei heit/Pr /Priva rivatsph sphäre äre')) inn the he Ge Germa ermann co corp rpus. s. '[T [The] e] end of privacy pr vacy' iss used u ed in the he En Englis lishh ccorp rpuss for for the he first first ti timee in 184 843 an and increas creas ased ed steep s eply ly in freq requen uency cy fro from 1992 1992 un until 200 8. The he fir firstt ooccu ccurren rence ce 008. Th in the he Am Ameri erican an En Engli glishh co orpus us iss iin 11885 85,, a peak peak occu occurri rringg in n 2001 andd ssinc nce th then en corp 2001 an the frequ frequenc ency has h s dim iminis hed again again.. Th Ther ere are re no no hits its for fo '[t e] end en oof th thee priv private te nished [the] sph sphere ere' in either ei her of the E glish h corp orpor ora. ra. Das Da 'En Ende de de der er Privat Pr ivathe heit/Pr t/Privat ivatsp tsphär äre' (the (the Engl 'en end of o priva privacy/ y/off the he pri atee sphere sphere')) occurs occurs for for the th fi first rst tim 1 73,, reaching rreac ing a private timee in i 1973 pea peak eak in i 2 2002 02. T Thee phr hrase Endee der er Pri Privat athei eit' (the (t e 'end nd of pri acy')) reach eache hed a mu much ch se 'En privac hig higher er freq freque uency ency in the G rman an corp orpus us in 200 002 2 than t an ha has ever ev er bee een reach reached ed in n any ny of Germ the Englis En lish h corp c rpora. ra. 5.3 5.3. Emp Empiric ricall Stu Study y (2): ( 2): Googl Adv dvanc nced d Sea earch rch Go gle A In this his sect sectio ction n the t e results resu lts of the em empir pirica rical st study dy (1 (1) ab abov ove w will bee part artly ly tes ested ed by run runnin ning a Goog Googlee A Adv dvance ed Sea earch rch.. Th This is wil will reinf reeinforc rce or o w aken en con onclu lusion ionss tthat at nced weak 12 The here is noo disp isplayy forr 'priva 'p ivacy y is dead' de d' iin th the Ameri American n Engl nglishh corp rpuss aas the phr hrase se occu ccurs rs not n tm more re tha han 40 tim imes (s (see aabove ve) the e. there. Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) PRIVACY IN TIMES OF DIGITAL COMMUNICATION AND DATA MINING 27 were drawn from the above findings. It is important to note, however, that web searches do not comprise all existing web pages13 and that the number of search string matches reported by search engines is only an estimate. For example, Google will only calculate the actual number of matches once the user navigates through all result pages, to the last one, and even then it places restrictions on the figure. […] For search terms that return many results, Google uses a process that eliminates results which are "very similar" to other results listed, both by disregarding pages with substantially similar content and by limiting the number of pages that can be returned from any given domain. […] Further, Google's list of distinct results is constructed by first selecting the top 1000 results and then eliminating duplicates without replacements. Hence the list of distinct results will always contain fewer than 1000 results regardless of how many webpages actually matched the search terms. (Anon 2014, "Wikipedia: Search Engine Test") With Google Advanced Search it is possible to look for exact phrases and to restrict the search to English documents from the USA only and to German documents from Germany only, respectively. This procedure was included in the following analysis. As Google's estimate can change and is more reliable when navigating "through all result pages" (see quotation above), this was done and differing estimates were noted in the tables below. Calculations were always made with the second, more reliable number in the respective columns. It should be noted as well that the World Wide Web is only 25 years old (cf. Owen 2014) and that it is likely that it is prejudiced towards rather current and recent documents, although it does not exclude older books and documents that were scanned, for example. In a way, empirical study (2) complements empirical study (1) in that the database includes documents about privacy issues from 2008 up to the present day, which are not part of the Google Ngram corpus that was used in empirical study (1). We ran Google Advanced Searches with the settings 'English' and 'German,' 'all regions' to be able to generally compare frequencies for items of the English and German language. The numbers resulting from these queries are followed by the abbreviations '(E)' and '(G),' respectively. We also ran Google Advanced Searches with the settings 'English' and '[region] USA' (indicated as '(E/USA)'), as well as 'German' and '[region] Germany' '(G/G).' This rendered results that included websites in English or German only and which were published in the USA or Germany, respectively. With this setting we hope to be able to make estimates about the circulation of certain concepts in the USA and Germany. You also have to take into account, however, that there are more web pages in English and from the United States than German web pages and sites from Germany. Based on an empirical study that included about 2 billion webpages, Ebbertz (2002)14 states that 56.4% of the contents of webpages are in 13 "Many, probably most, of the publicly available web pages in existence are not indexed. Each search engine captures a different percentage of the total. Nobody can tell exactly what portion is captured. The estimated size of the World Wide Web is at least 11.5 billion pages, but a much deeper (and larger) Web, estimated at over 3 trillion pages, exists within databases whose contents the search engines do not index. […] Google, as all search engines should, follows the robots.txt protocol and can be blocked by sites that do not wish their content to be indexed or cached by Google" (Anon 2014, "Wikipedia: Search Engine Test"). 14 http://www.netz-tipp.de/sprachen.html and http://www.netz-tipp.de/languages.html (7 March 2014). Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) 28 DANIELA WAWRA, Passau English, 7.7% in German. According to W3Techs,15 the daily updated English contents make up 55.6% of all webpages and German contents 6% on March 7, 2014.16 What would be estimates for the percentage of web pages from the USA and Germany, respectively? This is not easy to say, and it is difficult to find such an estimate. Pingdom (2012) comes to the conclusion – based on an empirical study – that about 43% ("of the world's top 1 million websites") "are hosted in the United States" and about 8% – the second largest number – are hosted in Germany. This is the best estimate we could find and it seems plausible in comparison to the percentages given for websites written in English or German (see above). The respective relative numbers for all calculations are given in the fourth column in the tables below. Finally, it has to be considered that the numbers given in the tables below are based on Advanced Google Searches that were (re-)run from mid-April until mid-May 2014. As new websites are constantly added, the results can vary when the analyses are repeated later. All of this generally shows how careful one must be when using the Google corpora. The basic problem is that we would need much more and more precise data about the make-up of the web corpora Google draws upon and also better possibilities to check the correctness of Google's claims. The Google Advanced Search for the exact phrases 'invasion of privacy,' 'Invasion der Privatheit,' 'Invasion der Privatsphäre' as well as 'violation of privacy,' 'Verletzung der Privatheit,' 'Verletzung der Privatsphäre,' renders the following absolute results which are displayed in the second and third column of the following table. Results for Google Advanced Search phrase invasion of privacy (E) invasion of privacy (E/USA) Invasion der Privatheit Invasion des Privaten (G) Invasion des Privaten (G/G) Invasion der Privatsphäre (G) Invasion der Privatsphäre (G/G) Eindringen in die Privatheit (G) Eindringen in die Privatheit (G/G) Eindringen in die Privatsphäre (G) Eindringen in die Privatsphäre (G/G) absolute number of hits absolute number of hits for German phrases added up 9,080,000/16,000,000 3,290,000/-18 9,910/9,990 3,400/-21 19.100/10.300/-22 293/299 286 84,800/84,300 56,200/56,900 relative hits17 number of 287,77019 76,51220 29,090 13,710 4,848 1,714 84,599 57,186 14,100 7,148 15 "W3Techs is a division of Q-Success Web-based Services. The goal is to collect information about the usage of various types of technologies used for building and running websites, and to produce and publish surveys that give insights into that subject. […] [the] company has no affiliation with any of the technology providers, which […] [they] cover in […] [their] surveys" (W3Techs 2014, http://w3techs.com /about). 16 http://w3techs.com/technologies/overview/content_language/all. 17 Here, one needs to take into account that English contents make up 55.6% of all webpages and German contents 6% and that websites from the USA have a share of about 43% of all websites and those from Germany about 8%. So, for example, the absolute number of hits for "invasion of privacy" is divided by 55.6 to get the relative number of hits for this phrase in the English corpus. 18 When I ran the search again a few days later, the search led to different numbers: 3,440,000/7,220,000. This underlines the above cited criticism that the results can vary considerably when there is a large number of hits. In the context of this study I decided to go with the most conservative estimate and thus the lowest number. It still shows that "invasion of privacy" is a phrase that is much more used in (American) English than in German. 19 16,000,000 : 55.6 = 287,769.78 ≈ 287,770 20 3,290,000 : 43 = 76,511.627 ≈ 76,512 21 When I checked the count a few days later, I got 3,330/3,450, which demonstrates that the discrepancy is not that pronounced for a lower number of hits (see criticism above). 22 Here, the count was the same a few days after the first one. Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) 29 PRIVACY IN TIMES OF DIGITAL COMMUNICATION AND DATA MINING The quantitative analysis shows that the metaphor of 'invasion' is used almost 60 times more often in English than in German in connection with privacy and about 45 times more often on websites from the USA than from Germany. This suggests that it is a more widespread metaphor and pattern of thought in English and in the USA than in German and in Germany. This is consistent with the Google Ngram analysis presented above. The German term 'Eindringen' lacks the military meaning component of 'Invasion' (cf. for example duden.de 2013) and can be used neutrally in the sense of "sich einen Weg bahnend in etwas dringen, hineingelangen" (duden.de 2013), as well as in the senses of forcing one's access to something one is not permitted to enter and of threatening somebody as a means to that end (cf. duden.de 2013). Still, 'Eindringen' is used about 20 times less often than English 'invasion.' Therefore, it seems that in German the thought pattern of a lower scale intruding into private space is more widespread according to the web analysis than the mental image of a large scale military operation. Taken together, German 'Invasion' and 'Eindringen' are used 15 times less often than the English term 'invasion.' With these results, the hypothesis can be formulated that in English intrusions into privacy are typically regarded as planned and coordinated actions. An 'invasion' in the literal sense of the word is commanded by a government, and therefore one could assume that the government is regarded as the main enemy in this respect, that might be most likely to 'invade' your privacy. In German, this fear seems to be much less widespread. 'Eindringen' is the more frequent collocation of 'Privatheit' in comparison to German 'Invasion.' This suggests that here any individual and any group could be intruding into your private space. Let us now take a closer look at the usage of 'violations' and 'Verletzungen' of privacy in English and German. The frequency of use for 'violation of (the) privacy/private sphere' in the English corpus without regional specification is about 24 times higher than the frequency for the corresponding German phrase 'Verletzung der Privatheit/-sphäre' in the regionally unspecified German corpus. The frequency of occurrence is about 12 times higher in the US corpus than in the German corpus which is restricted to websites from Germany. Results for Google Advanced Search phrase violation23 of privacy (E) violation of privacy (E/USA) violation of the private sphere (E) violation of the private sphere (E/USA) Verletzung der Privatheit (G) Verletzung der Privatheit (G/G) Verletzung der Privatsphäre (G) Verletzung der Privatsphäre (G/G) absolute number of hits 3,680,000/15,300,300 2,310,000/2,310,000 880,000/610,000 592,000/605,000 792/788 435/1,19024 438,000/70,500 283,000/44,100 number of hits for English, respectively German phrases added up relative number of hits 15,910,300 2,915,000 286,157 67,791 70,935 45,290 11,823 5,661 Other English and German expressions that can render the idea of a violation of privacy are 'breach of privacy' and 'Missachtung/Bruch/Verstoß der Privatheit/-sphäre' in German. Interpreting the table below, 'breach of privacy' is used three times as frequently in English as 'Missachtung/Bruch/Verstoß der Privatheit/-sphäre' in German. It is used 16 times more frequently in the US corpus than in the German corpus. 23 In German also "Verstoß" and "Bruch" (see following table). 24 I will use the first calculation here (435) as it seems to be more consistent with the results for the German corpus without regional specification. Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) 30 DANIELA WAWRA, Passau Results for Google Advanced Search phrase breach of privacy25 (E) breach of privacy (E/USA) breach of the private sphere (E) breach of the private sphere (E/USA) Missachtung26 der Privatheit (G; G/G) Missachtung der Privatsphäre (G) Missachtung der Privatsphäre (G/G) Bruch der Privatheit (G; G/G) Bruch der Privatsphäre (G) Bruch der Privatsphäre (G/G) Verstoß gegen die Privatheit (G; G/G) Verstoß gegen die Privatsphäre (G) Verstoß gegen die Privatsphäre (G/G) absolute number of hits 3,380,000/3,400,000 1,830,000/1,450,000 7/7/- number of hits for English, respectively German phrases added up relative number of hits 3,400,007 1,450,007 61,151 181,251 115,316 89,416 19,219 11,177 7/37,300/32,600 21,900/2/22,300/19,000 12,300/12,000 7/63,700/56,200/55,500 Adding up the results for 'breach' and 'violation' of privacy/the private sphere (= 347,308) and the corresponding German phrases (= 31,042), we can deduce that the violation of privacy is a topic that has a circulation about 11 times stronger in English than in German. When we examine the regionally specified corpora we can see that the frequency in the US corpus (= 249,042) is almost 15 times higher than in the corpus based on the webpages from Germany (= 16,838). How current, then, is the concept of the protection of privacy in German and English? Results for Google Advanced Search phrase protection of privacy (E) protection of privacy (E/USA) protection of the private (E) protection of the private (E/USA) protection of the private sphere (E) protection of the private sphere (E/USA) Privatheitsschutz (G) Privatheitsschutz (G/G) Schutz der Privatheit (G) Schutz der Privatheit (G/G) Schutz des Privaten (G) Schutz des Privaten (G/G) Schutz der Privatsphäre (G) Schutz der Privatsphäre (G/G) absolute number of hits 951,000/948,000 251,000/88,000,000/139,000,000 38,100,000/42,900,000 1,770,000/1,880,000 1,050,000/- number of hits for English, respectively German phrases added up 7,430/7,030/6,970 38,800/27,900/27,600 60,800/57,200 272,000/-27 5,450,000/673,000 2,950,000/296,000 relative number of hits 141,828,000 44,201,000 2,550,863 1,027,930 991,230 602,570 165,205 75,321 The results rendered in the table above demonstrate clearly that the protection of privacy is debated about 15 times more in the English discourses than in the German ones and about 14 times more on the US websites than on those from Germany. This contradicts the results of the Ngram analyses above, where the topic 'protection of privacy' occurred more in the German than in the English corpora. This was interpreted as an indicator that – "at least until 2008 – privacy has been more of a topic – and therefore worry? – in Germany rather than in the USA, statistically speaking" (see page 26 of this article). The results of the Google Advanced Search might indicate 25 I did not include the searches for "breach of the private" and the corresponding German phrases as they also render hits where "private" is used as an adjective like in the hits "breach of the private pleasure warranty/agreement/tracker community's trust" or in the German phrases "Missachtung des privaten Rundfunks/Parkplatzes/Überfahrtrechts." 26 The hits include the spelling "Mißachtung." 27 The number of hits for the German corpus with regional specification to Germany exceeds the one of the German corpus without regional specification. This, however, cannot be, so there must be some fault in Google's calculation (which always remained the same when repeated). I will therefore go with the number 272,000 (G/G) as it seems more plausible. Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) 31 PRIVACY IN TIMES OF DIGITAL COMMUNICATION AND DATA MINING that this has changed and that privacy protection has recently been debated more in the USA than in Germany. Results for Google Advanced Search phrase post-privacy (E) post-privacy (E/USA) post-privacy (G) post-privacy (G/G) absolute number of hits 235,000/221,000 272,000/187,000 relative number of hits 3,975 4,349 24,700/23,400 16,100/15,800 3,900 1,975 The relative number of hits for 'post-privacy' is almost identical for the German and English corpus and it is two times more frequent in the US corpus in comparison to the corpus with the websites from Germany. This contradicts Deyhle's (2013) claim that it is mainly a German phenomenon. A general caveat is in order at the end of this section with regard to the results of the Google Advanced Search and the conclusions that can be drawn from it. The data can give an impression of the statistical distribution of privacy collocations on webpages in English and German and on webpages from the USA and Germany. They allow for first conclusions about the cognitive framing of the concept of privacy in English and German and about the circulation of certain concepts. The results suggest, statistically speaking, that Germans and US Americans differ in their perspectives on privacy in certain aspects. However, we cannot draw conclusions as to whether or not the results of the web analyses are representative for the discourses taking place in German and US society, respectively. 6. Conclusion and Outlook: The Value of Privacy Summarizing the major results of this study, first of all the study of the semantic historical development of privacy has shown that it has developed from a concept that lacks something – i.e. the public dimension – to a right that is worthwhile protecting and defending against intrusion, observation and disturbance by individuals, groups, society and the government. The metaphors used in the privacy discourses reflect our thinking on the topic and they depict privacy – in both the German and US-American discourses – as something valuable and mainly as a protected space with boundaries that can be violated and must be protected. The post-privacy movement sees this differently and does not think that privacy needs protection. The Google Ngram analysis, which renders frequencies for the occurrence of items in the Google corpora between 1800 and 2008, showed that 'privacy' and 'private sphere' have been used more often in the English corpora than 'Privatsphäre' (which is the more common term in German) and 'Privatheit' in the German corpus. 'Private' has also been used more often in the English corpora in comparison to German 'privat.' This could be interpreted as an indicator that the topic has always been more prominent in US-American society. We would need further research to support this claim. 'Informational privacy' is considerably more frequent in both English corpora than 'decisional privacy.' This contradicts Rössler (2001), who claimed that decisional privacy is more important in the USA than informational privacy. Again, we must emphasize that further evidence is needed to substantiate this hypothesis, as pure quantity of occurrence in Google Books is not yet a sure sign of how important each Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) 32 DANIELA WAWRA, Passau kind of privacy is for a society. 'Decisional privacy' does not occur28 in the German corpus or it is at least so rare that it is not displayed in the Ngram viewer. This is consistent with Rössler's (2001) assumption that it has not been very important for Germans. In addition, a noteworthy difference is that we get the first hit for 'informational privacy' in 1943 in the English corpora, but only in 1990 for the German corpus. So at least the technical term seems to have been in use much earlier in the USA. Also, the concept of 'mental privacy' does not occur in the German corpus. The most frequent collocates of 'privacy' are 'security,' 'confidentiality' and 'freedom' in both English corpora. The collocate 'security' has strongly increased in its circulation from around 1991 onwards. Therefore, privacy and security have not just been a prominent issue in the USA since 9/11 (2001) but already earlier. However, the findings do not contradict the claim concerning the discourses presented in the qualitative analysis that the fear of terrorism has been especially strong in the USA since 9/11. 'Privatheit und Sicherheit' does not occur in the German corpus (or is so rare that it is not displayed in the Ngram viewer, see above). 'Freedom' and 'civil liberty,' which are also collocates in the English corpora, do not occur in the German corpus either. For German, the most frequent collocate of privacy ('Privatheit/ Privatsphäre') is 'Öffentlichkeit.' Both the empirical analyses (1) and (2) show that 'invasion' is used considerably more often in the English corpora than the corresponding German 'Invasion' in the German corpus. This indicates that breaches of privacy are conceived of as planned and coordinated large-scale military operations in English, which makes the government the first suspect when it comes to violations of privacy. In contrast, the German terms that are used instead of 'Invasion' in connection with privacy suggest less aggressive trespasses on a minor scale. Also, all the German collocates for the idea of an invasion or violation of privacy taken together are less frequent than the English collocates (in both corpora). This is also a consistent result of both empirical analyses (1) and (2). The Ngram analysis in contrast shows a higher frequency of the concept protection of privacy in German until 2008, whereas the Advanced Search renders a higher frequency for the English corpora. It might be that the Advanced Search corpus is biased towards more recent documents and that this could be interpreted as an indicator that the protection of privacy has become a more important topic in the USA in recent years. In a follow-up study, the dates that attracted attention in the Google Ngram analysis should be looked into, i.e. the dates that marked first occurrences or the beginnings of steep increases of item frequencies. It was repeatedly noted, for example, that privacy and some related concepts increased considerably in the corpora from the 1960s onwards. In the late 1960s, "[t]he first laws that addressed the protection of personal data were adopted in the United States […]. Since that time, the Federal government and the States have passed hundreds of laws and regulations that pertain to the collection, processing, sharing or use of personal data" (IT Law Group 2013). When former Vice President Lyndon B. Johnson became president in 1963, he "pressed for civil rights legislation" (Anon 2014, "1960s"). In Germany, in Hesse, the first data protection law worldwide was passed in 1970. This had been preceded by societal debates – 28 When I write that something does not occur in one of the Ngram corpora it is shorthand for 'does not occur at all or is not displayed by the Ngram viewer as the frequency of the item equals or is below 40' (see above). Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) PRIVACY IN TIMES OF DIGITAL COMMUNICATION AND DATA MINING 33 mainly led by jurists – about the new possibilities for administrations and also the threats to privacy that the latest technological developments brought with them, i.e. computers that allowed for the storage of large amounts of data. Critics argued for the protection of personal data and – taking up debates from US society – pointed out the dangers of central data gathering and processing by the government and by companies. Furthermore, they already warned of the potential risks to the privacy of the individual and to democracy (cf. Berlinghoff 2013). Therefore, if the socio-historical context were investigated in more depth, we should find explanations for significant changes in frequencies of items. Let us conclude with some general reflections on the value of privacy in our times of digital communication and data mining. We should keep in mind – as Weinstein (2013) argues – that we are social beings and that discretion is a natural part of the social sphere. Privacy is an essential feature of personhood. It is valuable because we estimate autonomy as something precious, and because autonomy can only be achieved when we can protect our privacy. Kant, Mill, Locke – they all proclaim that individual civil rights and liberties are necessary to protect our modern idea of autonomy and freedom against governments and society (cf. Rössler 2001, 28). The negotiation of the borders of privacy is difficult where individual freedom is opposed to the collective good (cf. Rössler 2001, 28). In the case of data protection, it is the goal of fighting crime or terrorism. In the case of genome sequencing, it is the promise to be able to treat better, cure or even prevent diseases in the future. Whenever we communicate, we partly decide how much access we grant other people to our thoughts and feelings, partly we grant insights by accident or unconsciously. In the modern digital age it has become more difficult to protect our privacy. McLuhan has described media as extensions of the human senses or as extensions of our self (cf. McLuhan 1968, 13; 1997, 112). Language is a medium that allows for the communication of our most personal or private thoughts and feelings. Digital media allow for more and more far reaching extensions of our selves: We can easily communicate with hundreds and thousands of people globally using basic technological equipment. The problem of communicating things about us involuntarily has now increased as we have no control over automatic big data collection and analyses by internet companies. As Marwick states: "Big data is made up of 'little data,' and these little data may be deeply personal" (2014, 22). We have already partly lost control of to whom we communicate what, how, when and in what context. The more the medial extension grows – the more difficult it becomes to control what parts of our selves we communicate and what kind of image we establish. On the one hand, one could say that this might enhance authenticity. On the other hand, it can also lead to the opposite: We are in danger of monitoring and filtering much more than ever before to make sure that we do not communicate anything about us that might be perceived as negative or that might not be 'liked' by the majority. I think this is reflected, for example, in political correctness that has gained considerable momentum in our societies in recent years. So the more publicity you get, the greater the possibility that you will encounter criticism and the greater the possibility that you will censure yourself. Consequently, you present of yourself what you assume will be liked by (most) others. This can lead to a loss of autonomy and to conformity in thinking. I agree with Grehsin (2014), for example, who states that we cannot form an opinion Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) 34 DANIELA WAWRA, Passau without communication, which is then a prerequisite for a diversity of opinions and for the possibility to change one's opinion. This is true as long as we are free in our communication: Surveillance and control of communication can easily lead to selfrestriction and in the end to the end of the democratic state (cf. Grehsin 2014, 35). Galison (2014) cites Freud with his insight that political censorship leads to people censoring themselves before official censorship takes place. They avoid 'dangerous' statements. Even inner censorship can occur in that non-conformist thoughts are stopped or fought (Galison 2014, 9). It is thus consistent to conceive of data protection as protection of our individual personalities and as protection of our freedom of opinion (cf. also Thüsing 2013, 7). Gelernter (2013) warns us of an age of 'robotism' in which individuality and subjective perspectives are lost and a society based on 'mass psychology' prevails, in which we are 'dogs with iPhones' ["Hunde mit iPhones"]. He calls for a new subjective humanism (cf. Gelernter 2013, 34). Welzer warns us of an 'informational totalitarianism' that has already developed and led to the total transparency of individual existences, which is the prerequisite for gaining complete control over a person's behavior. He observes that a transformation of our social system has already taken place in that general rules of everyday life have changed as well as standards about acceptable social behavior. He sees freedom and self-determination as threatened, as well as the social preconditions for democracy. As a counter-strategy, he advocates the withdrawal of information. He suggests, for example, using neither the internet nor the telephone when you want to communicate something you consider to be important. In addition, he proposes that one might generate an abundance of meaningless information to make it more difficult or impossible to extract useful data (cf. Welzer 2014, 9). The first suggestion, however, seems to be unrealistic, especially for younger people, and the second proposal probably underestimates the technological means and possibilities. Personal information that is particularly sensitive is information that might hurt you as a person: religion, sexual orientation, political inclination, personal activities (what you buy, your finances, for example) (cf. Anon 2014, "privacy"), as well as health-related data. These data are considered by most to be particularly in need of protection, especially in professional contexts, as they can easily lead to discrimination, damages to people's reputation and embarrassment (Anon 2014, "privacy"). These areas of sensitive data seem to be quite consistent to me over time and space, historically and cross-culturally, while the degree of sensitiveness can vary considerably. Lynch (2013) poses the question of how technologies change the way we think about the self. As I think that this self is more endangered than ever, a major question is whether it is still possible to exert individual control over access to personal thoughts and feelings without help from the market, the government, and legislation. If we want to regain control over our privacy we need to know at all times who has access to information and what the information is used for (cf. Weinstein 2013; see above). Schaar warned us already in 2007 that our information society must not become a surveillance society and that we must remain the subject, not the object of information (cf. Schaar 2007, 230). This is currently the major challenge we must confront. The boundaries of our informational self-determination – you could also say privacy – must be drawn. This is the responsibility of the government, the economy, and the individual. Data-gatherers must restrict themselves and take responsibility for the Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) PRIVACY IN TIMES OF DIGITAL COMMUNICATION AND DATA MINING 35 processing of the collected data (cf. Schaar 2007, 230f.) and they should make clear what the data are used for (cf. Bernau 2013, 17). In addition, it is necessary that each individual deals responsibly with their personal data. This includes being informed about the gathering of personal data and self-regulation, i.e. one should be careful to whom one gives which data. Data protection should be included in the planning, development and design of technical systems from the very beginning (cf. Schaar 2007, 231f.). Privacy protection should be a global concern like the protection of our environment as data cross national borders (cf. Schaar 2007, 237; see also Berg and Mausbach 2013, 7). Schaar wrote in 2007 that he hopes we do not need a privacy catastrophe (in analogy to the environmental catastrophe in Chernobyl) before this insight develops (cf. Schaar 2007, 237). The NSA scandal might be categorized as such a catastrophe and it has already led to new insights in the USA. It is to be hoped that further catastrophes can be prevented by a responsible handling of privacy. Works Cited Anon. "Big Data." 2014. webopedia. 23 April 2014 <http://www.webopedia.com>. —. "Big Data Analytics." 2014. webopedia. 23 April 2014 <http://www.webopedia.com>. —. "CeBIT." 19 March 2014. wikipedia. 25 April 2014 <http://en.wikipedia.org/ wiki/CeBIT>. —. "Culturomics." 22 May 2014. wikipedia. 23 May 2014 <http://en.wikipedia.org/ wiki/Culturomics>. —. "Data." 2014. thefreedictionary. 23 April 2014 <http://www.thefreedictionary. com>. —. "Data Mining." 2014. thefreedictionary. 23 April 2014 <http://www.thefreedic tionary.com>. —. "Data Mining." 2014. webopedia. 23 April 2014 <http://www.webopedia.com>. —. "Digital." 2014. thefreedictionary. 23 April 2014 <http://www.thefreedictionary. com>. —. "Digital Communication." 2014. thefreedictionary. 23 April 2014 <http://www. thefreedictionary.com>. —. "English Language & Usage Stack Exchange Meta." 2014. English Language & Usage Stack Exchange. 6 May 2014 <questions/2469/should-we-allow-googlengrams-to-be-presented-as-statistical-evidence-without-qu>. —. "Google Books Ngram Corpus and Viewer." Ed. Google. 2013. 18 June 2014 https://books.google.com/ngrams. —. "Much More than just a Numbers Game." 2014. CeBIT. 25 April 2014 <http://www. cebit.de/en/news-trends/trends/datability/index.xhtml>. —. "Post-Privacy." 14 February 2014. wikipedia. 18 May 2014. <http://de.wikipedia. org/wiki/Post-Privacy>. —. "Privacy." 16 May 2014. wikipedia. 24 January 2014. <http://en.wikipedia. org/ wiki/Privacy>. —. "Protecting Civil Liberties in the Digital Age." 2013. ACLU. 23 April 2014 <https://www.aclu.org/ protecting-civil-liberties-digital-age>. —. "Wikipedia: Search Engine Test." 15 April 2014. wikipedia. 15 May 2014. <http://en.wikipedia.org/wiki/ Wikipedia:Search_engine_test>. Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) 36 DANIELA WAWRA, Passau —. "1960s." 21 May 2014. wikipedia. 22 May 2014. <http://en.wikipedia.org /wiki/1960ies>. Berg, Manfred and Wilfried Mausbach. "Wie der Prinz in seinem Schloss?" Frankfurter Allgemeine Zeitung. 9 September 2013. 7. Berlinghoff, Marcel. "Computerisierung und Privatheit – Historische Perspektiven." 3 April 2013. Aus Politik und Zeitgeschichte (APUZ 15-16). 18 May 2014 <http://www.bpb.de/apuz/157542/computerisierung-und-privatheit-historischeperspektiven?p=all>. Bernau, Varinia. "Der Mensch in Zahlen." Süddeutsche Zeitung. 26 August 2013. 17. Bok, Sissela. Secrets: On the Ethics of Concealment and Revelation. New York, NY: Vintage, 1983. Bojaryn, Jan. "Das Paradox Privatsphäre." Frankfurter Allgemeine Zeitung. 10 December 2013. V1. Brandstetter, Barbara. "Behälter, Clubs, Kreise und verschiedene Geschwindigkeiten: Metaphern für die Konstruktion Europas." metaphorik.de 17 (2009): 7-25. Derman, Emanuel. "Ich möchte meine Freiheit zurück." Frankfurter Allgemeine Zeitung. 13 June 2013. 25. Deyhle, Ruben. "Post Privacy." 18 March 2013. sprachkonstrukt.de. 25 April 2014 <http://sprachkonstrukt.de/files/2012/08/deyhle_sicheresysteme_postprivacy.pdf>. Duden.de. Ed. Bibliographisches Institut. 2013. 18 April 2014 <http://www.duden. de>. Ebbertz, Martin. "Das Internet spricht Englisch … und neuerdings auch Deutsch: Sprachen und ihre Verbreitung im World-Wide-Web." 2002. Scribd. 23 April 2014 <http://de.scribd.com/doc/214373212/Ebbertz-M-2002-Das-Internet-SprichtEnglisch>. Eschkötter, Daniel. The Wire. Zürich: diaphanes, 2012. Galison, Peter. "Wir werden uns nicht mehr wiedererkennen." Frankfurter Allgemeine Zeitung. 8 April 2014. 9. Gelernter, David. "Der Robotismus als soziale Krankheit." Frankfurter Allgemeine Zeitung. 17 September 2013. 34. Goatly, Andrew. Washing the Brain: Metaphor and Hidden Ideology. Amsterdam: John Benjamins, 2007. Goldberg, Yoav and Jon Orwant. "A Dataset of Syntactic-ngrams over time from a very Large Corpus of English Books." 17 May 2014 <http://commondatastorage. googleapis.com/books/syntactic-ngrams/syntngrams. final.pdf>. Google, ed. Google Books Ngram Viewer. 2013. 15 May 2014. <https://books.google. com/ngrams>. Grehsin, Malte. "Deutschland hat angefangen." Frankfurter Allgemeine Zeitung. 4 January 2014. 35. Hannan, Daniel. "The Case for Swedish-Style Tax Transparency." 13 April 2012. The Telegraph. 5 April 2014 <http://blogs.telegraph.co.uk/news/danielhannan/100150 768/if-tax-transparency-turns-us-into-scandinavians-so-much-the-better/>. Heller, Christian. Post-Privacy: Prima leben ohne Privatsphäre. München: C.H. Beck, 2011. IT Law Group, ed. "Compliance with Privacy Laws." 2013. 18 May 2014. <http://www. itlawgroup.com/ compliance-with-privacy-laws>. Jarvis, Jeff. "The German Paradox: Privacy, Publicness, and Penises." 14 April 2010. re:publica. 28 January 2013 <http://re-publica.de/10/event-list/the-german-para dox/>. Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) PRIVACY IN TIMES OF DIGITAL COMMUNICATION AND DATA MINING 37 Lin, Yuri, Jean-Baptiste Michel, Erez Lieberman Aiden, Jon Orwant, Will Brockman, and Slav Petrov. "Syntactic Annotations for the Google Books Ngram Corpus." Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Jeju: Association for Computational Linguistics, 2012. 169-174. 17 June 2014 <http://www.petrovi.de/data/acl12b.pdf>. Lynch, Michael. "Privacy and the Threat to the Self." 22 June 2013. The New York Times Opinionator. 22 April 2014 <http://opinionator.blogs.nytimes.com/2013/06/ 22/ privacy-and-the-threat-to-the-self/?php=true&_type =blogs&_r=0>. Marwick, Alice. "How your Data are being Deeply Mined." The New York Review of Books 61.1 (2014): 22-24. McLuhan, Marshall. Die magischen Kanäle: Understanding media. Düsseldorf: Econ, 1968. —. Medien verstehen: Der McLuhan-Reader. Eds. Martin Baltes et al. Mannheim: Bollmann, 1997. Michel, Jean-Baptiste, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, William Brockman, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden. "Quantitative Analysis of Culture using Millions of Digitized Books. Science 331.6014 (2011): 176-182. New Shorter Oxford English Dictionary. Ed. Lesley Brown. Oxford: Clarendon, 1993. Online Etymology Dictionary. Ed. Douglas Harper. 2001-2014. 23 April 2014 <http://www.etymonline.com/index.php?allowed_in_frame=0&search=private&se archmode=none>. Orwant, Jon. "Ngram Viewer 2.0." 18 October 2012. googleresearch. 26 April 2014 <http://googleresearch. blogspot.de/2012/10/ngram-viewer-20.html>. Owen, Jonathan. "25 Years of the World Wide Web." 12 March 2014. The Independent. 15 May 2014 <http://www.independent.co.uk/life-style/gadgets-and-tech/ news/25years-of-the-world-wide-web-the-inventor-of-the-web-tim-bernerslee-explains-howit-all-began-9185040.html>. Oxford Dictionary of English Etymology. Ed. C.T. Onions. Oxford: Clarendon, 1966. Oxford English Dictionary. Ed. Robert Burchfield. Oxford: Clarendon, 1978. Oxford English Dictionary. Eds. J.A. Simpson and E.S.C. Weiner. 2nd ed. Oxford: Clarendon, 1989. Oxford English Dictionary. Ed. Angus Stevenson. 3rd ed. Oxford: Oxford University Press, 2010. Pingdom. "The US Hosts 43% of the World's Top 1 Million Websites." 2 July 2012. 15 May 2014 <http://royal.pingdom.com/2012/07/02/united-states-hosts-43-per cent-worlds-top-1-million-websites/>. Rössler, Beate. Der Wert des Privaten. Frankfurt am Main: Suhrkamp, 2001. Schaar, Peter. Das Ende der Privatsphäre: Der Weg in die Überwachungsgesellschaft. München: C. Bertelsmann Verlag, 2007. Seemann, Michael. "Vom Kontrollverlust zur Filtersouveränität." 6 April 2011. 28 January 2013 <http://carta.info/39625/vom-kontrollverlust-zur-filtersouveranitat/>. Simon, David. "We are shocked, shocked ... ." 7 June 2013. The Audacity of Despair: Collected prose, links and occasional venting from David Simon. 4 March 2014 <http://davidsimon.com/we-are-shocked-shocked/>. Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org) 38 DANIELA WAWRA, Passau Stokes, Bruce. "Trading Privacy for Security." 4 November 2013. foreignpolicy.com. 22 April 2014 <http://www.foreignpolicy.com/articles/2013/11/04/trading_ privacy_for_security>. Thüsing, Gregor. "Datenschutz als Persönlichkeitsschutz." Frankfurter Allgemeine Zeitung. 2 September 2013. 7. Warren, Samuel and Louis Brandeis. 1890. "The Right to Privacy." Philosophical Dimensions of Privacy: An Anthology. Ed. Ferdinand Schoeman. Cambridge, MA: Cambridge University Press, 1984. 75-103. Webster's Third New International Dictionary. Ed. Philip Babcock Gove. Springfield, MA: G. & C. Merriam Company, 1976. Weinstein, Mark. "Is Privacy Dead?" 24 April 2013. The Huffington Post. 24 April 2014 <http://www.huffingtonpost.com/mark-weinstein/internet-privacy_b_314045 7.html>. Welzer, Harald. "Wenn man etwas merkt, ist es zu spät." Frankfurter Allgemeine Zeitung. 23 April 2014. 9. Wiegand, Dorothee. "Data-was? Das CeBIT-Motto 'Datability' – ein Erklärungsversuch." 22 February 2014. c't magazin 6/14. 25 April 2014 <http://www.heise.de/ ct/heft/2014-6-Das-CeBIT-MottoDatability-ein-Erklaerungsversuch-2118710.html>. Zimmer, Ben. "Bigger, Better Google Ngrams: Brace Yourself for the Power of Grammar." 18 October 2012. The Atlantic. 23 April 2014 <http://www.theatlantic. com/technology/archive/2012/10/bigger-better-google-ngrams-brace-yourself-forthe-power-of-grammar/263487/>. Anglistik, Volume 25 (2014), Issue 2 © 2014 Universitätsverlag WINTER GmbH Heidelberg Powered by TCPDF (www.tcpdf.org)