Academia.eduAcademia.edu

Exploring Ancient Networks

2022, H2D|Revista de Humanidades Digitais

A small archive of texts from ancient Iraq is used to demonstrate an approach to network analysis in which traditional close reading and computational text analysis go hand-in-hand. The computational methods produce tables and graphs that link back to online editions of the primary material, enabling the user to check the results.

Exploring Ancient Networks Exploração de Redes Antigas https://doi.org/10.21814/h2d.3508 Niek Veldhuis, UC Berkeley, USA Como citar Veldhuis, N. (2021). Exploração de Redes Antigas. H2D|Revista De Humanidades Digitais, 3(1). https://doi.org/10.21814/h2d.3508 ISSN: 2184-562X Exploring Ancient Networks Exploração de Redes Antigas https://doi.org/10.21814/h2d.3508 Niek Veldhuis, UC Berkeley, United States of America Abstract A small archive of texts from ancient Iraq is used to demonstrate an approach to network analysis in which traditional close reading and computational text analysis go hand-in-hand. The computational methods produce tables and graphs that link back to online editions of the primary material, enabling the user to check the results. Keywords Network analysis; Natural Language Processing; Sumerian Resumo Um pequeno arquivo de textos provenientes do Iraque antigo é utilizado para demonstrar uma abordagem de análise em rede, em que a leitura atenta tradicional e a análise de texto informatizada andam de mãos dadas. Os métodos computacionais produzem tabelas e gráĄcos que remetem para as edições online das fontes primárias, permitindo ao utilizador veriĄcar os resultados. Palavras-chave Análise de rede; Processamento de linguagem natural; Sumério 1. Introduction In this article, I will discuss the possibilities of using interactive visualizations for exploring social networks, based on ancient Sumerian texts from the period between ca. 2.100 Ű 2.000 BCE (the so-called Ur III period). Building such networks allows students and researchers to quickly get an idea of the important actors in the archive (or text group) that is being studied and to move from studying a single text in detail, to a bird’s-eye view of the entire corpus1 . 1 Humanities research and teaching has traditionally put a lot of weight on carefully analyzing (or close reading) original texts and objects. Computational analysis (or distant reading, in Franco Moretti’s words) has the potential of displacing that emphasis and replacing it with conclusions that derive from analyses of large corpora that provide insights in broader patterns of an ancient culture. There are many reasons, however, to doubt such a scenario. First, ancient data sets are rarely Ślarge’ in the current sense of that adjective. Even if we muster all 100,000 Ur III documents (possibly the largest corpus in the ancient world), this still is no match for the magnitude of data used to produce GLOVE or https://fasttext.cc/FastTexthttps://fasttext.cc/ and similar models. Second, digital representations of ancient texts are likely to contain errors. Errors are, of course, everywhere and in all data sets, but in smaller data sets, such errors may gain more weight and the impossibility to track unlikely outcomes back to the source data is undesirable. Most importantly, however, computational approaches should not displace the current research practice, but rather support and extend it by providing tools and concepts that can be understood from within traditional Humanities research. Rather than distant reading, what I will argue for is an approach that enables going back and forth between bird’s-eye views in a visualization and access to the actual data (warts and all included) on which the visualization is based. Avoiding neural networks (which largely scramble the connection between data and result), I will use old-fashioned rule-based techniques to select nodes, assign roles, and create (directed) edges. The processes of assigning roles and creating edges are themselves accompanied by interactive displays that allow the user to check the validity of the results. Figure 1 is an example of a visualization of the Treasure Archive from PuzrišDagan (Paoletti, 2012). The names of the most important actors (those with the highest degree) are labeled. The coloring of the node represents its eigenvector centrality. The article will discuss how this visualization was built and how further analyses and visualizations may be employed to explore the network. 2. Building the Graph: Nodes, Roles, and Edges The graph is built by Ąrst acquiring the text corpus, Ąltering for proper nouns (the nodes), assigning a role to each node, and deĄning (directed) edges between the nodes. The data is freely available through the Open Richly Annotated Cuneiform Corpus (ORACC). The script that does the work is available as a Jupyter notebook in Chapter 4.1 of the Computational Assyriology (Compass) project2 . Before going into more details of this process, we will Ąrst brieĆy discuss the background of the Treasure Archive. 2 Figure 1: Treasure Archive Network 2.1. The Treasure Archive The Treasure Archive is a relatively small group (< 300 documents) of administrative texts that date to the 21st century BCE, dealing with valuable objects (made of metals and valuable stones), leather products (shoes, boots, etc.) and weapons. The archive was studied in detail by Paoletti (2012), a publication to which we will come back for interpreting the visualizations. The Ur III period (ca. 2100-2000 BCE) was only the second time in Babylonian history (present-day southern Iraq) when all the former city states were united under a single king. According to native historiography, this was the third time that kingship over Sumer and Akkad resided in the city of Ur Ű hence the modern moniker for the period: ŞUr III.Ť The period is known for its extraordinary number of texts (over 100,000), primarily administrative documents that, in large measure, come from a time span of only Ąfty years. One of the important Ąnd spots is the site of Drehem, ancient Puzriš-Dagan, which housed the royal administration of domestic animals. Some 15,000 administrative texts have been assigned to Puzriš-Dagan, distinguished by their use of a speciĄc calendar, by prosopographical connections, and by other aspects of text, spelling, writing conventions, subject matter, etc. The great majority of these texts (more than 99%) were looted from the site and reached museums and private collections all over the world through the antiquities market. All these texts are written in Sumerian in cuneiform writing on clay tablets; most of them have a date that may include day, month, and year (but not all date elements are present in 3 each document). Many tablets are small and may have no more than 6 to 10 lines; bigger, multi-column tablets, may represent monthly or yearly summaries. Damage to clay tablets is very common (in particular for larger ones) and presents challenges for their interpretation. The Treasure Archive is a small subset of the Ąnds from Puzriš-Dagan. Thanks to Paoletti’s very thorough study of all aspects of this archive, the results of our efforts may easily be compared to the results of more traditional methodologies. A somewhat random example of a text from this archive (AUCT 1, 502) is the following: 1. 10 ma-na kug-babbar -> 10 mina of silver (5kg) 2. niŋ-sa10-ma kug-sig17 10-ta-še3 -> purchase price for gold at a 10:1 ratio 3. i3-lal3-lum -> Ilalum 4. mu-kux(DU) -> delivered. 5. puzur4-er3-ra -> Puzur-Erra 6. šu ba-ti -> received. 7. šag4 puzur4-iš-d da-gan -> In Puzriš-Dagan. 8. iti še-sag11-kud -> Month 12. 9. mu en d Nanna ba-huŋ -> Year: the en priest of Nanna was installed. left edge: 10 ma-na -> 10 mina Figure 2: Obverse of AUCT 1 502 The text documents that Ilalum brought in 10 mina (or about 5 kg) of silver for buying 1 mina of gold and that this silver was received by Puzur-Erra. In the network graph this should result in two nodes (Ilalum and Puzur-Erra) with a directed edge Ilalum -> Puzur-Erra. Note that the year name (line 9) contains 4 the name of the god Nanna, but this god is not part of the transaction. Nanna, therefore, should not become part of the network. 2.2. Acquiring Data The documents that belong to the Treasure Archive are all available in three parallel databases: The Database of Neo-Sumerian Texts (http://bdtns.Ąl ol.csic.es/), the Cuneiform Digital Library Initiative (http://cdli.ucla.edu), and the electronic Pennsylvania Sumerian Dictionary (ePSD2) in the Open Richly Annotated Cuneiform Corpus (http://oracc.org) project. Each of these repositories provides open access to the data. For our purposes we will use the ePSD2 data, Ąrst, because in ePSD2 the texts are lemmatized and second, because the data can be acquired in JSON format, which is easier to process than the raw text data provided by BDTNS and CDLI. The ORACC JSON is quite complex in structure (see http://oracc.museum.upenn.edu/doc/opendata/json/Oracchttp: //oracc.museum.upenn.edu/doc/opendata/json/ JSON Data). Since the JSON structure is identical for all ORACC data (which covers all of cuneiform), the complexity of the JSON is a minor issue. A standard parser, developed by the present writer, can be used (and, where necessary, adapted) to represent the text in the desired units (signs, words, lines, sentences, or texts). The parser transforms the data into a proper Pandas data frame, selecting the relevant data elements from the JSON. Each word in the ORACC JSON data representation has a part-of-speech (POS) Ąeld. The POS Ąeld currently does not follow any particular international standard but is customized for the various cuneiform languages represented in ORACC. Proper Nouns have their own POS abbreviations, such as PN (Personal Name), RN (Royal Name), DN (Deity Name), etc. This data representation makes it straightforward to Ąlter for the human beings and gods who participated in the transactions documented in the Treasure Archive. Year names are marked explicitly in the JSON and may thus be excluded from consideration. The lemmatization, Ąnally, makes it possible to deal with the key words that indicate roles (such as recipient or intermediary) in documents of this time. 2.3. Name Role Activity Document Each name instance in the documents is considered an NRAD instance: a Name in a Role in an Activity in a Document. The NRAD model was developed by Patrick Schmitz and Laurie Pearce for the Berkeley Prosopography Services. The NRAD model draws attention to different aspects of a name instance. A name instance appears in a Document, which has metadata, such as a document ID number, a date and/or a provenance. An Activity (such as Selling, Receiving, or Issuing) implies a particular set of roles that are necessary or possible. The Activity may have its own set of attributes, for instance the goods that are being received, or issued, or the date of the transaction. Finally, the attributes of the Name may include spelling and normalization (if different spellings of the same 5 name are known) as well as attributes directly expressed in the text (profession, or familial relationships). The NRAD model was developed as an abstract way of representing a name instance and collecting the information needed for disambiguation. In the Berkeley Prosopography Services model, the attributes of Name, Role, Activity, and Document together provide the data for a probabilistic model of disambiguation in which one may assert that a person called Ur-Enlil (Name) who receives dead sheep (Role/Activity) in 2035 BCE (Document) is likely the same person called Ur-Enlil who receives dead lambs in 2033 BCE. In this article I will not go into the complex issues of disambiguation. The set of documents that we will use as an example comes from a small office that was active during a restricted period of time, and the issue of namesakes is of minor importance here. The NRAD model also helps, however, in developing a model for creating edges for a social network. We may distinguish between different types of Documents that represent different Activities. Within each Activity we expect a limited number of Roles, and edges are drawn between those Roles according to speciĄed patterns, for instance intermediary -> recipient (in the Delivery activity) but not recipient -> intermediary. 2.4. Name, Activity, and Role The Treasure Archive has three types of documents that are marked by key words towards the end of the text: Income, Expenditure, and Transfer. The essential difference between Expenditure and Transfer is that in the Ąrst the goods leave the Puzriš-Dagan organization, whereas in the latter the goods are transferred from one official (or office) to another. The Roles in a document are explicitly indicated by key words that appear either before or after the name. In the example above we saw mu-kux(DU) (marking the deliverer) and šu ba-ti (a two-word key phrase that marks the recipient). There are a good number of such key words, marking not only deliverer and recipient, but also intermediaries of various kinds, producers (of valuable goods), senders, and offerers. A Name may be followed immediately by a qualiĄer: a profession, such as Şgeneral,Ť or a familial relationship (PN1 son of PN2). In such cases the key word will follow the qualiĄers. Finally, there are participants who are not marked explicitly for a particular role. In Income texts those are deliverers, whereas in Expenditure texts those are recipients. Putting all these rules together, the script will scan the documents to assign appropriate roles. This process results in a Pandas data frame in which all name instances are listed, with the assigned role and attributes (the qualiĄers) and with a link that takes the user directly to the ePSD2 edition of the text, with the name instance highlighted. Widgets allow a user to see more or fewer rows, or to Ąlter for a particular text or for a particular role. This is important, because the set of rules that is applied is complex and prone to errors. Different archives within the Ur III corpus will need slightly different rules, and the interactive 6 presentation of the outcome of the process allows a user to Ąnd errors, adjust the code and reevaluate the result. Figure 3: Roles 2.5. Edges Edges and directionality are assigned by looking forward in the text. In an Expenditure text, when the script runs into a Deliverer, it will look for a Recipient or an Intermediary Ű whichever comes Ąrst. A set of rules, comparable to the rules that assign roles, is thus responsible for creating edges. As before, the edges are displayed in a Pandas data frame with links to the editions of the texts in which the edges appear, allowing the user to check the validity of the outcome and, where necessary, to adjust the rules. 2.6. Building the Graph Finally, the graph is built in the NetworkX package. An initial visualization of the graph is used, once again, for checking validity. A set of widgets is employed to create an interactive visualization that allows the user to select a single text and see the results for that one document, or to choose a single node and see its Ego network. The example shows the directed Ego network of Abumbašti. The button ŞOpen Edition in ORACCŤ creates a link to the seven documents that attest this Ego network. Once the user is satisĄed that the nodes and edges in the network accurately represent the corpus, the graph of the Treasure Archive has been created. It is important to keep in mind that the graph is a mathematical object that consists of nodes and edges and that does not coincide with any of its visualizations. For Humanities scholars, a common step at this point in the process is to import the 7 Figure 4: Edges Figure 5: Ego 8 data in Gephi, a powerful open-source visualization program for exploring and manipulating networks. Gephi has been around for a long time, has a large user base, and creates attractive and Ćexible visualizations by means of a graphical user interface. Although Gephi has many strong sides and has facilities for computing numerous attributes of the nodes, the edges, and the graph as a whole, the graphical user interface makes it impossible to create a reproducible workĆow. In the next section, therefore, we will discuss using Python packages (in particular NetworkX and hvPlot) for exploring and visualizing the graph. 3. Exploring the Graph 3.1. A Static Visualization Once the graph is built, it can be explored in a variety of ways. Figure 1 represents a visualization of the graph that labels the nodes with the highest degree using a spring layout3 . Node color indicates eigenvector centrality, ranging from yellow (high) to blue (low). The labeled nodes include AmarSuen, the king, Abisimti, the queen, and AradNannak, the prime minister. The most central Ągure in the graph is Ludiŋirak, who was in charge of the goldsmiths and was succeeded by PuzurErra. The other names included in the visualization may also be recognized as performing important functions in the organization. Ea’ili and Šu’Eštar were responsible for the administration of luxury shoes and boots in consecutive periods. Dayyanummišar was in charge of weapons, including luxury weapons for display purposes. Lugalkugzu fulĄlled an important coordinating position between the various offices within the treasury and between the (much larger) administration of domestic animals and the treasury. The information about the identity of the actors is not available in the graph itself but may be retrieved from Paoletti (2012). The persons with the highest degree (those labeled in the graph) include important political Ągures (the royal family and the prime minister), Ągures in leading positions in the institution, and people who mediate between different branches. 3.2. Interactive Visualization One drawback of the visualization in Figure 1 is the absence of labels for the lower-ranked nodes. Printing all the labels is certainly possible but leads to an illegible graph. We may improve the visualization by adding interactivity and displaying additional information about the nodes in tooltips. A Python package that was written for such interactive purposes is Bokeh. We can transform Figure 1 into the interactive visualization in Figure 64 . The Bokeh plot can be saved as an image, but also as a free-standing HTML Ąle, in which the interactivity is preserved. Hovering over a node with the mouse turns that node and all its edges green and displays some basic information about the node in tooltips (name, degree, and eigenvector centrality). Clicking on the node will highlight the node and its edges, but it will turn all other graph 9 Figure 6: Interactive elements gray. The tools in the toolbar may be used to zoom in, to save (as .png) or to reset. We can now move around in the visualization and discover the names of the smaller nodes. Doing so, we will also Ąnd gods (marked with DN) who are the recipients of valuable objects. 3.3. Cliques and Clique Communities The graph allows for a variety of further inquiries, based on the mathematics of graph theory. Cliques are sets of nodes that are all connected to each other. That is, if we have four nodes, A, B, C, and D, all possible connections exist. A related concept is the k-clique community. A k-clique community consists of adjacent cliques of at least k members, where adjacent cliques share at least k-1 nodes. As we will see, k-clique communities tend to overlap. That is, there are powerful actors who belong to multiple such communities and connect them to each other. If we compute the k-clique communities (k=4) for the present graph, we get the following names: Clique 1: Amarsuenak[1]RN, AradNannak[0]PN, Ayaŋu[0]PN, Dada[0]PN, Dayyanummišar[0]PN, Enkik[1]DN, Lisin[0]PN, Ludiŋirak[0]PN, Lunanna[0]PN, PuzurErra[0]PN, Šulgir[1]RN Clique 2: AradNannak[0]PN, Ludiŋirak[0]PN, Tahišatal[0]PN, Šušulgir[0]PN Clique 3: Ludiŋirak[0]PN, Lugalkugzu[0]PN, PuzurErra[0]PN, Utamišaram[0]PN Clique 4: Ludiŋirak[0]PN, Lugalkugzu[0]PN, Ribagada[0]PN, Tahišatal[0]PN 10 We see several of the names that also appeared as high-ranking nodes in Figures 1 and 6, but we also Ąnd several new names, including a god (Enkik) and a second king (Šulgir). Quite a few names appear in more than one k-clique community. The visualization below shows the nodes that belong to at least one k-clique community, plus the nodes with a degree of at least 8. The nodes are colored according to the k-clique community to which they belong Ű black represents nodes that do not belong to any k-clique community, and red nodes (not accidentally the larger ones) belong to multiple such communities. Hovering over a node will highlight the edges of that node in the color of the k-clique community. Clicking on Ludiŋirak will show that he participates in all four k-clique communities and is also connected to Abisimti, a black node. Figure 7: K-clique-com This last visualization may invite further investigation of the people who do or do not show up among the k-clique communities. 4. Further Thoughts The scripts that produce the graph and the visualizations were developed for this particular set of 300 documents Ű a tiny drop in the ocean of ancient texts. These scripts, in other words, cannot be used as out-of-the-box tools for visualizing and analyzing any data set. Most speciĄc is the part of the script that selects nodes, assigns roles, and creates edges Ű that is, the actual creation of the graph. ORACC data will have part-of-speech annotations that will make it relatively easy to Ąlter out personal names, royal names, gods, and/or place names. The assignment of roles, however, very much depends on the activities recorded in the archive at hand and how these activities were formulated. Even within the Ur III period, the set of key words developed for this text group is not necessarily 11 valid for any other text group. I do hope that the steps formulated and exempliĄed in the scripts will help to do similar things for other text groups. Most administrative traditions will use key words to mark the most important actors in transactions. If it turns out that such an approach is not feasible, one may always collect edges by hand. For a corpus of several hundred documents, that is labor-intensive but doable (and may produce fewer errors than the approach discussed here). Manual entry of edges may be somewhat more prone to inconsistencies, and the various utilities for checking nodes, roles, and edges as proposed here may still fulĄll important functions. Once the graph has been constructed the possibilities are almost endless Ű the examples shown here are, indeed, just examples. I hope they will inspire other researchers to use interactive visualizations for similar (or very different!) analyses. One may ask: did our scripts produce new knowledge? Since we did not start with a research question, there is no hypothesis that was proven or refuted at the end of this story. The approach was, from the start, an exploratory one and so the question should be: does this approach, or one like this, help in exploring an ancient dataset? For the interpretation of the graph we have relied heavily on the more traditional approach in Paoletti (2012). In fact, it would be difficult to make much sense of all of this without such work. The network analysis, therefore, is not going to replace traditional close reading of texts. I propose that graphs and interactive visualizations may play a role Ąrst, in working with students who are new to the material. It will show rather quickly, who the central people are and how they are (or are not) connected to each other. Second, for researchers and specialists the visualizations may add another layer to the understanding of the dynamics of the social network attested in the documents at hand. A Ąve-hundred-page book gives insights that cannot be reproduced computationally. On the other hand, the bird’s-eye view provided by the various visualizations shows, for instance, the (relative) importance of the royal family in the day-to-day business of the treasury office. Figure 7 not only includes two kings among the most important nodes (AmarSuenak and Šulgir) but also two queens (Abisimti and Kubatum). Since this is a royal archive that deals with royal property, it is not surprising to see the king showing up, but the prominence of the king himself (and his wife) in such operations is unexpected. The most central Ągures, those who participate in more than one k-clique community, however, are the high administrators and the prime minister, AradNannak. The visualizations shown here do not take into account the dates of the documents, and therefore we see people in the same graph that were not active at the same time. Dates, however, are part of the metadata in the NRAD model, attached to the Document. In a future incarnation we may well utilize such dates for restricting the documents to a range of dates. Similarly, one of the edge attributes in the graph discussed here contains the document IDs (the document or documents in which these edges occur), which can easily be turned into the 12 URLs of their online editions. It is not far-fetched, therefore, to imagine an interactive graph where clicking on an edge opens the editions of these documents, further enabling the exploration. The possibilities for explorative research seem endless. Notes 1) This article is based on my Compass (Computational Assyriology) project (http://github.com/niekveldhuis/compass), Chapter 4. This chapter proĄted from the Sumerian Networks project (http://github.com/niekveldhuis/sumnet). Sumerian Networks is a Data Science Discovery project that was coordinated from 2017-2021 by Dr. Adam Anderson. The undergraduate UC Berkeley students who participated in the project for shorter or longer periods of time are Yashila Bordag, Colman Bouton, Jennie Chen, Tiffany Chien, Dalton Do, Zekai Fan, Kimberly Kao, Jason Kha, Anya Kulikov, Rachel Lim, Dominic Liu, Harini Rajan, Max Sullivan, Anjali Unnithan, and Lucie Yoonsun Choi. In addition, Aleksi Sahala, visiting graduate student from the University of Helsinki, participated for one semester. 2) The Compass project is still under construction. The code for data acquisition, analysis, and visualization is functional, but at the time of writing not all code is sufficiently documented. 3) The code for this visualization was written by Colman Bouton. The code for Figure 5 reuses his functions for sizing and placing the nodes. 4) The code for Figure 5 was written in HvPlot, a package that uses HoloViews and Bokeh for plotting data from a variety of Python libraries, including NetworkX. A good introduction to the combination of NetworkX and Bokeh is Chapter 6 of Introduction to Cultural Analytics & Python by Melanie Wash (2021). Submitted 2021-07-06 | Published 2021-10-31 References Paoletti, P. (2012). Der König und sein Kreis. Das Staatliche Schatzarchiv der III. Dynastie von Ur. Biblioteca del Próximo Oriente Antiguo 10. Consejo Superior de Investigaciones CientíĄcas. 13