Academia.eduAcademia.edu

The ETCBC Database of the Hebrew Bible

This is the handout of my contribution to the seminar "New Directions in the Computational Analysis of Biblical Hebrew Grammar", IOSOT Congress, Stellenbosch, September 2016. (Apologies for Word format; will be replaced by PDF)

The ETCBC Database of the Hebrew Bible Wido van Peursen, Eep Talstra Centre for Bible and Computer, Amsterdam Seminar IOSOT Congress 2016: New Directions in the Computational Analysis of Biblical Hebrew Grammar History 1977: Start of the Werkgroep Informatica Vrije Universiteit (WIVU) 2013: WIVU becomes Eep Talstra Centre for Bible and Computer (ETCBC) 2014: Database becomes online accessible and searchable in SHEBANQ Corpus and contents Complete Hebrew Bible Some inscriptions, Qumran documents (1QS, 1QH), Peshitta (Kings, Ben Sira, PrMan, EpBar, Judges), Targum (Jonathan on Judges), other Syriac texts (Book of the Laws of the Countries) Linguistic levels of word, phrase, clause, sentence, text Methodological principles Encoding rather than tagging Form to function: registration of textual data before interpretation Bottom-up: from smaller to larger linguistic units Versions and representations Core database, on which other databases depend (server ETCBC) Interactive procedures for text encoding based on pattern recognition For Hebraists and Biblical scholars: SHEBANQ Direct access to Hebrew text and query-saver Serendipity Persistent identifiers for reference For those who have affinity with IT: LAF-Fabric A tool for running Python notebooks with access to the information in a LAF resource) People can also download it and use it for their own purposes at DANS EASY E.g. Joshua Berman and Moshe Koppel, Bar Ilan University: Tiberias Project For students: Bible Online Learner Useful educational tool: Text as Tutor (Nicolai WInther-Nielsen). Now also with Portugese interface For Bible Translations: Paratext (in preparation) Intuitive query creator Datamodel Extended Monad-dot-Feature Model (EMdF): Monad (integer) Objects (set of integers) Object Types Features Example: בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃ Clause 1 Phrase 1 2 3 4 Words B R>CJT BR> >LHJM >T H CMJM W >T H >RY Monad 1 2 3 4 5 6 7 8 9 10 11 Object Type Features Values Word lexeme e.g. ‘BR>’ Lexeme_utf8 ‎בָּרָא gender Masc, Fem person 1,2,3 Verbal stem Qal, Nifal etc. Verbal Tense Perfect, Imperfect etc. LAF: Linguistic Annotation Framework: a data format for stand-off annotations (unlike TEI etc. which are inline); advantage: overlapping hierarchies. When the complement of NTN is an indirect object, the verb means “give”; when the complement is a locative, the verb means “put, place”. As a first step, all patterns of NTN with a single object and complement are collected. Some examples This query finds all cases in which the same direct speech introduction (same subject [e.g. Abraham] speaks to the same complement [e.g. to Abimelech]) is repeated after the initial direct speech. The pattern we are looking for looks like this: X speaks to Y: "bla bla bla". X speaks to Y: "bla bla bla" One would expect after the initial X speaks to Y: "bla bla bla" the construction Y speaks/answers to X: "bla bla bla" is following. We are therefore searching for a rather uncommon pattern. The query was inspired by Genesis 20:9-10 where Abimelech initiated a direct speech twice without Abraham responding after Abimelech’s first speech. Screenshot from the Notebook “Kings and Parallels” (Dirk Roorda and Martijn Naaijer) Questions from seminar organizers Which linguistic theories are accommodate within the database? The database is used in various theoretical frameworks (e.g. Government and Binding theory; Rhetorical Structure Analysis; Role and Reference Grammar). This is possible due the focus on linguistic signals in the text and the bottom-up approach. What kind of grammatical data can be retrieved? Information at all levels of encoding: from, e.g., realizations of the prefix 3ms in the imperfect, to the various clause functions (e.g. attribute clauses) and clause relations (text hierarchy). To what extent can features that exihibit variation be retrieved (style, oral vs. written etc.)? We distinguish the domains Narrative, Quotation and Discourse. There are some ongoing projects that investigate difference between e.g. narrative texts and the direct speech sections or between, e.g., narrative and poetic books. To what extent can questions of diachrony of BH be addressed? In the project Does Syntactic Variation Reflect Language Change? Tracing Syntactic Diversity in Biblical Hebrew Texts (Janet Dyk, Wido van Peursen, Dirk Bakker, Marianne Kaajan, Martijn Naaijer) we investigate distribution patterns for linguistic phenomena to find out the factors that account for them (this may be language development, but also transmission, genre, syntactic context). This query finds all cases in which the same direct speech introduction (same subject [e.g. Abraham] speaks to the same complement [e.g. to Abimelech]) is repeated after the initial direct speech has been heard. The pattern we are looking for looks like this: X speaks to Y: "bla bla bla" X speaks to Y: "bla bla bla" When the complement of NTN is an indirect object, the verb means give; when the complement is a locative, the verb means put, place. As a first step, all patterns of NTN with a single object and complement are collected. @shebanq /etcbc shebanq.ancient-data.org