Academia.eduAcademia.edu

Using Prerequisites to Extract Concept Maps fromTextbooks

2016, Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

We present a framework for constructing a specific type of knowledge graph, a concept map from textbooks. Using Wikipedia, we derive prerequisite relations among these concepts. A traditional approach for concept map extraction consists of two sub-problems: key concept extraction and concept relationship identification. Previous work for the most part had considered these two sub-problems independently. We propose a framework that jointly optimizes these sub-problems and investigates methods that identify concept relationships. Experiments on concept maps that are manually extracted in six educational areas (computer networks, macroeconomics, precalculus, databases, physics, and geometry) show that our model outperforms supervised learning baselines that solve the two sub-problems separately. Moreover, we observe that incorporating textbook information helps with concept map extraction.

Using Prerequisites to Extract Concept Maps from Textbooks Shuting Wang† , Alexander G. Ororbia II‡ , Zhaohui Wu† , Kyle Williams‡ , Chen Liang‡ , Bart Pursel∗ , C. Lee Giles‡† † Computer Science and Engineering Information Sciences and Technology ∗ Teaching and Learning with Technology Pennsylvania State University, University Park, PA 16802, USA ‡ [email protected], [email protected], {zzw109,kwilliams}@psu.edu, [email protected],[email protected], [email protected] ABSTRACT knowledge graphs (DBpedia [1] and YAGO [33]), and for real world facts [26], there has been little effort in organizing knowledge for educational purposes. Applications of such knowledge structures in education have been widely used in teaching and learning assessment [17]. There are many interesting challenges in extracting knowledge graphs for education. In some cases, nodes in an educational knowledge graph can be scientific and mathematical concepts, such as “Lasso” and “Regularization”, instead of typical entities such as individuals, locations, or organizations. As such, instead of using general concept relationships such as “is-a” and “part-of”, we focus on the prerequisite dependencies among concepts. A prerequisite dependency requires that learning one concept is necessary before learning the next. For instance, we need to have basic knowledge of “Regularization” in order to learn “Lasso” (or L1-regularized regression). We present a method for constructing a specific type of knowledge graph, a concept map, which is widely used in the learning sciences [41]. In such a directed graph, each node is a scientific concept and directed links between these nodes imply their prerequisite dependencies. Figure 1 shows an example of an extracted concept map in the economics area where each node is an economical concept such as “Gross domestic product” and “Consumer price index” and links indicate prerequisite dependencies relating these concepts (from prerequisites to subsequents). We present a framework for constructing a specific type of knowledge graph, a concept map from textbooks. Using Wikipedia, we derive prerequisite relations among these concepts. A traditional approach for concept map extraction consists of two sub-problems: key concept extraction and concept relationship identification. Previous work for the most part had considered these two sub-problems independently. We propose a framework that jointly optimizes these sub-problems and investigates methods that identify concept relationships. Experiments on concept maps that are manually extracted in six educational areas (computer networks, macroeconomics, precalculus, databases, physics, and geometry) show that our model outperforms supervised learning baselines that solve the two sub-problems separately. Moreover, we observe that incorporating textbook information helps with concept map extraction. Categories and Subject Descriptors I.2.6 [Learning]: Knowledge acquisition; Concept learning; I.7.5 [Document and Text Processing]: Document Capture—Document Analysis; H.3.3 [Information Storage And Retrieval]: Information Search and Retrieval Keywords Open education; concept maps; textbooks; Web knowledge; 1. INTRODUCTION Export A knowledge graph organizes knowledge by linking entities with their relationships and is applicable to many NLP tasks such as question answering [43] and knowledge acquisition [10]. While recent work has addressed reasoning in Investment Consump�on Government Spending Gross Domes�c Product Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. Unemployment CIKM’16 , October 24-28, 2016, Indianapolis, IN, USA Unemployment Rate Price Level Consumer Price Index Figure 1: Example of an extracted concept map in economics. c 2016 ACM. ISBN 978-1-4503-4073-1/16/10. . . $15.00 ! DOI: http://dx.doi.org/10.1145/2983323.2983725 317 Traditional approaches to knowledge graph extraction generally consist of two separate steps: 1) Extracting key concepts and, 2) Identifying relationships between key concepts. While these two common information extraction tasks have been well studied [7, 33, 2], solving these two tasks independently for educational content poses problems. We argue that these two problems are actually strongly coupled, meaning that the results of one affects the results of the other. Thus, solving these sub-problems independently might lead to sub-optimal performance. For example, in educational resources, a concept is often presented by first introducing its prerequisites. Thus the order in which two concepts appear in a document source can help identify their prerequisite relation. If a concept in this ordered chain is not correctly extracted, its prerequisite relation to other concepts will be lost. Furthermore, if this concept is the prerequisite to many others, we may no longer identify an important key concept. Leveraging information from existing educational resources, we propose a concept map extraction model that jointly optimizes these two sub-problems. It then utilizes identified prerequisites to refine the extracted key concepts and vice versa. This model produces a related key concept set with prerequisite relations among those concepts. There are many education resources from which one could build concept maps. For this work, we focus on textbooks since they often provide a comprehensive list of domain concepts and are often used as major educational resources in schools, colleges and universities. Educational resources such as textbooks and slides can provide implicit knowledge structures for knowledge graph extraction. For example, structural information such as table of contents (TOC) of a textbook can be very useful in identifying concept relationships. We feel this method could be easily generalized to other education resources with structured information such as slides and courses. We then augment “inside-the-book” knowledge with web content (for now, Wikipedia), thus enriching the content of a specific book with complementary information. As described in Section 4.4, we will empirically verify that using such complementary resources can give quality concept information at the secondary school and undergraduate level. In summary, our contributions are: • The first attempt, to the best of our knowledge, to use textbooks to extract concept maps with explicit prerequisite relationships among the concepts. • A set of principled methods that utilize both Web knowledge (Wikipedia) and the rich structure in textbooks to identify prerequisite relationships among domain concepts. • An optimization framework that jointly solves the two sub-problems of concept map extraction and linkage. • The generation of datasets from books in six different educational domains to show how our methods work. Related work is introduced in Section 2. The joint optimization model for concept map extraction is presented in Section 3. We discuss the data preparation and baseline models in Section 4 and experimental results in Section 5. A case study on the subject of geometry is presented in Section 6 followed by conclusions and future work. 2. RELATED WORK Early work on the problem of identifying knowledge graphs [6] inferred knowledge bases from a collection of noisy facts. 318 More recently, ontology was used in the construction of a knowledge graph [13, 27]. [13] refined a knowledge base using relations and candidate facts found in an ontology. Building on earlier work [13], a probabilistic soft modeling framework [27] was used to jointly infer entities, categories and relations. Knowledge graph completion uses external sources (e.g., free text) to extract text patterns for certain relationships [3, 23, 29, 12, 34, 45, 8, 37]. In this line of work, relational facts are considered among existing word entities with a focus mainly on completing and extending an existing knowledge base. Our work differs from these work in that we consider scientific and mathematical concepts and the prerequisite dependencies between these concepts. Other related research is key phrase detection using Wiki pedia. Early work [4] explored Wikipedia as a resource for detecting key phrases in open text and used a disambiguation statistical learning method to compare lexical context around an ambiguous named entity in the content of candidate Wikipedia pages. Previous work also identified key phrase by considering the interdependence between Wikipedia candidates [9, 11, 19, 22, 15, 28] to obtain coherent key phrases for documents. More related we feel, concepts have been extracted from textbooks[38] and the textbook structure was used to organize the concepts, but this was done without considering explicit relationships between concepts. Instead of solely extracting entities from documents, our work constructs a concept map with both key concepts and their relationships. Moreover, our optimization model reinforces the mutual importance between the key concept extraction and prerequisite relationship identification and jointly optimizes the two sub-problems. For concept prerequisite inference, [35] utilized PageRank and a random walk with restart scores [35]. The difference between reference links in any two Wikipedia articles [16] was also considered and a learning-to-rank method [44] constructed a concept graph with prerequisite relationships between courses. Extracting concept maps with prerequisite relationships have also been studied in e-learning [36, 14, 39]. Concept maps [36] were derived from prerequisite relationships from ontologies by translating instances and interactions among instances into concept relationships in a knowledge graph. Association rule mining [14] was applied on learners’ test records to derive concept maps with prerequisite relations. [39] explored prerequisite relationships among concepts by looking at topic coverage of each concept. 3. JOINT KNOWLEDGE GRAPH EXTRACTION FROM TEXTBOOKS Here, we introduce our notation and describe how we jointly extract key concepts as well as the prerequisite relations. We define c ∈ C as a concept where C is a set of Wikipedia concepts, s ∈ S for a subchapter in the textbook. The term “subchapter” refers to all the headings in the TOC. For instance, both 1.1 and 1.1.1 are subchapters. A key concept in a book chapter is a concept which is not only mentioned but also discussed and studied in the subchapter. The input to our extractor consists of a digital book B with a list of titles, chapter number and contents for all its chapters. Each chapter contains one or more key concepts. The output is a concept map G, which is represented another one is advanced. We denote complexity level of a concept as l(·) and later discuss the definition of l(·). Given l(·), we define the following optimization function for R: as a set of triples in the form {(c1 , c2 , r)|c1 , c2 ∈ C, r ∈ R}, where R = {0, 1} is the prerequisite relationship, r takes value 0 when c1 and c2 have no prerequisite relation; and takes value 1 when c1 is c2 ’s prerequisite. We use CS = {csip ∈ {0, 1}|1 ≤ i ≤ |C|, 1 ≤ p ≤ |S|}, to indicate concept appearance in subchapter, where csip takes value 1 when the ith concept is a key concept in the pth subchapter; otherwise takes 0. Our goal is to optimize CS and R in order to obtain a global concept map. 3.1 (1) csip csjq f (ci , cj ), (2) To reinforce the mutual benefit between two sub-problems, we propose 5). Order coherence: Concepts should not be discussed without introducing their prerequisites, i.e., given a concept, prerequisite concepts should be introduced before this concept and subsequent concepts should be introduced after the concept. The following function is proposed to derive this mutual benefit property: P3 (CS, R) = α5 |C| |S| ! ! |C| |S| ! ! I(p < q)csip csjq rij . (4) i,j=1 p,q=1 In summary, the global objective function Λ(CS, R) = P1 (CS)+ P2 (R) + P3 (CS, R) + β1 ∥CS∥ + β2 ∥R∥ consists of P1 for key concept extraction, P2 for prerequisite relationship extraction, P3 for mutual benefit modeling and L1 regularization terms to control model complexity and is maximized. 3.1.4 Optimization We maximize Λ to obtain the optimal concept map by adopting the Metropolis-Hasting algorithm to optimize CS and R respectively. ∀cs ∈ CS, we calculate the value of Λ using current value of cs and the flipped value of cs (denoted ′ as cs ). We follow the following update rule to update CS. i=1 p=1 +α2 rij (l(ci ) − l(cj )). (3) i,j=1 3.1.3 Joint Modeling Intuitively, if concept c is a key concept in subchapter s, it should have these few properties: 1). Local Relatedness: Key concept c should be strongly related to subchapter s. For instance, the concept and book chapter share similar topics; 2). Global Coherence: We argue that extracted key concepts should be coherent in the following way: Less redundancy: Chapters do not always discuss all of the same concepts. Information overlap between concepts in different chapters should be minimized. For instance, given a geometry textbook, if subchapter 2.1 covers “Triangle” in detail, subchapter 3.1 should not cover this concept in detail again. Note that here we mention both concept-concept relatedness and concept-chapter relatedness. We denote all relatedness as one symmetric similarity function f (·, ·), which can take both the concept and chapter as arguments. We will discuss the definition of f (·, ·) later in this section. Given f (·, ·), the following objective function is proposed to derive the concept-subchapter matrix CS from the aforementioned properties: csip f (ci , sp ) |C| ! The first term corresponds to the Topically Related attributes and should be maximized. The second term is used to measure the Complexity Level Difference between two concepts and we want this value to be maximized. 3.1.1 Key Concept Extraction |C| |S| ! ! rij f (ci , cj ) + α4 i,j=1,i̸=j Concept Map Extraction P1 (CS) = α1 |C| ! P2 (R) = α3 ′ σCS (cs, cs ) = i,j=1 p̸=q where I(·) ∈ {1, −1} is an indicator function and returns 1 if the statement holds and returns −1 otherwise. αs are the term weights. The first term corresponds to the local relatedness attributes and captures the relatedness between candidates and book chapter. This term should be maximized to select candidates similar to the book chapter. The second term is used to reduce redundancy in the concept map. For this term, we calculate the pairwise similarity between selected concepts in different chapters as the redundancy in the extracted concept map and this value should be minimized. " ′ 1, if Λ(R(n) , CS (n) , cs ) ≤ Λ(R(n) , CS (n) , cs). e−β(Λ(R (n) ,CS (n) ′ ,cs )−Λ(R(n) ,CS (n) ,cs)) , otherwise. Similarly, ∀r ∈ R, we perform updates according to the following update rule: ′ σR (r, r ) = 3.2 " ′ 1, if Λ(R(n) , CS (n) , r ) ≤ Λ(R(n) , CS (n) , r). e−β(Λ(R (n) ,CS (n) ′ ,r )−Λ(R(n) ,CS (n) ,r)) , otherwise. Representation Schemes We explore different schema for book chapter/concept content representation and then derive measures for concept/book chapter similarity f (·, ·) and the concept complexity level l(·). If multiple measures are derived for the same attribute, we adopt an equal weighted sum of different measures as the value of this attribute. 3.1.2 Prerequisite Relationship We consider a pair of concepts to have a prerequisite relationship if they are: 3). Topically Related: If two concepts cover different topics, it is unlikely that they have prerequisite relationships. 4). Complexity Level Difference: Not all pairs of concepts with similar topics have prerequisite relationships. For example, “isosceles triangle” and “right angled triangle” cover similar topics but do not have learning dependencies. Thus, given two concepts, it is necessary to identify whether one concept is basic while 3.2.1 Word Based Similarity We represent each chapter using words appearing in the chapter and each concept using a bag-of-word representation from the word content in their Wikipedia pages. Standard text preprocessing/weight procedures, including casefolding, stop-word removal and term frequency-inverse doc- 319 ument frequency(TF-IDF) are applied. Based on this representation, we define the concept-chapter similarity function f (·, ·) (applied in Equation 1) as a combination of the following measures: • Title match: This feature measures the relatedness between the concept tile and the chapter/concept title. Given a book chapter/concept title tb and a Wikipedia candidate title tw, if tw is in tb or tw is tb , T itlematch(tb, tw) = 1; Otherwise, T itlematch(tb, tw) = 0. • Content cosine similarity: This feature measures the cosine similarity between the word TF-IDF vectors of chapter/concept contents. • Title Jaccard similarity: This feature computes the Jaccard similarity between the chapter/concept title. • Sustained periods in subchapters: A sustained period of a concept in a subchapter is the period from its first appearance to its last appearance. When the sustained period of a candidate concept in a subchapter is longer, it is more likely that this concept is important in this chapter. on their Wikipedia links [42]. 1− max(log |Qi |, log |Qj |) − log |Qi ∩ Qj |) , log Wall − min(log |Qi |, log |Qj |) where Qi is the set of Wikipedia concepts which link to wi and Wall be the total number of concepts in Wikipedia. We also derive the following measures for a Wikipedia concept’s complexity level and use these three measures in the second term in Equation 3. • Number of in-links/out-links: This feature returns the number of in-links/out-links in the concept Wikipedia page. • Relational strength in textbook/Wikipedia: Relational strength RS(wi , wj ) measures the semantic relatedness between two concepts using concept co-occurrence and distance in the same sentence [5]: log( nij /max(n) ), i ̸= j, avg d2ij /max(avg d2 ) where nij is the co-occurrence of concept i and j within a sentence and avg d2ij is sum of the distance of two keywords divided by the number of times two keywords appeared in the same sentence. If two concepts are close within in a lot of sentence in articles, implies that their relationship is also stronger than the others. We introduce one additional measure for concept-concept similarity. This concept-concept measure together with the other four aforementioned measures are used for conceptconcept similarity and applied in Equation 2, and the first term in Equation 3. • Concept co-occurrences: by counting the co-occurrences of two concepts lies within a sentence from either a book chapters or a Wikipedia page. We also derive the following measures for a concept’s complexity level based on its Wikipedia anchors. These measures are used in the second term in Equation 3. • Supportive relationship in concept definition: A is likely to be B’s prerequisite if A is used in B’s definition. Here, we use the first sentence in the concept’s Wikipedia page as its definition. Supportive(A, B) = 1 if A appears in B’s definition. For instance, “Logarithm” is used to define “Natural logarithm” whose definition is “The natural logarithm of a number is its logarithm to the base e... ” and Supportive(logarithm , natural logarithm) = 1. nij avg d2ij = # d2m m=1 nij , i ̸= j, • RefD: [16] defines a new metrics measuring the prerequisite relationships between concepts using Wikipedia links between two concepts. If most related concepts of A refer to B but few related concepts of B refer to A, then B is more likely to be a prerequisite of A. !|W | v(wi ,B)·u(wi ,A) !|W | i=1 w(wi ,A) i=1 − !|W | v(wi ,A)·u(wi ,B) , !|W | i=1 w(wi ,B) i=1 where W = {w1 , ..., w|W | } is the concept space and |W | is the size of the concept space; u(wi , A) weights the importance of wi to A; and v(wi , A) is an indicator showing whether wi has a Wikipedia link to A. 3.2.2 Word Embeddings 3.2.4 Textbook Structure This method maps concepts from the vocabulary to vectors of real numbers in a low-dimensional space [21]. We use word2vec which discovers lower dimensional vectors with two-layer neural networks using the contexts and syntactic of concepts. The concept similarity is defined as the cosine similarity of two concepts’ embeddings, which is used in Equation 2 and the first term in Equation 3. The TOC of textbooks contains implicit prerequisite relationships between concepts since textbooks usually introduce concepts based on their learning dependencies. Therefore, we define TOC distance between two concepts as the distance between their subchapter numbers. This feature is used to measure complexity level difference between concepts and applied in the second term in Equation 3. Given two concepts A and B, ai and bi are used to denote their chapter number arrays. For example if A is in chapter 1.1, then a1 = 1 and a2 = 1. We define the TOC distance in textbooks between A and B as: T OCdistance(a, b) = (ai − bi )/(β i−1 ) where i is the smallest index such as ai ̸= bi and β is a pre-specified decay parameter which is empirically set as 2. For instance, given a concept “HTTP” from chapter 2.3.1 and “HTTP message body” from chapter 2.3.2, TOC distance between them is 0.25 and “HTTP” could be “HTTP message body”’s prerequisite. Notice that a concept can serve as the key concept in multiple chapters and the value of the TOC distance feature between two concepts is the average TOC distance of all pairs of TOC of these two 3.2.3 Wikipedia Anchors Besides the content information, millions of cross-page links in Wikipedia are also useful in detecting concept relatedness and concept complexity levels. Given two concepts, we calculate the following measures as their similarity and use these measures in Equation 2, and the first term in Equation 3. • Wikipedia link based Jaccard similarity: Given two concept, this feature computes the Jaccard similarity of the inlinks/out-links of their Wikipedia page. • Wikipedia link based semantic similarity: This feature computes the semantic relatedness of two concepts based 320 Local Features: We use features defined in Section 3.2 which capture the relatedness between concepts and book subchapters, i.e., Title Match, Content cosine similarity, Title Jaccard similarity, Sustained periods in subchapters of the concept, are used as local features in concept extraction. Global Features: Global features include two sub-set of features: redundancy features and order coherence features. Redundancy Features: This set of features measure the information overlap that a candidate ci can possibly bring into the extracted concept set. Given the ith candidate in j th chapter, we calculate the similarity between this candidate and other candidates in different subchapters as the value of the redundancy feature of this candidate: concepts. This measure is used in second term in the second term in Equation 3. 4. EXPERIMENT SETTINGS 4.1 Dataset In order to build a test bed for concept map extraction, we manually construct concept maps using six widely-used textbooks: computer networking 1 , macroeconomics 2 , precalculus 3 , databases 4 , physics 5 , and geometry 6 . To construct the final dataset, we first manually label key concepts: 1) Extract all Wikipedia concepts that appear in each book chapter. 2) Given a candidate concept ci with title tw, we select it as a key candidate of subchapter j if T itlematch(tw, tbj ) = 1 where tbj is the title of the subchapter j, or ci is ranked within top − 30 among all candidates in subchapter j based on Content cosine similarity feature. 3) Label the candidates as “key concept” or “not key concept” and obtain a set of key concepts for this area. Then for each pair of key concepts A and B , we manually label them as “A is B’s prerequisite”, “B is A’s prerequisite” or “No prerequisite relationship”. Table 1 shows characteristics of the dataset. For each area, three graduate students with corresponding background knowledge are recruited to label the data and we take a majority vote of the annotators to create final labels. We achieve an average 79% correlation for the key concept labeling task and an average 83% correlation for the concept relationship labeling task. 4.2 Red(csi ) = |C| |S| ! ! csip csjq f (ci , cj ), i,j=1 p̸=q where f (cski , cspj ) is the similarity between candidate cski and cspj and where I(·) ∈ {1, −1} is an indicator function and returns 1 if the statement holds and returns −1 otherwise. Section 3.2 defines different semantic relatedness measurements and all these measurements can be applied to calculate redundancy features. Order Coherence Features: Besides less redundancy attributes, we also expect consistent learning order in concepts extracted from the book, i.e., given a concept cski in subchapter i, we expect that all cski ’s prerequisites appear in subchapters before i and all cski ’s subsequent concepts appear in subchapters after i. Given candidate cski in the kth subchapter, we define features orderCorr to capture the global learning order of the extracted concepts : Baseline - Key Concept Extraction 4.2.1 TextRank TextRank is a method widely used in key sentence and keyphrase extraction [20]. The general procedure of text rank is to build up a graph using candidate key concepts as vertices and co-occurrence of two candidates within a sentence as the weight on the edge between them. Then the algorithm iterates over the graph until it converges and sorts vertices based on their final scores to identify key concepts. orderCorr(ci ) = |C| # |S| # I(p < q)csip csjq rij j=1 p,q=1 |C| # |S| # . csij csjq |rij | j=1 p,q=1 Equation 4.2.3 computes the percentage of concepts that are appropriately ordered based on the ci ’s prerequisite relationships. We use SV M rank to predict rankings of Wikipedia candidates for each subchapter with data from one book as testing data and data from other five as training data. 4.2.2 Wikify Wikify detects significant Wikipedia concepts within unstructured texts. We use Wikipedia Miner developed in [22] to link book contents with Wikipedia concepts. 4.2.3 Supervised Key Concept Extraction (Supervised KCE): 4.3 Based on the local relatedness and global coherence attributes proposed in Section 3.2, we propose the following features for key concept learning from each subchapter. Baseline - Prerequisite Relationship Identification 4.3.1 Hyponym-Hypernym A hyponym is a concept whose semantic field is included within that of another concept (hypernym) and in this work, we use hyponym-Hypernym to as a baseline method of deriving prerequisite relationships. Lexico-syntax pattern based extraction methods are popular methods for extracting hyponym relationships between concepts because they offer effective text processing text. We adopt the10 lexico-syntactic patterns selected for hyponymy-hypernymy pattern matching in [40], as shown in Table 2. 1 Kurose, James. F. (2005). Computer networking: a topdown approach featuring the Internet. Pearson Education India. 2 Mankiw, N. Gregory.(2014). Principles of macroeconomics. Cengage Learning. 3 Stewart, James, Lothar Redlin, and Saleem Watson. Precalculus: Mathematics for calculus. Cengage Learning, 2015. 4 Ramakrishnan, Raghu, and Johannes Gehrke. ”Database management systems.”, 2000. 5 Mark Horner, Samuel Halliday, Sarah Blyth, Rory Adams, Spencer Wheaton, ”Textbooks for High School Students Studying the Sciences”, 2008 6 Dan Greenberg, Lori Jordan, Andrew Gloag, Victor Cifarelli, Jim Sconyers,Bill Zahnerm, ”CK-12 Basic Geometry” 4.3.2 Supervised Relationship Identification (Supervised RI) For concept relationship extraction, we utilize Topically Relatedness Features and Complexity Level Difference Fea- 321 Domain # subchapter # Key concepts per chapter Candidate concepts per subchapter # labeled pairs # pairs with relationships Network 98 3.72 70.53 1500 257 Economics 37 4.54 84.16 877 157 Precalculus 23 3.09 70.21 1171 222 Geometry 48 3.16 55.89 1305 186 Database 90 1.41 49.17 529 96 Physics 152 3.54 68.94 1517 208 Table 1: Physical characteristics of books. # labeled pairs is the number of candidate concept pairs labeled as whether two concepts have prerequisite relationships. # pairs with relationships is the number of concept pairs with prerequisite relationships in all the labeled pairs. NP2 such NP1 NP2 NP1 such as NP1 NP2 as NP1 is (a|an) NP2 includ(s|es|ing) NP1 (is|are) called NP2 NP1, one of NP2 NP1 (and|or) other NP2 NP2 consist(s) of NP1 NP2 (like|, specially) NP1 NP1 (in|belong to) NP2 mization function and βs are the weight of L1-regularization. We test different methods in a “leave one book out” manner, i.e, when testing on one book, we train our model using the other 5 books to select the optimal combination of parameters. 4.6 Table 2: Extracted Lexico-syntax patterns. NP1 represents a subsequent Noun Phrase (NP) and NP2 represents a prerequisite Noun Phrase (NP). Model Initialization • Concept-Subchapter Matrix Initialization: To initialize CS(·), we use two features Title match and Content cosine similarity proposed in Section 3.2 which measure the local similarity between a candidate and a book chapter. We set tures introduced in Section 3.2 to identify concept prereqcsij = 1, i.e., candidate ci is a key concept in subchapter j, uisite relationship. Topically Relatedness measures include if T itlematch(ci , tbj ) = 1 where tbj is the title of the subTitle match, Content cosine similarity, Title Jaccard simchapter j, or ci is ranked within top − 5 based on cosine ilarity, Wikipedia link based Jaccard similarity, Wikipedia similarity between chapter/concept contents feature. link based semantic similarity, Relational strength in text• Concept Relationship Matrix Initialization: To initialize book/Wikipedia. Complexity Level Difference features inthe concept relationship matrix R(·), given two concepts ci clude Supportive relationship in concept definition, RefD,Number and cj , we set rij = 1 if their complexity level difference is of in-links/out-links, TOC Distance. higher than threshold t1 and topically relatedness is higher Then we perform a binary class classification using SV M than threshold t2 . Empirically, t1 is set as mean value of the to identify prerequisite relationships with five books as trainoverall complexity level difference and t2 as mean value of ing data and one book as testing data. the overall topically relatedness. 4.4 Wikipedia Coverage of Concepts 5. EXPERIMENTAL RESULTS Wikipedia has previously been utilized as a controlled vocabulary for topic indexing [18, 19] and key phrase extraction [24]. A few studies have examined Wikipedia coverage of academically related topics [25, 30, 31, 32]. Though some work showed that Wikipedia does not properly cover academic content on the front end of science, previous studies [25, 32] have demonstrated that Wikipedia’s coverage of topics is comprehensive for secondary school and undergraduate education. In order to further validate the coverage of the extracted concept maps, we conducted the following experiments. For each book, three graduate students with corresponding background knowledge are recruited to manually extract all concepts from each subchapter (randomly sampled from the book), and label whether these concepts have a corresponding Wikipedia page. We found that 88% of the concepts (Computer network: 85%, Macroeconomics: 86%, Precalculus: 91% , Geometry: 97%, Physics: 85%, Database: 89%) in the books are covered by Wikipedia and this provides some empirical evidence of reasonable coverage of the extracted concept maps. 4.5 5.1 Effect of Textbook Information In this section, we present how textbook structures help concept map extraction. Figure 2 shows ranking precisions of key concept extraction on six books. For the baseline methods presented, we needed to manually decide the number of key concepts in each subchapter. We thus present the performance of top−1, top − 3, and top − 5 candidates from the concept extraction phase respectively. As shown, we test different combinations of features, with the local features derived from different aspects of relatedness between book subchapter and Wikipedia candidates, and global features which consider the global coherence of the book structure. The results show that incorporating our proposed global features (See “Supervised KCE” in Figure 2) into the extractor does achieve significantly higher precision than other methods which do not consider book structure (TextRank, Wikify and Local features). In Table 3, we present the F-1 score of concept relationship identification using top − 1, top − 3, and top − 5 candidates from the concept extraction phase respectively. The results show that both features derived from Wikipedia and textbooks features achieve significantly higher F-1 score than hyponym-hypernym pattern does. Moreover, we observe that textbook features outperform Wikipedia features Parameter Selection As shown in Equation 2 and Equation 3, our concept maps are shaped by parameters α = {αi , i = 1, 2, 3, 4} and β = {βj , j = 1, 2} where αs are the term weight in the opti- 322 0.5 0.4 0.3 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.4 0.3 0.4 0.3 0.2 0.2 0.1 0.1 0.1 0 0 TextRank Wikify Local Supervised KCE 0 TextRank (a) Computer network Wikify Local Supervised KCE TextRank (b) Macroeconomics 0.8 0.8 0.7 0.7 0.7 0.6 0.6 0.6 0.5 0.4 0.3 0.5 0.4 0.3 0.3 0.2 0.1 0.1 0.1 0 Wikify Local Supervised KCE 0 TextRank (d) Geometry Supervised KCE 0.4 0.2 TextRank Local 0.5 0.2 0 Wikify (c) Precalculus 0.8 Precision@n (n=1,3,5) Precision@n (n=1,3,5) 0.5 0.2 Precision@n (n=1,3,5) Precision@n (n=1,3,5) 0.6 Top-1 candidates Top-3 candidates Top-5 candidates Precision@n (n=1,3,5) 0.7 Precision@n (n=1,3,5) 0.8 Wikify Local Supervised KCE (e) Database TextRank Wikify Local Supervised KCE (f) Physics Figure 2: Precision@n (n=1,3,5) for key concept extraction from six textbooks. Local refers to the supervised learning model using local features defined in Section 4.2.3 with same experiment settings as Supervised KCE. # candidate Network Economics Precalculus Geometry Database Physics Hyponym 1 3 5 0.21 0.32 0.19 0.25 0.36 0.3 0.29 0.42 0.36 0.28 0.36 0.41 0.17 0.38 0.44 0.23 0.37 0.44 1 0.45 0.55 0.52 0.48 0.49 0.5 Wiki 3 0.56∗ 0.56 0.56 0.56∗ 0.55 0.55 5 0.54 0.5 0.47 0.53 0.45 0.58∗ 1 0.49 0.55 0.44 0.55 0.5 0.6 Textbook 3 5 0.6∗ 0.57 0.58 0.52 0.52 0.57 ∗ 0.61 0.59 0.53 0.52 0.58 0.62∗ Table 3: F-1 score for concept relationship prediction. Hyponym refers to the hyponym-hypernym baseline method. Textbook /Wikipedia are supervised learning methods with features textbooks/Wikipedia features using same experiment settings as Supervised RI. ∗ indicates when textbook features are statistically significantly better (p < 0.01) than the Wikipedia features. 5.2 Joint Optimization Table 3 shows the prediction accuracy of the baseline methods and the joint optimization model. The proposed optimization model often outperforms all others with only an exception on precision@1 of database, as a trade-off to performance of concept relationship prediction, as shown in Table 4. Our joint optimization model consistently outperforms the strongly baseline in F1-score on all the six textbooks. In addition, the proposed model can decide the number of concepts in each subchapter automatically by optimizing the proposed objective function while the baseline models depend on a manually decided value. # candidate Network Macroeconomics Precalculus Geometry Database Physics (Textbook features include Concept co-occurrence in book chapters, Relational strength in book contents and TOC distance measures. Wikipedia features include concept cooccurrence in Wiki pages, Content cosine similarity, RefD, Wikipedia link based semantic similarity, Relational strength in Wikipedia and Supportive relationship in concept definition measures.). This is because textbooks are designed for education purpose and provides a better knowledge structure than web knowledge base does. Another potential reason is that features that utilize TOC information (TOC distance feature) consider not only concept complexity level difference but also their semantic relatedness. If two concepts are introduced in neighbor subchapters (say subchapter 1.1 and 1.2), this feature reveals that two concepts might be semantic similar between authors usually put related concepts in neighbor chapters. Supervised 1 3 0.52 0.62 0.55 0.67∗ 0.54 0.61 0.52 0.64 0.51 0.58 0.58 0.65∗ RI 5 0.57 0.63 0.63∗ 0.67∗ 0.49 0.62 Joint Opt 0.63 0.7∗ 0.66∗ 0.71 0.55 0.69∗ Table 4: F-1 score for concept relationship prediction with/without joint optimization. ∗ indicates when the joint optimization model is statistically significantly (p < 0.01) better. 5.3 Measurement Importance In this section, we develop some insights regarding feature importance by reporting performance of the concept extractor and relationship identifier using different feature combinations. 323 0.6 0.5 0.4 0.3 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.5 0.4 0.3 0.5 0.4 0.3 0.2 0.2 0.1 0.1 0.1 0 0 TextRank Wikify Supervised KCE Joint Opt 0 TextRank (a) Computer network Wikify Supervised KCE Joint Opt TextRank (b) Macroeconomics 0.9 0.9 0.8 0.8 0.8 0.7 0.7 0.7 0.6 0.5 0.4 0.3 0.6 0.5 0.4 0.3 0.4 0.3 0.2 0.1 0.1 0.1 0 0 Supervised KCE (d) Geometry Joint Opt Joint Opt 0.5 0.2 Wikify Supervised KCE 0.6 0.2 TextRank Wikify (c) Precalculus 0.9 Precision@n (n=1,3,5) Precision@n (n=1,3,5) 0.6 0.2 Precision@n (n=1,3,5) Precision@n (n=1,3,5) 0.7 Top-1 candidates Top-3 candidates Top-5 candidates Joint Opt Precision@n (n=1,3,5) 0.8 Precision@n (n=1,3,5) 0.9 0 TextRank Wikify Supervised KCE Joint Opt TextRank (e) Database Wikify Supervised KCE Joint Opt (f) Physics Figure 3: Precision@n (n=1,3,5) for key concept extraction from six textbooks with/without joint optimization. # candidate Network Macroeconomics Precalculus Geometry Database Physics Title 1 0.32 0.43 0.41 0.46 0.24 0.3 Features 3 5 0.29 0.19 0.39 0.32 0.36 0.27 0.38 0.32 0.18 0.11 0.22 0.16 Content Similarity Features 1 3 5 0.61 0.64 0.65 0.72 0.65 0.54 0.66 0.68 0.59 0.79 0.76 0.7 0.66 0.69 0.58 0.68 0.62 0.59 # candidate Network Macroeconomics Precalculus Geometry Database Physics Topically Relatedness 1 3 5 0.41 0.49 0.46 0.47 0.52 0.43 0.53 0.58 0.61 0.42 0.45 0.53 0.52 0.59 0.56 0.3 0.22 0.16 Concept Complexity Level 1 3 5 0.56 0.58 0.5 0.55 0.6 0.51 0.55 0.6 0.62 0.49 0.53 0.56 0.53 0.59 0.61 0.64 0.62 0.59 Table 6: F1-score@n (n=1,3,5) for the relationship prediction using different measures. Table 5: Precision@n (n=1,3,5) for key concept extraction from six textbooks using different feature combinations. Title/Content features include local and global features derived from title/content information as defined in Section 4.2.3. textbooks organize knowledge quite differently. However, this remains to be determined. 6. CASE STUDY Table 5 shows the ranking precisions of measurements using title information and that using content information. As shown, content similarity features outperform title features since compared to subchapter titles, subchapter contents contain more information and achieve much higher recall rate in key concept extraction. Figure 6 shows the performance of relationship prediction using Topically Relatedness features and Complexity Level Difference features defined in Section 4.3.2 respectively. We observe that complexity level difference features perform better than content similarity features. This suggests that by capturing which feature is more basic, we can achieve better performance than only considering topic relatedness between features. We also observe that more fundamental subjects such as precalculus, geometry and physics have better performance than advanced subjects such as computer networks and database. A potential reason is that for those domains, different textbooks provide very similar learning and TOC structures, while for advanced subjects Here we present a case study on concept maps extracted for geometry. From Figure 4c and 4d, we can observe that by considering both the lexical similarity and semantic relatedness, the Supervised KCE + Supervised RI method, here and after, the supervised learning method, and joint optimization model achieve better performance in both concept extraction from book chapters and relationship identification than the TextRank+Hyponym-hypernym method (see Figure 4a) and Wikify+Hyponym-hypernym method (see Figure 4b). Moreover, supervised method and joint optmization model consider book structure information and this reaffirm the effectiveness of textbook structure in key concept extraction and concept relationship identification. By capturing the mutual dependencies between two subproblems, the joint optimization model achieves better prediction accuracy than the supervised learning method. For instance, supervised learning method fails to extract “Ray” and the prerequisite dependency between “Ray” and “Angle” 324 Edge Line Angle Line Interior Angle Line Segment Point Congruence Diagonal Midpoint Convex Congruence Degree Triangle Acute and obtuse triangles Line Segment Right triangle Supplementary angles Midpoint Perpendicular bisector In Triangle Bisec�on Bisec�on Coordinate system (a) Textrank + Hyponym-hypernym. (b) Wikify + Hyponym-hypernym. Angle Line Right Angle Degree Edge Point Line Segment Ray Triangle Congruence Acute Triangles Point Midpoint Angle Right Triangles Bisection Bisection Edge Right Angle Midpoint Triangle Congruence Right Triangles Line (d) Joint optimization model. (c) Supervised KCE + Supervised RI. Figure 4: Case study for geometry concept maps. while joint optimization model makes the correct prediction. A possible reason is that “Ray” is not extracted as a key concept because its content is not very similar to the content of the book chapter (compared to other candidates in this chapter). However, the joint optimization model identifies that “Ray” is likely to be prerequisite of “Angle” which is highly ranked as a key concept in some book chapters. [2] M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction for the web. In IJCAI, volume 7, pages 2670–2676, 2007. [3] A. Bordes, J. Weston, R. Collobert, and Y. Bengio. Learning structured embeddings of knowledge bases. In Conference on Artificial Intelligence, 2011. [4] R. C. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In EACL, pages 9–16, 2006. [5] N.-S. Chen, C.-W. Wei, H.-J. Chen, et al. Mining e-learning domain concept map from academic articles. Computers & Education, 50(3):1009–1021, 2008. [6] W. W. Cohen, H. Kautz, and D. McAllester. Hardening soft information sources. In SIGKDD, pages 255–259. ACM, 2000. [7] O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Web-scale information extraction in knowitall:(preliminary results). In WWW, pages 100–110. ACM, 2004. [8] M. Fan, D. Zhao, Q. Zhou, Z. Liu, T. F. Zheng, and E. Y. Chang. Distant supervision for relation extraction with matrix completion. In ACL, pages 839–849, 2014. [9] P. Ferragina and U. Scaiella. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). pages 1625–1628, 2010. [10] S. E. Gordon, K. A. Schmierer, and R. T. Gill. Conceptual graph analysis: Knowledge acquisition for 7. CONCLUSION AND FUTURE WORK We describe measures that identify prerequisite relationships between concepts. We propose a joint optimization model for concept map extraction from textbooks that utilizes the mutual interdependency between key concept extraction and the identification of related concepts from knowledge structures in Wikipedia. Experiments on six concept maps created manually from six different textbooks show that the proposed method gives promising results. To our knowledge, this is the first work that utilizes the implicit prerequisite relationships embedded in textbook table of contents (TOCs) for prerequisite relationship extraction. Future directions would be to construct concept maps from multiple books from the same area or use similar textbooks to modify each others concept maps. Another direction would be to develop a semi-automatic method for building large scale education area concept maps. 8. REFERENCES [1] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. Springer, 2007. 325 [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. pages 1375–1384, 2011. [29] S. Riedel, L. Yao, and A. McCallum. Modeling relations and their mentions without labeled text. In ECML PKDD, pages 148–163. Springer, 2010. [30] E. K. Rush and S. J. Tracy. Wikipedia as public scholarship: Communicating our impact online. Journal of Applied Communication Research, 38(3):309–315, 2010. [31] A. Samoilenko and T. Yasseri. The distorted mirror of wikipedia: a quantitative analysis of wikipedia coverage of academics. EPJ Data Science, 3(1):1–11, 2014. [32] N. J. Schweitzer. Wikipedia and psychology: Coverage of concepts and its use by undergraduate students. Teaching of Psychology, 35(2):81–85, 2008. [33] F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, pages 697–706. ACM, 2007. [34] M. Surdeanu, J. Tibshirani, R. Nallapati, and C. D. Manning. Multi-instance multi-label learning for relation extraction. In EMNLP, pages 455–465, 2012. [35] P. P. Talukdar and W. W. Cohen. Crowdsourced comprehension: predicting prerequisite structure in wikipedia. In Building Educational Applications Using NLP, pages 307–315. Association for Computational Linguistics, 2012. [36] S.-S. Tseng, P.-C. Sue, J.-M. Su, J.-F. Weng, and W.-N. Tsai. A new approach for constructing the concept map. Computers & Education, 49(3):691–707, 2007. [37] Q. Wang, B. Wang, and L. Guo. Knowledge base completion using embeddings and rules. [38] S. Wang, C. Liang, Z. Wu, K. Williams, B. Pursel, B. Brautigam, S. Saul, H. Williams, K. Bowen, and C. Giles. Concept hierarchy extraction from textbooks. In DocEng, 2015. [39] S. Wang and L. Liu. Prerequisite concept maps extraction for automaticassessment. In WWW, pages 519–521, 2016. [40] B. Wei, J. Liu, J. Ma, Q. Zheng, W. Zhang, and B. Feng. Motif-re: motif-based hypernym/hyponym relation extraction from wikipedia links. In NIPS, pages 610–619. Springer, 2012. [41] R. T. White. Learning science. Basil Blackwell, 1988. [42] I. Witten and D. Milne. An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In AAAI, pages 25–30, 2008. [43] S. Yang, F. Han, Y. Wu, and X. Yan. Fast top-k search in knowledge graphs. In ICDE, 2016. [44] Y. Yang, H. Liu, J. Carbonell, and W. Ma. Concept graph learning from educational data. In WSDM, pages 159–168, 2015. [45] X. Zhang, J. Zhang, J. Zeng, J. Yan, Z. Chen, and Z. Sui. Towards accurate distant supervision for relational facts extraction. In ACL, pages 810–815, 2013. instructional system design. Human Factors: The Journal of the Human Factors and Ergonomics Society, 35(3):459–481, 1993. X. Han and J. Zhao. Named entity disambiguation by leveraging wikipedia semantic knowledge. In CIKM, pages 215–224. ACM, 2009. R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, and D. S. Weld. Knowledge-based weak supervision for information extraction of overlapping relations. In ACL, pages 541–550, 2011. S. Jiang, D. Lowd, and D. Dou. Learning to refine an automatically extracted knowledge base using markov logic. In ICDM), pages 912–917. IEEE, 2012. R. Kavitha, A. Vijaya, and D. Saraswathi. An augmented prerequisite concept relation map design to improve adaptivity in e-learning. In PRIME, pages 8–13. IEEE, 2012. S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of wikipedia entities in web text. In SIGKDD, pages 457–466. ACM, 2009. C. Liang, Z. Wu, W. Huang, and C. L. Giles. Measuring prerequisite relations among concepts. 2015. J. R. McClure, B. Sonak, and H. K. Suen. Concept map assessment of classroom learning: Reliability, validity, and logistical practicality. Journal of research in science teaching, 36(4):475–492, 1999. O. Medelyan, E. Frank, and I. H. Witten. Human-competitive tagging using automatic keyphrase extraction. In EMNLP, pages 1318–1327, 2009. O. Medelyan, I. H. Witten, and D. Milne. Topic indexing with wikipedia. In AAAI, pages 19–24, 2008. R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. In ACL, 2004. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. D. Milne and I. H. Witten. Learning to link with wikipedia. In CIKM, pages 509–518. ACM, 2008. M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. In ACL, pages 1003–1011, 2009. C. Q. Nguyen and T. T. Phan. An ontology-based approach for key phrase extraction. In ACL-IJCNLP, pages 181–184, 2009. C. Okoli, M. Mehdi, M. Mesgari, F. Å. Nielsen, and A. Lanamäki. Wikipedia in the eyes of its beholders: A systematic review of scholarly research on wikipedia readers and readership. JASIST, 65(12):2381–2403, 2014. J. Pujara, H. Miao, L. Getoor, and W. Cohen. Knowledge graph identification. In ISWC, pages 542–557. 2013. J. Pujara, H. Miao, L. Getoor, and W. Cohen. Large-scale knowledge graph identification using psl. In AAAI, 2013. 326