Academia.eduAcademia.edu

ARCA. Semantic exploration of a bookstore

2020, Proceedings of the International Conference on Advanced Visual Interfaces

In this demo paper, we present ARCA, a visual-search based system that allows the semantic exploration of a bookstore. Navigating a domain-specific knowledge graph, students and researchers alike can start from any specific concept and reach any other related concept, discovering associated books and information. To achieve this paradigm of interaction we built a prototype system, flexible and adaptable to multiple contexts of use, that extracts semantic information from the contents of a books' corpus, building a dedicated knowledge graph that is linked to external knowledge bases. The web-based user interface of ARCA integrates text-based search, visual knowledge graph navigation, and linear visualization of filtered books (ordered according to multiple criteria) in a comprehensive coordinated view aimed at exploiting the underlying data while avoiding information overload and unnecessary cluttering. A proof-of-concept of ARCA is available online at http://arca.diag.uniroma1.it CCS CONCEPTS • Human-centered computing → Web-based interaction; • Information systems → Digital libraries and archives; Search interfaces.

ARCA. Semantic exploration of a bookstore Eleonora Bernasconi Miguel Ceriani Massimo Mecella DIAG Sapienza Università di Roma Rome, Italy [email protected] Dipartimento di Informatica Università di Bari Aldo Moro Bari, Italy [email protected] DIAG Sapienza Università di Roma Rome, Italy [email protected] Tiziana Catarci Maria Cristina Capanna Clara Di Fazio DIAG Sapienza Università di Roma Rome, Italy [email protected] DSA Sapienza Università di Roma Rome, Italy [email protected] DSA Sapienza Università di Roma Rome, Italy [email protected] Roberto Marcucci Erik Pender Fabio Maria Petriccione L’Erma di Bretschneider Rome, Italy [email protected] L’Erma di Bretschneider Rome, Italy [email protected] TSP - Tecnologie e Servizi Professionali s.r.l. Rome, Italy [email protected] ABSTRACT In this demo paper, we present ARCA, a visual-search based system that allows the semantic exploration of a bookstore. Navigating a domain-specific knowledge graph, students and researchers alike can start from any specific concept and reach any other related concept, discovering associated books and information. To achieve this paradigm of interaction we built a prototype system, flexible and adaptable to multiple contexts of use, that extracts semantic information from the contents of a books’ corpus, building a dedicated knowledge graph that is linked to external knowledge bases. The web-based user interface of ARCA integrates text-based search, visual knowledge graph navigation, and linear visualization of filtered books (ordered according to multiple criteria) in a comprehensive coordinated view aimed at exploiting the underlying data while avoiding information overload and unnecessary cluttering. A proof-of-concept of ARCA is available online at http://arca.diag.uniroma1.it CCS CONCEPTS · Human-centered computing → Web-based interaction; · Information systems → Digital libraries and archives; Search interfaces. KEYWORDS knowledge graph, books’ catalog, visual search interface ACM Reference Format: Eleonora Bernasconi, Miguel Ceriani, Massimo Mecella, Tiziana Catarci, Maria Cristina Capanna, Clara Di Fazio, Roberto Marcucci, Erik Pender, Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). AVI ’20, September 28-October 2, 2020, Salerno, Italy © 2020 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-7535-1/20/09. https://doi.org/10.1145/3399715.3399939 and Fabio Maria Petriccione. 2020. ARCA. Semantic exploration of a bookstore. In International Conference on Advanced Visual Interfaces (AVI ’20), September 28-October 2, 2020, Salerno, Italy. ACM, New York, NY, USA, 3 pages. https://doi.org/10.1145/3399715.3399939 1 INTRODUCTION Showing the results of the search on the content of a book can be a very hard task if the data are not organized and shown in a meaningful way. Digital libraries and publishing companies need to facilitate the navigation of their catalogs of books to attract most people in reading their catalogs. Hence there is the need to attach semantics to unstructured information like the text of the books, and this is possible with Named Entity Recognition (NER), Named Entity Linking (NEL) and Knowledge Graph (KG) techniques. In this way, the static pieces of the book’s text, become interactive, explorable and interconnected concepts. Some of the authors previously designed a pipeline for semantic enrichment and visualization of text corpus [3]. After the text is extracted from a book and sentences are split, the NER technique discovers the concepts presents in the sentences, like names of people, organizations, places, etc.; the NEL technique disambiguates the meaning of the identified concepts and establishes the relationships between each other; at the end, KG technique enables the explorations of the entities and their connections, through a search interface. The pipeline has been implemented in a tool called ARCA. In this demo paper, we focus on the visual user interface of ARCA. ARCA was tested on a selected catalog of 110 books of L’Erma di Bretschneider, a publishing company in the field of archaeology and ancient Rome history. For the visualization of the knowledge graph, ARCA uses the free visualization tool Ontodia [6]. For the development of the other components, like the books’ catalog component, we used the React framework1 . The development took place in parallel with the evaluation carried out by a small focus group of 1 https://reactjs.org/ AVI ’20, September 28-October 2, 2020, Salerno, Italy Bernasconi et al. Figure 1: The ARCA visual search interface six researchers in archaeology and history. This helped to refine details and features of the interface that improved its usefulness and usability, as the snippet component that shows fragments of text which contain the selected concept. The strength and innovation of ARCA is its flexibility and adaptability to any type of textual content. Other interfaces for visualization and exploration of knowledge graphs have been investigated, but most of the analyzed tools are either monolithic systems like Yewno Discover [1] or are not-ready-to-use systems, like Apache Stanbol2 or the GLOBDEF system [7], which are a set of components able to offer various services for semantic enrichment and the management of metadata, but they need to be integrated and composed before being usable. central board. Only green entities (h) are connected with the catalog of the books accessible from the book button (c) on the top of the screen. The books catalog (e) shows the results of a query that ask to view only the books for which the selected concept is a "concept" or "top concept" namely, where "concept" is among the extracted entities that appear more than 20 times in the book and "top entities" are the 10 most frequent entities extracted from the book. The books can be ordered by relevance (concept/top concept) or by year of publication. If users want to view more information about a book, they can select the info button (d) in the bottom right of the book table and users can view the sentences in the book which contains the selected entities, they are referred to as "snippets". Furthermore, books from the right panel can be dragged to the central board to view them as nodes in the knowledge graph, revealing as connections all the related concepts. 3 4 2 RELATED WORK ARCA INTERFACE In the following, we briefly outline the main ARCA interface, by referring to Figure 1. The expected types of users are mainly two: researchers (mainly in humanities) who want to discover and connect multifarious information to find the books that can help them in their research activities; lay users who are interested in discovering new things. Before starting with the search and the exploration of the contents, users can view the legend (a), the tutorials (b) and helps that will give them the instructions to perform the activities that ARCA allows. In Figure 1, they can be observed 4 different kinds of colors of the entities: green, light green, red and blue. The green entities (h) represent the concepts extracted from the books’ corpus; the light green entities (i) represent the words that the NER identifies in a sentence as entities; the red entities (g) represent the books of the catalog (e.g., those ones of L’Erma) and finally, the blue entities (f) are the concepts present in the DBpedia knowledge base [4]. Starting from the search of a concept the users can observe the results and choose the topic to be explored by drag-drop it on the 2 https://stanbol.apache.org CONCLUSIONS AND FUTURE WORK This demo illustrates the ARCA interface, a concrete tool for enhancing access to a catalog of books through knowledge graph-based exploration. ARCA allows users to discover new things in the books’ catalog through semantic search and the free exploration of content and connections. In future developments, the authors envision the implementation of new components to further enable multi-faceted content exploration. Such as the development of a storytelling component that connects geographic places and metadata to maps [2] and software that extracts images and caption from books [5] and integrate them into the ARCA knowledge graph. ACKNOWLEDGMENTS This research has been partly supported by project ARCA (POR FESR Lazio 2014ś2020 - Avviso pubblico łCreatività 2020ž, domanda prot. n. A0128-2017-17189). ARCA. Semantic exploration of a bookstore REFERENCES [1] Manisha Bolina. 2019. Yewno Discover. Nordic Journal of Information Literacy in Higher Education 11, 1 (2019). [2] Cecilia Cadenas. 2014. Geovisualization: Integration and Visualization of Multiple Datasets Using Mapbox. (06 2014). [3] Miguel Ceriani, Eleonora Bernasconi, and Massimo Mecella. 2020. A Streamlined Pipeline to Enable the Semantic Exploration of a Bookstore. In Digital Libraries: The Era of Big Data and Data Science, Michelangelo Ceci, Stefano Ferilli, and Antonella Poggi (Eds.). Springer International Publishing, Cham, 75ś81. [4] Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, et al. 2014. DBpedia ś A large-scale, multilingual knowledge base extracted AVI ’20, September 28-October 2, 2020, Salerno, Italy from wikipedia. Semantic Web Journal 5 (2014), 1ś29. [5] Pengyuan Li, Xiangying Jiang, and Hagit Shatkay. 2018. Extracting Figures and Captions from Scientific Publications. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (Torino, Italy) (CIKM ’18). Association for Computing Machinery, New York, NY, USA, 1595ś1598. https: //doi.org/10.1145/3269206.3269265 [6] Dmitry Mouromtsev, Dmitry Pavlov, Yury Emelyanov, Alexey Morozov, Daniil Razdyakonov, and Mikhail Galkin. 2015. The simple, Web-based tool for visualization and sharing of semantic data and ontologies. [7] Maria Nisheva-Pavlova and Asen Alexandrov. 2018. GLOBDEF: A Framework for Dynamic Pipelines of Semantic Data Enrichment Tools. In Proc. of MTSR 2018. Springer, 159ś168.