Academia.eduAcademia.edu

PUTTING HISTORICAL DATA IN CONTEXT: HOW TO USE DSPACE-GLAM

PUTTING HISTORICAL DATA IN CONTEXT: HOW TO USE DSPACE-GLAM We will talk about… 1. 2. 3. 4. 5. 6. 7. Theoretical and methodological foundations of the DSpace-GLAM project Managing digital objects with DSpace Exentending the DSpace data model with DSpace-GLAM Integrating DSpace and DSpace-GLAM entities Digital cultural resources fruition and sharing with add-ons Dataset analysis with CKAN Conclusions The BIG DATA age • Since several years the term "Big Data" has been bursting into the world of Information Technology, • Promising potential related to a new generation of technologies and architectures able to extract value from the enormous amount of data which is continuously produced in the most different fields A new scientific paradigm ? In the science domain "Big Data" are seen as an opportunity even bigger The "data deluge" will make obsolete some of the fundamental concepts on which the scientific method has been based so far A new scientific paradigm ? No more theories? No more hypothesis? No more models? Numbers speak for themselves? Certainly new opportunities… • Being able to manipulate and analyze massive amounts of data represents an important progress for science • It o ’t a olish the eed to build, refine and verify theories • It will allow to formulate hypotheses and test them infinitely more rapidly and on an infinitely larger sample than in the past Source:http://bouache.com/blog/big-data/ …also for humanities No data deluge, but…growing amount of data • • • • • Databases Electronic journals Digitization Tools for data extaction … A variety of multidisciplinary data are related to Cultural Heritage and History Different in: Typology Format Structure Scale More and more complexity In the humanities most of the data are created or collected by people (not measured by instruments) They are affected by individuals, place, time The are fragmentary, partial, biased Source: http://www.asianscientist.com/2016/07/print/body-as-a-source-of-big-data/ Putting data in context Digital Cultural Data have to be analyzed together with all contextual information, digital and not digital, needed to answer research questions, such as: • (cultural, social, economic, technological… production context of a document/monument • formation processes of an archaeological record • contextual associations at different levels and scales (according to the different dimensions of variations) Source: https://ddd.uab.cat/pub/expbib/2006/terradefoc/10.pdf A Digital Humanities approach is fundamental… Technological Cultural Environmental Social Economic Such an approach, with its focus on relationships, can help in identifying the important dimensions of variation (the CONTEXT) It can help in analyzing primary sources as evidences of a network of heterogeneous systems which can be studied by means of them through a global (holistic) and multidimensional analysis Source: Hodder I. 2016, Studies in Human-Thing Entanglement, p. 28 …within a Digital Library Management System To move such an approach from theory to practice we need infrastructures and tools for integration, analysis and storage of digital data and resources. Today most of the cultural digital resources and data are in the Digital Libraries or Repositories Are Digital Libraries and Repositories that must provide tools for: • modeling, visualising and analysing information, both in a qualitative and quantitative way, as well as collaboratively working on it • highlighting the relationships between data at different scales • explaining interpretations about the important dimensions of variation and about the network of contextual relations in which historical sources are involved To enter the daily workflow of historians, archaeologists and humanities scholars. Why DSpace? To achieve the outlined goals and build a state-of-art Digital Library Management System, open source software is preferable. Development of open source software gives effective way to create Digital Library Management Systems with a small financial investment. Looking exactly at sustainability, among the most used open source Digital Library Management Systems, we chose DSpace. , Why DSpace? DSpace out-of-the-box allows to: • capture and describe digital material using a submission workflow module, or a variety of batch ingest options • distribute digital assets over the web through a search and retrieval system • preserve digital assets over the long term , Why DSpace? The system is based on the specifications of the OAIS (Open Archival Information System) for Long Term Preservation and is able to manage the whole "life-cycle" of a digital object in terms of "Digital Curation", by means of: • metadata creation according to different standards • SIP (Submission Information Package) import and validation • AIP (Archival Information Package) creation • AIP export • storage management • digital resources dissemination (also by means of the OAIPMH) • digital object history management and integrity check , Why DSpace? There are over 2200 digital repositories and libraries worldwide using the DSpace application for a variety of digital archiving and dissemination needs. , DSpace is often used as an institutional repository to provide access to research outputs, scholarly publications, library collections, educational material and more. It is also used as a digital library to store, preserve and disseminate digital cultural heritage. A fairly large part of the world cultural and scientific heritage is already managed, accessed and preserved using DSpace It makes sense to enhance a system already widely used rather than propose to migrate data to new platforms DSpace Data Model , Communities & Collections , • Communities and collections are entities useful to aggregate DSpace items by: • Provenance and responsibility >>> Communities • Metadata, workflow, curation >>> Collections • They both aggregate the items but they are conceptually different things! Communities , Create your Community Collections , Create your Collection Collections , Collections , Collections , Workflow , Curating items , User management , E-People and Groups are the way DSpace identifies application users for the purpose of granting privileges DSpace metadata , Out-of-the-box DSpace can support multiple flat metadata schemas You can configure multiple schemas by ea s of the Metadata “ he a Regist and select metadata fields from a mix of configured schemas to describe your items Communities and collections have some simple descriptive metadata (a name, and some descriptive prose) The submission process , Defining the submission form , Configure the submission form by means of input-form.xml file You can configure different forms for different collections You can create internal vocabularies for the fields input-form.xml , input-form.xml dc-schema (Required) : Name of metadata schema employed, e.g. dc for Dublin Core. This value must match the value of the schema element defined in the Metadata Schema Registry , dc-element (Required) : Name of the element dc-qualifier: Qualifier of the element entered, e.g. when the field is contributor.advisor the value of this element would be advisor. Leaving this out means the input is for an unqualified element. repeatable: Value is true when multiple values of this field are allowed, false otherwise. When you mark a field repeatable, the UI servlet will add a control to let the user ask for more fields to enter additional values. label (Required): Text to display as the label of this field, describing what to enter, e.g. "Your Advisor's Name". input-type(Required): Defines the kind of interactive widget to put in the form to collect the Dublin Core value. input-form.xml hint (Required): Content is the text that will appear as a "hint", or instructions, next to the input fields. Can be left empty, but it must be present. , required: When this element is included with any content, it marks the field as a required input. If the user tries to leave the page without entering a value for this field, that text is displayed as a warning message. For example, <required>You must enter a title.</required> Note that leaving the required element empty will not mark a field as required, e.g.:<required></required> input-form.xml – dropdown menus , To create an internal flat vocabulary you have to: • use the «dropdown», «qualdrop» or «list» value within the <input-type> element • populate the <value-pairs> element Hierarchical Taxonomies and Controlled Vocabularies , Dspace offers also a way for structuring and managing more complex, hierarchical controlled vocabularies Managed in a separate file Taxonomies are described in XML Vocabularies are invoked from the inputform.xml, using the <vocabulary> tag within the related <field> Batch submission process Requires the creation of a DSpace Simple Archive: , <dublin_core> <dcvalue element="title" qualifier="none">A Tale of Two Cities</dcvalue> <dcvalue element="date" qualifier="issued">1990</dcvalue> <dcvalue element="title" qualifier="alternative" language="fr">J'aime les Printemps</ dcvalue> </dublin_core> • A directory for each item to import, containing: • the files that make up the item. • An xml file where each metadata element has it's own entry within a <dcvalue> tagset. There are currently three tag attributes available in the <dcvalue> tagset: • <element> - the Dublin Core element • <qualifier> - the element's qualifier • <language>- (optional)ISO language code for element • A • An (optional) collection file with the information about the collection(s) the item belongs to o te ts file, ith the files e u e atio UI Batch Import You have to: , • Compress the item directories into a zip files. • Place the zip file in a public domain URL, like Dropbox or Google Drive or wherever you have access to do so • Then log-in as Administrator and fill the form UI Batch Import , Batch metadata editing DSpace provides a batch metadata editing tool. , The batch editing tool facilitates the user to perform the following: • Batch editing of metadata by means of a comma delimited file in CSV format • Batch additions of metadata (e.g. add an abstract to a set of items) • Batch find and replace of metadata values (e.g. correct misspelled surname across several records) • Mass move items between collections • Mass deletion, withdrawal, or re-instatement of items • Enable the batch addition of new items (without bitstreams) via a CSV file • Re-order the values in a list (e.g. authors) Batch metadata editing , Extending Dspace Cultural Institutions in the «Big Data Age» ask for: • • • • Complex and multidimensional metadata structures Complex data models Relationships management between different entities Tools for digital data and resources visualization, analysis and interpretation Wh ot use a needs? e te ded e sio of DSpace to meet these relevant DSpace-GLAM (Galleries, Libraries, Archives, Museums) Built by 4Science on top of DSpace and to meet the needs of Cultural Heritage institutions Flexible and extensible data model inherited from DSpace-CRIS (our RIMS) to manage relevant metadata standards and specific conceptual models With dedicated add-ons for digital objects curation, fruition and sharing Also an add-on for datasets visualization and analysis DSpace-GLAM (Galleries, Libraries, Archives, Museums) DSpace-GLAM is free, open source, compliant with open standards Add-ons are mainly distributed following a new business model (crowdsourcing) Provides institutions with a sustainable and effective tool to manage and analyze Cultural Heritage Information Weakness of DSpace metadata management • Flat metadata model • Weak support for technical and structural metadata • All information are stored as string at the database level with minimal support (and validation) for data entry in the UI • DSpace-GLAM improves the metadata at the item level providing: -Additional input types for data entry (number, year and regex validation) -Partial support for nested metadata -Support for technical and structural metadata DSpace-GLAM: interoperability • Connect to VIAF records and Getty Vocabularies for precise identification , of persons, artists and places • It has been reported to work nicely with «plain» DSpace, with the authority implementation. Plan to include it out-of-box in DSpace 7 Extending the DSpace Data Model DSpace-GLAM can manage all the entities important to contextualize digital cultural heritage: • • • • • • • Persons Families Fonds Events Places Concepts ………….. Entities can be created to integrate different metadata standards and conceptual models Extending the Data Model Pre-defined entities • Persons • Projects • Organizations are pre-defined entities inherited from DSpace-CRIS … ut ou a e ot e ui ed to use all of the .  you can define additional entities  you can define your own relationships between entities, including the ones that you have defined Defining other entities Entities components Tabs Fields Box Entities components: tabs Entities components: boxes Entities components: fields DSpace-GLAM visibility and security Data model configuration • Each DSpace-GLAM entity instance has a status flag • Public: the details page is visible to anyone and it will be linked where appropriate. The record is included in public search results • Private: only administrators can access the details page. The entity is indexed only for use as authority entry • Each property/attribute value has an edit mode: • • • • Editable Visibility flag only Only Administrators Read only • A field becomes visible when included in a public visible tab/box DSpace-GLAM visibility and security Data model configuration • Visibility of a tab or box can be restricted to System administrators Only RP owner Admins and RP Owner  specific users and groups related to the entity instance • To restrict the visibility of a box or tab to specific groups or users one or more properties must be indicated containing the users and/or groups that have access to the protected box / tab model configuration Data modelData configuration • It can be performed via UI and exported to xls • It can be imported from XLS files Creating entities relationships Data model configuration Creating entities relationships Data model configuration Creating inverse relationships between entities Data model configuration DSpace-CRIS can use the SOLR indexes to reverse a relation • Documents are linked to the person  • But you can also list the documents under a specific person  Relations are defined in the configuration spring file cris-relationpreference.xml and characterized by A name The target entity (a CRIS Entity or a DSpace Item) The SOLR query with {0}, {1} placeholders to be replaced with the CRIS-ID or the uuid of the source CRIS instance Creating inverse relationships between entities Data model configuration (cris-relationpreference.xml) Name <bean id="relationINTERPRETATIONVSEVENTSConfiguration" class="org.dspace.app.cris.configuration.RelationConfiguration"> Target entity <property name="relationName" value="crisinterpretation.events" /> <property name="relationClass" value="org.dspace.app.cris.model.ResearchObject" /> <property name="type" value="crisevents" /> <property name="query"> <value>crisevents.eventsrelatedinterpretation_authority:{0}</value> </property> </bean> Solr query Creating inverse relationships between entities Data model configuration Inverse relations can be • Visualized • Used to show aggregated statistics • To be visualized, relations are embedded in components (see criscomponents.xml) • Creating inverse relationships between entities Data model (cris-components.xml) configuration <!-- Dynamic object component --> <bean id="doComponentsService" class="org.dspace.app.cris.integration.CrisComponentsService"> <property name="components"> <map> <entry key="journalspublications" value-ref="publicationlistforjournals" /> <entry key="eventsdocuments" value-ref="publicationlistforevents" /> <entry key="placesevents" value-ref="eventlistforplaces" /> <entry key="eventsperson" value-ref="personlistforevents" /> <entry key="fondschild" value-ref="fondschildforfonds" /> <entry key="fondspublications" value-ref="publicationlistforfonds" /> <entry key="conceptdocuments" value-ref="publicationlistforconcept"/> <entry key="conceptperson" value-ref="personlistforconcept"/> </map> </property> Name of the related box for </bean> visualizing data Creating inverse relationships between entities Data model (cris-components.xml) configuration <!-- Person list for Events dynamic entity --> <bean id="personlistforevents" class="org.dspace.app.webui.cris.components.CRISRPConfigurerComponent"> <property name="relationConfiguration" ref="relationEVENTSVSRPConfiguration" /> <property name="commonFilter"> <util:constant static-field="org.dspace.app.webui.cris.util.RelationPreferenceUtil.HIDDEN_FILTER" /> </property> <property name="target" value="org.dspace.app.cris.model.ResearchObject" /> <property name="facets" ref="facetsRPforComponentConfiguration" /> <property name="types"> <map> <entry key="all" value-ref="allObjectsComponent" /> </map> </property> </bean> Creating inverse relationships Data model configuration Integrating DSpace and DSpace-GLAM Data model configuration (dspace.cfg) ##### Authority Control Settings ##### plugin.named.org.dspace.content.authority.ChoiceAuthority = \ org.dspace.app.cris.integration.ORCIDAuthority = RPAuthority,\ org.dspace.content.authority.ItemAuthority = PublicationAuthority,\ org.dspace.content.authority.ItemAuthority = DataSetAuthority,\ org.dspace.app.cris.integration.DOAuthority = EVENTAuthority,\ org.dspace.app.cris.integration.DOAuthority = FONDSAuthority,\ org.dspace.app.cris.integration.DOAuthority = CONCEPTAuthority,\ org.dspace.app.cris.integration.DOAuthority = INTERPRETATIONAuthority,\ • All the GLAM’s entities can be linked with DSpace Items and used as authorities for ite ’s metadata • This can be done adding some code to dspace.cfg file Integrating DSpace and DSpace-GLAM Data model configuration (dspace.cfg) Display mode For authority values choices.plugin.dc.relation.conference = EVENTAuthority Authority name choices.presentation.dc.relation.conference = suggest authority.controlled.dc.relation.conference = true Authority has its own ID cris.DOAuthority.dc_relation_conference.filter = resourcetype_authority:events Origin cris.DOAuthority.dc.relation.conference.new-instances = events for authority values ItemCrisRefDisplayStrategy.publicpath.dc.relation.conference = events choices.plugin.dc.relation.concept = CONCEPTAuthority choices.presentation.dc.relation.concept = suggest authority.controlled.dc.relation.concept = true cris.DOAuthority.dc.relation_concept.filter = resourcetype_authority:concept cris.DOAuthority.dc.relation.concept.new-instances = concept ItemCrisRefDisplayStrategy.publicpath.dc.relation.concept = concept choices.plugin.dc.relation.fond = FONDSAuthority choices.presentation.dc.relation.fond = suggest authority.controlled.dc.relation.fond = true cris.DOAuthority.dc_relation_fond.filter = resourcetype_authority:crisfonds AND crisfonds.fondsleaf:true ItemCrisRefDisplayStrategy.publicpath.dc.relation.fond = fonds Entity to populate with new values Path to use to link the entity DSpace and DSpace-GLAM DataIntegrating model configuration Creating inverse relationships between items and Data model configuration entities Creating inverse relationships between items and Data model configuration entities Clustering of related objects Data model configuration Out-of-the-box are available components implementations to allow configurable rendering of inverse relation for each entities (dspace items or dspace-glam entities) It is possible • to configure which facets show in the component • to apply filters to the relation • It is possible to enable a clustering using custom categories defined by facet queries It is aware of the preference expressed for the relationships Managing hierarchical archival structures Extending the data model makes the system able to manage the hierarchical metadata structure required by archival standards such as ISAD (G) and EAD DSpace-GLAM can also manage the production and preservation context of the archive required by ISAAR-CPF, EAC-CPF and ISDIAH Creating and managing Archival Fonds at different levels Relating an Archival Unit (Item) to a Fond Visualizing hierarchical archival structures Overview of the DSpace-GLAM data model Overview of the DSpace-GLAM data model Pointing out Social Networks The system is able to draw graphs based on relationships between Persons using data from the different entities and from the DSpace Items In particular it draws relationships between persons who: • • • • Co-authored the same items Partecipated in the same event(s) Partecipated in event(s) in the same place(s) Are related to the same concept(s) Visualizing relationships between historical figures Network configuration (network.cfg) Networks are implemented by plugins You can write your own implementation typically starting from the default ones You can canfigure the network layout (colors, nodes numbers, levels) Formalizing and analysing interpretations Interpretations are logical processes which starts from data and/or assumptions and through logical reasoning and connecting persons, events, documents, etc., arrive to one or more conclusions Often, in humanities, such processes are merged and hidden within natural language narratives To make such processes explicit, we have to scompose them in different components and in atomic propositions and display such elements Formalizing and analysing interpretations Linking interpretations to entities With DSpace-GLAM you can link an interpretation to the items, the events and the persons, it is related to Moreover you can link different interpretations to the same entity Contextualizing historical data Interpretation: Ronchey Painting: The flagellation Concept: Renaissance Concept: Humanism Concept: Neoplatonism Painter: Piero della Francesca Event: Council of Ferrara (AD 1438) Event: Council of Mantua (AD 1459) Person: Emperor John VIII Palaiologos Place: Ferrara Place: Mantua Ready for Linked Open Data Ready for Linked Open Data Linking and relating the created entities with other authorities, the institution is ready to be part of the Linked Data Graph GLAM Now we are working to include also the additional entities into the DSpace RDF management features Global search across the whole Digital Library Navigation Global search across the whole Digital Library Infographics Global search across the whole Digital Library Top objects using several criteria Faceted Search Facets Customizable Browse indexes Customizable Browse indexes DSpace-GLAM use cases Cutural Heritage image files (digitalized manuscripts, paintings, monuments, archaeological finds, rare books, etc.) need to be consulted online, discussed and commented / annotated IIIF protocols and formats allow you to meet these requirements in a standard and understandable (for both humans and machine) way DSpace-GLAM use cases High-quality scanned books have images typically over 100MB for each page The structure of image sequences are complex and relevant (sequences of pages, of the phases of an historical event, of a cycle of frescoes, etc.) DSpace-GLAM use cases The same requirements apply to audio and video content -Streaming -Internal structure -Annotation / commenting / transcript Adopt an open standard: the MPEG-DASH format allows adaptive streaming over simple html client with full support for multiple tracks, ToC, subtitles 4Science IIIF Image Viewer Addon IIIF Compliant 1. Presentation API 2. Image API 3. Search API 4. Authentication API (soon) D“pa e ite ith see o li e optio Offering an integrated Universal Viewer player IIIF Image API allows a smooth interaction with the image files IIIF Presentation API generated on the fly using the metadata of the item and the bitstreams Bitstreams metadata An example from a PDF document offered as a complex package of pageimage Hierarchical ToC Link images with their textual transcription / OCR Indexing standard format (hOCR) in a webannotation server to supply IIIF Search API Side by side – image vs text using an additional OCR panel An example in Arabic characters https://dspace-glam.4science.it/handle/1234/24 IIIF Image Viewer: share and reuse Share images with other scholars/users without waiving proper attribution, e.g. using the «manifest» JSON file: https://dspaceglam.4science.it/json/iiif/1234/11/30/ manifest in another IIIF Image Viewer: http://projectmirador.org/demo/ Audio/Video streaming Full open source stack: 1. Transcoding 2. Adaptive streaming 3. MPEG-DASH standard Audio/Video streaming https://dspaceglam.4science.it/explore?bitstream_id=1841&handle=1234 /7&provider=video-streaming Allows the transcode of the audio/video formats in a format and encoding appropriate to the adopted media server (adaptive video streaming) Using the DASH standard protocol allows sharing video with other scholars/users without waiving proper attribution, e.g. using the «manifest» XML file: https://dspace-glam.4science.it/avstream/1841/ch/0/29/94/83/manifest.mpd in another DASH client http://dashif.org/reference/players/javascript/v2.4.1/sampl es/dash-if-reference-player/index.html Visualizing and analysing datasets 4Science has released a free and open source integration with CKAN, the world's leading open-source data management platform Using an extensible viewer framework you can now offer data discovery, exploration, preview, sampling and visualization from your DSpace repository CKAN makes open webservices for tabular data available: https://ckan.org/ Visualizing and analysing datasets We look at Dspace-GLAM not only as a tool for management and preservation, but also for analysis Our integration with CKAN allows the visualization and analysis of repertoires and inventories by means of grids, graphs or maps Datasets can also be related to items and other entities https://dspaceglam.4science.it/handle/1234/15 Archaeological finds geolocalization Visualizing and analysing datasets https://dspace-glam.4science.it/explore?bitstream_id=1971&handle=1234/22&provider=ckan-recline Pottery distribution Why do I need DSpace-GLAM? • DSpace-GLAM is a powerful extension of DSpace created by 4Science to meet the needs of Galleries, Libraries, Archives and Museums • to be able to manage, analyze and preserve digital objects • together with historical, archaeological or other cultural datasets, • relating them with other entities such as persons, events, places, concepts, etc. • to describe the context of cultural objects and data, according to different granularity levels, and to different interpretations • using worldwide adopted, cutting-edge, open-source software and open standards How I get DSpace-GLAM? • Every institution, can install Dspace-GLAM or upgrade its DSpace installation to DSpace-GLAM, extending documents management by creating new entities • Your publications will be safely managed as before, adding the advantage of linking them to relevant information such as authors, datasets, events, concepts, networks and much more When can I move to DSpace-GLAM? • Now: every moment is appropriate to enhance your Digital Library, to better support research activities and make your service more relevant • Upgrading from DSpace to DSpace-GLAM or installing a brand- e e te ded DLMS does not take much extra effort and it is largely rewarded by the extraordinary results that you can get • As an extra security, (if you already have a Dspace repository) DSpace-GLAM does not alter the structure of the current objects managed by DSpace, so you can go back from DSpace-GLAM to DSpace at any time just dropping (a lot of) e t a ta les… ut e a e o fide t that ou ill ot a t to do that Data Science in a Digital Humanities Framework • Our goal is to provide an environment for integrating the traditional hermeneutic and interpretative work of historical sciences with data visualization and analysis • In this way, we hope, there may be a fundamental change in the way digital cultural heritage is experienced, analyzed and contributed to by the whole scientific community Thanks for your attention Andrea Bollini Claudio Cortese <[email protected]> <[email protected]> mobile: +39 333 934 1808 mobile: +39 333 9340846 skype: a.bollini skype: claudio.cortese74 orcid: 0000-0002-9029-1854 orcid: 0000-0003-4572-9711