Versioning
18 Followers
Recent papers in Versioning
Cloud computing offers a powerful abstraction that provides a scalable, virtualized infrastructure as a service where the complexity of fine-grained resource management is hidden from the end-user. Running data analytics applications in... more
Bancos de dados e documentos sao comumente mantidos em separado nas organizacoes, controlados por Sistemas Gerenciadores de Bancos de Dados (SGBDs) e Sistemas de Recuperacao de Informacao (SRIs), respectivamente. Essa separacao tem... more
The present paper aims to shed light on the experiences of racism faced by second-generation black immigrants in Canada and explore their quest for identity in the context of a multicultural Canada. David Chariandy's novel Brother,... more
In unstructured distributed P2P systems there is no logical structure to control the peers coming and leaving the network, which can occur anytime due to mobility. Thus, data exchange with consistence, and data availability are very... more
This paper considers a two-stage development problem for information goods with costless quality degradation. In our model, a seller of information goods faces customers that are heterogeneous with regard to both the marginal willingness... more
It is commonly believed that piracy of information goods leads to lower profits, which translate to lower incentives to invest in innovation, and eventually to lower quality products. Manufacturers, policy-makers, and researchers, all... more
With sinking storage costs, it becomes more and more feasible, and popular, to retain past versions of documents and data. While undoing changes is worthy, this becomes even more valuable if the data is queryable. Nowadays, there are two... more
The maintenance of materialized views in large-scale environments composed of numerous information sources (ISs), such as in the WWW, is complicated by ISs not only continuously modifying their contents but also their capabilities... more
The trajectory data warehouse (TDW) view definitions are constructed from heterogeneous mobile information sources schema that are more and more independent. In fact, they frequently change their content due to perpetual transactions... more
File-sharing semantics is used by the file systems for sharing data among concurrent client processes in a consistent manner. Session semantics is a widely used file-sharing semantics in Distributed File Systems (DFSs). The main... more
Hadoop Distributed File System (HDFS) is the core component of Apache Hadoop project. In HDFS, the computation is carried out in the nodes where relevant data is stored. Hadoop also implemented a parallel computational paradigm named as... more
Background: Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim... more
Tomando como referência o modelo editorial que está a ser desenvolvido para a poesia de Pedro Homem de Mello (1904- 1984), este trabalho procura refletir sobre o problema da atribuição crítica de autoridade, nos casos em que se documentam... more
Immeasurable thanks goes to the Almighty God for granting me the grace and direction to complete this thesis. My gratitude goes to my parents Mr. and Mrs. Daniel Ani and my siblings, my family members, friends and my dear Sarah Garba... more
User-Defined Functions (UDF) allow application programmers to specify analysis operations on data, while leaving the data management tasks to the system. This general approach enables numerous custom analysis functions and is at the heart... more
The ubiquity of Big Data has greatly influenced the direction and the development of storage technologies. To meet the needs of storing and analyzing Big Data, researchers and administrators have turned to parallel and distributed storage... more
These last years, the amount of data generated by information systems has exploded. It is not only the quantities of information that are now estimated in Exabyte, but also the variety of these data which is more and more structurally... more
Modern day systems are facing an avalanche of data, and they are being forced to handle more and more data intensive use cases. These data comes in many forms and shapes: Sensors (RFID, Near Field Communication, Weather Sensors),... more
Web repositories are large scale warehouses of data downloaded from the Web, needed by applications that summarize that data to produce results that help people use information. Time is a central dimension in Web data, because the Web is... more
Atualmente, um massivo volume de dados tem sido produzido pelos mais variados tipos de fontes de dados. A facilidade de acesso a esses dados apresenta novas oportunidades, no entanto, escolher quais fontes de dados são mais adequadas para... more
Gerenciamento de emergência é uma tarefa complexa pois envolve comunicação e colaboração entre variadas organizações e seus sistemas. Integração de dados e interoperabilidade de sistemas estão entre os maiores desafios nesta área. Como... more
As data volumes increase at a high speed in more and more application fields of science, engineering, information services, etc., the challenges posed by data-intensive computing gain an increasing importance. The emergence of highly... more
The use of ontology is widely spread among software engineering groups as a way to represent, structure, share and reuse knowledge. As projects progress, the ontological understanding of the domain may change, evolve. New domain concepts... more
With the emergence of Cloud Computing, the amount of data generated in different fields such as physics, medical, social networks, etc. is growing exponentially. This increase in the volume of data and their large scale make the problem... more
In this era of developing technologies, one of the most promising is cloud computing that has been functioning since years and used by individuals and large enterprises to provide different kind of services to the world. Cloud computing... more
Deduplicação de registros (DR) tem como objetivo identificar instâncias que representam a mesma entidade do mundo real em repositórios de dados. No ambiente governamental, o processo de DR facilita a identificação de irregularidades e... more
Hardware transactional memory (HTM) systems have been studied extensively along the dimensions of speculative versioning and contention management policies. The relative performance of several designs policies has been discussed at length... more
O crescimento na produção e disponibilização de informacões não estruturadas na Web aumenta diariamente. Essa abundância de informações desestruturadas representa um grande desafio para a aquisição de conhecimento que seja processado por... more
Ontologies evolve continuously throughout their lifecycle to respond to different change requirements. Several problems emanate from ontology evolution: capturing change requirements, change representation, change impact analysis and... more
To distribute software, commercial vendors of proprietary software have the opportunity to use some dual licensing (DL) strategy i.e. to provide their software under two different licensing terms (proprietary and open source). We... more
Resource management is a key factor in the performance and efficient utilization of cloud systems, and many research works have proposed efficient policies to optimize such systems. However, these policies have traditionally managed the... more
The NSDL Metadata Registry is designed to provide humans and machines with the means to discover, create, access and manage metadata schemes, schemas, application profiles, crosswalks and concept mappings. This paper describes the general... more
ALMOST HOME IS AN ENTHRALLING EXAMINATION of the pursuit for belonging, the contradictions of black kinship and the contestations of colonial institutions. Ruma Chopra exposes the fraught nature of black mobility and freedom as it sits in... more
Quality of data plays a very important role in any scientific research. In this paper we present some of the challenges that we face in managing and maintaining data quality for a terabyte scale biometrics repository. We have developed a... more
Hadoop Distributed File System (HDFS) is the core component of Apache Hadoop project. In HDFS, the computation is carried out in the nodes where relevant data is stored. Hadoop also implemented a parallel computational paradigm named as... more
Product Data Management (PDM) and Software Configuration Management (SCM) are the disciplines of building and controlling the evolution of a complex artifacts; either physical or software. Surprisingly, these two fields have evolved... more
Data warehouse systems integrate data from heterogeneous sources. These sources are autonomous in nature and change independently of a data warehouse. Owing to changes in data sources, the content and the schema of a data warehouse may... more
Hardware transactional memory (HTM) systems have been studied extensively along the dimensions of speculative versioning and contention management policies. The relative performance of several designs policies has been discussed at length... more
In WebDAV: Next-Generation Collaborative Web Authoring, Lisa Dusseault thoroughly describes the WebDAV protocol and the rationale behind the current version (see Y. Goland et al., HTTP Extensions for Distributed Authoring WebDAV... more
Deduplicação de registros (DR) tem como objetivo identificar instâncias que representam a mesma entidade do mundo real em repositórios de dados. No ambiente governamental, o processo de DR facilita a identificação de irregularidades e... more
This paper addresses the design and implementation of an adaptive document version management scheme. Existing schemes typically assume: (i) a priori expectations for how versions will be manipulated and (ii) fixed priorities between... more
In the rapidly evolving Cloud market, the amount of data being generated is growing continuously and as a consequence storage as a service plays an increasingly important role. In this paper, we describe and compare two new approaches,... more
Active Storage provides an opportunity for reducing the bandwidth requirements between the storage and compute elements of current supercomputing systems, and leveraging the processing power of the storage nodes used by some modern file... more
The capability of taking snapshots is approaching ubiquity as a feature of file systems and data storage arrays. Here, we present an approach to structuring and managing snapshots in a storage space that provides for rapid creation and... more
- by Brian Stuart
O sistema judiciário é composto por inúmeros documentos relacionados a processos jurídicos. Esses documentos podem conter informações relevantes que suportem a tomada de decisão em processos futuros. No entanto, a coleta dessas... more
In this paper we discuss several features of XP we have used in developing curricula and courses at Duke University and the University of Northern Iowa. We also discuss those practices of XP that we teach as part of the design and... more
We present NeST, a flexible software-only storage appliance designed to meet the storage needs of the Grid. NeST has three key features that make it well-suited for deployment in a Grid environment. First, NeST provides a generic data... more