Mapreduce
2,330 Followers
Recent papers in Mapreduce
HDT a is binary RDF serialization aiming at minimizing the space overheads of traditional RDF formats, while providing retrieval features in compressed space. Several HDT-based applications, such as the recent Linked Data Fragments... more
In this paper, we analytically derive, implement, and empirically evaluate a solution for maximizing the execution rate of Map-Reduce jobs subject to power constraints in data centers. Our solution is novel in that it takes into account... more
In last two decades continues increase of comput-ational power and recent advance in the web technology cause to provide large amounts of data. That needs large scale data processing mechanism to handle this volume of data. MapReduce is a... more
Pretty much every part of life now results in the generation of data. Logs are documentation of events or records of system activities and are created automatically through IT systems. Log data analysis is a process of making sense of... more
Twitter produces a massive amount of data due to its popularity that is one of the reasons underlying big data problems. One of those problems is the classification of tweets due to use of sophisticated and complex language, which makes... more
Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed... more
The emergence of big data analytics as a way of deriving insights from data brought excitement to mathematicians, statisticians, computer scientists and other professionals. However, the absence of a mathematical foundation for analytics... more
Data outsourcing allows data owners to keep their data at untrusted clouds that do not ensure the privacy of data and/or computations. One useful framework for fault-tolerant data processing in a distributed fashion is MapReduce, which... more
Even though there are lots of invented systems that have implemented customer analytics, it’s still an upcoming and unexplored market that has greater potential for better advancements. Big data is one of the most raising technology... more
The computer industry is being challenged to develop methods and techniques for affordable data processing on large datasets at optimum response times. The technical challenges in dealing with the increasing demand to handle vast... more
Big Data is large-volume of data generated by public web, social media and different networks, business applications, scientific instruments, types of mobile devices and different sensor technology. Data mining involves knowledge... more
A flexible, efficient and secure networking architecture is required in order to process big data. However, existing network architectures are mostly unable to handle big data. As big data pushes network resources to the limits it results... more
Data Mining refers to the process of mining useful data over large datasets. The discovery of interesting association relationships among large amounts of business transactions is currently vital for making appropriate business decisions.... more
— In today’s age of information technology processing data is a very important issue. Nowadays even terabytes and petabytes of data is not sufficient for storing large chunks of database. The data is too big, moves too fast, or doesn’t... more
Map Reduce has gained remarkable significance as a prominent parallel data processing tool in the research community, academia and industry with the spurt in volume of data that is to be analyzed. Map Reduce is used in different... more
Pretty much every part of life now results in the generation of data. Logs are documentation of events or records of system activities and are created automatically through IT systems. Log data analysis is a process of making sense of... more
Twitter, a micro-blogging service, has been generating a large amount of data every minute as it gives people chance to express their thoughts and feelings quickly and clearly about any topics. To obtain the desired information from these... more
In distributed systems, databases migration is not an easy task. Companies will encounter challenges moving data including legacy data to the big data platform. This paper shows how to optimize migration from traditional databases to the... more
Accurate recognition and differentiation of the human facial expressions require substantial computational power, where the efficiency of algorithm plays a vital role. Recent advancement in the human computer interaction and object... more
Profound attention to MapReduce framework has been caught by many different areas. It is presently a practical model for data-intensive applications due to its simple interface of programming, high scalability, and ability to withstand... more
This document presents the handbook of BigDataBench (Version 3.1). BigDataBench is an open-source big data benchmark suite, publicly available from http://prof.ict.ac.cn/BigDataBench. After identifying diverse data models and... more
An effective technique to process and analyse large amounts of data is achieved through using the MapReduce framework. It is a programming model which is used to rapidly process vast amount of data in parallel and distributed mode... more
The efficiency and scalability of the cluster depends heavily on the performance of the single NameNode.
In order to make accurate and fast keywords and full text searches it is recommended to index the words in the corpus. One way to do this is to use an inverted index to maintain in a structured form the words occurrence in a set of... more
Graphs are everywhere in our lives: social networks, the World Wide Web, biological networks, and many more. The size of real-world graphs are growing at unprecedented rate, spanning millions and billions of nodes and edges. What are the... more
Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). Other algorithms are designed for finding association rules in data having... more
Data has become an indispensable part of every economy, industry, organization, business function and individual. Big Data is a term used to identify the datasets that whose size is beyond the ability of typical database software tools to... more
In recent years, big data are generated from a variety of sources, and there is an enormous demand for storing, managing, processing, and querying on big data. The MapReduce framework and its open source implementation Hadoop, has proven... more
Bigdata is a horizontally-scaled storage, opensource architecture for indexed data and computing fabric supporting optional transactions, very high concurrency and operates in both a single machine mode and a cluster mode. The bigdata... more
The main aim of this paper is to reduce the burden of the single reducer per map. Now-a-day, MapReduce performance improvement is very significant for big data processing. In previous works, we tried to reduce the network traffic cost for... more
Distributed denial of service (DDoS) attacks continues to grow as a threat to organizations worldwide. From the first known attack in 1999 to the highly publicized Operation Ababil, the DDoS attacks have a history of flooding the victim... more
Abstract: Apache Hadoop is a software framework that supports data-intensive distributed application under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google’s... more
During the recreation of monetary competence, data is entirety and everything is data. However data is dependent upon the world that is chaotic, insane, unpredictable, and sentimental. The acceleration of so called big data and the... more
The Apache Hadoop framework is an open source implementation of MapReduce for processing and storing big data. However, to get the best performance from this is a big challenge because of its large number configuration parameters. In this... more
PageRank evaluates the importance of Web pages with link relations. However, there is no direct method of evaluating the meaning of links in a hyperlink-based Web structure. This feature may cause problems in that pages containing many... more
At present, the scale of data in many cloud applications increases tremendously in accordance with the Big Data trend, thereby making it a challenge for commonly used software tools to capture, manage and process such large-scale data... more
MapReduce is a simple and powerful programming model which enables development of scalable parallel applications to process large amount of data which is scattered on a cluster of machines. The original implementations of Map Reduce... more
Pretty much every part of life now results in the generation of data. Logs are documentation of events or records of system activities and are created automatically through IT systems. Log data analysis is a process of making sense of... more
The Size of the data is increasing day by day with the using of social site. Big Data is a concept to manage and mine the large set of data. Today the concept of Big Data is widely used to mine the insight data of organization as well... more
Database storage storing abundant data usually accompanies slow performance of query and data manipulation. This thesis presents a model and methodology of faster data manipulation (insert/delete) of mass data rows stored in a big table.... more
Big data is an assemblage of large and complex data that is difficult to process with the traditional DBMS tools. The scale, diversity, and complexity of this huge data demand new analytics techniques to extract useful and hidden value... more