MapReduce is a simple and powerful programming model which enables development of scalable parall... more MapReduce is a simple and powerful programming model which enables development of scalable parallel applications to process large amount of data which is scattered on a cluster of machines. The original implementations of Map Reduce framework had some limitations which have been faced by many research follow up work after its introduction. It is gaining a lot of attraction in both research and industrial community as it has the capacity of processing large data. In this review paper, we are going to discuss the map reduce framework used in different applications and for different purposes. This is a analysis done for implementing the architecture of Map Reduce from different research perspectives.
The MapReduce model has become an important parallel processing model for largescale data-intensi... more The MapReduce model has become an important parallel processing model for largescale data-intensive applications like data mining and web indexing. Hadoop, an opensource implementation of MapReduce, is widely applied to support cluster computing jobs requiring low response time. The different issues of Hadoop are discussed here and then for them what are the solutions which are proposed in the various papers which are studied by the author are discussed here. Finally, Hadoop is not an easy environment to manage. The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous in nature. Network delays due to data movement during running time have been ignored in the recent Hadoop research. Unfortunately, both the homogeneity and data locality assumptions in Hadoop are optimistic at best and unachievable at worst, introduces performance problems in virtualized data centers. The analysis of SPOF existing in critical nodes of Hadoop and proposes a metadata replication based solution to enable Hadoop high availability. The goal of heterogeneity can be achieved by a data placement scheme which distributes and stores data across multiple heterogeneous nodes based on their computing capacities. Analysts said that IT using the technology to aggregate and store data from multiple sources can create a whole slew of problems related to access control and ownership. Applications analyzing merged data in a Hadoop environment can result in the creation of new datasets that may also need to be protected.
Traffic congestion on road networks is characterized by slower speeds, longer trip times and incr... more Traffic congestion on road networks is characterized by slower speeds, longer trip times and increased vehicular queuing. We all are observing that vehicle volume is increasing day by day exponentially, but in comparison with it, the road infrastructure is not.
Journal of Global Research in Computer Science, Jan 1, 2011
In this paper the author has concentrated on architecture of a cluster computer and the working o... more In this paper the author has concentrated on architecture of a cluster computer and the working of them in context with parallel paradigms. Author has a keen interest on guaranteeing the working of a node efficiently and the data on it should be available at any time to run the task in parallel. The applications while running may face resource faults during execution. The application must dynamically do something to prepare for, and recover from, the expected failure. Typically, checkpointing is used to minimize the loss of computation. Checkpointing is a strategy purely local, but can be very costly. Most checkpointing techniques, however, require central storage for storing checkpoints. This results in a bottleneck and severely limits the scalability of checkpointing, while also proving to be too expensive for dedicated checkpointing networks and storage systems. The author has suggested the technique of replication implemented on it. Replication has been studied for parallel databases in general. Author has worked on parallel execution of task on a node; if it fails then self protecting feature should be turned on. Selfprotecting in this context means that computer clusters should detect and handle failures automatically with the help of replication.
UID is being implemented in India providing valuable information of citizens. Various facilities ... more UID is being implemented in India providing valuable information of citizens. Various facilities are to be provided on the basis of that information to the masses. It will also reveal how it can be used for genuine identity of a person and thus improving security for various e-governance applications. In this paper idea of UID and authentication request types such as biometric have been studied. The use of Hadoop Architecture is suggested which can be used for studying the authentication request types such as biometric technique, as it is one of the strongest method of authentication. In Aadhaar Card System, Residents Photograph, Finger Prints and Iris recognition are collected. We have suggested how Hadoop, a distributed file system architecture can be used to develop this model. Hadoop implements Map/Reduce framework. Map/Reduce makes an easy task to process large amount of data on cloud or from various nodes where data is scattered.
Google. In the programming model, a user specifies the computation by two functions, Map and Redu... more Google. In the programming model, a user specifies the computation by two functions, Map and Reduce. The underlying MapReduce library automatically parallelizes the computation, and handles complicated issues like data distribution, load balancing and fault tolerance. Massive input, spread across many machines, need to parallelize. Moves the data, and provides scheduling, fault tolerance. The original MapReduce implementation by Google, as well as its open-source counterpart, Hadoop, is aimed for parallelizing computing in large clusters of commodity machines. Map Reduce has gained a great popularity as it gracefully and automatically achieves fault tolerance. It automatically handles the gathering of results across the multiple nodes and returns a single result or set.
MapReduce is a simple and powerful programming model which enables development of scalable parall... more MapReduce is a simple and powerful programming model which enables development of scalable parallel applications to process large amount of data which is scattered on a cluster of machines. The original implementations of Map Reduce framework had some limitations which have been faced by many research follow up work after its introduction. It is gaining a lot of attraction in both research and industrial community as it has the capacity of processing large data. In this review paper, we are going to discuss the map reduce framework used in different applications and for different purposes. This is a analysis done for implementing the architecture of Map Reduce from different research perspectives.
The MapReduce model has become an important parallel processing model for largescale data-intensi... more The MapReduce model has become an important parallel processing model for largescale data-intensive applications like data mining and web indexing. Hadoop, an opensource implementation of MapReduce, is widely applied to support cluster computing jobs requiring low response time. The different issues of Hadoop are discussed here and then for them what are the solutions which are proposed in the various papers which are studied by the author are discussed here. Finally, Hadoop is not an easy environment to manage. The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous in nature. Network delays due to data movement during running time have been ignored in the recent Hadoop research. Unfortunately, both the homogeneity and data locality assumptions in Hadoop are optimistic at best and unachievable at worst, introduces performance problems in virtualized data centers. The analysis of SPOF existing in critical nodes of Hadoop and proposes a metadata replication based solution to enable Hadoop high availability. The goal of heterogeneity can be achieved by a data placement scheme which distributes and stores data across multiple heterogeneous nodes based on their computing capacities. Analysts said that IT using the technology to aggregate and store data from multiple sources can create a whole slew of problems related to access control and ownership. Applications analyzing merged data in a Hadoop environment can result in the creation of new datasets that may also need to be protected.
Traffic congestion on road networks is characterized by slower speeds, longer trip times and incr... more Traffic congestion on road networks is characterized by slower speeds, longer trip times and increased vehicular queuing. We all are observing that vehicle volume is increasing day by day exponentially, but in comparison with it, the road infrastructure is not.
Journal of Global Research in Computer Science, Jan 1, 2011
In this paper the author has concentrated on architecture of a cluster computer and the working o... more In this paper the author has concentrated on architecture of a cluster computer and the working of them in context with parallel paradigms. Author has a keen interest on guaranteeing the working of a node efficiently and the data on it should be available at any time to run the task in parallel. The applications while running may face resource faults during execution. The application must dynamically do something to prepare for, and recover from, the expected failure. Typically, checkpointing is used to minimize the loss of computation. Checkpointing is a strategy purely local, but can be very costly. Most checkpointing techniques, however, require central storage for storing checkpoints. This results in a bottleneck and severely limits the scalability of checkpointing, while also proving to be too expensive for dedicated checkpointing networks and storage systems. The author has suggested the technique of replication implemented on it. Replication has been studied for parallel databases in general. Author has worked on parallel execution of task on a node; if it fails then self protecting feature should be turned on. Selfprotecting in this context means that computer clusters should detect and handle failures automatically with the help of replication.
UID is being implemented in India providing valuable information of citizens. Various facilities ... more UID is being implemented in India providing valuable information of citizens. Various facilities are to be provided on the basis of that information to the masses. It will also reveal how it can be used for genuine identity of a person and thus improving security for various e-governance applications. In this paper idea of UID and authentication request types such as biometric have been studied. The use of Hadoop Architecture is suggested which can be used for studying the authentication request types such as biometric technique, as it is one of the strongest method of authentication. In Aadhaar Card System, Residents Photograph, Finger Prints and Iris recognition are collected. We have suggested how Hadoop, a distributed file system architecture can be used to develop this model. Hadoop implements Map/Reduce framework. Map/Reduce makes an easy task to process large amount of data on cloud or from various nodes where data is scattered.
Google. In the programming model, a user specifies the computation by two functions, Map and Redu... more Google. In the programming model, a user specifies the computation by two functions, Map and Reduce. The underlying MapReduce library automatically parallelizes the computation, and handles complicated issues like data distribution, load balancing and fault tolerance. Massive input, spread across many machines, need to parallelize. Moves the data, and provides scheduling, fault tolerance. The original MapReduce implementation by Google, as well as its open-source counterpart, Hadoop, is aimed for parallelizing computing in large clusters of commodity machines. Map Reduce has gained a great popularity as it gracefully and automatically achieves fault tolerance. It automatically handles the gathering of results across the multiple nodes and returns a single result or set.
Uploads
Papers by Madhavi Vaidya