Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
…
1 page
1 file
The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!.
International Journal of Engineering Research and Technology (IJERT), 2014
https://www.ijert.org/hadoop-distributed-file-system-introduction-and-usage-on-different-workloads https://www.ijert.org/research/hadoop-distributed-file-system-introduction-and-usage-on-different-workloads-IJERTV3IS20600.pdf Distributed file system does not able to hold the very large amount of data. It can be manageable by the help of Hadoop distributed file system (HDFS). Files are store in multiple locations (nodes) or stored on various servers. This paper contain step to step introduction of Distributed File System and Hardtop distributed file system(HDFS).In This paper Hadoop analysis is also done on different workloads. This is done on clusters (open cloud,M45,Web-Mining). Three clusters have different hardware and software configurations and range in size from 9 nodes (Web Mining), to 64 nodes (Open Cloud), and 400 nodes (M45). Hadoop clusters are used to improve interactivity, improve effectiveness of authoring and efficiency of workloads and automatic optimization. Hadoop clusters are used to improve interactivity, improve effectiveness of authoring and efficiency of workloads and automatic optimization.
2016
Hadoop Distributed File System (HDFS) is a file system designed to handle large files - which are in gigabytes or terabytes size - with streaming data access patterns, running clusters on commodity hardware. However, big data may exist in a huge number of small files such as: in biology, astronomy or some applications generating 30 million files with an average size of 190 Kbytes. Unfortunately, HDFS wouldn't be able to handle such kind of fractured big data because single Namenode is considered a bottleneck when handling large number of small files. In this paper, we present a new structure for HDFS (HDFSX) to avoid higher memory usage, flooding network, requests overhead and centralized point of failure (single point of failure “SPOF”) of the single Namenode.
Distributed File Systems have enabled the efficient and scalable sharing of data across networks. These systems were designed to handle some technical problems associated with network data. For instance reliability and availability of data, scalability of infrastructure supporting storage, high cost of gaining access to data, maintenance cost and expansion. In this paper, we attempt to make a comparison of the key technical blocks that are referred to as the mainstay of Distributed File Systems such as Hadoop FS, Google FS, Luster FS, Ceph FS, Gluster FS, Oracle Cluster FS and TidyFS. This paper aims at elucidating the basic concepts and techniques employed in the above mentioned File Systems. We explained the different architectures and applications of these Distributed File Systems.
International Journal of Computer Applications, 2017
We are in the twenty-first centuries also known as the digital era, where each and every thing generates a data whether it's a mobile phone, signals, day to day purchasing and many more. This rapidly increases in amount of data; Big data has become a current and future frontier for researchers. In big data analysis, the computation is done on massive heap of data sets to extract intelligent, knowledgeable and meaningful data and at the same time the storage is also readily available to support the concurrent computation process. The Hadoop is designed to meet these complex but meaningful work. The HDFS (Hadoop Distributed File System) is highly fault-safe and is designed to be deployed on low cost hardware. This paper gives out the benefits of HDFS given to the large data set; HDFS architecture and its role in Hadoop.
2002
As Linux clusters have matured as platforms for lowcost, high-performance parallel computing, software packages to provide many key services have emerged, especially in areas such as message passing and networking. One area devoid of support, however, has been parallel file systems, which are critical for highperformance I/O on such clusters. We have developed a parallel file system for Linux clusters, called the Parallel Virtual File System (PVFS). PVFS is intended both as a high-performance parallel file system that anyone can download and use and as a tool for pursuing further research in parallel I/O and parallel file systems for Linux clusters.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2010
pCFS is a highly available parallel, symmetrical (where nodes perform both compute and I/O work) cluster file system that we have designed to run in medium-sized clusters. In this paper, using exactly the same hardware and Linux version across all nodes we compare pCFS with two distinct configurations of PVFS: one using internal disks, and therefore not able to provide any tolerance against disk and/or I/O node failures, and another where PVFS I/O servers access LUNs in a disk array and thus provide high availability (in the following named HA-PVFS). We start by measuring I/O bandwidth and CPU consumption of PVFS and HA-PVFS setups; then, the same set of tests is performed with pCFS. We conclude that, when using the same hardware, pCFS compares very favourably with HA-PVFS, offering the same or higher I/O bandwidths at a much lower CPU consumption.
Apache's Hadoop as of now is pretty good but there are scopes of extensions and enhancements. A largenumber of improvements are proposed to Hadoop which is an open source implementation of Google's Map/Reduce framework. It enables distributed, data intensive and parallel applications by decomposing a massive job into smaller tasks and a massive data set into smaller partitions such that each task processes a different partition in parallel. Hadoop uses Hadoop distributed File System (HDFS) which is an open source implementation of the Google File System (GFS) for storing data. Map/Reduce application mainly uses HDFS for storing data. HDFS is a very large distributed file system that uses commodity hardware and provides high throughput as well as fault tolerance. Many big enterprises believe that within a few years more than half of the world's data will be stored in Hadoop. HDFS storesfiles as a series of blocks and are replicated for fault tolerance. Strategic data partitioning, processing, layouts, replication and placement of data blocks will increase the performance of Hadoop and a lot of research is going on in this area. This paper reviews some of the major enhancements suggested to Hadoop especially in data storage, processing and placement.
Size of the data used in today's enterprises has been growing at exponential rates from last few years.
El Universal, 2023
Kaliope Revista Do Programa De Estudos Pos Graduados Em Literatura E Critica Literaria Issn 1808 6977, 2011
Bios Ethos Politicos, 2022
Corporate Social Responsibility and Environmental Management, 2019
Revista Loie, 2024
Pachoras. Faras. The wall paintings from the Cathedrals of Aetios, Paulos and Petros, 2017
Human molecular genetics, 2015
Brazilian Journal of Biology, 2007
Medicine Science | International Medical Journal, 2017
Journal of Fluid Mechanics, 2013
International Journal of Production Research, 2012
Actual Archaeology Magazine, 2017