Skip to main content

Hadoop provides High Availability services by distributing processing of large data sets across clusters of machines.

Hadoop from Apache, includes the modules:

  • Hadoop Common: support utilities
  • Hadoop Distributed File System (HDFS)
  • Hadoop YARN: job scheduling and resource management
  • Hadoop MapReduce: parallel processing of large data sets

It is used by by such heavyweights as Facebook and Twitter.

Apache projects based on Hadoop should probably use their own tag, namely Ambari, Avro, Cassandra, Chukwa, HBase, Hive, Mahout, Pig, Spark, Tez, and ZooKeeper.