Apache HIVE
Apache HIVE
Apache HIVE
• Why HIVE
• Limitations of HIVE
What is HIVE ?
• HIVE is a query interface on top of Hadoop’s native Map-Reduce
• HIVE allows users to write SQL style queries in a native language known as Hive Query Language (HQL)
• HIVE execution engine converts the scripts written in HQL into JAR files (map reduce) to execute in the
cluster
• The table’s schema information (table meta data) is saved in HIVE metastore which is borrowed from an
RDBMS (Derby is default database)
• Map Reduce does not provide user friendly libraries or interfaces to deal with unstructured data handling
• Very tight dependency of JAVA if one needs to use the Map-Reduce framework
• An operation like left inner join would need around 200-300 lines of code in JAVA Map-Reduce whereas in
SQL it would just be a couple of lines of code
• Analysts from SQL experience of having come from RDBMS world and DW/BI world cannot program in
JAVA in order to use
• To enable SQL developers to exploit the power of Hadoop, an abstraction interface was developed on top
of native Map-Reduce
• This interface (engine) was called HIVE and was officially developed by Facebook and initial release was in
the year 2010
Architectural overview
Working of HIVE
• Hive allows a way to project a table structure on the data in HDFS (structured data in HDFS)
• In reality, we do not actually load the data into the place where HIVE tables are created
HIVE DATA
HIVE TABLE
HDFS HDFS
• Not suited for frequent updates and inserts (inserts and updates are allowed in recent releases of HIVE)