Apache Hive
Apache Hive
Apache Hive
Apache Hive
Hive provides a SQL like interface for users to extract data from the Hadoop system.
Resource Management
Storage
HDFS HBase
Features of Hive
Uses HiveQL
The organization analyzes positive, negative, and neutral reviews using Hive.
Hive Architecture
Hive Architecture
The major components of Hive architecture are: Hadoop core components, Metastore, Driver, and Hive clients.
JDBC/ODBC
Drivers
Parse Planner
Metastore
r
Execution Optimizer
MapReduce
RDBM
S
HDF
S
Job Execution Flow in Hive
1 Parse HiveQL
2 Make optimizations
3 Plan execution
Submit job(s) to cluster
4
Monitor progress
5
Process data in MapReduce or Apache Spark
6
• Hue Web UI
– Hive Query Editor
• Metastore Manager
– ODBC / JDBC
Connecting with Hive
beeline –u … -f
To execute file using the –u option sample.hql
To use HiveQL directly from the command line using the beeline –u ... -e
-e option 'SELECT * FROM users‘
beeline –u … -
To continue running script even after an error force=TRUE
Running Hive query
SQL commands are terminated with a semicolon (;)
• String:
STRING
Changing Table Data Location
• Use LOCATION to specify the directory where you want to reside your data in HDFS
posted TIMESTAMP
)
ROW FORMAT DELIMITED
FIELD TERMINATED BY ‘ , ’
LOCATION '/bda/sample';
External Managed Table
SE
Q T
X
T
sqoop import \
-connect jdbc:mysql://localhost/sample \
-username training \
-password training \
hive-import creates a table accessible
in Hive.
-fields-terminated-by '\t' \
-table employees \
-hive-import
Hive DDL and DML
Defining Database and Table
• Databases and tables are created and managed using the DDL (Data Definition Language) of HiveQL
/user/hive/warehouse/dbname.db/tablename
Table Creation: Example
• Hive Tables are stored in HDFS and the relevant metadata is stored in the Metastore
What Is Hive Metastore?
The Metastore is the component that stores the system catalog which
contains metadata about tables, columns, and partitions.
Use of Metastore in Hive
Query
2. Query actual data
Hive Server
(Data in HDFS
files)
Data Warehouse Directory Structure
/user/hive/warehouse
• Each table is a directory within the default location having one or more files
Customers Table
customer_id name country /user/hive/warehouse/customers
001 Alice us
001 Alice us
002 Bob ca 002 Bob ca
File 01
003 Carlos mx 003 Carlos mx
…. …. ….
004 Dieter de
392 Maria it
392 Maria it
393 Nigel uk 393 Nigel uk
File 02
394 Ophelia dk
394 Ophelia dk
…. …. ….
…. …. ….
In HDFS, Hive data can be split into more than one file.