Hive and Impala
Hive and Impala
Hive and Impala
Batch Interactive
Processing SQL
Hive was developed by Facebook around 2007. Impala was developed by Cloudera around
2012.
It has a high level abstraction layer on top of It has a high performance dedicated SQL
MapReduce and Apache Spark.
engine.
It uses HiveQL to query the structured data in a
metastore. It uses Impala SQL for ad hoc queries.
Hive Impala
• Is highly extensible
• Used mainly for interactive
queries and data analysis
• Used mostly for batch
processing • Accommodates many
concurrent users
Relational Databases vs. Hive vs. Impala
Use Case: Hive and Impala
Hive and Impala are commonly used to analyze social media coverage.
Basics of Hive and Impala
Topic 2: Interacting with Hive and Impala
Executing a Query in Hive and Impala
Process data—
6
MapReduce or Apache Spark
• Command-line shell:
– Impala: Impala shell
– Hive: Beeline
Impala Query Editor
• Hue Web UI:
– Hive Query Editor
– Impala Query Editor
• Metastore Manager:
– ODBC/JDBC
Impala Lab Access Details
Step 1 Step 2
Demonstrate the method to start and connect to the Impala lab from command.
Impala Lab Access Details
Connecting with Hive and Impala Shell
To check all options of Impala using the help option: Impala-shell –help
Impala-shell –q ‘select *
To run direct queries from shell using the –q option: from simple’
Impala-shell –d
To issue a use database on startup using the –d option: Simplilearn
Demo
Connecting with Hive and Impala Shell
Demonstrate the method to connect with Hive and Impala shell, along with some basic
operations.
Sample Queries
SELECT version();
To explore a new Impala instance:
SELECT current_database();
To specify the location where the database is CREATE DATABASE IF NOT EXISTS
to be created: database_name LOCATION hdfs_path;
Sample Queries
impala-shell -i <impala-daemon-uri> -f
To migrate from SQL:
<filename>.sql;
Sample Queries
SELECT stockprice.open_price,
MAX(stockprice.stock_volume),
MIN(stop_loss.stock_current_rate)
To aggregate and join: FROM stop_loss JOIN stockprice USING
(stock_id)
GROUP BY high_price ORDER BY 1
LIMIT 5;
• Interactive mode:
SELECT count(*) FROM stockprice;
Demonstrate the method to connect with Beeline and execute basic queries.
Running Beeline from Command Line
beeline –u … -f
To execute file using the –u option: simplilearn.hql
To use HiveQL directly from the command line using the -e beeline –u ... -e 'SELECT *
option: FROM users‘
Demonstrate the method to connect Hive and Impala shell using Hue.
Hive and Impala Editors in Hue
Diagram 1 Diagram 2
Key Takeaways
Hive and Impala are tools to perform SQL queries on data residing on HDFS
or HBase.
Hive and Impala are easy to learn for experienced SQL developers.
Hive and Impala solve the Big Data problem but cannot replace a traditional
RDBMS.
Impala uses a very fast specialized SQL engine that is faster than MapReduce.
Quiz
QUIZ
Which of the following components can be used to accept command inputs from users?
1
b. Query compiler
c. Execution engine
d. Thrift server
QUIZ
Which of the following components can be used to accept command inputs from users?
1
b. Query compiler
c. Execution engine
d. Thrift server
a. Impala editor
b. Hive Editor
c. File browser
d. YARN UI
QUIZ
Hive can be accessed from Hue using ________.
2
a. Impala editor
b. Hive Editor
c. File browser
d. YARN UI
a. Impala editor
b. Hive Editor
c. File browser
d. YARN UI
QUIZ
Impala can be accessed from Hue using ________.
3
a. Impala editor
b. Hive Editor
c. File browser
d. YARN UI
a. Impala
b. Hive
c. RDBMS
a. Impala
b. Hive
c. RDBMS
a. RDBMS
b. Hive
c. Impala
a. RDBMS
b. Hive
c. Impala