Welcome to Scribd!

0% found this document useful (0 votes)

90 views

Apache HIVE

Uploaded by

Hive is a data warehouse infrastructure built on top of Hadoop. It allows users to query data stored in HDFS using SQL-like queries. Hive converts these queries into MapReduce jobs for execution. It provides a mechanism to define tables and schemas to impose structure on data in HDFS. The schema information is stored separately in a metastore. Hive enables SQL users to analyze large datasets using familiar SQL queries without needing to use MapReduce programming directly. However, Hive is not suited for real-time queries or transactions due to its batch processing nature.

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Apache HIVE

Uploaded by

palanivel

0% found this document useful (0 votes)

90 views9 pages

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

90 views9 pages

Apache HIVE

Uploaded by

palanivel

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Download as pptx, pdf, or txt

Jump to Page

You are on page 1of 9

Search inside document

Apache HIVE

Image courtesy : https://en.wikipedia.org/wiki/Apache_Hive

Agenda
• What is Apache HIVE

• Why HIVE

• How does HIVE fit into the Hadoop technology landscape

• Limitations of HIVE
What is HIVE ?
• HIVE is a query interface on top of Hadoop’s native Map-Reduce

• HIVE is a data warehouse

• HIVE allows users to write SQL style queries in a native language known as Hive Query Language (HQL)

• HIVE execution engine converts the scripts written in HQL into JAR files (map reduce) to execute in the
cluster

• HIVE reads data from HDFS

• Allows creation of tables to operate on structured data

• The table’s schema information (table meta data) is saved in HIVE metastore which is borrowed from an
RDBMS (Derby is default database)

• HIVE is not an RDBMS

Why HIVE ?
• Hadoop is known for its Map-Reduce engine for parallelizing data processing operations using HDFS as its
native file storage system

• Map Reduce does not provide user friendly libraries or interfaces to deal with unstructured data handling

• Very tight dependency of JAVA if one needs to use the Map-Reduce framework

• An operation like left inner join would need around 200-300 lines of code in JAVA Map-Reduce whereas in
SQL it would just be a couple of lines of code

• Analysts from SQL experience of having come from RDBMS world and DW/BI world cannot program in
JAVA in order to use

• To enable SQL developers to exploit the power of Hadoop, an abstraction interface was developed on top
of native Map-Reduce

• This interface (engine) was called HIVE and was officially developed by Facebook and initial release was in
the year 2010
Architectural overview
Working of HIVE
• Hive allows a way to project a table structure on the data in HDFS (structured data in HDFS)

• The table meta data is saved separately from the data

• In reality, we do not actually load the data into the place where HIVE tables are created

• HIVE table information (table meta data is saved in meta store)

HIVE DATA
HIVE TABLE

HDFS HDFS

METASTORE HDFS HDFS

How does HIVE fit into the Hadoop
ecosystem
Things HIVE cannot do efficiently
• Ad hoc real time queries

• OLTP (Online Line Transaction Processing)

• No ACID support (ACID support is limited)

• Not suited for frequent updates and inserts (inserts and updates are allowed in recent releases of HIVE)

• Not recommended for small data sets

• Not meant for unstructured data analysis

DEMO

EB2406 - Teradata PDF
Document18 pages
EB2406 - Teradata PDF
Matthew Reach
No ratings yet
Resume For Power BI 3
Document4 pages
Resume For Power BI 3
palanivel
No ratings yet
Validating Clinical Trial Data Excerpt
Document27 pages
Validating Clinical Trial Data Excerpt
palanivel
No ratings yet
Ruta de Entrenamiento Base Cloudera Revisada
Document6 pages
Ruta de Entrenamiento Base Cloudera Revisada
thiagos25
100% (1)
PBL2 SME Governance Problem Statement-V2
Document3 pages
PBL2 SME Governance Problem Statement-V2
Rubab Iqbal
No ratings yet
Etl Architecture Best Practices PDF
Document2 pages
Etl Architecture Best Practices PDF
Elijah
No ratings yet
Electronic Communications PDF
Document366 pages
Electronic Communications PDF
congquy nguyen
100% (1)
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Big Data Landscape 2017
Document1 page
Big Data Landscape 2017
ram
No ratings yet
Federated vs. Centeralized vs. De-Centeralized Data Warehouse
Document5 pages
Federated vs. Centeralized vs. De-Centeralized Data Warehouse
jamilwa
No ratings yet
ProMoTe A Data Product Model Template For Data Meshes
Document18 pages
ProMoTe A Data Product Model Template For Data Meshes
tazziee8
No ratings yet
Cloudera Kudu
Document102 pages
Cloudera Kudu
Giuseppe Pucci
100% (1)
Hadoop Interviews Q
Document9 pages
Hadoop Interviews Q
S K
No ratings yet
Hive and Impala
Document46 pages
Hive and Impala
Joe1
No ratings yet
Oltp Olap Rtap
Document53 pages
Oltp Olap Rtap
goyalsb2682
No ratings yet
NoSQL Intro
Document26 pages
NoSQL Intro
DuyNguyễn
No ratings yet
Building A Data Warehouse With SQL Server: Presented by John Sterrett
Document28 pages
Building A Data Warehouse With SQL Server: Presented by John Sterrett
Zaid
No ratings yet
Talend Big Data Data Transformation Pig
Document8 pages
Talend Big Data Data Transformation Pig
geoinsys
No ratings yet
Chapter 2 Data Warehousing
Document47 pages
Chapter 2 Data Warehousing
ekpanjabidost
No ratings yet
Chapter - 2 Hadoop
Document32 pages
Chapter - 2 Hadoop
Rahul Pawar
No ratings yet
ETL Vs ELT
Document7 pages
ETL Vs ELT
Miguel Angel Piñon
No ratings yet
DWDM Lecturenotes PDF
Document133 pages
DWDM Lecturenotes PDF
Janani
No ratings yet
Informatica Power Center 9
Document166 pages
Informatica Power Center 9
SANDEEP K
No ratings yet
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
Document112 pages
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
udayachandrikaa@gmailcom
No ratings yet
Cloud Data Warehouse
Document7 pages
Cloud Data Warehouse
Naveen
No ratings yet
IQCards Data Warehousing Identify Qualify
Document10 pages
IQCards Data Warehousing Identify Qualify
Evrim AY
No ratings yet
Hadoop
Document34 pages
Hadoop
forjunklikescribd
No ratings yet
Kudu
Document9 pages
Kudu
Aman Raturi
No ratings yet
Data Warehousing Guide
Document530 pages
Data Warehousing Guide
Juan Antonio Romero
No ratings yet
Real-Time Stock Market Analysis Using LSTM
Document5 pages
Real-Time Stock Market Analysis Using LSTM
umang garg
No ratings yet
What Is DW2.0
Document13 pages
What Is DW2.0
Suresh Yaram
No ratings yet
Matthieu - Lamairesse - Reda - Khouani - Why The Best Serverless Data Warehouse Is A Lakehouse - (DAIWT - PARIS)
Document38 pages
Matthieu - Lamairesse - Reda - Khouani - Why The Best Serverless Data Warehouse Is A Lakehouse - (DAIWT - PARIS)
Enguerran DELAHAIE
No ratings yet
Big Data: by It Faculty Alttc Ghaziabad
Document26 pages
Big Data: by It Faculty Alttc Ghaziabad
Rajesh Kumar
No ratings yet
Need of Two Types of Data: Information
Document7 pages
Need of Two Types of Data: Information
Pranav Sharma
No ratings yet
Apache Spark
Document100 pages
Apache Spark
Tuấn Đặng
No ratings yet
Kimball Vs Inmon
Document28 pages
Kimball Vs Inmon
Nilda Atachagua
No ratings yet
ETL vs. ELT: Frictionless Data Integration - Diyotta
Document3 pages
ETL vs. ELT: Frictionless Data Integration - Diyotta
Diyotta
No ratings yet
Battle of The Giants - Comparing Kimball and Inmon
Document15 pages
Battle of The Giants - Comparing Kimball and Inmon
Felipe Oliveira Gutierrez
No ratings yet
01a Hadoop Spark 1spp
Document68 pages
01a Hadoop Spark 1spp
Othman Farhaoui
No ratings yet
Talend Open Studio For Master Data Management: A Practical Starter Guide 2nd Edition
Document100 pages
Talend Open Studio For Master Data Management: A Practical Starter Guide 2nd Edition
srini99
No ratings yet
Data Warehousing Concepts
Document9 pages
Data Warehousing Concepts
Vikram Reddy
No ratings yet
Course+Slides+ +Data+Warehouse+ +the+Ultimate+Guide
Document393 pages
Course+Slides+ +Data+Warehouse+ +the+Ultimate+Guide
DERICK BLACIDO CONTRERAS
No ratings yet
Powercenter 8.X New Features: Education Services
Document159 pages
Powercenter 8.X New Features: Education Services
sambit76
No ratings yet
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
Document44 pages
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
Patrick D Cerna
No ratings yet
SQL Server 2016 ITDM - Sales Deck
Document39 pages
SQL Server 2016 ITDM - Sales Deck
wciscato
No ratings yet
01 - Introduction To Data Science
Document77 pages
01 - Introduction To Data Science
Hiba Mediaa
No ratings yet
ETL Staging Area
Document3 pages
ETL Staging Area
Arpit Agrawal
No ratings yet
Data Mining Unit - 1 Notes
Document16 pages
Data Mining Unit - 1 Notes
Ashwathy MN
No ratings yet
Untitled
Document37 pages
Untitled
asha
No ratings yet
T09 Data Streaming
Document52 pages
T09 Data Streaming
abdulazizbinyabtemp
No ratings yet
Scope, and The Inter-Relationships Among These Entities
Document12 pages
Scope, and The Inter-Relationships Among These Entities
Syafiq Ahmad
No ratings yet
Trivago Pipeline
Document18 pages
Trivago Pipeline
behera.ece
No ratings yet
An Investigation of NoSQL Database Performance From A MYSQL Perspective
Document3 pages
An Investigation of NoSQL Database Performance From A MYSQL Perspective
Bhanu Prakash Yadav
No ratings yet
DWH
Document48 pages
DWH
Rajesh Kumar
No ratings yet
Bigdataaaaa
Document180 pages
Bigdataaaaa
Aya Grami
No ratings yet
Data Pipeline Essentials: See Ya Later
Document6 pages
Data Pipeline Essentials: See Ya Later
Dev
No ratings yet
Gartner Report 2020 - Business Intelligence Applications
Document39 pages
Gartner Report 2020 - Business Intelligence Applications
Ram Sidh
No ratings yet
DATA ANALYTICS Lab
Document3 pages
DATA ANALYTICS Lab
Boopathi kumar
No ratings yet
Etl Tools PDF
Document2 pages
Etl Tools PDF
Jessica
0% (1)
Data Warehousing (2002-05 IBM Ex)
Document40 pages
Data Warehousing (2002-05 IBM Ex)
Abou Bahaj
100% (1)
Data Catalog Third Edition
From Everand
Data Catalog Third Edition
Gerardus Blokdyk
No ratings yet
Getting Started with Big Data Query using Apache Impala
From Everand
Getting Started with Big Data Query using Apache Impala
Agus Kurniawan
No ratings yet
Python Best Interview Question Collection
Document182 pages
Python Best Interview Question Collection
palanivel
0% (1)
RDD - S and Data Frames
Document11 pages
RDD - S and Data Frames
palanivel
No ratings yet
Clinical Trial Documents
Document44 pages
Clinical Trial Documents
palanivel
100% (1)
Protocol Development and Statistical Analysis Plan
Document40 pages
Protocol Development and Statistical Analysis Plan
palanivel
No ratings yet
Power BI Introduction
Document5 pages
Power BI Introduction
palanivel
No ratings yet
Installation Links
Document1 page
Installation Links
palanivel
No ratings yet
PQ Tutorial
Document62 pages
PQ Tutorial
palanivel
No ratings yet
Introduction To Clinical Protocol
Document42 pages
Introduction To Clinical Protocol
palanivel
No ratings yet
Statistical Analysis Plan And Clinical Study Report: Zibao Zhang (张子豹), Phd Associate Director, Biostatistics Ppd China
Document44 pages
Statistical Analysis Plan And Clinical Study Report: Zibao Zhang (张子豹), Phd Associate Director, Biostatistics Ppd China
palanivel
No ratings yet
Getintopc - Com SAS 9.4 M5 x64 SID 30 April 2020
Document3 pages
Getintopc - Com SAS 9.4 M5 x64 SID 30 April 2020
palanivel
No ratings yet
R Vs SAS
Document9 pages
R Vs SAS
palanivel
No ratings yet
Clinical Trial Documents H
Document9 pages
Clinical Trial Documents H
palanivel
No ratings yet
Sas 2
Document13 pages
Sas 2
palanivel
No ratings yet
Volltext PDF
Document72 pages
Volltext PDF
palanivel
No ratings yet
Dimensionality Reduction: Pca, SVD, MDS, Ica, and Friends
Document50 pages
Dimensionality Reduction: Pca, SVD, MDS, Ica, and Friends
palanivel
No ratings yet
Logistic Regression: Adapted From: Tom Mitchell's Machine Learning Book Evan Wei Xiang and Qiang Yang
Document15 pages
Logistic Regression: Adapted From: Tom Mitchell's Machine Learning Book Evan Wei Xiang and Qiang Yang
palanivel
No ratings yet
Abhijeet AB Bhusari-General
Document8 pages
Abhijeet AB Bhusari-General
Chirantar Kumar
No ratings yet
Sample Copy. Not For Distribution
Document22 pages
Sample Copy. Not For Distribution
Rajat Bhattacharjee
No ratings yet
Robotics Set of Codes
Document11 pages
Robotics Set of Codes
Ginsell Mae Tanteo Sarte
No ratings yet
Quattro Datasheet
Document4 pages
Quattro Datasheet
IngIrvingFCervantes
No ratings yet
Email Template (Web& App) 345
Document4 pages
Email Template (Web& App) 345
sanjay patel
No ratings yet
CMS Computer Science SrSec 2024-25
Document9 pages
CMS Computer Science SrSec 2024-25
priyanshurajgopi
No ratings yet
UGI RFM 2021 HighLevelScopelist S4H
Document34 pages
UGI RFM 2021 HighLevelScopelist S4H
Balakrishna Vegi
No ratings yet
Calculation of Lightning and Switching Overvoltages Transferred Through Power Transformer
Document9 pages
Calculation of Lightning and Switching Overvoltages Transferred Through Power Transformer
Božidar Filipović-Grčić
No ratings yet
Manual de Referencia API
Document586 pages
Manual de Referencia API
David Becerra
No ratings yet
A Literature Survey On Automated Cargo Tracking System
Document5 pages
A Literature Survey On Automated Cargo Tracking System
IJRASETPublications
No ratings yet
LTE System Engineering
Document5 pages
LTE System Engineering
Omer Abdalaziz
No ratings yet
High-Voltage Circuit Breakers: From 72.5 KV Up To 800 KV
Document15 pages
High-Voltage Circuit Breakers: From 72.5 KV Up To 800 KV
Da Debebe
No ratings yet
Business Intelligence and Analytics: Systems For Decision Support, 10e (Sharda) Chapter 13 Big Data and Analytics
Document13 pages
Business Intelligence and Analytics: Systems For Decision Support, 10e (Sharda) Chapter 13 Big Data and Analytics
Kristijan Petrovski
No ratings yet
End-to-End Formal Using Abstractions To Maximize Coverage
Document8 pages
End-to-End Formal Using Abstractions To Maximize Coverage
amitpatel1991
No ratings yet
Brown Addie Paper
Document5 pages
Brown Addie Paper
Daniel Brown
No ratings yet
HR Management Future Challenge - Edited
Document13 pages
HR Management Future Challenge - Edited
Rehan Achariya
No ratings yet
Dasher Training Deck: Socar+
Document19 pages
Dasher Training Deck: Socar+
Amir Al Asyraf Abdul Ghani
No ratings yet
Logcat Prev CSC Log
Document425 pages
Logcat Prev CSC Log
ritoeito27
No ratings yet
(Chapter-06) The Tools of Structured Analysis
Document23 pages
(Chapter-06) The Tools of Structured Analysis
M A Rob
No ratings yet
Electrical Checklist
Document1 page
Electrical Checklist
Dale Wearpack
No ratings yet
MS Teams Integration
Document2 pages
MS Teams Integration
Rajesh Reghu Nadh Nadh
No ratings yet
Mobile App Presentation Slide
Document10 pages
Mobile App Presentation Slide
4y8j98bsbz
No ratings yet
BOQ Chevron - Revised (27th August)
Document4 pages
BOQ Chevron - Revised (27th August)
Edward Alexander Ronald Buning
No ratings yet
Frequency Shift Keying
Document3 pages
Frequency Shift Keying
sobia aslam
No ratings yet
Dev Ops
Document14 pages
Dev Ops
coastalbroadbandrvp
No ratings yet
MOOC On Artificial Intelligence in Agriculture
Document2 pages
MOOC On Artificial Intelligence in Agriculture
Sona Charles
No ratings yet
EN - KM816010 Anterior Smart Wheel
Document44 pages
EN - KM816010 Anterior Smart Wheel
Hamlin Agudelo
No ratings yet
Customer Excellence Awards - Ebook PDF
Document102 pages
Customer Excellence Awards - Ebook PDF
Carlos Lobo
No ratings yet
Data Domain 401 - LAG - Boost - Ifgroups 2019
Document32 pages
Data Domain 401 - LAG - Boost - Ifgroups 2019
ale
No ratings yet