Silabus Pentaho Data Integration For Data Warehouse
Silabus Pentaho Data Integration For Data Warehouse
Silabus Pentaho Data Integration For Data Warehouse
22 – 23 Juli 2023
DESKRIPSI
Pentaho Data Integration (PDI) atau Kettle adalah software dari Pentaho
yang dapat digunakan untuk proses ETL (Extraction, Transformation dan Loading).
PDI dapat digunakan untuk migrasi data, membersihkan data, loading dari file ke
database atau sebaliknya dalam volume besar. PDI menyediakan graphical user
interface dan drag-drop komponen yang memudahkan user. Elemen utama dari PDI
adalah Transformation dan Job. Transformation adalah sekumpulan instruksi untuk
merubah input menjadi output yang diinginkan (input-proses-output). Sedangkan
Job adalah kumpulan instruksi untuk menjalankan transformasi. Ada tiga
komponen dalam PDI: Spoon, Pan dan Kitchen. Spoon adalah user interface untuk
membuat Job dan Transformation. Pan adalah tools yang berfungsi membaca,
merubah dan menulis data. Sedangkan Kitchen adalah program yang mengeksekusi
job.
TUJUAN
Setelah mengikuti training ini, peserta diharapkan dapat:
1. Memahami fungsi Kettle sebagai tools ETL (Extract, Transform, Loading)
Data
2. Memahami komponen dasar Kettle
3. Memahami pembuatan parameters dan variables pada kettle
4. Mampu menggunakan komponen-komponen dasar kettle (Step, Hop, Job)
MATERI
1. Introduction to Data Warehouse
Data Warehouse
Online Transaction Processing (OLTP) and Online Analytical Processing
(OLAP)
Data Warehouse and OLAP
Delivering Solution with ETL (Extract, Transform, Load) Tool
2. Installation and configuration
Java Runtime Environment / Java Development Kit
Pentaho Data Integration
XAMPP package (Apache HTTP Server and MySQL)
SQLYog – a GUI based mysql client
Data and Script samples
3. Short introduction to MySQL
MySQL Storage Engines
Administering MySQL via PHPMyAdmin
PHI-Minimart sample database installation
4. Pentaho Data Integration (Kettle)
Introducing Kettle as Pentaho’s ETL Suite
Architecture
Components
Spoon : Graphical UI Designer for job / transformation steps
Pan : Command line batch script for transformation execution
Kitchen : Command line batch script for transformation execution
Carte : Cluster server
Job / Transformation
Step and Hop
Row and Meta Data
Relation between job and transformation
5. Getting started with spoon
File system and RDBMS based Repository
Spoon Development Environment
Database Connections
Job and Transformation
Creating job
Creating transformation
Calling transformation from job
Configuring Log
6. Wulti dimensional modelling
Normalized versus Multi Dimensional Model
Fact and Dimension Tables
Star Schema and Snowflake Schema
Tasks :
Create a Kettle transformation to map PHI-Minimart transactional
database sample to dimensional modeling database
Create logs for each steps
7. Change Data Capture (CDC)
What is CDC ?
Why CDC is so hard that heavily relied on data source ?
SQL Server 2008’s CDC feature demonstration
Tasks :
Create a Kettle transformation to map PHI-Minimart transactional
database sample to dimensional modeling database
Create logs for each steps
8. Slowly Changing Dimensio (SCD)
Slowly Changing Dimension to solve master data historical problems
SCD Types
Use of Kettle’s step to solve several SCD types with several schema :
Insert / Update
Punch Through
9. Orphan/Late Arrival
What is Late Arrival Dimension?
Typical Situations where Late Arrival occurs
Best practice of Late Arrival’s handling
10. OLAP Vied of multidimensional data (Mondrian/JPivot)
Mondrian Installation
Creating scheme based on our fact and dimension tables
View and navigate our Cube using Web Browser
11. Data staging
What is Data Staging?
Background : Physical I/O versus In-Memory Processing
Task :
Create a transformation to join from 3 data sources : text file, Excel
spreadsheet, and RDBMS
Create a currency staging table to solve sequential dependence problem
12. Advance controls
Environment Variables
Shared Objects
Error Handling
Email job results
Task :
Create a dynamic tables dump using variable and looping control
Refining existing transformations to use email alert
13. Automation
Using Windows Task Scheduler to schedule ETL running job and
transformation