Data Engineering Roadmap uYdSPm5q
Data Engineering Roadmap uYdSPm5q
Data Engineering Roadmap uYdSPm5q
• Programming Languages
• Python (Recommended)
• Java
• Scala
• NumPy
• Matplotlib
• Shell Scripting
• Cron Jobs
• Strings
• Linked List
• Stack
• Queue
• Tree (Basics)
• Graph (Basics)
• Dynamic Programming
• Searching
• Sorting
• Schema Types
• ER Diagram
• ACID Properties
• Transactions
• Concurrency Control
• Deadlock
• Indexing
• Hashing
• Normalization Forms
• Views
• Stored Procedures
• SQL
• Basics Of DDL, DML, DCL
• Subqueries
• Group By
• Case-When Statement
• Window Functions
• Pivoting
• BigData Terminologies
• What is BigData?
• 5 V’s of BigData
• Distributed Computation
• Distributed Storage
• Commodity Hardware
• Clusters
• File formats
a. CSV
b. JSON
c. AVRO
d. Parquet
e. ORC
• Type of Data
a. Structured
b. Unstructured
c. Semi-structured
• Data Warehousing
• OLAP vs OLTP
• Dimension Tables
• Fact Tables
• Star Schema
• Snowflake Schema
• BigData Frameworks
• Apache Hadoop (Architecture Understanding Most Important)
a. HDFS
c. Yarn
• Apache Hive
• How to load data in different file formats
• Internal Tables
• External Tables
• Partitioning
• Bucketing
• Map-Side Join
• Sorted-Merge Join
• UDFs in Hive
• SerDe in Hive
• Spark SQL
• Spark Streaming
• Apache SQOOP
• Apache NIFI
• Apache FLUME
• Schedulers/Workflow Managers
• Apache Airflow
• Apache NIFI
• Azkaban
• NoSQL Databases
• HBase
• Cassandra
• ElasticSearch
• MongoDB
• Messaging Queue
• Apache Kafka
• PowerBI
• Grafana
• Kibana
• Access Management
• AWS IAM
• AWS Athena
• Serverless
• AWS Lambda
• ETL Services
• AWS Glue
• Scheduler
• AWS CloudWatch
• Messaging Queue
• AWS SNS
• AWS SQS