etl-pipeline

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

python data-science machine-learning etl pandas orchestration data-engineering data-analysis software-engineering feature-engineering dataframe hacktoberfest dag lineage etl-framework etl-pipeline rag mlops llmops

Updated Nov 14, 2024
Jupyter Notebook

airscholar / e2e-data-engineering

Star

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

docker big-data cassandra apache-spark data-storage postgresql data-engineering apache-kafka data-processing data-pipeline real-time-analytics containerization apache-zookeeper apache-airflow etl-pipeline

Updated Oct 5, 2023
Python

martandsingh / ApacheSpark

Star

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

sql database spark hive hadoop etl pyspark data-engineering spark-streaming data-analysis databricks datalake spark-sql timetravel apachespark etl-pipeline deltalake

Updated Jul 28, 2024
Python

Mmodarre / AzureDataFactoryHOL

Star

Azure Data Factory Hands On Lab - Step by Step - A Comprehensive Azure Data Factory and Mapping Data Flow step by step tutorial

azure azure-data-factory hands-on-lab azure-key-vault etl-pipeline adf-pipeline filter-activity lookup-activity foreach-activity metadata-activity mapping-dataflows hands-on-azure-data-factory azure-data-factory-tutorial azure-modern-data-warehous web-activity foreach-loop-activity

Updated Apr 27, 2021

restarone / violet_rails

Star

an app engine for your business. Seamlessly implement business logic with a powerful API. Out of the box CMS, blog, forum and email functionality. Developer friendly & easily extendable for your next SaaS/XaaS project. Built with Rails 6, Devise, Sidekiq & PostgreSQL

Updated Oct 13, 2024
Ruby

imsanjoykb / Data-Science-Regular-Bootcamp

Star

Regular practice on Data Science, Machien Learning, Deep Learning, Solving ML Project problem, Analytical Issue. Regular boost up my knowledge. The goal is to help learner with learning resource on Data Science filed.

Updated Jan 29, 2023
Jupyter Notebook

usc-isi-i2 / dig-etl-engine

Star

Download DIG to run on your laptop or server.

search-engine crawling information-extraction information-visualization etl-framework etl-pipeline

Updated Jan 9, 2019

stitchfix / hamilton

Star

A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton

python data-science machine-learning etl numpy pandas data-engineering data-platform software-engineering feature-engineering dataframe dag hamiltonian etl-framework hamilton featurization etl-pipeline stitch-fix

Updated Jul 3, 2023
Python

Wittline / uber-expenses-tracking

Sponsor

Star

The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.