Sahith - Sr. Data Engineer

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Sahith

Sr. Data Engineer

PROFESSIONAL SUMMARY
 Over 10+ of hands-on expertise as a Senior Data Engineer specializing in TECHNICAL SKILLS
Database Development, ETL Development, Data Modeling, Report Cloud Platforms: Amazon web Services
Development, and Big Data Technologies. (AWS), Microsoft Azure, Google Cloud
 Proficient in programming languages like Python (Pandas, NumPy, PySpark, Platform (GCP)
scikit-learn, PyTorch), SQL (including PL/SQL for Oracle), Scala, and Big Data Processing and Analytics:
PowerShell. Apache Spark, Apache Airflow, Hadoop,
 Extensive experience with cloud platforms including AWS (S3, Redshift, RDS, Hive, Sqoop, Kafka, Impala, Apache
DynamoDB, EMR, Glue, Data Pipeline, Kinesis, Athena, QuickSight, Lambda, Beam
CloudFormation, CodePipeline), Azure (ADF, SQL Server, Cosmos DB, Programming and Scripting: Python
Databricks, HDInsight, Blob Storage, Data Lake Storage), and Google Cloud (Pandas, NumPy, PySpark, scikit-learn,
Platform (BigQuery, Dataflow, Dataproc, Pub/Sub, Cloud Storage, Cloud SQL, PyTorch), SQL (including PL/SQL for
Cloud Datastore, Cloud Pub/Sub, Apache Beam). Oracle), Scala, PowerShell
 Proficient in Apache Spark, Apache Airflow, Hadoop, Hive, Sqoop, Kafka, Data Integration and ETL Tools: AWS
Impala, and Apache Beam for large-scale data processing and analytics. Glue, AWS Data Pipeline, Informatica,
 Experience with cloud data warehouses like Redshift, Snowflake, and Talend, SSIS
BigQuery for scalable data storage and retrieval. Containerization and Orchestration:
 Skilled in data visualization tools such as Tableau, Power BI, Google Data Docker, Kubernetes
Studio, and QuickSight for creating insightful reports and dashboards. Version Control and Collaboration: Git,
 Hands-on experience with data integration tools like Informatica, Talend, and GitHub, Bitbucket
SSIS for seamless data flow across systems. CI/CD: Azure DevOps, Jenkins, AWS
 In-depth knowledge of various database systems including SQL Server, CodePipeline
Cosmos DB, Oracle, PostgreSQL, Cassandra, MySQL, and DynamoDB for Data Warehousing and Database
efficient data storage and retrieval. Management: Redshift, Snowflake,
 Proficient in handling data formats like JSON, XML, and Avro for data BigQuery, SQL Server, Cosmos DB,
interchange and storage. Oracle, PostgreSQL, Cassandra, MySQL,
 Familiarity with containerization technologies like Docker and orchestration DynamoDB
tools like Kubernetes for scalable and manageable deployments. Data Visualization and BI Tools:
 Experience with CI/CD pipelines using Azure DevOps, Jenkins, and AWS Tableau, Power BI, Google Data Studio,
CodePipeline for automated software delivery and deployment. QuickSight
 Proficient in Excel Advanced functions, pivot tables, and V Lookups for data Security and Access Control: AWS IAM
analysis and reporting. and AWS KMS, Azure Key Vault and
 Hands-on experience with AWS Glue, AWS Data Pipeline for ETL workflows Azure AD, SSL/TLS, AES encryption
and data processing. standards
 Familiarity with version control systems like Git, GitHub, and Bitbucket for Miscellaneous Tools and Technologies:
collaborative development and code management. JSON, Excel Advanced functions, pivot
 Strong understanding of security and access control principles including AWS tables, V Lookups, Bugzilla, Confluence,
IAM, AWS KMS, Azure Key Vault, Azure AD, SSL/TLS, and AES encryption SharePoint, JIRA, Agile, Scrum, Kanban
standards. Operating Systems: Window, Linux,
 Proficient in project management tools like Bugzilla, Confluence, SharePoint, UNIX, macOS
JIRA, Agile, Scrum, and Kanban for efficient project execution and
collaboration. EDUCATION
 Analyzed business needs, designed efficient processes, and managed Masters
development teams to deliver successful projects. Bachelors
 Implemented data pipelines, ensured system integration, and adapted to new
technologies for continuous improvement.
 Collaborated with stakeholders at all levels to align project goals and ensure
informed decision-making.
 Tackled complex data challenges with strong analytical skills and a drive to
contribute to a dynamic and innovative environment.
WORK EXPERIENCE

NRG Energy, Huston, Tx


Sr. Data Engineer | Oct 2022 - Present
 Led the software development lifecycle (SDLC) for data engineering projects, from requirements gathering to
deployment and maintenance, ensuring quality and efficiency throughout the process.
 Managed data storage and retrieval using Amazon S3, optimizing data storage and access patterns for scalability and
performance.
 Designed and implemented data warehouse solutions using Amazon Redshift, ensuring efficient data modeling and query
performance for analytics.
 Managed relational databases using Amazon RDS, ensuring data integrity, availability, and performance.
 Implemented NoSQL database solutions using Amazon DynamoDB, enabling scalable and flexible data storage for various
use cases.
 Utilized Amazon EMR for big data processing and analytics, leveraging HDFS, MapReduce, Hive, and Pig for distributed
computing tasks.
 Implemented ETL processes using AWS Glue and AWS Data Pipeline, ensuring seamless data integration and
transformation.
 Managed real-time data streams using Amazon Kinesis, enabling stream processing and real-time analytics.
 Utilized Amazon Athena for interactive query processing, enabling ad-hoc analysis of data stored in S3.
 Developed interactive dashboards and reports using Amazon QuickSight, providing business intelligence insights to
stakeholders.
 Leveraged serverless computing with AWS Lambda for event-driven data processing and automation.
 Utilized Apache Spark for distributed data processing and analytics, optimizing data workflows and performance.
 Orchestrated data workflows using Apache Airflow, ensuring automation and scheduling of data pipelines.
 Applied SQL, Python, and PySpark for data manipulation, analysis, and machine learning model development, enhancing
data processing capabilities.
 Utilized Scala for Spark programming, optimizing Spark code for performance and scalability.
 Managed and analyzed data using Pandas and NumPy, ensuring efficient data processing and analysis workflows.
 Implemented columnar storage using Apache Parquet, optimizing data storage and query performance.
 Processed and transformed XML data formats, enabling structured data processing and integration.
 Utilized ERwin for data modeling and database design, ensuring data integrity and consistency.
 Managed access control and encryption using AWS IAM and AWS KMS, ensuring data security and compliance.
 Implemented encryption standards including SSL/TLS and AES, ensuring data protection during transmission and storage.
 Implemented data anonymization techniques and data governance policies, ensuring data privacy and compliance with
regulations.
 Monitored and managed AWS resources using AWS CloudWatch and AWS CloudTrail, ensuring performance
optimization and security.
 Managed code repositories and collaborated with teams using Git, ensuring version control and code quality.
 Managed project workflows and tasks using JIRA, ensuring collaboration and alignment with project goals and timelines.
 Automated infrastructure deployment using AWS CloudFormation, ensuring consistent and scalable infrastructure
configurations.
 Implemented continuous integration and continuous deployment (CI/CD) pipelines using AWS CodePipeline, ensuring
automated and reliable software delivery.
 Containerized applications using Docker, enabling scalable and portable deployment of data solutions.
 Orchestrated containerized applications using Kubernetes, ensuring efficient management and scaling of containerized
workloads.
 Contributed to Agile methodologies, participating in Scrum ceremonies and sprint planning to deliver data solutions
iteratively and efficiently.
Tech Stack: AWS, Redshift, DynamoDB, EMR, AWS Glue, Kinesis, Athena, QuickSight, AWS Lambda, HDFS, MapReduce, Hive,
Pig, Spark, Airflow, SQL, Python, PySpark, Scala, Parquet, XML, ERwin, IAM, KMS, CloudWatch, CloudTrail, GIT, JIRA , AWS
CloudFormation, Docker, Kubernetes, Agile (Scrum), JIRA.

First America, CA
Data Engineer | Nov 2020 - Sep 2022
 Designed and implemented data integration workflows using Azure Data Factory (ADF), ensuring seamless data
movement and transformation across on-premises and cloud environments.
 Managed and optimized SQL Server databases, ensuring data integrity, performance, and availability for business
operations.
 Implemented Azure Cosmos DB for globally distributed and scalable NoSQL database solutions, ensuring high availability
and low-latency data access.
 Utilized Snowflake for cloud-based data warehousing, enabling scalable and flexible analytics solutions.
 Implemented data processing and analytics workflows using Azure Databricks, leveraging Apache Spark for distributed
computing and machine learning.
 Managed and optimized big data clusters using Azure HDInsight, ensuring efficient data processing and analytics
capabilities.
 Utilized Azure Blob Storage and Azure Data Lake Storage for scalable and cost-effective data storage solutions.
 Automated tasks and workflows using PowerShell, streamlining data management and operations.
 Applied Python with Pandas, NumPy, and PyTorch for data manipulation, analysis, and machine learning model
development, enhancing data processing capabilities.
 Implemented data processing pipelines using Spark, handling large-scale data processing and analytics tasks.
 Developed serverless functions using Azure Functions, enabling event-driven data processing and automation.
 Managed and optimized Hadoop clusters for distributed data processing and analytics, ensuring scalability and
performance.
 Implemented Kafka for real-time data streaming and processing, enabling real-time analytics and event-driven
architectures.
 Managed secrets and access control using Azure Key Vault and Azure Active Directory (Azure AD), ensuring data security
and compliance.
 Processed and analyzed JSON data formats, enabling structured data processing and integration.
 Managed code repositories and collaborated with teams using Bitbucket, ensuring version control and code quality.
 Utilized Impala for interactive SQL queries and analytics on Hadoop-based data platforms.
 Implemented continuous integration and continuous deployment (CI/CD) pipelines using Azure DevOps, ensuring
automated and reliable software delivery.
 Monitored and managed Azure resources using Azure Monitor and Azure Log Analytics, ensuring performance
optimization and troubleshooting.
 Automated infrastructure deployment and management using Terraform, ensuring consistent and scalable infrastructure
configurations.
 Containerized applications and services using Docker, enabling scalable and portable deployment of data solutions.
 Orchestrated containerized applications using Kubernetes, ensuring efficient management and scaling of containerized
workloads.
 Developed and deployed interactive data visualizations using Power BI, enabling data-driven insights and decision-
making.
 Contributed to Agile methodologies, participating in Scrum ceremonies and sprint planning to deliver data solutions
iteratively and efficiently.
 Managed project workflows and tasks using JIRA, ensuring collaboration and alignment with project goals and timelines.
Tech Stack: ADF, SQL Server, Azure Cosmos DB, Snowflake, Azure Databricks, Azure HDInsight, Azure Blob Storage, Azure
Data Lake Storage, PowerShell, Python, Spark, Azure Functions, Hadoop, Kafka, JSON, Bitbucket, Impala, Azure DevOps,
Azure Monitor, Terraform, Docker, Kubernetes, Power BI, JIRA.

Jhonson & Johnson , Santa Clara, CA


Data Engineer | May 2019 - Oct 2020
 Utilized Google Cloud Platform (GCP) services including BigQuery, Dataflow, and Dataproc for data processing, analysis,
and orchestration, optimizing performance and scalability.
 Implemented Pub/Sub and Cloud Storage for real-time data ingestion and storage, ensuring reliable and scalable data
pipelines.
 Managed Cloud SQL and Cloud Datastore for structured and unstructured data storage, maintaining data integrity and
accessibility.
 Designed and implemented data streaming pipelines using Cloud Pub/Sub, Apache Beam, and Apache Kafka, enabling
real-time data processing and analytics.
 Leveraged Apache Spark and Hadoop for distributed data processing and analytics, handling large-scale datasets
efficiently.
 Orchestrated data workflows using Apache Airflow, ensuring automation and scheduling of data pipelines for timely
processing.
 Integrated data sources using Sqoop and Informatica, facilitating seamless data extraction, transformation, and loading
processes.
 Utilized Python with Pandas and NumPy for data manipulation, analysis, and modeling, enhancing data processing
capabilities.
 Developed data visualizations and dashboards using Data Studio and Google Analytics, providing actionable insights to
stakeholders.
 Managed code repositories and collaborated with teams using GitHub, ensuring version control and code quality.
 Orchestrated containerized applications using Docker and Kubernetes, ensuring scalability and reliability of deployed
data solutions.
 Automated infrastructure deployment and management using Terraform, optimizing resource utilization and cost
efficiency.
 Implemented continuous integration and deployment pipelines using Jenkins, ensuring seamless delivery of data
solutions.
 Utilized ELK Stack (Elasticsearch, Logstash, Kibana) for log analysis and monitoring, ensuring data visibility and
troubleshooting capabilities.
 Handled data serialization using Avro, ensuring efficient data storage and processing.
 Managed relational databases including PostgreSQL and NoSQL databases like Cassandra, ensuring data availability and
performance.
 Developed and deployed machine learning models using TensorFlow, enhancing data analytics and predictive
capabilities.
 Utilized VS Code for code development and debugging, ensuring code efficiency and reliability.
 Contributed to Agile and Kanban methodologies, participating in sprint planning, daily stand-ups, and backlog grooming
to deliver data solutions efficiently.
 Collaborated and documented technical specifications using Confluence, ensuring knowledge sharing and
documentation of data solutions.
Tech Stack: GCP, BigQuery, Apache Beam, Apache Spark, Apache, Airflow, Hadoop, Kafka, Sqoop, Informatica, Python, Scala,
Data Studio, Google Analytics, GitHub, Terraform, Jenkins, ELK Stack, Avro, PostgreSQL, Cassandra, TensorFlow, VS Code,
Agile, Kanban, Confluence.

Mercury Insurance, San Antonio, Tx


Data Engineer | Feb 2018 - Apr 2019
 Utilized Hadoop, Spark, and Hive to process and analyze large volumes of data, optimizing performance and scalability
for big data applications.
 Implemented Sqoop for efficient data transfer between Hadoop and relational databases, ensuring seamless data
integration and synchronization.
 Developed complex SQL and PL/SQL queries to extract, transform, and load data from diverse sources into data
warehouses, improving data accessibility and analysis capabilities.
 Managed AWS resources including EC2 instances, S3 buckets, RDS databases, and Lambda functions, leveraging cloud
services for scalable and cost-effective data solutions.
 Utilized Python with NumPy and Pandas for data manipulation, statistical analysis, and machine learning model
development, enhancing data processing workflows.
 Maintained version control and collaborated with teams using Git, ensuring code quality, and facilitating efficient project
management.
 Designed and implemented ETL processes using SSIS (SQL Server Integration Services), ensuring data quality and
consistency across data pipelines.
 Managed bug tracking and issue resolution using Bugzilla, ensuring data integrity and timely resolution of data-related
issues.
 Contributed to Agile and Kanban methodologies, participating in sprint planning, daily stand-ups, and backlog grooming
to deliver data solutions efficiently.
 Developed interactive dashboards and visualizations using Tableau, providing stakeholders with actionable insights and
data-driven decision-making capabilities.
 Collaborated with SharePoint for document management and collaboration, ensuring data governance and compliance
with organizational standards.
 Implemented data security measures and access controls, ensuring data privacy and compliance with regulatory
requirements.
 Conducted performance tuning and optimization of database queries and processes, improving data processing
efficiency and reducing latency.
 Participated in data architecture design and data modeling activities, ensuring scalability, flexibility, and performance of
data solutions.
 Provided technical expertise and support to cross-functional teams, contributing to the successful delivery of data
projects and initiatives.
Tech Stack: Hadoop, Spark, Hive, Sqoop, SQL, PL/SQL, AWS, EC2, S3, RDS, Lambda, Python (NumPy, Pandas), Git, SSIS,
Bugzilla, Agile, Kanban, Tableau, SharePoint.

Optum, Hyderabad, India


Data Analyst/ Engineer | June 2012 - Dec 2015
 Utilized Python, pandas, NumPy, and scikit-learn for data cleaning, preprocessing, analysis, and machine learning model
development, resulting in improved data quality and predictive accuracy.
 Leveraged SQL to query, manipulate, and extract insights from large datasets stored in Oracle databases, ensuring data
integrity and optimizing data retrieval performance.
 Demonstrated expertise in Excel Advanced functions, pivot tables, and V Lookups to create interactive dashboards and
reports for stakeholders, facilitating data-driven decision-making processes.
 Implemented Apache Spark for big data processing and analytics, handling large-scale datasets efficiently and performing
distributed computing tasks for faster data processing.
 Designed and implemented data integration workflows using Talend, ensuring seamless data flow between
heterogeneous systems and maintaining data consistency across platforms.
 Managed version control and collaborated with teams using Git, tracking changes and resolving issues efficiently to
maintain code quality and project progress.
 Utilized Bugzilla for bug tracking and issue management, ensuring timely resolution of data-related issues and
maintaining data accuracy.
 Applied data engineering techniques in Hadoop ecosystem, including HDFS, Hive, and HBase, to store, process, and
analyze large volumes of structured and unstructured data.
 Collaborated with cross-functional teams to develop and deploy data pipelines and ETL processes, ensuring data
availability and reliability for business analytics and reporting.
 Contributed to data governance initiatives by establishing data quality standards, monitoring data quality metrics, and
implementing data cleansing and enrichment strategies.
Tech Stack: Python, pandas, NumPy, scikit-learn, SQL, Excel Advanced functions, pivot tables, V Lookups, Spark, Talend,
Oracle, Hadoop, Hive, HBase, Git, Bugzilla.

You might also like