Dinesh Kumar Pullepally

Dinesh Kumar Pullepally
[email protected]
SUMMARY:
 10 years of experience in the IT industry. Skilled in leveraging tools, including Azure services, big data
components such as Hadoop, Hive, MapReduce, and cloud data modeling schemas like Snowflake and Star.
Extensive experience with Informatica.
 Developed comprehensive data pipeline solutions using a wide range of technologies Azure Data Bricks, ADF,
Azure Synapse, Azure functions, Logic Apps, ADLS Gen2, Hadoop, Stream Sets, PySpark, Map Reduce, Hive,
HBase, Python, Scala, and Snowflake.
 Proven expertise with Apache Sqoop for synchronizing data across HDFS and Hive, and configured and
managed workflows with Apache Oozie, Control M, and Microsoft Purview to efficiently schedule, manage,
and govern Hadoop processes.
 Designed and developed complex data transformations using mapping data flows in Azure Data Factory and
Azure Data Bricks, optimizing data processing, and enhancing overall efficiency.
 Built production data pipelines using Apache Airflow, YAML, Terraform scripts, and incorporated Microsoft
Purview for automated metadata management and governance.
 Demonstrated expertise in developing efficient data pipelines, utilizing technologies such as Delta Lake, Delta
Tables, Delta Live Tables, Data Catalogs, and Delta Lake API.
 Proficiency in Apache Kafka-driven real-time streaming analytics in Spark Streaming, enabling efficient
processing and analysis of high-velocity streaming data, while utilizing Kafka as a fault-tolerant data pipeline
integrated with Microsoft Purview for comprehensive data lineage tracking.
 Developed cloud-based data warehouse solutions using Snowflake, optimizing schemas, tables, and views for
efficient data storage and retrieval. Implemented SQL Analytical Functions & Window Functions.
 Leveraged AWS services including EMR, Glue, and Lambda for transforming, moving data, and automating
processes, showcasing a robust AWS-based environment.
TECHNICAL SKILLS:
 Azure Services: Azure Data Factory, Azure Data Bricks, Azure Synapse analytics, Azure blob storage, Logic
Apps, Function Apps, Azure Data Lake Gen2, Azure SQL Database, Azure key vault, Azure DevOps.
 AWS Services: EC2, S3, Glue, Lambda functions.
 Big Data Technologies: MapReduce, Hive, Tez, HDFS, YARN, Pyspark, Hue, Kafka, Spark streaming, Oozie,
Sqoop, Zookeeper, Airflow.
 Hadoop Distribution: Cloudera, Horton Works
 Languages: SQL, PL/SQL, Hive Query Language, Azure Machine learning, Python, Scala, Java.
 Web Technologies: JavaScript, JSP, XML, Restful, SOAP, FTP, SFTP
 Operating Systems: Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.
 Build Automation tools: Ant, Maven, Toad, AutoSys
 Version Control: GIT, GitHub.
 IDE & Build Tools, Design: Eclipse, IntelliJ IDEA, Visual Studio, SSIS, informatica, Erwin, Tableau, Business
Objects, Power BI
 Databases: MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse, MS Excel, MS Access, Oracle
11g/12c, Cosmos DB, Mongo DB, Cassandra, HBase.
EXPERIENCE:
09/22 to KOANTEK – Remote.
Present Senior Data Governance Engineer
 Architected comprehensive data governance frameworks using Purview, ensuring robust data cataloging,
lineage tracking, and metadata management.
 Led the implementation of data stewardship programs, utilizing effective data asset management and
governance.
 Define and enforce data quality rules, enhancing data accuracy and reliability across business processes.
 Engineered automated data classification workflows using Purview, improving data governance and compliance
with privacy regulations.
 Conducted data lineage analysis, identifying, and resolving data quality issues and enhancing data
trustworthiness.
 Orchestrated data governance initiatives and fostering a culture of data accountability and stewardship within
the organization.
 Configured and managed data sensitivity labels and ensuring proper handling of sensitive and confidential data.
 Developed data governance dashboards and providing real-time insights into data quality, usage, and
compliance metrics.
 Established data governance best practices, enhancing data management processes and decision-making
capabilities.
 Implemented a multi-node cloud cluster on AWS EC2, utilizing CloudWatch and CloudTrail for monitoring and
logging with versioned S3 storage.
 Developed Spark applications for extensive data processing and employed Marillion ETL for pipeline design
and maintenance.
 Enabled real-time data movement using Spark Structured Streaming, Kafka, and Elasticsearch, along with
Tableau refresh via AWS Lambda.
 Map and document data assets, ensuring comprehensive visibility into data sources, flows, and transformations.
 Developed and executed data classification strategies, enhancing data privacy and protection measures.
 Designed data governance training programs, promoting data literacy and governance awareness across the
organization.
Environment: Azure Databricks, Azure Data Factory, Azure Blob storage, Azure Synapse Analytics, Azure Data
Lake, Azure Event hub, Azure DevOps, Logic Apps, Function Apps, MS SQL, Python, Snowflake, Pyspark, Kafka,
Power Bi.
09/20 to US BANK - Minneapolis, MN.

08/22 Azure Data Governance Engineer
 Designed and implemented comprehensive data governance frameworks using Microsoft Purview, ensuring
robust data cataloging, lineage tracking, and metadata management.
 Developed and enforced data privacy and protection policies leveraging Risk and Compliance features, ensuring
adherence to GDPR, CCPA, and other regulatory requirements.
 Engineered automated data classification and labeling workflows, enhancing data governance and regulatory
compliance.
 Implemented end-to-end data lineage tracking with Microsoft Purview, providing complete visibility into data
transformations and flows.
 Establish a unified data catalog, facilitating easy data discovery and enhancing collaboration among data users.
 Implemented role-based access control (RBAC), safeguarding sensitive data, and ensuring appropriate access
levels.
 Conducted data risk assessments, Risk and Compliance tools, identifying potential data privacy and security
risks.
 Orchestrated data governance initiatives using Microsoft Purview, fostering a culture of data accountability and
stewardship within the organization.
 Developed advanced metadata search capabilities in Microsoft Purview, enabling efficient data discovery and
retrieval for analytics and reporting.
 Configured and managed data sensitivity labels, ensuring proper handling of sensitive and confidential data.
 Designed and implemented data audit trails, ensuring transparent data governance and regulatory compliance.
 Leveraged streamline data governance workflows, improving efficiency and collaboration across data teams.
 Managed data lifecycle processes, ensuring proper data retention, archiving, and disposal practices.
 Integrated Microsoft Purview with enterprise data sources, ensuring consistent metadata management and data
governance across diverse systems.
 Conducted impact analysis and data lineage assessments, supporting efficient change management and data
governance.
 Implemented data policy enforcement mechanisms, ensuring compliance with organizational and regulatory
data governance standards.
 Implemented data governance monitoring and alerting, enabling timely identification and resolution of data
issues.
 Partnered with cross-functional teams to integrate into data governance workflows, ensuring alignment with
business objectives and regulatory requirements.
 Developed comprehensive data governance reports, providing stakeholders with actionable insights and
governance status updates.
 Implemented data governance best practices leveraging enhancing data management processes and decision-
making capabilities.
 Conducted data risk and compliance audits, identifying, and mitigating potential data governance issues.
Environment: Azure Databricks, Azure Data Factory, Azure Data Lake Storage Gen2, Azure Synapse Analytics,
Logic Apps, Azure SQL Database, Oracle, Snowflake, Pyspark, Power bi.
12/18 to OPTION CARE - Brecksville, OH
08/20 Data Engineer
 Designed and implemented end-to-end data pipelines using AWS services for efficient data ingestion,
transformation, and loading (ETL) into Snowflake data warehouse.
 Utilized AWS EMR and Redshift for large-scale data processing, transforming, and moving data into and out of
AWS S3.
 Developed and maintained ETL processes with AWS Glue, migrating data from various sources into AWS
Redshift.
 Implemented serverless computing with AWS Lambda, executing real-time Tableau refreshes and other
automated processes.
 Utilized AWS SNS, SQS, and Kinesis for efficient messaging and data streaming, enabling event-driven
communication and message queuing.
 Designed and orchestrated workflows with AWS Step Functions, automating intricate multi-stage data
workflows.
 Implemented data movement with Kafka and Spark Streaming for efficient real-time data ingestion and
transformation.
 Integrated and monitored ML workflows with Apache Airflow, ensuring smooth task execution on Amazon
SageMaker.
 Leveraged Hadoop ecosystem tools, including Hadoop, MapReduce, Hive, Pig, and Spark for big data
processing and analysis.
 Managed workflows with Oozie, orchestrating effective coordination and scheduling in big data projects.
 Utilized Sqoop for data import/export between Hadoop and RDBMS, importing normalized data from staging
areas to HDFS and performing analysis using Hive Query Language (HQL).
 Ensured version control with Git/GitHub, maintaining version control of the codebase and configurations.
 Automated deployment with Jenkins and Terraform, facilitating the automated deployment of applications and
data pipelines.
 Worked with various databases, including SQL Server, Snowflake, and Teradata, for efficient data storage and
retrieval.
 Performed data modeling with Python, SQL, and Erwin, implementing Dimensional and Relational Data
Modeling with a focus on Star and Snowflake Schemas.
 Implemented and optimized Apache Spark applications, creating Spark applications extensively using Spark
DataFrames, Spark SQL API, and Spark Scala API for batch processing of jobs.
 Collaborated with business users for Tableau dashboards, facilitating actionable insights based on Hive tables.
 Enhanced performance using optimization techniques, leading to the optimization of complex data models in
PL/SQL, improving query performance by 30% in high-volume environments.
 Developed predictive analytics reports with Python and Tableau, visualizing model performance and prediction
results.
Environment: AWS S3, AWS Redshift, HDFS, Amazon RDS, Apache Airflow, Tableau, AWS Cloud Formation,
AWS Glue, Apache Airflow Apache Cassandra, Terraform.
07/17 to CHANGE HEALTHCARE - Nashville, TN

11/18 Big Data Developer
 Developed an ETL framework utilizing Sqoop, Pig, and Hive to seamlessly extract, transform, and load data
from diverse sources, making it readily available for consumption.
 Processed HDFS data and established external tables using Hive, while also crafting scripts for table ingestion
and repair to ensure reusability across the entire project.
 Engineered ETL jobs utilizing Spark and Scala to efficiently migrate data from Oracle to new MySQL tables.
 Employed Spark (RDDs, Data Frames, Spark SQL) and Spark-Cassandra Connector APIs for a range of tasks,
including data migration and the generation of business reports.
 Collaborated with Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka, and Sqoop.
 Played a significant role in crafting combiners, implementing partitioning, and leveraging distributed cache to
boost the performance of MapReduce jobs.
 Designed and implemented a Spark Streaming application for real-time sales analytics.
 Analyzed source data, proficiently managed data type modifications, and employed Excel sheets, flat files, and
CSV files to generate ad-hoc reports using Power BI.
 Successfully addressed intricate technical challenges throughout the development process.
 Analyzed SQL scripts and designed solutions using spark Scala.
 Utilized Sqoop to extract data from diverse sources and load it into Hadoop Distributed File System (HDFS).
 Managed data import from diverse sources, executed transformations using Hive and MapReduce, and loaded
processed data into Hadoop Distributed File System (HDFS).
 Developed detailed requirements specification for a new system using use cases and data flow diagrams.
 Expertise in optimizing complex SQL queries for improved performance and reduced execution time,
employing Snowflake query profiling and optimization techniques.
 Utilized Sqoop to extract data from MySQL and efficiently load it into Hadoop Distributed File System
(HDFS).
 Implemented automation for deployments using YAML scripts for streamlined builds and releases.
 Utilized Git and GitHub repositories to maintain the source code and enable version control.
Environment: Hadoop, MapReduce, Hive, Hue, spark, Scala, Sqoop, Spark SQL, Machine learning, Snowflake,
Shell scripting, Cassandra, ETL.
07/12 to BIRLA SOFT - Hyderabad, India.

05/16 Data Warehouse Developer
 Developed source data profiling and analysis processes, thoroughly reviewing data content and metadata. This
facilitated data mapping and validation of assumptions made in the business requirements.
 Created and maintained databases for Server Inventory and Performance Inventory.
 Operated within the Agile Scrum Methodology, engaging in daily stand-up meetings. Proficient in utilizing
Visual SourceSafe for Visual Studio 2010 and project tracking through Trello.
 Created Drill-Through and Drill-Down reports with dropdown menu options, implemented data sorting, and
defined subtotals for enhanced data exploration.
 Utilized the Data Warehouse to craft a Data Mart feeding downstream reports. Additionally, engineered a User
Access Tool empowering users to generate ad-hoc reports and run queries, facilitating data analysis within the
proposed Cube.
 Developed logical and physical designs of databases and ER Diagrams for both Relational and Dimensional
databases using Erwin.
 Designed and implemented comprehensive data warehousing solutions, including the development of ETL
processes and dimensional data models, resulting in improved data accessibility and analytical capabilities.
 Engineered and optimized SQL queries and stored procedures for efficient data extraction, transformation, and
loading (ETL) processes in support of data warehousing initiatives.
 Proficient in Dimensional Data Modeling for Data Mart design, adept at identifying Facts and Dimensions, and
skilled in developing fact tables and dimension tables using Slowly Changing Dimensions (SCD).
 Developed a Business Objects dashboard that facilitated the tracking of marketing campaign performance for a
company.
 Deployed SSIS packages and established jobs to ensure the efficient execution of the packages.
 Involved in System Integration Testing (SIT) and User Acceptance Testing (UAT).
 Crafted intricate mappings utilizing Source Qualifier, Joiners, Lookups (Connected and Unconnected),
Expression, Filters, Router, Aggregator, Sorter, Update Strategy, Stored procedure, and Normalizer
transformations.
 Proficient in crafting ETL packages using SSIS to extract data from heterogeneous databases, followed by
transformation and loading into the data mart.
 Implemented and maintained data integration workflows using ETL tools like Informatica, SSIS, or Talend,
facilitating seamless data movement across the data warehouse.
 Engaged in the creation of SSIS jobs to automate the generation of reports and refresh packages for cubes.
 Experienced in utilizing SQL Server Reporting Services (SSRS) for authoring, managing, and delivering both
paper-based and interactive web-based reports.
 Designed and implemented stored procedures and triggers to ensure consistent and accurate data entry into the
database.
Environment: Informatica 8.6.1, SQL Server 2005, RDBMS, Fast load, FTP, SFTP, Windows server, MS SQL
Server 2014, SSIS, SSAS, SSRS, SQL Profiler, Dimensions, Performance Point Server, MS Office, SharePoint.
EDUCATION:
Bachelor’s degree in computer science
Andhra University.
CERTIFICATION:
 Microsoft Certified: Azure Data Engineer Associate

Dinesh Kumar Pullepally

Uploaded by

Copyright:

Available Formats

Dinesh Kumar Pullepally

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dinesh Kumar Pullepally

Uploaded by

Copyright:

Available Formats

Dinesh Kumar Pullepally

09/20 to US BANK - Minneapolis, MN.

07/17 to CHANGE HEALTHCARE - Nashville, TN

07/12 to BIRLA SOFT - Hyderabad, India.

You might also like