Dinesh Kumar Pullepally
Dinesh Kumar Pullepally
Dinesh Kumar Pullepally
SUMMARY:
10 years of experience in the IT industry. Skilled in leveraging tools, including Azure services, big data
components such as Hadoop, Hive, MapReduce, and cloud data modeling schemas like Snowflake and Star.
Extensive experience with Informatica.
Developed comprehensive data pipeline solutions using a wide range of technologies Azure Data Bricks, ADF,
Azure Synapse, Azure functions, Logic Apps, ADLS Gen2, Hadoop, Stream Sets, PySpark, Map Reduce, Hive,
HBase, Python, Scala, and Snowflake.
Proven expertise with Apache Sqoop for synchronizing data across HDFS and Hive, and configured and
managed workflows with Apache Oozie, Control M, and Microsoft Purview to efficiently schedule, manage,
and govern Hadoop processes.
Designed and developed complex data transformations using mapping data flows in Azure Data Factory and
Azure Data Bricks, optimizing data processing, and enhancing overall efficiency.
Built production data pipelines using Apache Airflow, YAML, Terraform scripts, and incorporated Microsoft
Purview for automated metadata management and governance.
Demonstrated expertise in developing efficient data pipelines, utilizing technologies such as Delta Lake, Delta
Tables, Delta Live Tables, Data Catalogs, and Delta Lake API.
Proficiency in Apache Kafka-driven real-time streaming analytics in Spark Streaming, enabling efficient
processing and analysis of high-velocity streaming data, while utilizing Kafka as a fault-tolerant data pipeline
integrated with Microsoft Purview for comprehensive data lineage tracking.
Developed cloud-based data warehouse solutions using Snowflake, optimizing schemas, tables, and views for
efficient data storage and retrieval. Implemented SQL Analytical Functions & Window Functions.
Leveraged AWS services including EMR, Glue, and Lambda for transforming, moving data, and automating
processes, showcasing a robust AWS-based environment.
TECHNICAL SKILLS:
Azure Services: Azure Data Factory, Azure Data Bricks, Azure Synapse analytics, Azure blob storage, Logic
Apps, Function Apps, Azure Data Lake Gen2, Azure SQL Database, Azure key vault, Azure DevOps.
AWS Services: EC2, S3, Glue, Lambda functions.
Big Data Technologies: MapReduce, Hive, Tez, HDFS, YARN, Pyspark, Hue, Kafka, Spark streaming, Oozie,
Sqoop, Zookeeper, Airflow.
Hadoop Distribution: Cloudera, Horton Works
Languages: SQL, PL/SQL, Hive Query Language, Azure Machine learning, Python, Scala, Java.
Web Technologies: JavaScript, JSP, XML, Restful, SOAP, FTP, SFTP
Operating Systems: Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.
Build Automation tools: Ant, Maven, Toad, AutoSys
Version Control: GIT, GitHub.
IDE & Build Tools, Design: Eclipse, IntelliJ IDEA, Visual Studio, SSIS, informatica, Erwin, Tableau, Business
Objects, Power BI
Databases: MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse, MS Excel, MS Access, Oracle
11g/12c, Cosmos DB, Mongo DB, Cassandra, HBase.
EXPERIENCE:
09/22 to KOANTEK – Remote.
Present Senior Data Governance Engineer
Architected comprehensive data governance frameworks using Purview, ensuring robust data cataloging,
lineage tracking, and metadata management.
Led the implementation of data stewardship programs, utilizing effective data asset management and
governance.
Define and enforce data quality rules, enhancing data accuracy and reliability across business processes.
Engineered automated data classification workflows using Purview, improving data governance and compliance
with privacy regulations.
Conducted data lineage analysis, identifying, and resolving data quality issues and enhancing data
trustworthiness.
Orchestrated data governance initiatives and fostering a culture of data accountability and stewardship within
the organization.
Configured and managed data sensitivity labels and ensuring proper handling of sensitive and confidential data.
Developed data governance dashboards and providing real-time insights into data quality, usage, and
compliance metrics.
Established data governance best practices, enhancing data management processes and decision-making
capabilities.
Implemented a multi-node cloud cluster on AWS EC2, utilizing CloudWatch and CloudTrail for monitoring and
logging with versioned S3 storage.
Developed Spark applications for extensive data processing and employed Marillion ETL for pipeline design
and maintenance.
Enabled real-time data movement using Spark Structured Streaming, Kafka, and Elasticsearch, along with
Tableau refresh via AWS Lambda.
Map and document data assets, ensuring comprehensive visibility into data sources, flows, and transformations.
Developed and executed data classification strategies, enhancing data privacy and protection measures.
Designed data governance training programs, promoting data literacy and governance awareness across the
organization.
Environment: Azure Databricks, Azure Data Factory, Azure Blob storage, Azure Synapse Analytics, Azure Data
Lake, Azure Event hub, Azure DevOps, Logic Apps, Function Apps, MS SQL, Python, Snowflake, Pyspark, Kafka,
Power Bi.
EDUCATION:
Bachelor’s degree in computer science
Andhra University.
CERTIFICATION:
Microsoft Certified: Azure Data Engineer Associate