Srilakshi M Resume

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

SRILAKSHMI MANNEMALA

(AWS Snowflake Data Engineer)


Email: [email protected]

Professional Summary:

 Experienced AWS Data Engineer with expertise in deploying and managing AWS services including EC2, Lambda,
Beanstalk, Batch, ECS, S3, EFS, Glacier, CloudTrail, CloudWatch, Athena, EMR, Glue, Redshift, SNS, SQS, Step
Functions, Code Deploy, and Code Pipeline.
 Strong experience in designing, building, and maintaining data pipelines, data lakes, and data warehouses on AWS
platform using various services such as Glue, Athena, EMR, Redshift, and Kinesis.
 Proficient in writing complex SQL queries and creating ETL scripts using Python or Spark to process and transform
large datasets stored in S3 or other data sources.
 Skilled in data modelling, schema design, and performance optimization for big data applications.
 Experience in configuring and monitoring data workflows using AWS services such as CloudTrail, CloudWatch, SNS,
and SQS.
 Designined, implemented, and tested machine learning models to extract valuable insights from large and complex
datasets.
 Developed data models using Star and Snowflake schemas, ensuring optimal performance and flexibility for
analytical queries.
 Familiar with best practices for security, compliance, and data governance on AWS platform.
 Demonstrated ability to work in a fast-paced and collaborative environment, and to communicate effectively with
cross-functional teams.
 AWS certified in one or more relevant areas such as AWS Certified Big Data - Specialty, AWS Certified Solutions
Architect - Associate, or AWS Certified Developer - Associate.
 Experienced AWS Data Engineer with expertise in Python object-oriented programming, PySpark, and data storage
technologies such as RDS and DynamoDB.
 Experience building out machine learning algorithms for fraud detection and risk analysis.
 Developed data models and implemented ETL processes using Ab Initio and Snowflake, ensuring efficient data
integration, transformation, and loading from various sources.
 Developed ETL pipelines in and out of data warehouse using a combination of Python and Snowflakes Snow SQL
and SQL queries against Snowflake.
 Proficienct in programming languages such as Python, R, or Java, and experience with machine learning libraries
(e.g., TensorFlow, PyTorch, scikit-learn).
 Expertise in transforming business requirements into analytical models, designing algorithms, building models,
developing Data Mining, Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, Machine
Learning Algorithms, Validation and Visualization and reporting solutions that scales across massive volume of
structured and unstructured Data.ml
 Skilled in designing and implementing robust data pipelines using AWS services such as S3, Glue, and Lambda
functions.
 Proficient in developing RESTful APIs and integrating with third-party services, utilizing JSON as the primary data
exchange format.
 Proficient in GCP core services , including Compute Engine, Storage, and Networking.
 Proven track record of delivering end-to-end data solutions that meet business requirements, from data ingestion
to visualization and reporting.
 Strong analytical and problem-solving skills, with the ability to identify and resolve data-related issues quickly and
efficiently.
 Excellent communication and collaboration skills, able to work effectively with cross-functional teams and
stakeholders to ensure project success.
 Experienced AWS Data Engineer with a strong background in SQL, Airflow, Avro, Parquet, Sequence, JSON, ORC,
Kafka, Spark, HDFS, Hadoop, HiveQL, and HBASE.
 Skilled in data modeling, ETL development, data warehousing, and data integration using various tools and
technologies such as Python, PySpark, Scala, and Java.
 Used apache airflow in GCP composer environment to build data pipelines used various airflow operations
like bash operator.
 Demonstrated ability to work with cross-functional teams to understand business requirements and translate them
into technical solutions that deliver value to the business.
 Strong understanding of data security, compliance, and governance frameworks, including GDPR, CCPA, HIPAA, and
PCI.
 Experience in performance tuning, optimization, and troubleshooting of data pipelines and workflows to ensure
timely and accurate data delivery.
 Excellent communication, collaboration, and leadership skills with a track record of delivering high-quality projects
on time and within budget.
 Experienced AWS data engineer with expertise in designing, building, and maintaining scalable data solutions in the
cloud.
 Skilled in data warehousing, data lakes, data marts, and big data technologies such as Hadoop, Spark, and EMR.
 Proficient in data migration, cloud migration, and ETL processes using tools such as AWS Glue, AWS Data Pipeline,
and Apache Nifi.
 Utilized Snowflake for data warehousing, implementing SnowSQL, SnowPipe, Streams, Tasks, Shares, Snowpipe,
Data Sharing, zero copy Clone, Materialized views, Time Travel, and advanced SnowSQL for optimized data
processing and analysis.
 Familiar with SCD1 and SCD2 techniques for handling slowly changing dimensions in data warehouses.
 Skilled in GCP integration, data infrastructure design, and ETL processes. Strong collaborator and problem-solver
with a focus on delivering high-quality solutions. Continuously updated on cloud technologies.
 Knowledgeable in testing frameworks such as JUnit and JMeter for ensuring data quality and performance.
 Expertise in ELT (Extract, Load, Transform) processes. Skilled in designing and implementing efficient data pipelines
for optimal data extraction, loading, and transformation. Proficient in AWS technologies and Collaborative team
player focused on delivering high-quality data solutions. Up-to-date with the latest advancements in ELT
methodologies and cloud technologies.
 Proficient in project management tools such as Jira for agile development methodologies.
 Experienced in version control tools such as GitHub and Bitbucket for managing code repositories and collaborating
with team members.
 Experienced professional proficient in leveraging Alation to streamline data management, enhance metadata
organization, and facilitate data-driven decision-making
 Analysed complex business challenges and devised innovative solutions to improve efficiency, productivity, and
profitability.
 Proficient in leveraging PostgreSQL and Alation to streamline data organization, enhance metadata management,
and facilitate data-driven decision-making for optimal business outcomes.
Education:

• Bachelor’s in Electronics and Communications, JNTU, ANANTHAPUR.

Technical skills:

AWS Services AWS s3, redshift, EMR, SNS, SQS, Athena, glue, CloudWatch, IAM
Big Data Technologies HDFS, SQOOP, PySpark, hive, MapReduce, spark, spark streaming, HBASE
Hadoop Distribution Cloudera, Horton Works
Languages SQL, PL/SQL, Python, HiveQL, Scala.
Operating Systems Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.
Database Teradata, oracle, SQL server,
Scheduling Control-m, oozie, airflow
Version Control GIT, GitHub, VSS
Methodology Agile, Scrum,Jira.
IDE &Build Tools, Design Eclipse, Visual Studio.
Cloud Computing Tools AWS, Snowflake.

Work Experience:

Client: Marvellous Tehnologies, Gurnee, Illinouis.


Role: AWS Snowflake Data Engineer | May 2023 to till date
Responsibilities:
 Designed and implemented Snowflake stages to efficiently load data from various sources into Snowflake tables.
 Created and managed different types of tables in Snowflake, such as transient, temporary, and persistent tables.
 Optimized Snowflake warehouses by selecting appropriate sizes and configurations to achieve optimal performance and
cost efficiency.
 Developed complex Snow SQL queries to extract, transform, and load data from various sources into Snowflake.
 Implemented partitioning techniques in Snowflake to improve query performance and data retrieval.
 Assisted in analyzing sales data, identifying potential growth opportunities and inefficiencies.
 Created interactive Tableau visualizations to represent regional sales performance and market trends.
 Configured and managed multi-cluster warehouses in Snowflake to handle high-concurrency workloads effectively.
 Defined roles and access privileges in Snowflake to ensure proper data security and governance.
 Implemented Snowflake caching mechanisms to improve query performance and reduce data transfer costs.
 Utilized Snow pipe for real-time data ingestion into Snowflake, ensuring continuous data availability and automated
data loading processes.
 Leveraged Snowflake's time travel features to track and restore historical data for auditing and analysis purposes.
 Implemented regular expressions in Snowflake for pattern matching and data extraction tasks.
 Developed Snowflake scripting solutions to automate data pipelines, ETL processes, and data transformations.
 Migrated other data warehouses to Snowflake data warehouse. Created databases, schemas, tables in snowflake and
loaded the raw data from AWS S3 .
 AWS Cloud Data Engineering:
 Designed and implemented data ingestion and storage solutions using AWS S3, Redshift, and Glue.
 Developed ETL workflows using AWS Glue to extract, transform, and load data from various sources into Redshift.
 Integrated AWS SNS and SQS for real-time event processing and messaging.
 Implemented AWS Athena for ad-hoc data analysis and querying on S3 data.
 Utilized AWS CloudWatch for monitoring and managing resources, setting up alarms, and collecting metrics.
 Designed and implemented data streaming solutions using AWS Kinesis for real-time data processing.
 Managed DNS configurations and routing using AWS Route53 for efficient application and service deployment.
 Developed data processing pipelines using Hadoop, including HDFS, Sqoop, Hive, MapReduce, and Spark.
 Implemented Spark Streaming for real-time data processing and analytics.
 Implemented scheduling and job automation using IBM Tivoli, Control-M, Oozie, and Airflow.
 Designed and configured workflows for data processing and ETL pipelines.
 Designed and developed database solutions using Teradata, Oracle, and SQL Server.
 Managed and maintained PostgreSQL databases, including installation, configuration, and performance tuning.
Implemented backup and recovery strategies, monitored database health, and optimized query performance to ensure
optimal data availability and system efficiency.
 Implemented and administrated Alation, a data catalog software platform, to enable efficient data discovery, metadata
management, and data governance within the organization.
Environment: AWS, AWS S3, redshift, EMR, SNS, SQS, aetna, glue, CloudWatch, kenisis, route53, IAM, Sqoop, MYSQL, HDFS,
Apache Spark, Hive, Cloudera, Kafka, Zookeeper, Oozie, PySpark, Ambari, JIRA, IBM Tivoli, control-m, OOZIE, airflow, Teradata,
oracle, SQL, Alation, PostgreSQL

Client: Sureminds Solutions PVT.Ltd


Role: Snowflake Engineer Jan 2022 to Apr 2023
Responsibilities:
 Implemented advanced partitioning techniques in Snowflake to significantly enhance query performance and expedite
data retrieval.
 Created and managed various types of Snowflake tables, including transient, temporary, and persistent tables, to cater
to specific data storage and processing needs.
 Defined robust roles and access privileges within Snowflake to enforce strict data security and governance protocols.
 Implemented regular expressions in Snowflake for seamless pattern matching and data extraction tasks.
 Developed and implemented Snowflake scripting solutions to automate critical data pipelines, ETL processes, and data
transformations,
 Developed and optimized ETL workflows using AWS Glue to extract, transform, and load data from diverse sources into
Redshift for efficient data processing.
 Configured and fine-tuned Redshift clusters to achieve high-performance data processing and streamlined querying.
 Involved in the development of the Hadoop System and improving multi-node Hadoop Cluster performance.
 Worked on analyzing the Hadoop stack and different big data tools including Pig, Hive, HBase database, and Sqoop.
 Worked on Hadoop eco-systems including Hive, HBase, Oozie, Pig, Zookeeper, Spark Streaming MCS (MapR Control
System), and so on with MapR distribution.
 Installed and configured Hadoop MapReduce, HDFS, and Developed multiple MapReduce jobs in Java for data cleaning
and pre-processing.
 Developed data pipeline using Flume, Sqoop, and pig to extract the data from weblogs and store it in HDFS
 Worked with different data sources like Avro data files, XML files, JSON files, SQL server and Oracle to load data into
Hive tables.
 Used Spark to create structured data from large amounts of unstructured data from various sources.
 Implemented usage of Amazon EMR for processing Big Data across the Hadoop Cluster of virtual servers on Amazon
Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
 Used AWS Lambda to implement real-time data analytics, such as log analysis, user behavior analysis, and predictive
analytics.
 Designed and implemented Glue workflows to incrementally update a large data set, reducing the processing time and
resource utilization required.
 Experienced in creating interactive dashboards using Tableau functionality such as parameters, filters, calculated fields,
sets, groups and hierarchies.
 Developed Python scripts to find vulnerabilities with SQL Queries by doing SQL injection.
 Experienced in designing and developing POC’s in Spark using Scala to compare the performance of Spark with Hive and
SQL/Oracle.
 Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File
format.
 Written shell scripts that run multiple Hive jobs which helps to automate different Hive tables incrementally which are
used to generate different reports using Tableau for Business use.
 Imported weblogs & unstructured data using the Apache Flume and stores the data in Flume channel.
 Exported event weblogs to HDFS by creating a HDFS sink that directly deposits the weblogs in HDFS.
 Used RESTful web services with MVC for parsing and processing XML data.
 Managing the OpenShift cluster, which involves scaling up and down the Amazon Web Services application nodes.
 Expertise in developing and manipulating complex SQL queries, stored procedures, functions and triggers in Oracle, SQL
Server and other databases.
 Collaborated and communicated the results of analysis to the decision makers by presenting actionable insights by using
visualization charts and dashboards in Amazon Quick Sight.
 Developed data warehouse model in snowflake for over 100 datasets using Where cape.
 Proven experience in creating complex dashboards with Tableau focusing on interactive visualizations and data
exploration.
 Worked on various data modelling concepts like star schema, and snowflake schema in the project.
 Integrated AWS SNS and SQS to enable real-time event processing and efficient messaging.
 Implemented AWS Athena for ad-hoc data Asca analysis and querying on data stored in AWS S3.
 Designed and implemented data streaming solutions using AWS Kinesis, enabling real-time data processing and analysis.
Environment: AWS, AWS S3, redshift, EMR, SNS, SQS, Athena, glue, CloudWatch, kinesis, route53, IAM, Sqoop, MYSQL, HDFS,
Apache Spark, Hive, Cloudera, Kafka, Zookeeper, Oozie, PySpark, Ambari, JIRA, IBM Tivoli, control-m, OOZIE, airflow, Teradata,
oracle, SQL

Client: Webmatics Solutions Pvt.Ltd- Hyderabad, IN


Role: Big Data Developer Feb 2018 to Jan 2022
Responsibilities:
• Developed ETL jobs using Spark -Scala to migrate data from Oracle to new MySQL tables.
• Rigorously used Spark -Scala (RRD’s, Data frames, Spark SQL) and Spark - Cassandra -Connector API's for various tasks
(Data migration, Business report generation etc.)
• Developed Spark Streaming application for real time sales analytics.
• Prepared an ETL framework with the help of sqoop, pig and hive to be able to frequently bring in data from the source
and make it available for consumption.
• Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can
be reused across the project.
• Analyzed the source data and handled efficiently by modifying the data types. Used excel sheet, flat files, CSV files to
generated PowerBI ad-hoc reports
• Analyzed the SQL scripts and designed the solution to implement using PySpark
• Extracted the data from other data sources into HDFS using Sqoop
• Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data
into HDFS.
• Extracted the data from MySQL into HDFS using Sqoop
• Implemented automation for deployments by using YAML scripts for massive builds and releases
• Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka and Sqoop.
• Implemented Data classification algorithms using MapReduce design patterns.
• Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of MapReduce
jobs.
• Worked on GIT to maintain source code in Git and GitHub repositories
Environment: Hadoop, Hive, spark, PySpark, Sqoop, Spark SQL, Cassandra, YAML, ETL.
sss
Client: Wipro Technologies – Hyderabad, IN
Role: Hadoop Developer Jan 2017 to Jan 2018
Responsibilities:
• Imported data from MySQL to HDFS on a regular basis using Sqoop for efficient data loading.
• Performed aggregations on large volumes of data using Apache Spark and Scala, and stored the results in the Hive data
warehouse for further analysis.
• Worked extensively with Data Lakes and big data ecosystems, including Hadoop, Spark, Hortonworks, and Cloudera.
• Loaded and transformed structured, semi-structured, and unstructured data sets efficiently.
• Developed Hive queries to analyze data and meet specific business requirements.
• Leveraged HBASE integration with Hive to build HBASE tables in the Analytics Zone.
• Utilized Kafka and Spark Streaming to process streaming data for specific use cases.
• Developed data pipelines using Flume and Sqoop to ingest customer behavioral data into HDFS for analysis.
• Utilized various big data analytic tools, such as Hive and MapReduce, to analyze Hadoop clusters.
• Implemented a data pipeline using Kafka, Spark, and Hive for ingestion, transformation, and analysis of data.
• Wrote Hive queries and used Hive QL to simulate MapReduce functionalities for data analysis and processing.
• Migrated data from RDBMS (Oracle) to Hadoop using Sqoop for efficient data processing.
• Developed custom scripts and tools using Oracle's PL/SQL language to automate data validation, cleansing, and
transformation processes.
• Implemented CI/CD pipelines for building and deploying projects in the Hadoop environment.
• Utilized JIRA for issue and project workflow management.
• Utilized PySpark and Spark SQL for faster testing and processing of data in Spark.
• Used Spark Streaming to process streaming data in batches for efficient batch processing.
• Leveraged Zookeeper to coordinate, synchronize, and serialize servers within clusters.
• Utilized the Oozie workflow engine for job scheduling in Hadoop.
• Utilized PySpark in SparkSQL for data analysis and processing.
• Used Git as a version control tool to maintain the code repository.

Environment: Sqoop, MYSQL, HDFS, Apache Spark Scala, Hive Hadoop, Cloudera, Kafka, MapReduce, Zookeeper, Oozie, Data
Pipelines, RDBMS, Python, PySpark, Ambari, JIRA.

Client: Blueliner Software Services Private Limited- Hyderabad, IN


Role: Data Warehouse Developer Jan 2014 — Dec 2017
Responsibilities:
• Creating jobs, SQL Mail Agent, alerts, and scheduling DTS/SSIS packages for automated processes.
• Managing and updating Erwin models for logical/physical data modeling of Consolidated Data Store (CDS), Actuarial
Data Mart (ADM), and Reference DB to meet user requirements.
• Utilizing TFS for source controlling and tracking environment-specific script deployments.
• Exporting current data models from Erwin to PDF format and publishing them on SharePoint for user access.
• Developing, administering, and managing databases such as Consolidated Data Store, Reference Database, and
Actuarial Data Mart.
• Architected and managed Snowflake data warehouses.
• Writing triggers, stored procedures, and functions using Transact-SQL (T-SQL) and maintaining physical database
structures.
• Deploying scripts in different environments based on Configuration Management and Playbook requirements.
• Creating and managing files and file groups, establishing table/index associations, and optimizing query and
performance tuning.
• Tracking and closing defects using Quality Center for effective issue management.
• Maintaining users, roles, and permissions within the SQL Server environment.

• Snowflake Hands on Essentials -Data Applications


• Snowflake Hands on Essentials – Data Wrehouse

Environment: SQL Server 2008/2012 Enterprise Edition, SSRS, SSIS, T-SQL, Windows Server 2003, Performance Point Server
2007, Oracle 10g, visual Studio 2010.

You might also like