Working With Databricks Tables, Databricks File System (DBFS) Etc
Working With Databricks Tables, Databricks File System (DBFS) Etc
Working With Databricks Tables, Databricks File System (DBFS) Etc
Business Deliverable - Name of the application as defined in the AGGRIFY database. A single Business Deliverable can have
multiple SLT IDs. This is to allow the application team to group their tasks and batches under separate but unique SLT IDs.
However, to allow grouping on the report, the teams can use the same business deliverable name for all their SLT IDS.
Environment: AWS, EMR, EC2, Hadoop 2.7.3, Spark3.3, Python 3.8, Open-Source Delta Lake, Kafka, Oracle, VS code.
Description: It is a digital platform that provides health plans to have a better understanding of their population, best care, and
reduced cost. It is a simple-to-use product used to analyze, monitor, intervene, and improve quality of care management.
Description: The Rx Surveillance project monitors invoice claims to ensure they are correctly processed based on the set of criteria or
rules, thousands of claims everyday are identified as outliers. Businesses need to verify that each claim has been adjudicated properly.
This tool works as a single repository of all the outliers claims and allows users to filter and writeback comments. Writeback allows
users to add comments to a claim or mass update to multiple claims. Comment updates are then saved in real time to a database.
Description: NSP (Network Service Personalization) is one of the critical system applications in the Network Personalization program
that creates an interface that is specific to the needs of the Network Repair Bureau (NRB) technicians and ties into the functions they
perform day-to-day.
NSP leverages scoring data produced by Network Data Lake (NDL a Hadoop Platform) based on specific Businesses identified Key
Performance Indicators (KPIs) and presents all the customer experience in form of Interactive Grafana dashboard. NSP also combines
NRB ticket information, Provisioning Data, and record details in a single pane of glass used for network outage troubleshooting. It
includes automation of process streaming and removing unnecessary troubleshooting steps and downstream communication with other
applications to prevent redundant tickets from reaching the NRB.
The main objective of this project is to achieve better control over automation of Tickets/Customer Experience.
Environment: Hadoop 2.8, HDFS, Pig 0.14.0, Hive 1.2, Oozie, Spark 2.4, Scala 2.11, GIT HUB, JIRA, Screwdriver, IntelliJ, HUE,
Jenkins
Description: Aetna is a US based health care company, which sells traditional, and consumer directed health care insurance plans and
related services, such as medical, pharmaceutical, dental, behavioral health, long-term care, and disability plans. On average Aetna
receives 1 million claims each day. The sheer number of providers, members and plan types makes the pricing of these claims
incredibly complex. Through misinterpretation of provider contracts and human errors a small number of claims are paid improperly.
As part of the Data Science team all the data from the critical domains like Aetna Medicare, Traditional Group membership, Member,
Plan, Claim, Provider are migrated to Hadoop Environment. All the demographic information is moved from MySQL to Hadoop and
the analysis is done on the data. Also, the claims data will be moved from MQ to Hadoop and after processing the claims sending the
response back to MQ and build history to track all the changes corresponding to the processed claims.
Environment: Hadoop 2.7, HDFS, Pig 0.14.0, Hive 0.13.0, Sqoop, Flume, Apache NIFI, Oozie, GIT, JIRA, Pyspark, H2o, Netezza,
MY SQL, Aginity.
Description: “iTunes OPS Reporting” is a near real time data warehouse and reporting solution for iTunes Online Store and acts as a
reporting system for external and operational reporting needs. It also publishes data to downstream systems like Piano (for Label
Reporting) and ICA (for Business Objects Reporting, Campaign List Pull and Analytics). De-normalized data is used for publishing
various reports to the users. In addition, this project caters to the need of the ITS (iTunes Store) Business user groups. A lot of
complex analytical expertise is required which involves a lot of domain knowledge, detailed understanding of the features of iTunes,
its data flow and measuring the accuracy of the system in place.
Environment: Hadoop 2.x7, HDFS, Hive 0.13.0, Oozie, GIT, JIRA, Java, Teradata