Report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

A

Project Report
on
CLOUD-BASED RECOMMENDATION SYSTEM

Submitted in partial fulfillment of the requirements


for the award of the degree of

Bachelor of Technology
in

Computer Science and Engineering

by
Shyali Narayan (2100971540057)
Anshu Chauhan (2100971540011)
Saurabh Yadav (2100971540049)

Under the Supervision of


Prof. Vijay Prakash

Galgotias College of Engineering & Technology


Greater Noida, Uttar Pradesh
India-201306
Affiliated to

Dr. A.P.J. Abdul Kalam Technical University


Lucknow, Uttar Pradesh,
India-226031
December, 2024
GALGOTIAS COLLEGE OF ENGINEERING & TECHNOLOGY
GREATER NOIDA, UTTAR PRADESH, INDIA- 2 0 1 3 0 6 .

1 CERTIFICATE

This is to certify that the project report entitled “Cloud-Based Medicine Recommendation
System” submitted by Ms. Shyali Narayan (2100971540057), Mr. Anshu
Chauhan(2100971540011), Mr. Saurabh Yadav (2100971540049) to the Galgotias College
of Engineering & Technology, Greater Noida, Utter Pradesh, affiliated to Dr. A.P.J. Abdul
Kalam Technical University Lucknow, Uttar Pradesh in partial fulfillment for the award of
Degree of Bachelor of Technology in Computer Science & Engineering is a bonafide record
of the project work carried out by them under my supervision during the year 2024-2025.

Name (Mr. Vijay Prakash) Dr. Pushpa Chaudhary


Proffessor Professor and Head
Dept. of CSE Dept. of CSE

i
GALGOTIAS COLLEGE OF ENGINEERING & TECHNOLOGY
GREATER NOIDA, UTTAR PRADESH, INDIA- 2 0 1 3 0 6 .

2 ACKNOWLEDGEMENT
We have taken efforts in this project. However, it would not have been possible without
the kind support and help of many individuals and organizations. We would like to
extend my sincere thanks to all of them.

We are highly indebted to Mr. Vijay Prakash for his guidance and constant
supervision. Also, we are highly thankful to them for providing the necessary
information regarding the project & also for their support in completing the project.

We are extremely indebted to Dr. Pushpa Chaudhary, HOD, Department of Computer


Science and Engineering, GCET, and Mr. Manish Kumar Sharma, Project Coordinator,
Department of Computer Science and Engineering, GCET for their valuable
suggestions and constant support throughout my project tenure. We would also like to
express our sincere thanks to all faculty and staff members of the Department of
Computer Science and Engineering, GCET for their support in completing this project
on time.

We also express gratitude towards our parents for their kind co-operation and
encouragement which helped me in the completion of this project. Our thanks and
appreciation also go to our friends in developing the project and all the people who have
willingly helped me out with their abilities.

(Shyali Narayan)

(Anshu Chauhan)

(Saurabh Yadav)

ii
3 ABSTRACT

The increasing demand for reliable alternatives to prescribed medications, driven by


issues like drug shortages, side effects, and patient-specific conditions, necessitates the
development of an intelligent system that can recommend suitable substitutes. This
research proposes a Cloud-based Medicine Recommendation System that utilizes
cosine similarity to suggest alternative medications based on a patient's symptoms and
the effects of various drugs. By vectorizing the symptoms and medication data, the
system calculates similarity scores to identify the most relevant alternatives. Hosted on
the cloud, the system offers scalability, accessibility, and real-time processing, allowing
healthcare professionals and patients to access it anytime and anywhere. The system
features a comprehensive drug database and incorporates machine learning models for
personalized recommendations, considering factors such as dosage, form, and
contraindications. In medical emergencies or when physicians and prescribed
medications are unavailable, the system serves as a valuable resource for quick and
informed decision-making. By recommending effective alternatives with minimal side
effects, the system helps reduce the risk of adverse drug reactions, improves treatment
outcomes, and enhances patient safety. Furthermore, it reduces healthcare professionals'
workload by automating the identification of alternative medicines. This cloud-based
solution has the potential to significantly improve patient care, streamline healthcare
processes, and ensure treatment continuity. Future developments will focus on expanding
the database, improving recommendation accuracy, and integrating real-time pharmacy
availability data to enhance the system’s effectiveness and global accessibility.

KEYWORDS: Cloud-Based, Medicine Recommendation System, Collaborative Filtering, Content-


Based Algorithms, Healthcare Data, Patient Care.

3-iii
CONTENTS

Title Page

CERTIFICATE i
ACKNOWLEDGEMENT ii
ABSTRACT iii
CONTENTS iv
LIST OF TABLES v
LIST OF FIGURES vi
NOMENCLATURE vii
ABBREVIATIONS viii

CHAPTER 1: INTRODUCTION

1.1 Flow Through Inlet Manifold 3


1.2 Simulation Using Computational Fluid Dynamics 5

CHAPTER 2: LITERATURE REVIEW

2.1 Introduction 8
2.2 Diesel Engine Flow Modeling 9
2.2.1 Effect on Engine Configurations 9
2.3 Diesel Combustion Model 25

CHAPTER 3: PROBLEM FORMULATION

CHAPTER 4: PROPOSED WORK

CHAPTER 5: SYSTEM DESIGN

CHAPTER 6: IMPLEMENTATION

CHAPTER 7: RESULT ANALYSIS

CHAPTER 8: CONCLUSION, LIMITATION, AND FUTURE SCOPE

REFERENCE 50
LIST OF PUBLICATIONS 55

CONTRIBUTION OF PROJECT 55

3-iv
List of Tables

Table Title Page

3.1 Values Assigned to Standard k-ε Turbulence Model Coefficients 55

3.2 Values Assigned to RNG k-ε Turbulence Model Coefficients 57

4.1 Engine Specifications 90

4.2 Geometrical Details of the Injector 90

4.3 Boundary and Initial Conditions 94

4.4 Grid Independence Study 99

v
LIST OF FIGURES

Figure Title Page

3.1 Lagrangian Droplet Motion 70

4.1 Vertical Manifold 95

4.2 20O Bend Manifold 95

4.3 90O Bend Manifold 95

4.4 Spiral Manifold 95

4.5 Spiral Manifold Configuration ( θ = 225o) 96

4.6 Spiral Manifold with Different Flow Entry Angles (20O, 32.5O and 45O) 96

4.7 Helical Manifold (Helical Angles 30O, 35O, 40O, 45O and 50O) 97

4.8 Spiral Manifold 97

4.9 Helical Manifold 97

4.10 Helical-Spiral Manifold 97

4.11 Grid Independent SR for Validation Model 98

4.12 Grid Independent TKE for Validation Model 99

vi
NOMENCLATURE
English Symbols

A Pre-exponential constant

A d Droplet cross-sectional area, m2

A s Droplet surface area, m2

A0 2
Nozzle cross sectional area. m

Cp Specific heat,J/kg-K

C am Virtual mass coefficient

c Reaction progress variable

cd Coefficient of discharge of nozzle

c p ,d Droplet specific heat

Dd Instantaneous droplet diameter, m

Dm Vapour diffusivity

vii
ABBREVIATIONS

ATDC After Top Dead Center


BDC Bottom Dead Center
BTDC Before Top Dead Center
CA Crank Angle
CAD Computer Aided Design
CCS Combined Charging System
CFD Computational Fluid Dynamics
CO Carbon Monoxide
CTC Characteristic–Time Combustion
DI Direct Injection
DME Dimethyl Ether
DNS Direct Numerical Simulations
EGR Exhaust Gas Re- Circulation
FIE Fuel Injection Equipments
HC Hydrocarbon
HWA Hot Wire Anemometer
IC Internal Combustion

viii
1. INTRODUCTION
In modern healthcare, the vast array of treatment options—ranging from conventional
medications to complementary therapies like acupuncture, homeopathy, and herbal
remedies—often leaves patients and physicians facing the challenge of selecting the
most effective treatment for a given condition. Alternative medicine, when used as a
supplement to traditional therapies, has gained popularity for its ability to offer
personalized care with potentially fewer side effects. However, determining the best
course of action among countless options requires advanced analytical tools and
methodologies. A cloud-based medicine recommendation system offers a powerful
solution by combining the scalability of cloud computing with advanced algorithms like
cosine similarity to deliver precise, personalized, and evidence-based treatment
suggestions. Cosine similarity is a mathematical technique used to measure the
similarity between two datasets by analyzing the cosine of the angle between them. In
the context of medicine recommendations, this approach enables the comparison of
patient symptoms with drug profiles, helping identify the most suitable treatments for
individual health conditions. By integrating cosine similarity into a cloud-based
platform, the system can process vast amounts of patient data and alternative treatment
options in real-time. This allows physicians to make informed decisions about treatment
plans, especially when conventional methods have proven ineffective. The use of cosine
similarity provides numerous benefits, such as improved accuracy in identifying
effective drugs, reduced adverse effects, and the ability to discover new treatment
insights through large-scale data analysis. Alternative medications often have fewer side
effects than conventional drugs, and by leveraging cosine similarity, doctors can
recommend treatments that are not only effective but also safe and compatible with a
patient's medical history. Challenges like data quality and the underutilization of
alternative treatments must be addressed to fully realize the potential of cosine similarity
in medicine recommendations. High-quality, comprehensive datasets are essential for
producing reliable results, and ensuring the safety and effectiveness of suggested
treatments is paramount. In conclusion, a cloud-based medicine recommendation
system, powered by cosine similarity, represents a transformative step in personalized
healthcare. this approach has the potential to revolutionize the integration of alternative
medicine into mainstream healthcare.
2. LITERATURE REVIEW
Using systems that propose recommendations based on user behavior and patterns, it
may alleviate the millenium problem of Data overload and get insight for improved
analytics. This is also based on the reviews of those products or items. Building a
model for proposing medical items or treatments to patients with comparable
comorbidities is the subject of this study. Note that here there are five subsections 2.1,
subsection 2.2, subsection 2.3, subsection 2.4, and Subsection 2.5.

2.1 Research into the use of recommender systems in various industries.

Recommender systems have proven to be an efficient method of reducing information


overload in the age of ever-increasing online data. Since recommender technologies
are largely used in many web applications, their potential to alleviate many of the
difficulties associated with over-choice cannot be understated. Many fields of
research, including computer vision and NLP, have recently seen a surge in interest in
deep learning because of its exceptional performance as well as its ability to learn
feature representations from scratch. Research on information retrieval and
recommender systems has lately demonstrated the usefulness of deep learning. The
field of deep learning in recommender systems appears to be flourishing. In (Zhang et
al.; 2019) A taxonomy of deep learning based recommendation models, as well as a
complete description of the state-of-the-art, are provided and devised. Finally, they
present new perspectives on this exciting new development in the field by expanding
on existing trends. Here, the author provides a comprehensive evaluation of the most
notable papers on deep learning-based recommender systems that have been published
to date. The authors also discuss some of the most important issues in the field, as well
as anticipated future developments. Deep learning and recommender systems have
both been hot research subjects for decades. (Stark et al.; 2019) describes that a drug
recommendation system can aid doctors and nurses in prescribing the proper
medication. Thanks to modern technology, it is feasible to create recommendation
systems that lead to shorter decisions. Several contemporary pharmaceutical
recommendation systems use unique algorithms. As a result, it’s vital to understand
how these systems work now, their benefits and drawbacks, and where more research
is needed. This study examines and compares existing methods for medicine
recommendation systems and provides future research targets.

This study provided a systematic literature evaluation of medicine recommendation


engines. They searched six databases for 13 research that met our strict criteria.
Ontology-based and rule-based techniques dominate machine learning and data mining
research. The research evaluated parameters

such as data storage, interface, data collection, data preparation, and


platform/technology/algorithm. Non-disease research lacked data storage, interface, data
collection, pre-processing, and custom algorithms. Music recommendation system using the
CNN module is also discussed in (Elbir et al.; 2018) where the authors discuss that after
extracting acoustic data from music, machine-learning techniques were utilized to classify
music genres and create music suggestions. Convolutional neural networks were used to classify
and recommend music genres, as well as compare the findings. This project uses convolutional
neural networks and digital signal processing to categorize and recommend songs. The study
created a service that offers music based on user requests after analyzing how features are
obtained. Initially, features were extracted using digital signal processing, and then a CNN was
taught to do so. The acoustic qualities of songs are then used to determine the best suggestion
algorithm. SVM outperforms other approaches in classification accuracy. As a result, changing
the window size or window type had relatively negligible effects on performance.

2.2 Content-Based Filtering in Recommender Systems Research

There are different kind of Filtering used in Recommender systems and the most used ones are
content-based filtering and collaborative Filtering. Based on the user’s preferences and choices,
the Content-Based (CB) technique suggests things or products to the user. In order to suggest
new things with similar attributes, content-based filtering operates on the matching level of the
attributes of content. In (Pal et al.; 2017) and (Zhao; 2019) the authors concludes that as an
alternative to traditional content-based filtering, this work presents a simple method for finding
correlations between two features using set intersection and predicting the similarities between
two items for recommendations using this method. Naive Bayes and other text classifiers have
been utilized in contentbased algorithms in the past. In addition, the algorithm is tested and
compared to the PureCF and SVD algorithms. After evaluation, the MAE values that are created
allow for accurate comparisons. A larger dataset may yield different results, even if Hybrid
content recommendation had superior MAE values and increased the dataset’s sparsity
between 1 percent and 2 percent.Figure 2 shows that at sparsity levels of 98.5percent,
there is a little difference between the outcomes of Pure CF and the authors’technique,
but when the sparsity level is increased to 99percent, the algorithm out-performs the
Pure CF.

Figure 2: Comparison of MAE values with increasing sparsity for hybrid CF and Pure
CF. (Pal et al.; 2017)

2.3 Study of Recommender System Collaborative Filtering

Collaborative filtering is the most common way to survive more advanced


recommendation engines. Netflix, Amazon, Facebook, and other major digital
companies employ this type of recommendation system in a wide range of industries.
In the paper (Gurbanov and Ricci; 2017), the authors discusses about the use of
technique for recommendation system and limitation faced while using Collaborative
filtering. Furthermore, the paper combines Sequence Mining (SM) and Collaborative
Filtering (CF) to forecast user actions. When a target user performs an unobserved
action on an object, the proposed model predicts based on the users’ completed actions
at that time. The model does not use any user or item information. A large real-world
dataset shows that hybrid models outperform standalone SM and CF. Any SM and CF
model can be subcomponents in the suggested hybrid model. Our technique considers
time delays between actions and their sequential order. Another issue is that there is no
method to harness the interaction effect or the impact of one action on another. As a
result, the SM component’s probability computations can be wrong for odd sequences.
Finally, the proposed hybrid system has the same cold-start issue as any other CF
system. The authors want to compare the proposed hybrid model to FPMC and CBCF
and test it on more datasets in the future. They also want to include information about
user and item attributes in the model to help with the cold-start issue. Similalry, there
are researchers who concludes that having a large dataset can be sometime difficult for
collaborative filtering approach for a novel solution but hybrid model with matrix
factorization can resolve the problem and they get better results. In (Dong et al.; 2017)
the authors illustrates that in recommender systems, collaborative filtering (CF) is a
commonly employed method for resolving a wide range of real-world issues.Users’
preferences for products are encoded in a matrix known as a user-item matrix in
traditional CF-based methods for learning to provide recommendations. Because rating
matrices in real-world applications are often sparse, CF-based algorithms’
recommendation performance degrades dramatically. Data sparsity and cold start issues
can be addressed with enhanced CF methods that make use of more side information.
However, the sparseness of the user-item matrix and the side information may mean
that the learnt latent components are ineffective. A hybrid model that uses side
information and collaborative filtering from the rating matrix to jointly perform deep
user and item latent factor learning is proposed to address this issue by the authors, who
draw on developments in learning effective representations in deep learning. On three
different datasets, extensive testing has shown that the hybrid model is superior than
other methods at leveraging side information, and this leads to better outcomes overall.
(Fararni et al.; 2021) employs Hadoop-based infrastructure and the MapReduce
algorithm to solve the problem of dealing with a large dataset in CF, which is frequently
encountered. The author in (Shaikh; 2020) develops a TOP-N Nearest Neighbor Based
Movie Recommender System. The research includes evaluating machine learning
models. Conclusion: Cross-validation of machine learning models using K-fold and
LOO (Leave One Out). RMSE and MAE are used to assess accuracy (Mean Absolute
Error). For RMSE and MAE, KNNBaseline SVD, KNNWithMean(Ib) and
KNNWithMean(Ib) came second (Ub). Using SVDpp in Matrix factorization and
KNNBaseline Content-based filtering, K-fold CV folds are 2 percent more accurate than
LOOCV folds. Sprawl is a term used to describe data scarcity. There will be further
iterations on many recommendation machine learning models (such as restricted
Boltzmann machines and auto encoders). SageMake can use big data from AWS cloud
services.

2.4 Recommender System Research Using Deep Learning

Deep neural networks and their implementation over recommender systems have
expanded rapidly in recent years. This field is exploding with new ideas and methods.
We cannot overestimate the value of recommender systems given their extensive use in
online applications and capacity to solve various issues connected to over-choice. Deep
learning has recently gained popularity in numerous academic disciplines, including
computer vision and natural language processing, because to its superior performance
and the ability to build feature representations from scratch. Deep learning has lately
shown its efficiency in information retrieval and recommender systems research. Deep
learning in recommender systems is booming. The author in (Wang et al.; 2014)
suggests people regularly use CF when recommending. This method is widely used in
recommender systems. Users’ ratings are the only source of CF information. Because
the evaluations are often low, CF-based approaches perform poorly. Auxiliary data,
such as item content data, can aid. CTR combines two components that learn from
numerous sources of data. However, scarce auxiliary data may render CTR’s latent
representation ineffective. They propose a collaborative deep learning hierarchical
Bayesian model that extends recent deep learning successes from single identifier inputs
to CF-based inputs (CDL). CDL can increase understanding on three real-world
datasets. They demonstrated cutting-edge content information performance by
combining deep representation learning and collaborative filtering. First hierarchical
Bayesian model that combines deep learning with RS. The researchers also developed
a sampling-based Bayesian back-propagation method for CDL. More powerful
alternatives may replace the bag-of-words model. CDL also supports deep learning
models besides SDAE. Convolutional neural networks, for example, can consider word
order. Further improving performance with deep learning models. In (Tran et al.; 2021)
authors suggests nowadays, a large amount of clinical data is scattered across various
websites, making it difficult for users to find useful information. The abundance of
medical information (e.g., on drugs, medical tests, and treatment suggestions) has made
it difficult for doctors to make patient-centered decisions. These issues highlight the
need for recommender systems in healthcare to help both end-users and medical
professionals make better health decisions. They review existing research on healthcare
recommender systems in this article. Unlike other related overview papers, ours delves
into recommendation scenarios and approaches. Food, drug, health status prediction,
healthcare service, and healthcare professional recommendations are examples. The
authors also develop working examples to better understand recommendation
algorithms. Finally, the authors discuss future challenges in developing healthcare
recommender systems.
2.5 Recommender System Matrix Factorization Research

(Bhavana et al.; 2019) explains most recommender systems employ Matrix


Factorization to reduce the number of dimensions in the underlying data set.When it
comes to unsupervised machine learning, it uses Principle Component Analysis (PCA).
Cold start (meaning a new user has no preferences or reviews to compare or propose the
things)
and very sparse data are two of the key challenges and limits of any recommender
system (which means items have no reviews or ratings from the user to build a
correlation matrix between user-item). Our research will be based on the above-
mentioned issue. In the paper (Guan et al.; 2017) the author proposes that the algorithms
are gaining popularity due to their promising performance on recommender systems.
Some algorithms suffer from data sparsity. Active learning algorithms work well in
recommender systems because they ask users to rate items as they enter the system. This
research proposes an enhanced SVD (ESVD) matrix factorization model that combines
standard matrix factorization methods with active learning-inspired ratings completion.
A link between prediction accuracy and matrix density is also constructed to further
investigate its potentials. In order to increase forecast accuracy, the authors suggest the
Multi-layer ESVD (MESVD). The Item-wise ESVD (IESVD) and User-wise ESVD
(UESVD) are provided to manage imbalanced datasets with considerably more users
than items. The approaches are tested on the Netflix and Movielens datasets. Comparing
them to classic matrix factorization and active learning approaches, the results show that
they are more accurate and efficient. Most recommender systems struggle with a
shortage of data. Additionally, they suggest using classic matrix factorization methods
to best estimate a matrix with missing data. In particular, the overall EVSD model
proposes high density through popular goods and active users, inspired by active
learning. However, as all ratings are added simultaneously (ESVD, IESVD, and
UESVD) or iteratively (a preset number of repetitions), the suggested methods
considerably minimize the computational cost (MESVD). In (Bodhankar et al.; 2019),
the author explains the methods used to overcome the difficulties of developing a
recommendation system based on a social network with user interest. In addition, this
overview study presented many methods for constructing the recommendation system.
Based on user location, user interest, and interpersonal interest in the social network,
the described method is able to identify users. The method described here utilizes social
matrix factorization and base matrix improved recommendation results to arrive at its
conclusions about what to recommend.However, factor analysis presupposes that there
is a linear association between factors and the variables that were calculating
correlations. This method has its drawbacks. A Recommender system is extremely
important for the healthcare business and will immediately assist the stakeholders in
making more informed judgments in recommending drugs to the patient, as evidenced
by this extensive research. For data preparation, content-boosted collaborative filtering
might be employed. Combining these techniques with CNN and Matrix factorization
can help overcome the drawbacks of cold start data and sparse datasets.

3. PROBLEM FORMULATION
An alternating medicine recommendation system is essential to ensure continuity of care
when primary medicines are unavailable, ineffective, or contraindicated. Such systems
aim to provide safe, personalized, and efficient alternatives to prescribed medications
based on pharmacological properties and patient-specific factors.

3.1 NEED FOR RESEARCH


Research into cloud-based medicine recommendation systems using machine learning
(ML) is a growing field with transformative potential for personalized healthcare and
efficient drug delivery.

3.1.1 INTRODUCTION AND BACKGROUND


3.1.1.1 Overview of Medicine Recommendation Systems
• Significance: Personalized medicine recommendation systems aim to suggest
effective and safe medication tailored to individual patients, addressing
variations in medical conditions, age, allergies, and comorbidities.
• Challenges: Current systems face limitations like insufficient data integration,
poor accessibility, and inefficiency in addressing diverse patient needs.
3.1.1.2 Current Challenges in Traditional Systems
• Reliance on manual prescriptions, which may lead to errors or inefficiencies.
• Lack of integrated platforms to access patient history, drug interactions, or side
effects.
• Fragmentation of data between pharmacies, hospitals, and healthcare providers.

3.1.2 MACHINE LEARNING FUNDAMENTALS


3.1.2.1. ML Basics
• Key Concepts: Familiarization with supervised and unsupervised learning,
collaborative filtering, content-based filtering, and hybrid approaches.
• Model Evaluation: Understanding metrics like precision, recall, F1-score, and
Mean Absolute Error (MAE).
3.1.2.2. Algorithms
• Focus on algorithms such as Decision Trees, Support Vector Machines (SVM),
and deep learning models like Recurrent Neural Networks (RNNs) or
Transformer-based architectures for personalized recommendations.

3.1.3. DATA COLLECTION AND PREPROCESSING


3.1.3.1. Datasets
• Integration of datasets like electronic health records (EHR), prescription
databases, and patient feedback systems.
3.1.3.2. Preprocessing
• Techniques for handling missing values, normalizing data, and ensuring privacy
and security through anonymization.

3.1.4. FEATURE EXTRACTION AND SELECTION


3.1.4.1. Key Features
• Extract features such as patient demographics, medical history, symptoms,
allergies, and prescription trends.
3.1.4.2. Dimensionality Reduction
• Use Principal Component Analysis (PCA) and feature selection algorithms to
simplify large-scale data without compromising recommendation accuracy.

3.1.5. MODEL DEVELOPMENT


3.1.5.1. Model Selection
• Compare models like collaborative filtering for similar user behavior or deep
neural networks for understanding complex patient-drug relationships.
3.1.5.2. Training and Validation
• Implement cross-validation techniques and hyperparameter tuning to ensure
robust performance.

3.2 SIGNIFICANCE OF PROPOSED RESEARCH

3.2.1. IMPROVED HEALTHCARE DELIVERY


• Personalized Recommendations: Tailoring drug suggestions to individual
patient profiles for optimized treatment.
• Safety Assurance: Proactively flagging potential drug interactions or
contraindications.

3.2.2. EFFICIENCY AND AUTOMATION


• Time-Saving: Reducing the time required for healthcare providers to prescribe
the most appropriate medication.
• Scalability: Enabling systems to support a growing number of patients and
prescriptions globally.

3.2.3. PATIENT EMPOWERMENT


• Informed Decisions: Empowering patients with transparent insights into
medicine choices.
• Accessibility: Providing recommendations remotely via cloud platforms,
improving accessibility in underserved areas.

3.2.4. ADVANCEMENTS IN MEDICAL RESEARCH


• Analyzing aggregated data to identify drug efficacy trends and uncover insights
into rare conditions.
3.3 POSSIBLE EFFECT ON HEALTHCARE SYSTEMS AND PATIENT
CARE

3.3.1 Enhanced Patient Outcomes


• Real-time recommendations ensure timely medication, improving recovery rates

and patient satisfaction.

3.3.2 Streamlined Healthcare Systems


• Reducing the administrative burden on providers, allowing them to focus on

patient care.

3.3.3 Cost Efficiency


• By optimizing prescriptions and reducing trial-and-error methods, costs for

patients and healthcare systems can be significantly lowered.

4. PROPOSED WORK
The focus of the research project is to build a recommender system which suggests or
recommends the OTC drugs based on the symptoms or disease of the patient. CRISPDM
approach is used for initial exploration of Dataset and basic techniques of Data Analysis
are used.Figure 3 shows the CRISP-DM (Cross Industry Standard Process for Data
Mining) approach followed as a Data Analytics methodology.

4.1 Stakeholder / Business Approach

Recommender system are evolving every other day and its significance in Healthcare
industry is booming after the pandemic of 2020. Researchers show the usefulness of this
technology in paper (Khoie et al.; 2017) where they discuss that healthcare organizations
survey patients and staff about their experiences. Hospital administrators frequently use
graphs and charts to provide patient satisfaction data. A deeper data analysis to find
crucial patient satisfaction aspects is rarely followed by such visualization. Researchers
present an unsupervised method for analyzing patient satisfaction survey data.
Identification of similar patient communities and the primary factors contributing to their
satisfaction. It will help hospitals identify patient groups or clusters most likely to be
satisfied and take proactive steps to improve patient care. Finding links between patient
demographics and satisfaction indicators using unsupervised exploratory data analysis.
Figure 3: Methodology flow of CRISP-DM

4.2 Data Extraction

This dataset file was extracted from the UCI Machine Learning Repository2 and contains
little over 200,000 observations with distinctive properties of OTC (Over-the-Counter)
drugs. It was created by UCI Machine Learning Repository. This is a multivariate data
set with six attributes in total. The description is about the drug and users/patients who
have provided reviews and ratings out of ten stars, which represent the overall satisfaction
of the patient with the drug. In addition, by vectorizing the language, it is possible to
perform sentiment analysis on the review attribute of a certain Drug. Individually
identifiable information (PII) is not contained within this data, which is offered as
opensource. First and foremost, cleaning up the data is essential. This includes removing
null values and removing any columns that are no longer needed. Starting with the data
analysis, the plotting of the top 10 highly rated medicines for each symptom is possible.
In Table 1 shows details about drugs dataset.

4.3 Data Pre-processing and Analysis

The CSV data file used for the project is kept at Microsoft Azure blob Storage and can
be fetched from cloud whenever required. The Analysis of Dataset consists of Data
Cleaning by removing na values and dropping duplicate rows. Doing Factor Analysis to
determine which feature can be used for model building and accordingly modifying few
features. Furthermore, using attributes Rating and Useful count, the most reviewed
medicines are sorted out and graph is plotted. Using Python regex library some unwanted
characters were removed from Reviews and fed to word vectorizer to calculate score of
individual

Attributes Description Values


UniqueID Drugs’s identification number
Drug Name OTC drug names
Review Patient’s Symptoms or disease
Rating Given by previous patients
Users’ rating
Upvote given to specific Drugs
word. For initial data visualition , Top10 drugs as per most reviewed by users, the graph
got plotted.

4.4 Recommender system Methodology


4.4.1 Matrix Factorization
In this project , Matrix factorization played very crucial role for feature decomposition
as per the rules of PCA (Principle Component Analysis) which is an unsupervised
machine learing algorithm. After Feature extraction and factor analysis the dataset was
fed for Matrix factorization and word vectorization.

4.4.2 Singular Vector Decomposition


The reviews were converted into vector and then using NLTK library were used for
stemming the words to its root word in english language. The SVD model was applied
using CountVectorizer class of sckitlearn library and further fed for building a vector
matrix.

4.5 Similarity Metrices used in Recommender System

4.5.1 Pearson Correlation


The Pearson correlation coefficient is quite sensitive to data values that are out of the
ordinary. A single value that is significantly different from the other values in a data set
might have a significant impact on the value of the correlation coefficient. A low
Pearson correlation coefficient does not necessarily imply that there is no relationship
between the variables in question. It is possible that the variables have a nonlinear
connection.

4.5.2 Cosine Similarity


Using this similarity metrics, the similarity between vectorized words and most
frequently used words were calculated and sorted in descending order. Since, jacard’s
distance is inversly propotional to similarity, which means more the distance between
two vectors lesser the similarity between them. This is the basic algorithm used in
finding the recommendations.
4.5.3 Spearman’s Correlation
In the case of Pearson’s correlation and Spearman’s correlation, the Pearson correlation
is equal to the Spearman correlation between the rank values of those two variables
(whether linear or not). There are no repeated data values when the variables are perfect
monotone functions of one another, hence the Spearman correlation is always between
-1 and +1.The author of (Akoglu; 2018) has also discussed about Spearman’s correlation
coefficient and its significance.

4.5.4 Kendall Tau’s Correlation


It’s also called Kendall’s tau coefficient. Kendall’s Tau and Spearman’s rank
correlation coefficients employ data ranks. If your data violates one or more hypotheses,
Kendall rank correlation (non-parametric) might be used instead of Pearson’s
correlation. For samples with many tied rankings, non-parametric Spearman correlation
may be an option. Using Kendall rank correlation, two sets of data can be ordered
similarly. Rather than looking at individual observations, Kendall’s method looks at
patterns of concordance or discordance between pairs of observations.

4.5.5 Jaccard’s Similarity


The Jaccard Similarity algorithm can be used to determine how similar two objects are.
The computed similarity might then be used into a recommendation query. As an
example, you can use the Jaccard Similarity algorithm to display the products that were
purchased by comparable customers, in terms of the previous products that they have
purchased As also shown in (Fletcher and Islam; 2018)

4.6 Distance Metrices used in Recommender system

4.6.1 Euclidean Distance


Mathematics uses a line segment as a measure of how far two points are from each other
in Euclidean space. This distance may be determined using the Pythagorean theorem,
hence it is sometimes referred to as the Pythagorean distance. When two medications
are compared, the distance between them is calculated, and the similarity between them
is inversely proportional to this distance. 3.6.2 Manhattan Distance Real-valued vectors
can be measured using the Manhattan Distance (also known as the Taxicab Distance or
City Block Distance). A chessboard or a city block is a good example of a vector that
may be used to describe an object on a consistent grid. Intuition about what the metric
calculates: the quickest path a taxicab would travel between city blocks is reflected in
the name (coordinates on the grid). In an integer feature space, it may make sense to
calculate Manhattan distance rather than Euclidean distance for two vectors.

5. System Design
Our recommendation engine’s proposed architecture will be discussed in this
section. Both content-based and user-based collaborative filtering are used in a
Recommendation system’s foundation. The basic types of recommender system
is shown in below Figure 4. The implementation of our Recommendation engine
techniques was made possible with help of these designs’ specifications. It will
be discussed in the following section 5

Types of Recommender System

5.1 Content Based Filtering Recommendation System


Process followed in content based filtering

In this project content based filtering recommender system was used by matching the
attributes of a drug and review given by different patient to that drug as shown in
Figure 5. The Drug also matches its corresponding disease or symptoms given in the 12
dataset and then further by using Cosine similarity score the relation is calculated.
Word to Vector using CountVectorization is done in order to have vectors for each
words.The Formula used in calculation of similarity which was directly import from
python library is shown as in Figure

Formula for cosine similarity calculation

5.2 Content Boosted Collaborative Filtering (CBCF)- hybrid


method
High sparsity and cold start in data can be avoided using a hybrid method that com
bines Content Boosted with Collaborative Filtering. Figure 7 shows a flowchart
showing how the symptoms of a patient and the reviews of other patients on a
specific drug for that symptom can be combined and weighted sums calculated by CB
and CF algorithms to predict recommendations. User-item rating matrix
decomposition using the Singular Value Decomposer algorithm will be performed
using a Convolution Neural Network. SVD is a Matrix Factorization technique based on
the unsupervised learning PCA prin ciple.

Diagram of Hybrid Method

You might also like