Report
Report
Report
Project Report
on
CLOUD-BASED RECOMMENDATION SYSTEM
Bachelor of Technology
in
by
Shyali Narayan (2100971540057)
Anshu Chauhan (2100971540011)
Saurabh Yadav (2100971540049)
1 CERTIFICATE
This is to certify that the project report entitled “Cloud-Based Medicine Recommendation
System” submitted by Ms. Shyali Narayan (2100971540057), Mr. Anshu
Chauhan(2100971540011), Mr. Saurabh Yadav (2100971540049) to the Galgotias College
of Engineering & Technology, Greater Noida, Utter Pradesh, affiliated to Dr. A.P.J. Abdul
Kalam Technical University Lucknow, Uttar Pradesh in partial fulfillment for the award of
Degree of Bachelor of Technology in Computer Science & Engineering is a bonafide record
of the project work carried out by them under my supervision during the year 2024-2025.
i
GALGOTIAS COLLEGE OF ENGINEERING & TECHNOLOGY
GREATER NOIDA, UTTAR PRADESH, INDIA- 2 0 1 3 0 6 .
2 ACKNOWLEDGEMENT
We have taken efforts in this project. However, it would not have been possible without
the kind support and help of many individuals and organizations. We would like to
extend my sincere thanks to all of them.
We are highly indebted to Mr. Vijay Prakash for his guidance and constant
supervision. Also, we are highly thankful to them for providing the necessary
information regarding the project & also for their support in completing the project.
We also express gratitude towards our parents for their kind co-operation and
encouragement which helped me in the completion of this project. Our thanks and
appreciation also go to our friends in developing the project and all the people who have
willingly helped me out with their abilities.
(Shyali Narayan)
(Anshu Chauhan)
(Saurabh Yadav)
ii
3 ABSTRACT
3-iii
CONTENTS
Title Page
CERTIFICATE i
ACKNOWLEDGEMENT ii
ABSTRACT iii
CONTENTS iv
LIST OF TABLES v
LIST OF FIGURES vi
NOMENCLATURE vii
ABBREVIATIONS viii
CHAPTER 1: INTRODUCTION
2.1 Introduction 8
2.2 Diesel Engine Flow Modeling 9
2.2.1 Effect on Engine Configurations 9
2.3 Diesel Combustion Model 25
CHAPTER 6: IMPLEMENTATION
REFERENCE 50
LIST OF PUBLICATIONS 55
CONTRIBUTION OF PROJECT 55
3-iv
List of Tables
v
LIST OF FIGURES
4.6 Spiral Manifold with Different Flow Entry Angles (20O, 32.5O and 45O) 96
4.7 Helical Manifold (Helical Angles 30O, 35O, 40O, 45O and 50O) 97
vi
NOMENCLATURE
English Symbols
A Pre-exponential constant
A0 2
Nozzle cross sectional area. m
Cp Specific heat,J/kg-K
Dm Vapour diffusivity
vii
ABBREVIATIONS
viii
1. INTRODUCTION
In modern healthcare, the vast array of treatment options—ranging from conventional
medications to complementary therapies like acupuncture, homeopathy, and herbal
remedies—often leaves patients and physicians facing the challenge of selecting the
most effective treatment for a given condition. Alternative medicine, when used as a
supplement to traditional therapies, has gained popularity for its ability to offer
personalized care with potentially fewer side effects. However, determining the best
course of action among countless options requires advanced analytical tools and
methodologies. A cloud-based medicine recommendation system offers a powerful
solution by combining the scalability of cloud computing with advanced algorithms like
cosine similarity to deliver precise, personalized, and evidence-based treatment
suggestions. Cosine similarity is a mathematical technique used to measure the
similarity between two datasets by analyzing the cosine of the angle between them. In
the context of medicine recommendations, this approach enables the comparison of
patient symptoms with drug profiles, helping identify the most suitable treatments for
individual health conditions. By integrating cosine similarity into a cloud-based
platform, the system can process vast amounts of patient data and alternative treatment
options in real-time. This allows physicians to make informed decisions about treatment
plans, especially when conventional methods have proven ineffective. The use of cosine
similarity provides numerous benefits, such as improved accuracy in identifying
effective drugs, reduced adverse effects, and the ability to discover new treatment
insights through large-scale data analysis. Alternative medications often have fewer side
effects than conventional drugs, and by leveraging cosine similarity, doctors can
recommend treatments that are not only effective but also safe and compatible with a
patient's medical history. Challenges like data quality and the underutilization of
alternative treatments must be addressed to fully realize the potential of cosine similarity
in medicine recommendations. High-quality, comprehensive datasets are essential for
producing reliable results, and ensuring the safety and effectiveness of suggested
treatments is paramount. In conclusion, a cloud-based medicine recommendation
system, powered by cosine similarity, represents a transformative step in personalized
healthcare. this approach has the potential to revolutionize the integration of alternative
medicine into mainstream healthcare.
2. LITERATURE REVIEW
Using systems that propose recommendations based on user behavior and patterns, it
may alleviate the millenium problem of Data overload and get insight for improved
analytics. This is also based on the reviews of those products or items. Building a
model for proposing medical items or treatments to patients with comparable
comorbidities is the subject of this study. Note that here there are five subsections 2.1,
subsection 2.2, subsection 2.3, subsection 2.4, and Subsection 2.5.
There are different kind of Filtering used in Recommender systems and the most used ones are
content-based filtering and collaborative Filtering. Based on the user’s preferences and choices,
the Content-Based (CB) technique suggests things or products to the user. In order to suggest
new things with similar attributes, content-based filtering operates on the matching level of the
attributes of content. In (Pal et al.; 2017) and (Zhao; 2019) the authors concludes that as an
alternative to traditional content-based filtering, this work presents a simple method for finding
correlations between two features using set intersection and predicting the similarities between
two items for recommendations using this method. Naive Bayes and other text classifiers have
been utilized in contentbased algorithms in the past. In addition, the algorithm is tested and
compared to the PureCF and SVD algorithms. After evaluation, the MAE values that are created
allow for accurate comparisons. A larger dataset may yield different results, even if Hybrid
content recommendation had superior MAE values and increased the dataset’s sparsity
between 1 percent and 2 percent.Figure 2 shows that at sparsity levels of 98.5percent,
there is a little difference between the outcomes of Pure CF and the authors’technique,
but when the sparsity level is increased to 99percent, the algorithm out-performs the
Pure CF.
Figure 2: Comparison of MAE values with increasing sparsity for hybrid CF and Pure
CF. (Pal et al.; 2017)
Deep neural networks and their implementation over recommender systems have
expanded rapidly in recent years. This field is exploding with new ideas and methods.
We cannot overestimate the value of recommender systems given their extensive use in
online applications and capacity to solve various issues connected to over-choice. Deep
learning has recently gained popularity in numerous academic disciplines, including
computer vision and natural language processing, because to its superior performance
and the ability to build feature representations from scratch. Deep learning has lately
shown its efficiency in information retrieval and recommender systems research. Deep
learning in recommender systems is booming. The author in (Wang et al.; 2014)
suggests people regularly use CF when recommending. This method is widely used in
recommender systems. Users’ ratings are the only source of CF information. Because
the evaluations are often low, CF-based approaches perform poorly. Auxiliary data,
such as item content data, can aid. CTR combines two components that learn from
numerous sources of data. However, scarce auxiliary data may render CTR’s latent
representation ineffective. They propose a collaborative deep learning hierarchical
Bayesian model that extends recent deep learning successes from single identifier inputs
to CF-based inputs (CDL). CDL can increase understanding on three real-world
datasets. They demonstrated cutting-edge content information performance by
combining deep representation learning and collaborative filtering. First hierarchical
Bayesian model that combines deep learning with RS. The researchers also developed
a sampling-based Bayesian back-propagation method for CDL. More powerful
alternatives may replace the bag-of-words model. CDL also supports deep learning
models besides SDAE. Convolutional neural networks, for example, can consider word
order. Further improving performance with deep learning models. In (Tran et al.; 2021)
authors suggests nowadays, a large amount of clinical data is scattered across various
websites, making it difficult for users to find useful information. The abundance of
medical information (e.g., on drugs, medical tests, and treatment suggestions) has made
it difficult for doctors to make patient-centered decisions. These issues highlight the
need for recommender systems in healthcare to help both end-users and medical
professionals make better health decisions. They review existing research on healthcare
recommender systems in this article. Unlike other related overview papers, ours delves
into recommendation scenarios and approaches. Food, drug, health status prediction,
healthcare service, and healthcare professional recommendations are examples. The
authors also develop working examples to better understand recommendation
algorithms. Finally, the authors discuss future challenges in developing healthcare
recommender systems.
2.5 Recommender System Matrix Factorization Research
3. PROBLEM FORMULATION
An alternating medicine recommendation system is essential to ensure continuity of care
when primary medicines are unavailable, ineffective, or contraindicated. Such systems
aim to provide safe, personalized, and efficient alternatives to prescribed medications
based on pharmacological properties and patient-specific factors.
patient care.
4. PROPOSED WORK
The focus of the research project is to build a recommender system which suggests or
recommends the OTC drugs based on the symptoms or disease of the patient. CRISPDM
approach is used for initial exploration of Dataset and basic techniques of Data Analysis
are used.Figure 3 shows the CRISP-DM (Cross Industry Standard Process for Data
Mining) approach followed as a Data Analytics methodology.
Recommender system are evolving every other day and its significance in Healthcare
industry is booming after the pandemic of 2020. Researchers show the usefulness of this
technology in paper (Khoie et al.; 2017) where they discuss that healthcare organizations
survey patients and staff about their experiences. Hospital administrators frequently use
graphs and charts to provide patient satisfaction data. A deeper data analysis to find
crucial patient satisfaction aspects is rarely followed by such visualization. Researchers
present an unsupervised method for analyzing patient satisfaction survey data.
Identification of similar patient communities and the primary factors contributing to their
satisfaction. It will help hospitals identify patient groups or clusters most likely to be
satisfied and take proactive steps to improve patient care. Finding links between patient
demographics and satisfaction indicators using unsupervised exploratory data analysis.
Figure 3: Methodology flow of CRISP-DM
This dataset file was extracted from the UCI Machine Learning Repository2 and contains
little over 200,000 observations with distinctive properties of OTC (Over-the-Counter)
drugs. It was created by UCI Machine Learning Repository. This is a multivariate data
set with six attributes in total. The description is about the drug and users/patients who
have provided reviews and ratings out of ten stars, which represent the overall satisfaction
of the patient with the drug. In addition, by vectorizing the language, it is possible to
perform sentiment analysis on the review attribute of a certain Drug. Individually
identifiable information (PII) is not contained within this data, which is offered as
opensource. First and foremost, cleaning up the data is essential. This includes removing
null values and removing any columns that are no longer needed. Starting with the data
analysis, the plotting of the top 10 highly rated medicines for each symptom is possible.
In Table 1 shows details about drugs dataset.
The CSV data file used for the project is kept at Microsoft Azure blob Storage and can
be fetched from cloud whenever required. The Analysis of Dataset consists of Data
Cleaning by removing na values and dropping duplicate rows. Doing Factor Analysis to
determine which feature can be used for model building and accordingly modifying few
features. Furthermore, using attributes Rating and Useful count, the most reviewed
medicines are sorted out and graph is plotted. Using Python regex library some unwanted
characters were removed from Reviews and fed to word vectorizer to calculate score of
individual
5. System Design
Our recommendation engine’s proposed architecture will be discussed in this
section. Both content-based and user-based collaborative filtering are used in a
Recommendation system’s foundation. The basic types of recommender system
is shown in below Figure 4. The implementation of our Recommendation engine
techniques was made possible with help of these designs’ specifications. It will
be discussed in the following section 5
In this project content based filtering recommender system was used by matching the
attributes of a drug and review given by different patient to that drug as shown in
Figure 5. The Drug also matches its corresponding disease or symptoms given in the 12
dataset and then further by using Cosine similarity score the relation is calculated.
Word to Vector using CountVectorization is done in order to have vectors for each
words.The Formula used in calculation of similarity which was directly import from
python library is shown as in Figure