Unit 4 - MLMM

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

Recommendation Systems

Recommender systems are the systems that are designed to recommend things to the
user based on various factors. It takes the user model consisting of ratings, preferences,
demographics, etc. and items with its descriptions as input, finds relevant score which is used
for ranking, and finally recommends items that are relevant to the user.
Recommendations are commonly seen in e-commerce systems, LinkedIn, friend
recommendation on Facebook, song recommendation at FM, news recommendations at
Forbes.com, etc.. Companies like Netflix, Amazon, etc. use recommender systems to help
their users to identify the correct product or movies for them.
The recommender system deals with a large volume of information present by filtering the
most important information based on the data provided by a user and other factors that take
care of the user’s preference and interest. It finds out the match between user and item and
imputes the similarities between users and items for recommendation.

Recommendation system is a facility that involves predicting user responses to options in


web applications.
Generally we see the following recommendations:
1. “You may also like these…”, “People who liked this also liked…”
2. If you download presentations from slideshare, it says “similar content you can save and
browse later”.
Use-Cases of Recommendation System
There are many use-cases of it. Some are
A. Personalized Content: Helps to improve the on-site experience by creating dynamic
recommendations for different kinds of audiences like Netflix does.

B. Better Product search experience: Helps to categories the product based on their
features. Eg: Material, Season, etc.
Recommendation Algorithms Design

Steps to designing an effective recommendation algorithm

Recommendation algorithms are widely used in online platforms to provide


personalized suggestions to users based on their preferences, behavior, and feedback. They
can help increase user engagement, retention, and satisfaction, as well as generate revenue
and insights for businesses. Here are some steps that guide us to design an effective
recommendation algorithm that meets your goals and challenges.

1. Define the problem

The first step is to define the problem you want to solve with our recommendation algorithm.
What is the purpose of our recommendation? Who are our target users? What are their needs
and expectations? What kind of data do we have access to? How will we measure the
performance and impact of your recommendation? These questions will help us narrow down
the scope and objectives of our problem and set the criteria for evaluating our solution.
2. Choose the approach

The next step is to choose the approach that best suits our problem and data. There are three
main types of recommendation algorithms: content-based, collaborative filtering, and hybrid.
Content-based algorithms recommend items that are similar to the ones the user has liked or
interacted with before, based on the features or attributes of the items. Collaborative filtering
algorithms recommend items that are popular or liked by other users who have similar
preferences or behavior to the user, based on the ratings or feedback of the users. Hybrid
algorithms combine both content-based and collaborative filtering methods to leverage the
strengths and overcome the limitations of each one.

3. Implement the algorithm

The third step is to implement the algorithm using the appropriate tools and techniques.
Depending on the complexity and scale of our problem, we may need to use different
programming languages, libraries, frameworks, or platforms to build and deploy our
algorithm. We may also need to apply various methods and algorithms to preprocess,
analyze, and model our data, such as data cleaning, feature extraction, dimensionality
reduction, clustering, classification, regression, or neural networks. We should also follow the
best practices and standards of coding, testing, debugging, and documenting your algorithm.

4. Evaluate the algorithm

The fourth step is to evaluate the algorithm using the metrics and criteria we defined in the
first step. We should test our algorithm on different datasets, such as training, validation, and
test sets, to measure its accuracy, precision, recall, coverage, diversity, novelty, serendipity,
or relevance. We should also compare your algorithm with other existing or baseline
algorithms to benchmark its performance and identify its strengths and weaknesses. We
should also collect and analyze the feedback and behavior of our users to assess the user
satisfaction and engagement with our algorithm.

5. Optimize the algorithm

The fifth step is to optimize the algorithm based on the results and insights we obtained from
the previous step. We should identify and address the issues and challenges that affect our
algorithm's performance and quality, such as data sparsity, cold start, scalability, or privacy.
We should also experiment with different parameters, features, methods, or models to
improve our algorithm's efficiency, effectiveness, or robustness. We should also update and
refine our algorithm based on the changing needs and preferences of our users and the
evolving trends and patterns of our data.
6. Deploy and monitor the algorithm

The final step is to deploy and monitor the algorithm in the real-world setting. We should
integrate our algorithm with our online platform and ensure its compatibility and
functionality with the existing systems and components. We should also monitor our
algorithm's performance and impact over time and track its key indicators and metrics. We
should also maintain and troubleshoot our algorithm and fix any errors or bugs that may arise.
We should also keep learning and improving our algorithm based on the new data and
feedback we collect and the new challenges and opportunities we encounter.
A Model for Recommendation System

What is Collaborative Filtering?

Most recommendation systems use collaborative filtering to find similar patterns or

information about the users. This technique can filter out items that users like on the

basis of the ratings or reactions by similar users. It uses community data from peer groups

for recommendations. This exhibits all those things that are popular among the peers.

Collaborative filtering systems recommend items based on similarity measures between users

and/or items. The items recommended to a user are those preferred by similar users

(community data). Eg. When we shop on Amazon it recommends new products saying

“Customer who brought this also brought”.

Types of Collaborative Filtering

There are two types of Collaborative Filtering available:

 User-based Collaborative Filtering (UBCF)

 Item-based Collaborative Filtering (IBCF)

User-Based Collaborative Filtering (UBCF)


User-Based Collaborative Filtering is a technique used to predict the items that a user might
like on the basis of ratings given to that item by other users who have similar taste with that
of the target user. Many websites use collaborative filtering for building their
recommendation system.

It is based on the notion of users’ similarity. Eg. On the left side, we can see a picture where
3 children named A, B, C, and 4 fruits i.e, grapes, strawberry, watermelon, and orange
respectively. Based on the image let assume A purchased all 4 fruits, B purchased only
strawberry and C purchased strawberry as well as watermelon. Here A & C are similar kinds
of users because of this C will be recommended Grapes and Orange as shown in dotted line.
Steps for User-Based Collaborative Filtering:
Step 1: Finding the similarity of users to the target user U. Similarity for any two users.
Using Pearson’s Correlation in peer-based collaborative filtering, the formula turns out to be:

where “a” and “b” are users; ra, p is the rating of user “a” for item “p”; “P” is the set of items

that are rated by both “a” and “b”.


Step 2: Prediction of missing rating of an item using Nearest Neighbor technique:
Now, the target user might be very similar to some users and may not be much similar to
others. Hence, the ratings given to a particular item by the more similar users should be
given more weightage than those given by less similar users and so on. This problem can be
solved by using a weighted average approach. In this approach, we multiply the rating of
each user with a similarity factor calculated using the formula. Similarity measure lies in the
range of −1 and 1, where −1 indicates very dissimilar and 1 indicates perfect similarity. The
missing rating can be calculated as,

Example: Consider a matrix that shows four users Alice, U1, U2 and U3 rating on different
news apps. The rating range is from 1 to 5 on the basis of users’ likability of the news app.
The ‘?’ indicates that the user has not rated the app.

Calculating the similarity between Alice and all the other users At first we calculate the
averages of the ratings of all the user excluding I5 as it is not rated by Alice. Therefore, we
calculate the average as:
If we assume the threshold rating as 3.5, we recommend I5 to Alice as the predicted
rating is 3.83. But the problem arises when it is a cold start. How to recommend new items?
What to recommend to the new users? In the initial phase, the user can be requested to rate a
set of items. Otherwise either demographic data or non-personalized data can be used.

Item-Based Collaborative Filtering (IBCF)

Item-based collaborative filtering is the recommendation system to use the similarity


between items using the ratings by users. Item-based collaborative filtering is a model-
based algorithm for making recommendations.
Similarities between items
The similarity values between items are measured by observing all the users who have
rated both the items. As shown in the diagram below, the similarity between two items
is dependent upon the ratings given to the items by users who have rated both of them:
Similarity measures
1. To build the model by finding similarity between all the item pairs. The similarity between
item pairs can be found in different ways. One of the most common methods is to use cosine
similarity. Cosine similarity means the similarity between two vectors of inner product
space. It is measured by the cosine of the angle between two vectors.
Formula for Cosine Similarity:

This formula yields a value between -1 and 1, where 1 indicates perfect similarity, 0 indicates
no similarity, and -1 indicates perfect dissimilarity.
2. Executing a recommendation system. It uses the items (already rated by the user) that are
most similar to the missing item to generate rating.

Eg. Given below is a set table that contains some items and the user who have rated those
items. The rating is explicit and is on a scale of 1 to 5. Each entry in the table denotes the
rating given by aith User to a jth Item. We need to find the missing ratings for the respective
user.

1. Finding similarities of all the item pairs.

Sim(Item1, Item2):

In the table, we can see only User_2 and User_3 have rated for both items 1 and 2.

Thus, let I1 be vector for Item_1 and I2 be for Item_2. Then,

I1 = 5U2 + 3U3 and,

I2 = 2U2 + 3U3

Sim(Item2, Item3):

In the table we can see only User_3 and User_4 have rated for both the items 2 and 3.

Thus, let I2 be vector for Item_2 and I3 be for Item_3. Then,

I2 = 3U3 + 2U4 and,

I3 = 1U3 + 2U4
Sim(Item1, Item3):

In the table we can see only User_1 and User_3 have rated for both the items 1 and 3.

Thus, let I1 be vector for Item_1 and I3 be for Item_3. Then,

I1 = 2U1 + 3U3 and,

I3 = 3U1 + 1U3

Rule-based System

What is a Rule-Based System?

A rule-based system is a computational framework that relies on a predefined set of explicit


rules to make decisions or draw conclusions within a specific domain. In technical terms,
these rules are typically formulated as “if-then” statements, where specific conditions trigger
corresponding actions. The strength of rule-based systems lies in their transparency and ease
of interpretation. However, their drawback is the need for explicit rules, making them less
adaptable to complex scenarios or situations where patterns are not easily expressible in rule
form. Despite these limitations, rule-based systems remain valuable in various applications,
especially when dealing with well-defined problems and clear decision logic.
For example, in cybersecurity, a rule-based system might be employed to detect malicious
activities on a network. A rule could be defined as follows: “If a system receives more than a
specified number of connection requests within a short time frame (indicating a potential
cyberattack), then block that IP address.” In this scenario, the rule acts as a security measure
to protect the network from potential threats.

Advantages of Rule-based system

 It provides a clear and understandable way to express logical relationships, enhancing


transparency in decision-making.
 The explicit nature of rules enables users to trace the decision-making process,
creating transparency in system actions.
 Rule-based systems facilitate easy maintenance and debugging in the process.
 They are scalable and adaptable to changing requirements.

Limitation of Rule-based system

 Rule-based systems lack the ability to learn from experience, restricting their capacity
to adapt and improve over time.
 Rule-based systems may struggle with uncertain or ambiguous information, leading to
potential inaccuracies in decision-making.
 Managing a large number of rules can become complex, posing challenges in
organization.

Sequential Recommendation Systems

What is a Sequential Recommendation System?

Sequential recommendation systems try to understand the user input over time and model in
sequential order. The user input interaction is essentially sequence-dependent. That means if
a person books a flight, it books a taxi also for the destination, and books a room. This
information is stored in sequence. If another person books a flight and taxi, the system will
give recommendations for hotel or room booking. Both users’ preference and item popularity
are dynamic or it changes over time.
For instance, we can see more people are buying wireless earphones and some people tend to
buy new phones every year. By these tends, the popularity of some phones or earphones or
any other products changes. Such dynamic profiling is of great significance for precisely
profiling a user or an item for more accurate recommendations and they can only be captured
by only sequential recommendation systems.

Sequential Recommendation System

How are Sequential recommendation systems different from others?

In traditional recommendation systems like Collaborative filtering and Content-based


filtering models, items interactions are in a static way and only capture the user’s general
preferences. But in the Sequential recommendation system, the item interaction is a dynamic
sequence.

When to use Sequential Recommendation Systems?

In the digital world, where everything is going digital, user’s sequential behavior is a wealth
of information. A sequential recommendation system can be used in the field of e-commerce
to determine the historical behavior of the person and predict continuous change in
preference.
How Does it Work?

Generally, an SRS takes a sequence of user-item interactions as the input and tries to predict
the subsequent user-item interactions that may happen in the near future through modelling
the complex sequential dependencies embedded in the sequence of user-item interactions.
More specifically, given a sequence of user-item interactions, a recommendation list
consisting of top ranked candidate items are generated by maximizing a utility function value
(e.g., the likelihood):

R = arg max f(S)

where f is a utility function to output a ranking score for the candidate items, and it could be
of various forms, like a conditional probability [Wang et al., 2018], or an interaction score
[Huang et al., 2018]. S = {i1, i2, ..., i|S|} is a sequence of user-item interactions where each
interaction ij =< u, a, v > is a triple consisting of a user u, the user’s action a, and the
corresponding item v. In some cases, users and items are associated with some meta data
(e.g., the demographics or the features), while the actions may have different types (e.g.,
click, add to the cart, purchase) and happen under various contexts (e.g., the time, location,
weather). The output R is a list of items ordered by the ranking score.
Data Characteristics and Challenges

As the behaviour of customers, for example, shopping is diverse and complex in a real-world
scenario, different data input characteristics from customers will bring different challenges.
Some of them are listed in the below table.

Applications of Sequential Recommendation System

Sequential recommendation systems add more value to many industries and the system itself
is very user-friendly as it recommends what next action should be taken.

It is used in the e-commerce industry to recommend the next relevant product to customers.

For example, if person A buys sports shoes, then the system will recommend socks. This will
help the company to generate more revenue as the recommended product is relevant. It is
used in the tourism/travel industry as well. While booking a ticket for a holiday the system
will recommend the next step like booking top-rated hotels in the area.
SVD for Recommender Systems
SVD is Singular Vector Decomposition. It decomposes a matrix into constituent arrays of
feature vectors corresponding to each row and each column. The SVD technique was
introduced into the recommendation system domain by Brandyn Webb, much more famously
known as Simon Funk during the Netflix Prize challenge.

Just like a number such as 24 can be decomposed as factors 24=2×3×4, a matrix can also be
expressed as multiplication of some other matrices. Because matrices are arrays of numbers,
they have their own rules of multiplication. Consequently, they have different ways of
factorization, or known as decomposition. Example is singular value decomposition, which
has no restriction to the shape or properties of the matrix to be decomposed.

Singular value decomposition assumes a matrix M (for example, a m×n matrix) is


decomposed as

Where,
 U is a m×m orthogonal matrix (for an m×n matrix A).
 Σ is a m×n diagonal matrix containing the singular values of A.
 V is a n×n orthogonal matrix.

 Vectors are orthogonal to each other if any two vectors’ dot product is zero. A vector
is unit vector if its length is 1. Orthonormal matrix has the property that its transpose
is its inverse. In other words, since U is an orthonormal matrix, UT=U-1 OR
U.UT=UT.U= I, where I is the identity matrix.
Implementation:

The SVD can be calculated by calling the svd() function. The function takes a matrix and
returns the U, Sigma and V^T elements. The Sigma diagonal matrix is returned as a vector of
singular values. The V matrix is returned in a transposed form.

The example below defines a 3×2 matrix and calculates the Singular-value decomposition.

# Singular-value decomposition
from numpy import array
from scipy.linalg import svd
# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
# SVD
U, s, VT = svd(A)
print(U)
print(s)
print(VT)

Running the example first prints the defined 3×2 matrix, then the 3×3 U matrix, 2 element
Sigma vector, and 2×2 V^T matrix elements calculated from the decomposition.
[[1 2]
[3 4]
[5 6]]

[[-0.2298477 0.88346102 0.40824829]


[-0.52474482 0.24078249 -0.81649658]
[-0.81964194 -0.40189603 0.40824829]]

[ 9.52551809 0.51430058]

[[-0.61962948 -0.78489445]
[-0.78489445 0.61962948]]

An Incremental approach in Singular Value Decomposition (SVD)

This is a method to compute the SVD of a matrix by iteratively adding columns (or rows) and
updating the decomposition as new columns (or rows) are added. This approach can be useful
when dealing with large matrices where computing the SVD directly may be computationally
expensive or memory-intensive.

Here's a simplified step-by-step explanation of the incremental approach in SVD:


1. Initialize: Start with an empty matrix or an initial approximation of the SVD, depending on
the specific algorithm or requirements.
2. Add a Column (or Row): Incrementally add one column (or row) of the matrix at a time.
This could be the next column (or row) of the original matrix being decomposed.
3. Update the SVD: As each new column (or row) is added, update the SVD using algorithms
designed for incremental updates. This typically involves updating the singular values and
corresponding singular vectors.
4. Repeat: Continue adding columns (or rows) and updating the SVD until all columns (or
rows) of the original matrix have been processed.
5. Finalize: Once all columns (or rows) have been added and the SVD has been updated
accordingly, the final SVD of the original matrix is obtained.

The incremental approach is particularly useful in scenarios where the matrix is too large to
fit into memory entirely, or when new data arrives over time, and we need to update the SVD
accordingly without recomputing it from scratch each time.

It's important to note that the incremental approach may introduce some approximation errors
compared to computing the SVD directly, but it offers a trade-off between accuracy and
computational efficiency in certain situations.

SVD Vs Matrix Factorization in Recommender Systems


SVD is a fancy way to factorizing a matrix into three other matrices (A = UΣVᵀ). The
way SVD is done guarantees those 3 matrices carry some nice mathematical properties. There
are many applications for SVD. One of them is Principal Component Analysis (PCA), which
is just reducing a dataset of dimension n to dimension k (k < n). The biggest evidence is that
SVD creates 3 matrices while Funk’s Matrix Factorization creates only 2.

Matrix Factorization in Recommender Systems

Matrix decomposition (or) matrix factorization is an approximation of a matrix into a product

of matrices. They are used to implement efficient matrix algorithms.


“The general idea is that we are able to multiply all of the resulting ‘factor’ matrices together

to obtain the original matrix.”

Ex: 6 = 2 * 3 , where 2,3 are treated as factors of 6 and we can generate 6 again by product of

those factors(i.e 2,3).

In similar fashion, A = B.C , A can be expressed as product of two lower dimensional

matrices B, C. Here, k=no. of dimensions (hyper parameter).

a) Purpose and Ways to decompose a matrix

There are many different matrix decomposition techniques, each finds use among a particular

class of problems.

To be brief, these are two broad classes of decomposition:

a) Some only apply to ‘n x n’ matrices

b) Some apply to more general ‘m x n’ matrices

1) To solve linear equations, of which mostly matrix decomposes into 2 parts.

 LU decomposition factorizes a matrix into a Lower triangle and a Upper triangle matrix.
 QR decomposition decomposes of a matrix A into a product A = QR of an orthogonal

matrix Q and an upper triangular matrix R.

 Cholskey decomposition etc.

 Non-negative matrix factorization (NMF) decomposes a matrix A into two matrices that

have non-negative elements. (A must have non-negative elements too)

2) Eigenvalue-based decomposition, mostly applicable to square matrices, where matrix

decomposes into 3 parts (final rotation, scaling, initial rotation).

 PCA is a transform that uses eigen decomposition to obtain the transform matrix.

 Singular Value Decomposition (SVD) factorizes any matrix with any dimension as 3

parts USV’.

b) What are the benefits of decomposing a matrix?

 Matrix factorization which separates a matrix into two other matrices that is typically

much easier to solve than the original matrix. This not only makes the problem easy to

solve, but it reduces the amount of time required by a computer to calculate the answer.

 Matrix decomposition is mostly used to solve linear systems in an easy way or quickly.

 Matrix factorization reduces a computers storage space for matrices, instead of storing the

large non factorized matrix (A), We can use less storage for its factors (i.e B, C),

sometimes even smaller when rank of matrix is small.

c) Applications

1. Imputation of missing/incomplete data.

2. Imaging: Segmentation and Noise Removal.


3. Text Mining/Topic Modeling.

4. Recommendations using Collaborative filtering.

5. Eigen faces.

Collaborative Filtering Algorithm Implementation

Collaborative filtering is a way recommendation systems filter information by using


the preferences of other people. It uses the assumption that if person A has similar
preferences to person B on items they have both reviewed, then person A is likely to have a
similar preference to person B on an item only person B has reviewed. Collaborative filtering
is used by many recommendation systems in various fields, including music, shopping,
financial data, and social networks and by various services (YouTube, Reddit, Last.fm). Any
service that uses a recommendation system most likely employs collaborative filtering.

Overview

With the growth of the internet and faster computers, large amounts of data are being stored
by companies all over the world. This has resulted in the rise of a subfield within computer
science called big data, which focuses on the problems of storing and analyzing vast troves of
data. Collaborative filtering is a way of extracting useful information from this data, in a
general process called information filtering. The algorithm compares a user with other similar
users (in terms of preferences) and recommends a specific product or action based on these
similarities.

Specifically, a collaborative filtering scheme uses the following steps:

1. A user expresses preferences of items, usually by rating them.

2. The algorithm finds other users with similar tastes to the given user.

3. The system recommends items that the user has not yet rated (thus, likely being new
to the user) and that are highly rated by users similar to the given user.

A caveat in this process includes active participation by the users of the service. In order to be
recommended an item, the user must have liked or disliked items in the past; otherwise, the
system will not provide good recommendations.
Differences in Implementation

There are two main ways of implementing collaborative filtering - a memory-based approach
and a model-based approach.

The memory-based approach calculates similarity (si,j) between users i and j in the following
manner:

 p is a given preference in the set of preferences P.

 wp is a weight associated with a given preference to determine its relevance. For


example, two people rating a popular rock song highly could be more relevant than
rating an obscure song highly and thus could be assigned a higher weight.

 Simil is a user-defined similarity function that takes in two user preferences (pi and pj)
and determines how similar the preferences are to each other (higher means more
similar). For example, a simple but effective similarity function is one that returns 1 if
the preferences are the same, and returns 0 if they are not.

The similarity calculation as a whole acts as a variant of the k-nearest neighbors algorithm by
determining which users are the closest neighbors to the given user by preferences. The
benefits of the memory-based approach are that it is easy to implement and very effective.
However, some issues include data sparsity, which occurs when there is not enough data to
make good recommendations, and scalability, since with a large dataset more computation is
required to determine similarity.

The model-based approach uses machine learning techniques to find patterns in the dataset.
For example, neural networks can be used to find trends among item preferences. Advantages
to this approach include easy scalability, but can lead to expensive model building for the
neural network. Many collaborative filtering systems use a hybrid approach, which is a
combination of the memory-based and model-based approaches. Though such systems are
expensive and complex to implement, they overcome the shortcomings of each of the above
approaches.
Challenges

There are various known challenges regarding collaborative filtering. The following are some
notable problems.

 Lack of data - In order to start recommending, the system must first obtain a sufficient
number of a user's past preferences. This delays the usefulness of the recommendation
system until this number is met. Additionally, new products must be rated by a
sufficient number of people before being recommended. These issues cause a delay in
the usefulness of collaborative filtering to users.
 Scalability - With more and more people and more and more preferences, the
collaborative filtering algorithm becomes computationally intensive and can take a lot
of time to return a result. There are many ways to combat this issue, from using
subsets of data to returning a result after a certain similarity threshold is reached.
 Synonyms - Collaborative filtering may not be able to distinguish two products that
are the same (for example, different names for the same item on Amazon or a song
and a fan cover of the same song). In this instance, the algorithm may recommend the
same product unknowingly and will unnecessarily perform extra computation in
processing that item rather than avoiding it.
 Shilling attacks - Users can rate their own products highly and rate competitors'
products poorly - this can cause the filtering algorithm to prefer one product over its
competitors', a huge problem for services that guarantee fairness to its users.
Additionally, recommendations within a community will prefer one point of view
over another. Examples of this include Reddit, Facebook, and Buzzfeed - where the
top-rated links will be biased towards the community's preferences.
 Preferring old products - Because old products have more data associated with them
over new products, the algorithm might recommend these old products, rather than the
new products users are looking for. This defeats the purpose for some applications,
where the user might use the service to find new content.
 Gray sheep - Some users have preferences that do not consistently agree with any
group of people (gray sheep). These users do not find collaborative filtering
particularly helpful when determining their wants.

Implementation

The following is an implementation of a simple collaborative filtering algorithm.


# List of people mapped to their preferences
# 1 represents "like" and 0 represents "dislike"
# -1 represents an unknown preference
person_to_preferences = {
"Alice" : [1, 0, 1, 1],
"Bob": [0, 1, 1, 1],
"Chris": [1, 1, 1, 0],
"Devin": [1, 1, 1, -1]
}
# Person and preference index to determine
person_to_recommend = "Devin"
preference_index = 3
def collaborative_filtering():
# Keep track of similarity score for each person
person_to_score = {}
person_to_recommend_preferences = person_to_preferences[person_to_recommend]
for person in person_to_preferences:
# Continue if the person is the same as the person to recommend to
if person == person_to_recommend:
continue
# Initialize similarity score to 0
person_to_score[person] = 0
preferences = person_to_preferences[person]
for i in range(len(preferences)):
# Continue if this is the preference we are trying to recommend
if person_to_recommend_preferences[i] == -1:
continue
# Add a "point" if the person shares similar tastes with the person to recommend to
if preferences[i] == person_to_recommend_preferences[i]:
person_to_score[person] += 1
# Determine the person with the most similar preferences
max_score = -1
max_person = ""
for person in person_to_score:
if person_to_score[person] > max_score:
max_score = person_to_score[person]
max_person = person
return person_to_preferences[max_person][preference_index]

Online Recommendation Systems

Working Process of an Online Recommendation System

Introduction

Have you ever wondered how YouTube can pick up on your brand-new interest
immediately after you watched one video about it? Or how Amazon can recommend products
based on what you currently have in your shopping cart? The answer is online
recommendation systems (or recommender systems or recsys). These systems can
generate recommendations for users based on real-time contextual information, such as the
latest item catalog, user behaviors, and session context. While online recommendation
systems can vary substantially, they are all generally composed of the same high-level
components. Those are:

 Candidate generation

 Feature retrieval

 Filtering

 Model inference

 Pointwise scoring and ranking

 Listwise ranking
Candidate generation

Candidate generation is responsible for quickly and cheaply narrowing the set of possible
candidates down to a small enough set that can be ranked. For example, narrowing a billion
possible YouTube videos down to ~hundreds based on a user’s subscriptions and recent
interests. A good candidate generation system produces a relatively small number of diverse,
high-quality candidates for the rest of the system.

There are many different options that can make effective candidate generators:
 Query your existing operational database

o E.g., query Postgres for a user’s recently purchased items.


 Create a dedicated “candidate database” in a key-value store.

o E.g., use Redis sorted sets to track popular items in a given locale, or

o run a daily batch job to generate a list of 500 candidates for every user and
load those candidates into DynamoDB.
 Vector/Embedding similarity-based approaches
o Many different ML approaches that can learn “embeddings” for users and
items. These approaches can be combined with vector search technologies
like Faiss, Annoy, Milvus, and Elasticsearch to scale to online serving.

o Collaborative Filtering/Matrix Factorization also falls into this category. For


lower scale or less mature systems, a basic collaborative filtering model may
also be used directly in production.

Every approach will have its pros and cons with respect to complexity, freshness, candidate
diversity, candidate quality, and cost. A good practice is to use a combination of approaches
and then “union” the results in the online system. For example, collaborative filtering models
can be used to generate high-quality candidates for learned users and items, but collaborative
filtering models suffer from the cold-start problem. Your system could supplement a
collaborative filtering model with recently added, popular items fetched from an operational
database.

Feature retrieval
At one or more points, the recommendation system will need to look up or compute
data/features for the user and the candidates being considered. This data will fall into three
categories:

 Item features

o E.g., product_category=clothing or average_review_rating_past_24h=4.21


 User features

o E.g., user_favorite_product_categories=[clothing,
sporting_goods] or user_site_spend_past_month=408.10
 User-Item cross features
o E.g., user_has_bought_this_product_before=False or product_is_in_user_favo
rite_categories=True

o Cross features like these can be very helpful with model performance without
requiring the model to spend capacity to learn these relationships.

Your system does not necessarily need to use a purpose-built “feature store” for feature
retrieval, but the data service needs to handle the following:

 Extremely high scale and performance

o Due to the user-to-item fanout, this service may receive 100-1000x the QPS of
your recommendations service, and long-tail latency will heavily impact your
overall system performance.

o Fortunately, in most recsys cases, query volume and cost can be significantly
reduced at the expense of data freshness by caching item features. Item
features (as opposed to user features) are usually well suited to caching
because of their lower cardinality and looser freshness requirements. (It’s
probably not an issue if your product_average_review_rating feature is a
minute or two stale.) Feature caching may be done at multiple levels; e.g., a
first-level, in-process cache and a second-level, out-of-process cache like
Redis.
 Online-offline parity

o Feature data fetched online must be consistent with the feature data that
ranking models were trained with. For example, if the
feature user_site_spend_past_month is pre-tax during offline training and
post-tax during online inference, you may get inaccurate online model
predictions.
 Feature and data engineering capabilities
o Your chosen data service should support serving the kinds of features and data
that your models and heuristics require. For example, time-windowed
aggregates
(e.g., product_average_review_rating_past_24h or user_recent_purchase_ids)
are a popular and powerful class of features, and your data service should have
a scalable approach to building and serving them.

o If your data service does not support these classes of features, then
contributors will look for escape hatches, which may degrade system
complexity, reliability, or performance.

Filtering

Filtering is the process of removing candidates based on fetched data or model predictions.
Filters primarily act as system guardrails for bad user experience. They fall into the following
categories:

 Item data filters:

o E.g., an item_out_of_stock_filter that filters out-of-stock products.


 User-Item data filters:

o E.g., a recently_view_products_filter that filters out products that the user has
viewed in the past day or in the current session.
 Model-based filters:
o E.g., an unhelpful_review_filter that filters out item reviews if a personalized
model predicts that the user would rate the review as unhelpful.

Even though filters act as guardrails and are typically very simple, they themselves may need
some guardrails. Here are some recommended practices:

 Filter limits

o Have checks in place to prevent a single filter (or a combination of filters)


from filtering out all of the candidates. This is especially important for model-
based filters, but even seemingly benign filters, like
an item_out_of_stock_filter, can cause outages if the upstream data service
has an outage.
 Explainability and monitoring
o Have monitoring and tooling in place to track filters over time and debug
changes. Just like model predictions, filter behavior can change based on data
drift and will need to be re-tuned.

Model inference
Finally, we get to the “normal” machine learning part of the system: model inference. Like
with other ML systems, online model inference boils down to sending feature vectors to a
model service to get predictions. Typically these models predict easily measurable
downstream events, like Click Through Rate, Probability of a Purchase, Video Watch Time,
etc. The exact ML method, model architecture, and feature engineering for these models are
huge topics, and fully covering them is outside of the scope of this blog post.

Instead, we will focus on a couple of practical topics that are covered less in data science
literature.

1. Incorporating fresh features into predictions.

o If you’ve decided to build an online recommendation system, then leveraging


fresh features (i.e., features based on user or item events from the past several
seconds or minutes) is probably a high priority for you.

o Some ML methods, like collaborative filtering, cannot incorporate fresh


features like these into predictions. Other approaches, like hybrid content-
collaborative filtering (e.g., LightFM) or pure content-based filtering (e.g.,
built on XGBoost or a neural net) can take advantage of fresh features. Make
sure your Data Science team is thinking in terms of online serving and fresh
features when choosing ML methods.
2. Model calibration
o Model calibration is a technique used to fit the output distribution of an ML
model to an empirical distribution. In other words, your model output can be
interpreted as a real-world probability, and the outputs of different model
versions are roughly comparable.

o To understand what this means, consider two example uncalibrated Click


Through Rate models. Due to different training parameters (e.g., negative
label dropout), one model’s predictions vary between 0 and 0.2, and the
second model’s predictions vary between 0 and 1. Both models have the same
test performance (AUC) because their relative ordering of predictions is the
same. However, their scores are not directly comparable. A score of 0.2 means
a very high CTR for the first model and a relatively low CTR for the second
model.

o If production and experimental models are not calibrated to the same


distribution, then downstream systems (e.g., filters and ranking functions) that
rely on the model score distributions will need to be re-tuned for every
experiment, which quickly becomes unsustainable. Conversely, if all of your
model versions are calibrated to the same distribution, then model experiments
and launches can be decoupled from changes to the rest of the
recommendation system.

Pointwise scoring and ranking

“Pointwise” ranking is the process of scoring and ranking items in isolation; i.e., without
considering other items in the output. The item pointwise score may be as simple as a single
model prediction (e.g., the predicted click through rate) or a combination of multiple
predictions and heuristics.

For example, at YouTube, the ranking score is a simple algebraic combination of many
different predictions, like predicted “click through rate” and predicted “watch time”, and
occasionally some heuristics, like a “small creator boost” to slightly bump scores for smaller
channels.
ranking_score=
f(pCTR,pWatchTime,isSmallChannel)=
pCTR ^ X * pWatchTime ^ Y * if(isSmallChannel, Z, 1.0)

Tuning the parameters X, Y, and Z will shift the system’s bias from one objective to another.
If your model scores are calibrated, then this ranking function can be tuned separately from
model launches. The parameters can even be tuned in a personalized way; e.g., new users
may use different ranking function parameters than power users.

Listwise ranking

Have you ever noticed that your YouTube feed rarely has two similar videos next to each
other? The reason is that item diversity is a major objective of YouTube’s “listwise” ranking.
(The intuition being if a user has scrolled past two “soccer” videos, then it’s probably sub-
optimal to recommend another “soccer” video in the third position.)

Listwise ranking is the process of ordering items in the context of other items in the list. ML-
based and heuristic-based approaches can both be very effective for listwise optimization—
YouTube has successfully used both. A simple heuristic approach that YouTube had success
with is to greedily rank items based on their pointwise rank but apply a penalty to items that
are too similar to the preceding N items.

If item diversity is an objective of your final output, keep in mind that listwise ranking can
only be impactful if there are high-quality, diverse candidates in its input. This means that
tuning the upstream system, and in particular candidate generation, to source diverse
candidates is critical.

You might also like