Module 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

MODULE 5

Recommender system, types


Case study: Netflix Recommender system
Social media analytics: current trends, tools
Social media analytics for citizen-centric public services
Churn prediction –Case study
RECOMMENDER SYSTEMS
• People are influenced by recommendations in their daily decisions.
• Salesmen try to sell us the product we like.
• Restaurants are being evaluated and rated.

• Recommender systems can support us in our online commercial


activities by suggesting specific items from a wide range of options.
RECOMMENDER SYSTEMS
• Different techniques are to build a recommender system:

• Collaborative filtering
• Content‐based filtering
• Demographic filtering
• Knowledge‐based filtering
• Hybrid filtering.
Collaborative Filtering
• Also called as Social filtering.
• Most commonly used technique.
• To recommend items based on the opinions of other users.
• User‐based collaborative filtering
• Items will be recommended to a user based on how similar users rated these
items.
• Item‐based collaborative filtering
• Items will be recommended to a user based on how this user rated similar
items.
Similarity matrix
• One way to calculate similarity between users or items is to use a
user‐item matrix that contains information on which user bought
what item.

• Any similarity measure can then be used to create a similarity matrix.


• Pearson correlation
• cosine
Continued..
• To build a collaborative recommender system, ratings are required.
• These ratings form the link between a user and an item.

• Three types of rating:


• Scalar rating (a number or an ordinal rating)
• Binary rating (good or bad)
• Unary ratings (indicate that a user has had an interaction with an item, such
as a click on an item or a purchase.)
Continued..
Two types of methods for the collection of ratings:

• Explicit ratings:
• Can be obtained by requesting a user to rate a certain item.

• Implicit ratings:
• Are obtained by associating a rating with a certain action such as buying an
item.
Algorithm used
 Neighborhood‐based algorithms are applied.

1. A similarity measure is used to calculate similarity between users or


items.
2. Subset of users or items is selected that functions as the
neighborhood of the active user or item.
3. Predicts a rating based on the active user’s or item’s neighborhood,
typically giving the highest weight to the most similar neighbors.
Advantages
• Collaborative filtering does not restrict the type of items to be
recommended.
• It manages to deliver recommendations to a user even when it is
difficult to find out which specific feature of the item makes it
interesting to the user or when there is no easy way to extract such a
feature automatically.
• Recommend more unexpected items (that are equally valuable).
Disadvantages
• If items are not frequently bought by the users (e.g., recommending mobile
phones or apartments), it may indeed be difficult to obtain representative
neighborhoods, hence lowering the power of the technique.
• The cold start problem, which means that new items cannot easily be
recommended because they have not been rated yet.
• Items purchased a long time ago may have a substantial impact if few items have
been rated, which may lead to wrong conclusions in a changing environment.
• Privacy - collaborative filtering needs data on users to give recommendations or
could generate trust issues because a user cannot question the recommendation.
Content‐Based Filtering
• Recommend items based on two information sources:
• Features of products
• Ratings given by users.

• In the case of structured data, each item consists of the same


attributes and the possible values for these attributes are known.
Content‐Based Filtering
• When only unstructured data are available, such as text, different
techniques have to be used in order to learn the user profiles.

• No standard attributes and values are available, typical problems arise, such
as synonyms and polysemous words.
• Free text can then be translated into more structured data by using a
selection of free text terms as attributes.
• Techniques like TF‐IDF (term frequency/ inverse document frequency) can
then be used to assign weights to the different terms of an item.
Content‐Based Filtering
• Data is semistructured, consisting of some attributes with restricted
values and some free text.

• One approach to deal with this kind of data is to convert the text into
structured data.
Content‐Based Filtering
• A classification algorithm is invoked for each user based on his or her
ratings on items and their attributes.
• Predict whether a user will like an item with a specific representation.
• Can apply explicit/implicit rating.
Machine learning techniques used
• Logistic regression
• Neural networks
• Decision trees
• Association rules
• Bayesian networks
• Nearest neighbor methods
• Support vector machines
• Naïve Bayes.
Advantages
• There is no cold start problem for new items.
• Items can also be recommended to users that have unique
preferences.
• Possibility to give an explanation to the user about his or her
recommendations.
• Only ratings of the active user are used in order to build the profile.
Disadvantages
• Only suitable if the right data are available.
• Old ratings potentially influence the recommendation.
• Over‐specialization can be a problem because such techniques will
focus on items similar to the previously bought items.
Demographic Filtering
• Recommends items based on demographic information of the user.
• The main challenge is to obtain the data.
• This can be explicitly done by asking for information from users such
as age, gender, address, and so on.
• Personal attributes.
• Analytical techniques could be used to extract information linked to
the interactions of the users with the system.
Advantages
• There is not always a need for a history of user ratings.
• Segments can be used in combination with user–item interactions in
order to obtain a high‐level recommender system.
Disadvantages
• The cold start problem for new users and new items.
• Difficulty in capturing the data, which is highly dependent on the
participation of the users.
Knowledge‐Based Filtering
• A recommender system is knowledge-based when it makes
recommendations based not on a user’s rating history, but on specific
queries made by the user.

• A first advantage of knowledge‐based recommender systems is that


they can be used when there is only limited information about the
user, hence avoiding the cold start problem.
• Expert knowledge is used in the recommender system.
Knowledge‐based recommender systems can be divided in two :

• Constraint‐ based recommenders


• Systems meeting a set of constraints imposed by users and the item domain.
• A model of the customer requirements, the product properties, and other
constraints that limit the possible requirements is first constructed and formalized.

• Case‐based recommenders
• The goal is to find the item that is most similar to the ones the user requires.
• Similarity is then often based on knowledge of the item domain.
• The system will then start with an example provided by the user and will generate a
user profile based on it.
• Based on this user profile gathering information and additional knowledge sources,
recommendations can then be proposed.
Advantages
• That they can be used when there is only limited information about
the user.
• Expert knowledge is used in the recommender system.
• It is also possible to function in an environment with complex,
infrequently bought items.
• Can help customers actively, for example, by explaining products or
suggesting changes in case no recommendation is possible.
Disadvantages
• System may require some effort concerning knowledge acquisition,
knowledge engineering, and development of the user interface.
• It can be difficult when the user is asked to provide the system with
an example if the number of items in the recommendation system is
very high.
• It may be difficult or impossible for the user to provide an example
that fits the user’s needs.
Hybrid Filtering
• Hybrid recommender systems combine the advantages of content‐
based, knowledge‐based, demographic, and collaborative filtering
recommender systems.
• Developed is to avoid the cold start problem.
• Combines two or more recommendation techniques to gain
performance with fewer of the drawbacks of any of them.
• There are seven types of hybrid techniques.
Types of hybrid techniques
• Weighted hybrid filtering:
• The recommendation scores of several recommenders are combined by
applying specific weights.
• Switching:
• Recommendations are taken from one recommender at a time, but not
always the same one.
• Mixed:
• Recommendations for multiple recommenders are shown to the user.
• Feature combination
• Different knowledge sources are used to obtain features, and these are then given to
the recommendation algorithm.
• Augmentation
• A first recommender computes the features while the next recommender computes
the remainder of the recommendation.
• Cascade
• Each recommender is assigned a certain priority and if high priority recommenders
produce a different score, the lower priority recommenders are decisive.
• Meta‐level hybrid
• Consists of a fi rst recommender that gives a model as output that is used as input by
the next recommender
SOCIAL MEDIA ANALYTICS
• With the rising popularity of the web, people are closer connected to each other
than ever before.
• The demographic boundaries are fading away through the recently trending
online communication channels.
• Traditional word‐of‐ mouth advertising is replaced by the web.

• Web users have been putting billions of data online on social medias.
• Facebook
• Twitter
• Wikipedia (user‐generated encyclopedia)
• LinkedIn
• Reddit
• Instagram
• Users are no longer reluctant to share personal information about themselves,
their friends, their colleagues, their idols, and their political preferences with
anybody who is interested in them.
• Web users are 24/7 connected to all kinds of social media platforms, giving
real‐time information about their whereabouts.
• A new challenging research domain arises: social media analytics.
• These data sources offer invaluable knowledge and insights in customer behavior
and enable marketers to more carefully profile, track, and target their customers.
• Crawling through such data sources is far from evident because social media data
can take immense magnitudes never seen before.
From a sales‐oriented point of view:
• Social media offers advantages for both parties in the business–consumer relationship.
• People share thoughts and opinions on weblogs, microblogs, online forums, and review
websites, creating a strong effect of digital word‐of‐mouth advertising.
• Web users can use others’ experience to gain information and make purchase decisions.
• Consumers are no longer falling for transparent business tricks of a sales representative,
but they are well‐informed and make conscious choices like true experts.
• Companies are forced to keep offering high‐quality products and services, and only a
small failure can have disastrous consequences for the future.
• Consumers can easily compare product and service characteristics of both local and
global competitors.
Continued..
• People trust social media platforms with their personal data and interests, making it an
invaluable data source for all types of stakeholders.
• Marketers who are searching for the most promising and profitable consumers to target
are now able to capture more concrete consumer characteristics, and hence develop a
better understanding of their customers.
• Next‐generation business intelligence platform.
• Politicians and governmental institutions can get an impression of the public
opinion through the analysis of social media.
• During election campaigns, studies claim that political candidates with a higher
social media engagement got relatively more votes within most political parties.
• Tool to acquire and propagate one’s reputation.
Continued..
• Social media analytics is a multifaceted domain.
• Social networking sites are protective toward data sharing and offer built‐in
advertisement tools to set up personalized marketing campaigns.
Social Networking Sites: B2B Advertisement Tools
• Business‐to‐business industry is launched by capturing users’ information in
social network websites.
• Enabling personalized advertising and offering services for budget and impact
management.
• Facebook Advertising is a far‐evolved marketing tool with an extensive variety of
facilities and services.
Facebook Advertising
• Calculates the impact and spread of the digital word‐of‐mouth advertising.
• Facebook Advertising is particularly suitable for Business‐to‐ Consumer (B2C)
marketing

• Supports simple marketing campaigns and advanced options:


• such as increasing the number of clicks to a website (click rate)
• page likes (like rate)
• reactions on messages posted by the user (comment and share rate
• mobile app engagement (download and usage rate)
• website conversion (conversion rate)
• enrollment for a newsletter
• leaving an email address, buying a product, downloading a trial version
Facebook Advertising
• FB measures conversion rates by including a conversion‐tracking pixel on the web
page where conversion will take place.
• A pixel is a small piece of code communicating with the Facebook servers and
tracking which users saw a web page and performed a certain action.
• As such, Facebook Advertising matches the users with their Facebook profile and
provides a detailed overview of customer characteristics and the campaign
impact.
Facebook Advertising
• Allows users to create personalized ads and target a specific public by selecting
the appropriate characteristics in terms of demographics, interests, behavior, and
relationships.
Continued..
• Advertisements are displayed according to a bidding system, where the most
eye‐catching spots of a page are the most expensive ones.
• When a user opens his or her Facebook page, a virtual auction decides which ad
will be placed where on the page.
• Depending on the magnitude and the popularity of (a part of) the chosen
audience, Facebook suggests a bidding amount.
• A safer solution is to fix a maximum bid amount in advance. The higher the
amount of the bid, the higher the probability of getting a good ad placement.
• The winning bid does not necessarily have to pay the maximum bid amount.
• Only when many ads are competing do ad prices rise drastically. As such, the
price of an ad differs depending on the target user.
LinkedIn
• The LinkedIn Campaign Manager allows the marketer to create personalized ads
and to select the right customers.
• Compared to Facebook, LinkedIn Campaign Managers offers services to target
individuals based on the characteristics of the companies they are working at and
the job function they have.
Continued..
• LinkedIn Campaign Manager is aimed at advertisements for Business‐to Business
(B2B) and Human Resource Management (HRM) purposes.
• The reader must be careful when deploying these advertisement tools since they
may be so user friendly that the user no longer realizes what he/she is actually
doing with them.
• Make sure that you specify a maximum budget and closely monitor all activities
and advertisement costs.
• A small error can result in a cost of thousands or even millions of dollars in only a
few seconds.
• Good knowledge of all the facilities is essential to pursue a healthy online
marketing campaign.
Sentiment Analysis
• Sentiment analysis and opinion mining focus on the analysis of text and
determining the global sentiment of the text.

• The various steps involved are:


• Tag removal
• Tokenization
• Stopword removal
• Stemming.
Sentiment Analysis
• A text contains many irrelevant words and symbols.
Step 1: Tag Removal
• Text contains many irrelevant words and symbols, unnecessary tags are removed
from the text, such as URLs and punctuation marks.
Step 2: Tokenization
• This step converts the text into a stream of words.
Step 3: Stopword Removal
• Stopwords are detected and removed from the sentence.
• A stopword is a word in a sentence that has no informative meaning, like articles,
conjunctions, prepositions, and so forth.
• Using a predefined machine‐readable list, stopwords can easily be identified and
removed.
• A stoplist can be constructed manually, words with an IDF (inverse document
frequency) value close to zero are automatically added to the list. These IDF
values are computed based on the total set of text fragments that should be
analyzed. The more a word appears in the total text, the lower its value.
Step 4: Stemming
• Stemming converts each word back to its stem or root.
• All conjugations are transformed to the corresponding verb, all nouns are
converted to their singular form, and adverbs and adjectives are brought back to
their base form.
Network Analytics
• Network analytics focuses on the relationships between users on social media platforms.
• Five types of relationships can be distinguished:
Friends.
There is a mutual positive relationship between two users. Both users know each other, and
acknowledge the association between them.
Admirers.
A user receives recognition from another user, but the relationship is not reciprocal.
Idols.
A user acknowledges a certain positive connectedness with another user, but the relationship
is not reciprocal.
Neutrals.
Two users do not know each other and do not communicate with each other.
Enemies.
There is a negative relationship between two users. Both users know each other, but there is a
negative sphere.
Continued…
• Most social networking sites only friendship relationships are exploited.
• Twitter incorporates admirers (followers) and idols (followees) by enabling users
to define the people they are interested in.
• Admirers receive the tweets of their idols.
• Enemy relationships are not common in social networking sites, except for
EnemyGraph.
• Link prediction is one subdomain of network analytics where one tries to predict
which neutral links are actually friendship, admirer, or idol relationships.
• Tie strength prediction is used to determine the intensity of a relationship
between two users.
Continued…
• Homophily, a concept from sociology, states that people tend to connect to other
similar people and they are unlikely to connect with dissimilar people.
• Similarity can be expressed in terms of the same demographics, behavior,
interests, brand affinity, and so on.
• People connected to each other are more likely to like the same product or
service.
• Customer acquisition projects should identify those high‐potential customers
based on the users’ neighborhoods and focus their marketing resources on them.
• A customer whose friends have churned to the competition is likely to be a
churner as well, and should be offered additional incentives to prevent him or
herfrom leaving.

You might also like