We consider the problem of optimizing the placement of stubborn agents in a social network in ord... more We consider the problem of optimizing the placement of stubborn agents in a social network in order to maximally in uence the population. We assume the network contains stubborn users whose opinions do not change, and nonstubborn users who can be persuaded. We further assume the opinions in the network are in an equilibrium that is common to many opinion dynamics models, including the well-known DeGroot model. We develop a discrete optimization formulation for the problem of maximally shifting the equilibrium opinions in a network by targeting users with stubborn agents. The opinion objective functions we consider are the opinion mean, the opinion variance, and the number of individuals whose opinion exceeds a xed threshold. We show that the mean opinion is a monotone submodular function, allowing us to nd a good solution using a greedy algorithm. We nd that on real social networks in Twitter consisting of tens of thousands of individuals, a small number of stubborn agents can non-trivially in uence the equilibrium opinions. Furthermore, we show that our greedy algorithm outperforms several common benchmarks. We then propose an opinion dynamics model where users communicate noisy versions of their opinions, communications are random, users grow more stubborn with time, and there is heterogeneity is how users' stubbornness increases. We prove that under fairly general conditions on the stubbornness rates of the individuals, the opinions in this model converge to the same equilibrium as the DeGroot model, despite the randomness and user heterogeneity in the model.
We consider the problem of evaluating the quality of startup companies. This can be quite challen... more We consider the problem of evaluating the quality of startup companies. This can be quite challenging due to the rarity of successful startup companies and the complexity of factors which impact such success. In this work we collect data on tens of thousands of startup companies, their performance, the backgrounds of their founders, and their investors. We develop a novel model for the success of a startup company based on the first passage time of a Brownian motion. The drift and diffusion of the Brownian motion associated with a startup company are a function of features based its sector, founders, and initial investors. All features are calculated using our massive dataset. Using a Bayesian approach, we are able to obtain quantitative insights about the features of successful startup companies from our model. To test the performance of our model, we use it to build a portfolio of companies where the goal is to maximize the probability of having at least one company achieve an exit (IPO or acquisition), which we refer to as winning. This picking winners framework is very general and can be used to model many problems with low probability, high reward outcomes, such as pharmaceutical companies choosing drugs to develop or studios selecting movies to produce. We frame the construction of a picking winners portfolio as a combinatorial optimization problem and show that a greedy solution has strong performance guarantees. We apply the picking winners framework to the problem of choosing a portfolio of startup companies. Using our model for the exit probabilities, we are able to construct out of sample portfolios which achieve exit rates as high as 60%, which is nearly double that of top venture capital firms.
We consider the problem of identifying coordinated influence campaigns conducted by automated age... more We consider the problem of identifying coordinated influence campaigns conducted by automated agents or bots in a social network. We study several different Twitter datasets which contain such campaigns and find that the bots exhibit heterophily-they interact more with humans than with each other. We use this observation to develop a probability model for the network structure and bot labels based on the Ising model from statistical physics. We present a method to find the maximum likelihood assignment of bot labels by solving a minimum cut problem. Our algorithm allows for the simultaneous detection of multiple bots that are potentially engaging in a coordinated influence campaign, in contrast to other methods that identify bots one at a time. We find that our algorithm is able to more accurately find bots than existing methods when compared to a human labeled ground truth. We also look at the content posted by the bots we identify and find that they seem to have a coordinated agenda.
In events such as natural disasters, terrorist attacks, or war zones, one can gain critical situa... more In events such as natural disasters, terrorist attacks, or war zones, one can gain critical situational awareness by monitoring what people on the ground are saying in social media. But how does one build a set of users in a specific location from scratch? In “Building a Location-Based Set of Social Media Users,” Christopher Marks and Tauhid Zaman present an algorithm to do just this. The algorithm starts with a small set of seed users in the location and then grows this set using an “expand–classify” approach. They apply the algorithm to diverse regions ranging from South America to the Philippines and in a few hours can collect tens of thousands of Twitter users in the target locations. The algorithm is language agnostic, making it especially useful for anyone trying to gain situational awareness in foreign countries.
Online extremists in social networks pose a new form of threat to the general public. These extre... more Online extremists in social networks pose a new form of threat to the general public. These extremists range from cyberbullies who harass innocent users to terrorist organizations such as the Islamic State of Iraq and Syria (ISIS) that use social networks to recruit and incite violence. Currently social networks suspend the accounts of such extremists in response to user complaints. The challenge is that these extremist users simply create new accounts and continue their activities. In this work we present a new set of operational capabilities to deal with the threat posed by online extremists in social networks. Using data from several hundred thousand extremist accounts on Twitter, we develop a behavioral model for these users, in particular what their accounts look like and who they connect with. This model is used to identify new extremist accounts by predicting if they will be suspended for extremist activity. We also use this model to track existing extremist users as they create new accounts by identifying if two accounts belong to the same user. Finally, we present a model for searching the social network to efficiently find suspended users' new accounts based on a variant of the classic Polya's urn setup. We find a simple characterization of the optimal search policy for this model under fairly general conditions. Our urn model and main theoretical results generalize easily to search problems in other fields.
Springer Proceedings in Business and Economics, 2022
In the United States, medical responses by fire departments over the last four decades increased ... more In the United States, medical responses by fire departments over the last four decades increased by 367%. This had made it critical to decision makers in emergency response departments that existing resources are efficiently used. In this paper, we model the ambulance dispatch problem as an average-cost Markov decision process and present a policy iteration approach to find an optimal dispatch policy. We then propose an alternative formulation using post-decision states that is shown to be mathematically equivalent to the original model, but with a much smaller state space. We present a temporal difference learning approach to the dispatch problem based on the post-decision states. In our numerical experiments, we show that our obtained temporal-difference policy outperforms the benchmark myopic policy. Our findings suggest that emergency response departments can improve their performance with minimal to no cost.
Communities in social networks or graphs are sets of well-connected, overlapping vertices. The ef... more Communities in social networks or graphs are sets of well-connected, overlapping vertices. The effectiveness of a community detection algorithm is determined by accuracy in finding the ground-truth communities and ability to scale with the size of the data. In this work, we provide three contributions. First, we show that a popular measure of accuracy known as the F1 score, which is between 0 and 1, with 1 being perfect detection, has an information lower bound is 0.5. We provide a trivial algorithm that produces communities with an F1 score of 0.5 for any graph! Somewhat surprisingly, we find that popular algorithms such as modularity optimization, BigClam and CESNA have F1 scores less than 0.5 for the popular IMDB graph. To rectify this, as the second contribution we propose a generative model for community formation, the sequential community graph, which is motivated by the formation of social networks. Third, motivated by our generative model, we propose the leader-follower algo...
Bots Impact Opinions in Social Networks: Let’s Measure How Much There is a serious threat posed b... more Bots Impact Opinions in Social Networks: Let’s Measure How Much There is a serious threat posed by bots that try to manipulate opinions in social networks. In “Assessing the Impact of Bots on Social Networks,” Nicolas Guenon des Mesnards, David Scott Hunter, Zakaria el Hjouiji, and Tauhid Zaman present a new set of operational capabilities to detect these bots and measure their impact. They developed an algorithm based on the Ising model from statistical physics to find coordinating gangs of bots in social networks. They then created an algorithm based on opinion dynamics models to quantify the impact that bots have on opinions in a social network. They applied their algorithms to a variety of real social network data sets. They found that, for topics such as Brexit, the bots had little impact, whereas for topics such as the U.S. presidential debate and the Gilets Jaunes protests in France, the bots had a significant impact.
We present an integer programming approach to winning daily fantasy sports hockey contests which ... more We present an integer programming approach to winning daily fantasy sports hockey contests which have top heavy payoff structures (i.e. most of the winnings go to the top ranked entries). Our approach incorporates publicly available predictions on player and team performance into a series of integer programming problems that compute optimal lineups to enter into the contests. We find that the lineups produced by our approach perform well in practice and are even able to come in first place in contests with thousands of entries. We also show through simulations how the profit margin varies as a function of the number of lineups entered and the points distribution of the entered lineups. Studies on real contest data show how different versions of our integer programming approach perform in practice. We find two major insights for winning top heavy contests. First, each lineup entered should be constructed in a way to maximize its individual mean, and more importantly, variance. Second...
We consider the problem of finding a specific target individual hiding in a social network. We pr... more We consider the problem of finding a specific target individual hiding in a social network. We propose a method for network vertex search that looks for the target vertex by sequentially examining the neighbors of a set of "known" vertices to which the target vertex may have connected. The objective is to find the target vertex as quickly as possible from amongst the neighbors of the known vertices. We model this type of search as successively drawing marbles from a set of urns, where each urn represents one of the known vertices and the marbles in each urn represent the respective vertex's neighbors. Using a dynamic programming approach, we analyze this model and show that there is always an optimal "block" policy, in which all of the neighbors of a known vertex are examined before moving on to another vertex. Surprisingly, this block policy result holds for arbitrary dependencies in the connection probabilities of the target vertex and known vertices. Furth...
We consider the problem of selecting a portfolio of items of fixed cardinality where the goal is ... more We consider the problem of selecting a portfolio of items of fixed cardinality where the goal is to have at least one item achieve a high return, which we refer to as winning. This framework is very general and can be used to model a variety of problems, such as pharmaceutical companies choosing drugs to develop, studios selecting movies to produce, or our focus in this work, which is venture capital firms picking startup companies in which to invest. We first frame the construction of a portfolio as a combinatorial optimization problem with objective function given by the probability of having at least one item in the selected portfolio win. We show that a greedy solution to this problem has strong performance guarantees, even under arbitrary correlations between the items. We apply the picking winners framework to the problem of choosing a portfolio of startups to invest in. This is a relevant problem due to recent policy changes in the United States which have greatly expanded th...
Choice decisions made by users of online applications can suffer from biases due to the users'... more Choice decisions made by users of online applications can suffer from biases due to the users' level of engagement. For instance, low engagement users may make random choices with no concern for the quality of items offered. This biased choice data can corrupt estimates of user preferences for items. However, one can correct for these biases if additional behavioral data is utilized. To do this we construct a new choice engagement time model which captures the impact of user engagement on choice decisions and response times associated with these choice decisions. Response times are the behavioral data we choose because they are easily measured by online applications and reveal information about user engagement. To test our model we conduct online polls with subject populations that have different levels of engagement and measure their choice decisions and response times. We have two main empirical findings. First, choice decisions and response times are correlated, with strong p...
We consider the problem of selecting a portfolio of entries of fixed cardinality for a winner tak... more We consider the problem of selecting a portfolio of entries of fixed cardinality for a winner take all contest such that the probability of at least one entry winning is maximized. This framework is very general and can be used to model a variety of problems, such as movie studios selecting movies to produce, drug companies choosing drugs to develop, or venture capital firms picking start-up companies in which to invest. We model this as a combinatorial optimization problem with a submodular objective function, which is the probability of winning. We then show that the objective function can be approximated using only pairwise marginal probabilities of the entries winning when there is a certain structure on their joint distribution. We consider a model where the entries are jointly Gaussian random variables and present a closed form approximation to the objective function. We then consider a model where the entries are given by sums of constrained resources and present a greedy int...
We consider the problem of optimizing the placement of stubborn agents in a social network in ord... more We consider the problem of optimizing the placement of stubborn agents in a social network in order to maximally in uence the population. We assume individuals in a directed social network each have a latent opinion that evolves over time in response to social media posts by their neighbors. The individuals randomly communicate noisy versions of their latent opinion to their neighbors, causing them to update their opinions using a time-varying update rule that has them become more stubborn with time and be less a ected by new posts. The noisy communicated opinion and dynamic update rule are novel components of our model and they re ect realistic behaviors observed in many psychological studies. We prove that under fairly general conditions on the stubbornness rates of the individuals, the opinions converge to an equilibrium in the presence of stubborn agents. What is surprising about this result is that the equilibrium condition depends only upon the network structure and the identi...
We consider the problem of selecting a portfolio of entries of fixed cardinality for contests wit... more We consider the problem of selecting a portfolio of entries of fixed cardinality for contests with top-heavy payoff structures, i.e. most of the winnings go to the top-ranked entries. This framework is general and can be used to model a variety of problems, such as movie studios selecting movies to produce, venture capital firms picking start-up companies to invest in, or individuals selecting lineups for daily fantasy sports contests, which is the example we focus on here. We model the portfolio selection task as a combinatorial optimization problem with a submodular objective function, which is given by the probability of at least one entry winning. We then show that this probability can be approximated using only pairwise marginal probabilities of the entries winning when there is a certain structure on their joint distribution. We consider a model where the entries are jointly Gaussian random variables and present a closed form approximation to the objective function. Building o...
Online extremists’ use of social media poses a new form of threat to the general public. These ex... more Online extremists’ use of social media poses a new form of threat to the general public. These extremists range from cyberbullies to terrorist organizations. Social media providers often suspend the extremists’ accounts in response to user complaints. However, extremist users can simply create new accounts and continue their activities. In this work we present a new set of operational capabilities to address the threat posed by online extremists in social networks. We use thousands of Twitter accounts related to the Islamic State in Iraq and Syria (ISIS) to develop behavioral models for these users—in particular, what their accounts look like and with whom they connect. We use these models to track existing extremist users by identifying pairs of accounts belonging to the same user. We then present a model for efficiently searching the social network to find suspended users’ new accounts based on a variant of the classic Pólya’s urn setup. We find a simple characterization of the op...
Food adulteration poses a serious threat to public health. The U.S. Food and Drug Administration ... more Food adulteration poses a serious threat to public health. The U.S. Food and Drug Administration (FDA) has a major role in maintaining food safety in the U.S. through various activities including sampling of imported shipments and site inspections. However, resource constraints significantly limit these regulatory activities, making it essential to allocate resources in a risk-based manner. In this paper, we develop a data-driven, risk analytics approach to identify high-risk consignees (firms that import food). We apply the approach to consignees of shrimp, a product subject to frequent food safety problems. Leveraging supply chain analytics, based on shipment and FDA regulatory information, we construct network features that model risk, specifically predicting which consignees are likely to fail FDA site inspections. Our main findings are that supply chain network complexity, and website network engagement, are predictive of risk. For instance, we measure the diversity of a firm's product portfolio using graph modularity on a product network graph, and find that firms with higher product diversity are more likely to fail site inspections. The results suggest that network-based risk analytics could significantly improve the effectiveness of regulatory activities related to food supply chains, and increase "hit rate" for failed site inspections by 23.6%.
We consider the problem of detecting the source of a rumor which has spread in a network using on... more We consider the problem of detecting the source of a rumor which has spread in a network using only observations about which set of nodes are infected with the rumor and with no information as to when these nodes became infected. In a recent work ( Shah and Zaman 2010 ), this rumor source detection problem was introduced and studied. The authors proposed the graph score function rumor centrality as an estimator for detecting the source. They establish it to be the maximum likelihood estimator with respect to the popular Susceptible Infected (SI) model with exponential spreading times for regular trees. They showed that as the size of the infected graph increases, for a path graph (2-regular tree), the probability of source detection goes to 0 and for d-regular trees with d ≥ 3 the probability of detection, say αd, remains bounded away from 0 and is less than 1/2. However, their results stop short of providing insights for the performance of the rumor centrality estimator in more gen...
Condensation phenomenon is often observed in social networks such as Twitter where one "superstar... more Condensation phenomenon is often observed in social networks such as Twitter where one "superstar" vertex gains a positive fraction of the edges, while the remaining empirical degree distribution still exhibits a power law tail. We formulate a mathematically tractable model for this phenomenon that provides a better fit to empirical data than the standard preferential attachment model across an array of networks observed in Twitter. Using embeddings in an equivalent continuous time version of the process, and adapting techniques from the stable age-distribution theory of branching processes, we prove limit results for the proportion of edges that condense around the superstar, the degree distribution of the remaining vertices, maximal nonsuperstar degree asymptotics and height of these random trees in the large network limit.
We consider the problem of optimizing the placement of stubborn agents in a social network in ord... more We consider the problem of optimizing the placement of stubborn agents in a social network in order to maximally in uence the population. We assume the network contains stubborn users whose opinions do not change, and nonstubborn users who can be persuaded. We further assume the opinions in the network are in an equilibrium that is common to many opinion dynamics models, including the well-known DeGroot model. We develop a discrete optimization formulation for the problem of maximally shifting the equilibrium opinions in a network by targeting users with stubborn agents. The opinion objective functions we consider are the opinion mean, the opinion variance, and the number of individuals whose opinion exceeds a xed threshold. We show that the mean opinion is a monotone submodular function, allowing us to nd a good solution using a greedy algorithm. We nd that on real social networks in Twitter consisting of tens of thousands of individuals, a small number of stubborn agents can non-trivially in uence the equilibrium opinions. Furthermore, we show that our greedy algorithm outperforms several common benchmarks. We then propose an opinion dynamics model where users communicate noisy versions of their opinions, communications are random, users grow more stubborn with time, and there is heterogeneity is how users' stubbornness increases. We prove that under fairly general conditions on the stubbornness rates of the individuals, the opinions in this model converge to the same equilibrium as the DeGroot model, despite the randomness and user heterogeneity in the model.
We consider the problem of evaluating the quality of startup companies. This can be quite challen... more We consider the problem of evaluating the quality of startup companies. This can be quite challenging due to the rarity of successful startup companies and the complexity of factors which impact such success. In this work we collect data on tens of thousands of startup companies, their performance, the backgrounds of their founders, and their investors. We develop a novel model for the success of a startup company based on the first passage time of a Brownian motion. The drift and diffusion of the Brownian motion associated with a startup company are a function of features based its sector, founders, and initial investors. All features are calculated using our massive dataset. Using a Bayesian approach, we are able to obtain quantitative insights about the features of successful startup companies from our model. To test the performance of our model, we use it to build a portfolio of companies where the goal is to maximize the probability of having at least one company achieve an exit (IPO or acquisition), which we refer to as winning. This picking winners framework is very general and can be used to model many problems with low probability, high reward outcomes, such as pharmaceutical companies choosing drugs to develop or studios selecting movies to produce. We frame the construction of a picking winners portfolio as a combinatorial optimization problem and show that a greedy solution has strong performance guarantees. We apply the picking winners framework to the problem of choosing a portfolio of startup companies. Using our model for the exit probabilities, we are able to construct out of sample portfolios which achieve exit rates as high as 60%, which is nearly double that of top venture capital firms.
We consider the problem of identifying coordinated influence campaigns conducted by automated age... more We consider the problem of identifying coordinated influence campaigns conducted by automated agents or bots in a social network. We study several different Twitter datasets which contain such campaigns and find that the bots exhibit heterophily-they interact more with humans than with each other. We use this observation to develop a probability model for the network structure and bot labels based on the Ising model from statistical physics. We present a method to find the maximum likelihood assignment of bot labels by solving a minimum cut problem. Our algorithm allows for the simultaneous detection of multiple bots that are potentially engaging in a coordinated influence campaign, in contrast to other methods that identify bots one at a time. We find that our algorithm is able to more accurately find bots than existing methods when compared to a human labeled ground truth. We also look at the content posted by the bots we identify and find that they seem to have a coordinated agenda.
In events such as natural disasters, terrorist attacks, or war zones, one can gain critical situa... more In events such as natural disasters, terrorist attacks, or war zones, one can gain critical situational awareness by monitoring what people on the ground are saying in social media. But how does one build a set of users in a specific location from scratch? In “Building a Location-Based Set of Social Media Users,” Christopher Marks and Tauhid Zaman present an algorithm to do just this. The algorithm starts with a small set of seed users in the location and then grows this set using an “expand–classify” approach. They apply the algorithm to diverse regions ranging from South America to the Philippines and in a few hours can collect tens of thousands of Twitter users in the target locations. The algorithm is language agnostic, making it especially useful for anyone trying to gain situational awareness in foreign countries.
Online extremists in social networks pose a new form of threat to the general public. These extre... more Online extremists in social networks pose a new form of threat to the general public. These extremists range from cyberbullies who harass innocent users to terrorist organizations such as the Islamic State of Iraq and Syria (ISIS) that use social networks to recruit and incite violence. Currently social networks suspend the accounts of such extremists in response to user complaints. The challenge is that these extremist users simply create new accounts and continue their activities. In this work we present a new set of operational capabilities to deal with the threat posed by online extremists in social networks. Using data from several hundred thousand extremist accounts on Twitter, we develop a behavioral model for these users, in particular what their accounts look like and who they connect with. This model is used to identify new extremist accounts by predicting if they will be suspended for extremist activity. We also use this model to track existing extremist users as they create new accounts by identifying if two accounts belong to the same user. Finally, we present a model for searching the social network to efficiently find suspended users' new accounts based on a variant of the classic Polya's urn setup. We find a simple characterization of the optimal search policy for this model under fairly general conditions. Our urn model and main theoretical results generalize easily to search problems in other fields.
Springer Proceedings in Business and Economics, 2022
In the United States, medical responses by fire departments over the last four decades increased ... more In the United States, medical responses by fire departments over the last four decades increased by 367%. This had made it critical to decision makers in emergency response departments that existing resources are efficiently used. In this paper, we model the ambulance dispatch problem as an average-cost Markov decision process and present a policy iteration approach to find an optimal dispatch policy. We then propose an alternative formulation using post-decision states that is shown to be mathematically equivalent to the original model, but with a much smaller state space. We present a temporal difference learning approach to the dispatch problem based on the post-decision states. In our numerical experiments, we show that our obtained temporal-difference policy outperforms the benchmark myopic policy. Our findings suggest that emergency response departments can improve their performance with minimal to no cost.
Communities in social networks or graphs are sets of well-connected, overlapping vertices. The ef... more Communities in social networks or graphs are sets of well-connected, overlapping vertices. The effectiveness of a community detection algorithm is determined by accuracy in finding the ground-truth communities and ability to scale with the size of the data. In this work, we provide three contributions. First, we show that a popular measure of accuracy known as the F1 score, which is between 0 and 1, with 1 being perfect detection, has an information lower bound is 0.5. We provide a trivial algorithm that produces communities with an F1 score of 0.5 for any graph! Somewhat surprisingly, we find that popular algorithms such as modularity optimization, BigClam and CESNA have F1 scores less than 0.5 for the popular IMDB graph. To rectify this, as the second contribution we propose a generative model for community formation, the sequential community graph, which is motivated by the formation of social networks. Third, motivated by our generative model, we propose the leader-follower algo...
Bots Impact Opinions in Social Networks: Let’s Measure How Much There is a serious threat posed b... more Bots Impact Opinions in Social Networks: Let’s Measure How Much There is a serious threat posed by bots that try to manipulate opinions in social networks. In “Assessing the Impact of Bots on Social Networks,” Nicolas Guenon des Mesnards, David Scott Hunter, Zakaria el Hjouiji, and Tauhid Zaman present a new set of operational capabilities to detect these bots and measure their impact. They developed an algorithm based on the Ising model from statistical physics to find coordinating gangs of bots in social networks. They then created an algorithm based on opinion dynamics models to quantify the impact that bots have on opinions in a social network. They applied their algorithms to a variety of real social network data sets. They found that, for topics such as Brexit, the bots had little impact, whereas for topics such as the U.S. presidential debate and the Gilets Jaunes protests in France, the bots had a significant impact.
We present an integer programming approach to winning daily fantasy sports hockey contests which ... more We present an integer programming approach to winning daily fantasy sports hockey contests which have top heavy payoff structures (i.e. most of the winnings go to the top ranked entries). Our approach incorporates publicly available predictions on player and team performance into a series of integer programming problems that compute optimal lineups to enter into the contests. We find that the lineups produced by our approach perform well in practice and are even able to come in first place in contests with thousands of entries. We also show through simulations how the profit margin varies as a function of the number of lineups entered and the points distribution of the entered lineups. Studies on real contest data show how different versions of our integer programming approach perform in practice. We find two major insights for winning top heavy contests. First, each lineup entered should be constructed in a way to maximize its individual mean, and more importantly, variance. Second...
We consider the problem of finding a specific target individual hiding in a social network. We pr... more We consider the problem of finding a specific target individual hiding in a social network. We propose a method for network vertex search that looks for the target vertex by sequentially examining the neighbors of a set of "known" vertices to which the target vertex may have connected. The objective is to find the target vertex as quickly as possible from amongst the neighbors of the known vertices. We model this type of search as successively drawing marbles from a set of urns, where each urn represents one of the known vertices and the marbles in each urn represent the respective vertex's neighbors. Using a dynamic programming approach, we analyze this model and show that there is always an optimal "block" policy, in which all of the neighbors of a known vertex are examined before moving on to another vertex. Surprisingly, this block policy result holds for arbitrary dependencies in the connection probabilities of the target vertex and known vertices. Furth...
We consider the problem of selecting a portfolio of items of fixed cardinality where the goal is ... more We consider the problem of selecting a portfolio of items of fixed cardinality where the goal is to have at least one item achieve a high return, which we refer to as winning. This framework is very general and can be used to model a variety of problems, such as pharmaceutical companies choosing drugs to develop, studios selecting movies to produce, or our focus in this work, which is venture capital firms picking startup companies in which to invest. We first frame the construction of a portfolio as a combinatorial optimization problem with objective function given by the probability of having at least one item in the selected portfolio win. We show that a greedy solution to this problem has strong performance guarantees, even under arbitrary correlations between the items. We apply the picking winners framework to the problem of choosing a portfolio of startups to invest in. This is a relevant problem due to recent policy changes in the United States which have greatly expanded th...
Choice decisions made by users of online applications can suffer from biases due to the users'... more Choice decisions made by users of online applications can suffer from biases due to the users' level of engagement. For instance, low engagement users may make random choices with no concern for the quality of items offered. This biased choice data can corrupt estimates of user preferences for items. However, one can correct for these biases if additional behavioral data is utilized. To do this we construct a new choice engagement time model which captures the impact of user engagement on choice decisions and response times associated with these choice decisions. Response times are the behavioral data we choose because they are easily measured by online applications and reveal information about user engagement. To test our model we conduct online polls with subject populations that have different levels of engagement and measure their choice decisions and response times. We have two main empirical findings. First, choice decisions and response times are correlated, with strong p...
We consider the problem of selecting a portfolio of entries of fixed cardinality for a winner tak... more We consider the problem of selecting a portfolio of entries of fixed cardinality for a winner take all contest such that the probability of at least one entry winning is maximized. This framework is very general and can be used to model a variety of problems, such as movie studios selecting movies to produce, drug companies choosing drugs to develop, or venture capital firms picking start-up companies in which to invest. We model this as a combinatorial optimization problem with a submodular objective function, which is the probability of winning. We then show that the objective function can be approximated using only pairwise marginal probabilities of the entries winning when there is a certain structure on their joint distribution. We consider a model where the entries are jointly Gaussian random variables and present a closed form approximation to the objective function. We then consider a model where the entries are given by sums of constrained resources and present a greedy int...
We consider the problem of optimizing the placement of stubborn agents in a social network in ord... more We consider the problem of optimizing the placement of stubborn agents in a social network in order to maximally in uence the population. We assume individuals in a directed social network each have a latent opinion that evolves over time in response to social media posts by their neighbors. The individuals randomly communicate noisy versions of their latent opinion to their neighbors, causing them to update their opinions using a time-varying update rule that has them become more stubborn with time and be less a ected by new posts. The noisy communicated opinion and dynamic update rule are novel components of our model and they re ect realistic behaviors observed in many psychological studies. We prove that under fairly general conditions on the stubbornness rates of the individuals, the opinions converge to an equilibrium in the presence of stubborn agents. What is surprising about this result is that the equilibrium condition depends only upon the network structure and the identi...
We consider the problem of selecting a portfolio of entries of fixed cardinality for contests wit... more We consider the problem of selecting a portfolio of entries of fixed cardinality for contests with top-heavy payoff structures, i.e. most of the winnings go to the top-ranked entries. This framework is general and can be used to model a variety of problems, such as movie studios selecting movies to produce, venture capital firms picking start-up companies to invest in, or individuals selecting lineups for daily fantasy sports contests, which is the example we focus on here. We model the portfolio selection task as a combinatorial optimization problem with a submodular objective function, which is given by the probability of at least one entry winning. We then show that this probability can be approximated using only pairwise marginal probabilities of the entries winning when there is a certain structure on their joint distribution. We consider a model where the entries are jointly Gaussian random variables and present a closed form approximation to the objective function. Building o...
Online extremists’ use of social media poses a new form of threat to the general public. These ex... more Online extremists’ use of social media poses a new form of threat to the general public. These extremists range from cyberbullies to terrorist organizations. Social media providers often suspend the extremists’ accounts in response to user complaints. However, extremist users can simply create new accounts and continue their activities. In this work we present a new set of operational capabilities to address the threat posed by online extremists in social networks. We use thousands of Twitter accounts related to the Islamic State in Iraq and Syria (ISIS) to develop behavioral models for these users—in particular, what their accounts look like and with whom they connect. We use these models to track existing extremist users by identifying pairs of accounts belonging to the same user. We then present a model for efficiently searching the social network to find suspended users’ new accounts based on a variant of the classic Pólya’s urn setup. We find a simple characterization of the op...
Food adulteration poses a serious threat to public health. The U.S. Food and Drug Administration ... more Food adulteration poses a serious threat to public health. The U.S. Food and Drug Administration (FDA) has a major role in maintaining food safety in the U.S. through various activities including sampling of imported shipments and site inspections. However, resource constraints significantly limit these regulatory activities, making it essential to allocate resources in a risk-based manner. In this paper, we develop a data-driven, risk analytics approach to identify high-risk consignees (firms that import food). We apply the approach to consignees of shrimp, a product subject to frequent food safety problems. Leveraging supply chain analytics, based on shipment and FDA regulatory information, we construct network features that model risk, specifically predicting which consignees are likely to fail FDA site inspections. Our main findings are that supply chain network complexity, and website network engagement, are predictive of risk. For instance, we measure the diversity of a firm's product portfolio using graph modularity on a product network graph, and find that firms with higher product diversity are more likely to fail site inspections. The results suggest that network-based risk analytics could significantly improve the effectiveness of regulatory activities related to food supply chains, and increase "hit rate" for failed site inspections by 23.6%.
We consider the problem of detecting the source of a rumor which has spread in a network using on... more We consider the problem of detecting the source of a rumor which has spread in a network using only observations about which set of nodes are infected with the rumor and with no information as to when these nodes became infected. In a recent work ( Shah and Zaman 2010 ), this rumor source detection problem was introduced and studied. The authors proposed the graph score function rumor centrality as an estimator for detecting the source. They establish it to be the maximum likelihood estimator with respect to the popular Susceptible Infected (SI) model with exponential spreading times for regular trees. They showed that as the size of the infected graph increases, for a path graph (2-regular tree), the probability of source detection goes to 0 and for d-regular trees with d ≥ 3 the probability of detection, say αd, remains bounded away from 0 and is less than 1/2. However, their results stop short of providing insights for the performance of the rumor centrality estimator in more gen...
Condensation phenomenon is often observed in social networks such as Twitter where one "superstar... more Condensation phenomenon is often observed in social networks such as Twitter where one "superstar" vertex gains a positive fraction of the edges, while the remaining empirical degree distribution still exhibits a power law tail. We formulate a mathematically tractable model for this phenomenon that provides a better fit to empirical data than the standard preferential attachment model across an array of networks observed in Twitter. Using embeddings in an equivalent continuous time version of the process, and adapting techniques from the stable age-distribution theory of branching processes, we prove limit results for the proportion of edges that condense around the superstar, the degree distribution of the remaining vertices, maximal nonsuperstar degree asymptotics and height of these random trees in the large network limit.
Uploads
Papers by Tauhid Zaman