We present a microblog recommendation system that can help monitor users, track conversations, an... more We present a microblog recommendation system that can help monitor users, track conversations, and potentially improve diffusion impact. Given a Twitter network of active users and their followers, and historical activity of tweets, retweets and mentions, we build upon a prediction tool to predict the Top K users who will retweet or mention a focal user, in the future [10]. We develop personalized recommendations for each focal user. We identify characteristics of focal users such as the size of the follower network, or the level of sentiment averaged over all tweets; both have an impact on the quality of personalized recommendations. We use (high) betweenness centrality as a proxy of attractive users to target when making recommendations. Our recommendations successfully identify a greater fraction of users with higher betweenness centrality, in comparison to the overall distribution of betweenness centrality of the ground truth users for some focal user.
We identify a set of features that are related to extremes of price changes of individual equitie... more We identify a set of features that are related to extremes of price changes of individual equities. Our hypothesis is that these extreme features may be used to isolate co-movements of prices for groups of equities, reflecting systematic risk. The equities are classified within industry sectors and we create a three mode tensor to represent the dataset; the dimensions of the three mode tensor correspond to the equity, the industry sector and the day on which the feature occurred. We use a method for non-negative tensor factorization (NOTF) to identify factors or communities that are composed of multiple equities, and / or industry sectors. Our preliminary results indicate that our NOTF approach has the potential to identify such communities of price related features that may experience co-movement across industry sectors and temporal intervals.
Understanding relationships among financial entities can provide insight into the behavior of com... more Understanding relationships among financial entities can provide insight into the behavior of complex financial ecosystems. In this demonstration paper, we consider datasets of financial documents that describe the activity or role played by a financial institution (FI), typically with respect to a financial product or another financial entity. We develop community models based on financial institutions (FI) and their behavior or activity described by their roles (Role). Our models are based on an intuitive assumption that FIs will form communities, and FIs within a community are more likely to collaborate with other FIs in that community, and to play the same role, in other communities. Inspired by the Latent Dirichlet Allocation (LDA) and topic models, we develop several probabilistic financial community models and we use those models to identify interesting financial communities in two datasets.
Drug-induced liver injury (DILI) is an uncommon but important and challenging adverse drug event ... more Drug-induced liver injury (DILI) is an uncommon but important and challenging adverse drug event developed following the use of drugs, both prescription and over-the-counter. Early detection of DILI cases can greatly improve the patient care as discontinuing the offending drugs is essential for the care of DILI cases. An online resource, LiverTox, has been established to provide up-to-date, comprehensive clinical information on DILI in the form of case reports. In this study, we explored the use of the Human Phenotype Ontology (HPO) to annotate case reports with HPO terms and to predict a ranked list of phenotype categories (describing patient outcomes) that is most closely matched to the HPO annotations that are attached to the case report. The prediction performance based on our method was found to be good to excellent for 67% of case reports included in this study, i.e., the phenotype category that was assigned to the report was among the Top 3 predicted phenotype category descriptions. Future directions would be to incorporate other annotations, laboratory findings, and the exploration of other semanticbased methods for case report retrieval and ranking.
Annotation graph datasets are a natural representation of scientific knowledge. They are common i... more Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences and health sciences, where concepts such as genes, proteins or clinical trials are annotated with controlled vocabulary terms from ontologies. We present a tool, PAnG (Patterns in Annotation Graphs), that is based on a complementary methodology of graph summarization and dense subgraphs. The elements of a graph summary correspond to a pattern and its visualization can provide an explanation of the underlying knowledge. Scientists can use PAnG to develop hypotheses and for exploration.
We consider an architecture of mediators and wrappers for Internet accessible WebSources of limit... more We consider an architecture of mediators and wrappers for Internet accessible WebSources of limited query capability. Each call to a source is a WebSource Implementation (WSI) and it is associated with both a capability and (a possibly dynamic) cost. The multiplicity of WSIs with varying costs and capabilities increases the complexity of a traditional optimizer that must assign WSIs for each remote relation in the query while generating an (optimal) plan. We present a twophase Web Query Optimizer (WQO). In a pre-optimization phase, the WQO selects one or more WSIs for a pre-plan; a pre-plan represents a space of query evaluation plans (plans) based on this choice of WSIs. The WQO uses cost-based heuristics to evaluate the choice of WSI assignment in the pre-plan and to choose a good pre-plan. The WQO uses the pre-plan to drive the extended relational optimizer to obtain the best plan for a pre-plan. A prototype of the WQO has been developed. We compare the effectiveness of the WQO, i.e., its ability to efficiently search a large space of plans and obtain a low cost plan, in comparison to a traditional optimizer. We also validate the cost-based heuristics by experimental evaluation of queries in the noisy Internet environment.
This report presents the goals and outcomes of the 2019 Financial Entity Identification and Infor... more This report presents the goals and outcomes of the 2019 Financial Entity Identification and Information Integration (FEIII) Challenge. We describe two challenge datasets and tasks. FEIII SHIP was a bill of lading dataset for incoming shipments to the United States and the task was to identify the major shippers for some product, from some country. FEIII CALI included state and local regulatory data from California, and the task was entity linkage for San Francisco restaurants. The report summarizes plans for the 2020 Challenge and the Business Open Knowledge Network (BOKN).
This report presents the motivation and challenge tasks for Year One of the Financial Entity Iden... more This report presents the motivation and challenge tasks for Year One of the Financial Entity Identification and Information Integration (FEIII) Challenge. It summarizes the process and outcomes as well as lessons learned and future plans.
This report presents the goals and outcomes of the 2017 Financial Entity Identification and Infor... more This report presents the goals and outcomes of the 2017 Financial Entity Identification and Information Integration (FEIII) Challenge. We describe the dataset and challenge task and the protocol to create labeled data. The report summarizes the process, outcomes and plans for the 2018 Challenge.
Proceedings of the International AAAI Conference on Web and Social Media, Aug 3, 2021
The phenomenal growth in both scale and importance of social media such as blogs, micro-blogs and... more The phenomenal growth in both scale and importance of social media such as blogs, micro-blogs and user-generated content, has created a need for tools that monitor information diffusion and make recommendations within these platforms. An essential element of social media, particularly blogs, is the hyperlink graph that connects various pieces of content. There are two types of links within the blogosphere; one from blog post to blog post, and another from blog post to blog channel (an event stream of blog posts). These links can be viewed as a proxy for the flow of information between blog channels and to reflect influence. Given this assumption about links, the ability to predict future links can facilitate the monitoring of information diffusion, making recommendations, and word-of-mouth (WOM) marketing. We propose different methods for link predictions and we evaluate these methods on an extensive blog dataset.
Many classes of drugs, their interaction pathways and gene targets are known to play a role in dr... more Many classes of drugs, their interaction pathways and gene targets are known to play a role in drug induced liver injury (DILI). Pharmacogenomics research to understand the impact of genetic variation on how patients respond to drugs may help explain some of the variability observed in the occurrence of adverse drug reactions (ADR) such as DILI. The goal of this project is to combine rich genotype and phenotype data to better understand these scenarios. We consider similarities between drugs, similarities between drug targets, drug-pathway-gene interactions, etc. Links to the patients will include patient drug usage, ADR, disease outcomes, etc. We will develop appropriate protocols to create these rich datasets and methods to identify patterns in graphs for explanation and prediction.
Understanding the behavior of complex financial supply chains is usually difficult due to a lack ... more Understanding the behavior of complex financial supply chains is usually difficult due to a lack of data capturing the interactions between financial institutions (FIs) and the roles that they play in financial contracts (FCs). resMBS is an example supply chain corresponding to the US residential mortgage backed securities that were critical in the 2008 US financial crisis. In this paper, we describe the process of creating the resMBS graph dataset from financial prospectus. We use the SystemT rule-based text extraction platform to develop two tools, ORG NER and Dict NER, for named entity recognition of financial institution (FI) names. The resMBS graph comprises a set of FC nodes (each prospectus) and the corresponding FI nodes that are extracted from the prospectus. A Role-FI extractor matches a role keyword such as originator, sponsor or servicer, with FI names. We study the performance of the Role-FI extractor, and ORG NER and Dict NER, in constructing the resMBS dataset. We also present preliminary results of a clustering based analysis to identify financial communities and their evolution in the resMBS financial supply chain.
We propose an algorithm for the non-negative factorization of an occurrence tensor built from het... more We propose an algorithm for the non-negative factorization of an occurrence tensor built from heterogeneous networks. We use 0 norm to model sparse errors over discrete values (occurrences), and use decomposed factors to model the embedded groups of nodes. An efficient splitting method is developed to optimize the nonconvex and nonsmooth objective. We study both synthetic problems and a new dataset built from financial documents, resMBS.
There is a wealth of information about financial systems that is embedded in document collections... more There is a wealth of information about financial systems that is embedded in document collections. In this paper, we focus on a specialized text extraction task for this domain. The objective is to extract mentions of names of financial institutions, or FI names, from financial prospectus documents, and to identify the corresponding real world entities, e.g., by matching against a corpus of such entities. The tasks are Named Entity Recognition (NER) and Entity Resolution (ER); both are well studied in the literature. Our contribution is to develop a rule-based approach that will exploit lists of FI names for both tasks; our solution is labeled Dict-based NER and Rank-based ER. Since the FI names are typically represented by a root, and a suffix that modifies the root, we use these lists of FI names to create specialized root and suffix dictionaries. To evaluate the effectiveness of our specialized solution for extracting FI names, we compare Dict-based NER with a general purpose rule-based NER solution, ORG NER. Our evaluation highlights the benefits and limitations of specialized versus general purpose approaches, and presents additional suggestions for tuning and customization for FI name extraction. To our knowledge, our proposed solutions, Dict-based NER and Rank-based ER, and the root and suffix dictionaries, are the first attempt to exploit specialized knowledge, i.e., lists of FI names, for rule-based NER and ER.
The exponential growth of machine-readable data to record and communicate activities throughout t... more The exponential growth of machine-readable data to record and communicate activities throughout the financial system has significant implications for macroprudential monitoring. The central challenge is the scalability of institutions and processes in the face of the variety, volume, and rate of the “big data” deluge. This deluge also provides opportunities in the form of new, rapidly available, valuable streams of information with finer levels of detail and granularity. A difference in scale can become a difference in kind, as legacy processes are overwhelmed and innovative responses emerge.Despite the importance and ubiquity of data in financial markets, processes to manage this crucial resource must adapt. This need applies especially to financial stability or macroprudential analysis, where information must be assembled, cleaned, and integrated from regulators around the world to build a coherent view of the financial system to support policy decisions. We consider the key challenges for systemic risk supervision from the expanding volume and diversity of financial data. The discussion is organised around five broad supervisory tasks in the typical life cycle of supervisory data.
Page 1. Using Quality of Data Metadata for Source Selection and Ranking George A. Mihaila Departm... more Page 1. Using Quality of Data Metadata for Source Selection and Ranking George A. Mihaila Department of Computer Science University of Toronto [email protected]. edu Louiqa Raschid Smith School of Business and UMIACS ...
Social media including blogs and microblogs provide a rich window into user online activity. Moni... more Social media including blogs and microblogs provide a rich window into user online activity. Monitoring social media datasets can be expensive due to the scale and inherent noise in such data streams. Monitoring and prediction can provide significant benefit for many applications including brand monitoring and making recommendations. Consider a focal topic and posts on multiple blog channels on this topic. Being able to target a few potentially influential blog channels which will contain relevant posts is valuable. Once these channels have been identified, a user can proactively join the conversation themselves to encourage positive word-ofmouth and to mitigate negative word-of-mouth. Links between different blog channels, and retweets and mentions between different microblog users, are a proxy of information flow and influence. When trying to monitor where information will flow and who will be influenced by a focal user, it is valuable to predict future links, retweets and mentions. Predictions of users who will post on a focal topic or who will be influenced by a focal user can yield valuable recommendations.
Proceedings of the Second International Workshop on Data Science for Macro-Modeling
There is a growing interest in modeling and predicting the behavior of financial systems and supp... more There is a growing interest in modeling and predicting the behavior of financial systems and supply chains. In this paper, we focus on the the analysis of the resMBS supply chain; it is associated with the US residential mortgage backed securities and subprime mortgages that were critical in the 2008 US financial crisis. We develop models based on financial institutions (FI), and their participation described by their roles (Role) on financial contracts (FC). Our models are based on an intuitive assumption that FIs will form communities within an FC, and FIs within a community are more likely to collaborate with other FIs in that community, and play the same role, in another FC. Inspired by the Latent Dirichlet Allocation (LDA) and topic models, we develop two probabilistic financial community models. In FI-Comm, each FC (document) is a mix of topics where a topic is a distribution over FIs (words). In Role-FI-Comm, each topic is a distribution over Role-FI pairs (words). Experimental results over 5000+ financial prospecti demonstrate the effectiveness of our models.
Proceedings of the Second International Workshop on Data Science for Macro-Modeling, 2016
This is a demonstration paper of the Karsha Explorer to visualize price changes in the S&P 500 In... more This is a demonstration paper of the Karsha Explorer to visualize price changes in the S&P 500 Index dataset.
This demonstration paper will illustrate the use of two tools, Ferret and semEP, to mine the sent... more This demonstration paper will illustrate the use of two tools, Ferret and semEP, to mine the sentence evidence and the annotation evidence, respectively, for cross genome gene function discovery.
We present a microblog recommendation system that can help monitor users, track conversations, an... more We present a microblog recommendation system that can help monitor users, track conversations, and potentially improve diffusion impact. Given a Twitter network of active users and their followers, and historical activity of tweets, retweets and mentions, we build upon a prediction tool to predict the Top K users who will retweet or mention a focal user, in the future [10]. We develop personalized recommendations for each focal user. We identify characteristics of focal users such as the size of the follower network, or the level of sentiment averaged over all tweets; both have an impact on the quality of personalized recommendations. We use (high) betweenness centrality as a proxy of attractive users to target when making recommendations. Our recommendations successfully identify a greater fraction of users with higher betweenness centrality, in comparison to the overall distribution of betweenness centrality of the ground truth users for some focal user.
We identify a set of features that are related to extremes of price changes of individual equitie... more We identify a set of features that are related to extremes of price changes of individual equities. Our hypothesis is that these extreme features may be used to isolate co-movements of prices for groups of equities, reflecting systematic risk. The equities are classified within industry sectors and we create a three mode tensor to represent the dataset; the dimensions of the three mode tensor correspond to the equity, the industry sector and the day on which the feature occurred. We use a method for non-negative tensor factorization (NOTF) to identify factors or communities that are composed of multiple equities, and / or industry sectors. Our preliminary results indicate that our NOTF approach has the potential to identify such communities of price related features that may experience co-movement across industry sectors and temporal intervals.
Understanding relationships among financial entities can provide insight into the behavior of com... more Understanding relationships among financial entities can provide insight into the behavior of complex financial ecosystems. In this demonstration paper, we consider datasets of financial documents that describe the activity or role played by a financial institution (FI), typically with respect to a financial product or another financial entity. We develop community models based on financial institutions (FI) and their behavior or activity described by their roles (Role). Our models are based on an intuitive assumption that FIs will form communities, and FIs within a community are more likely to collaborate with other FIs in that community, and to play the same role, in other communities. Inspired by the Latent Dirichlet Allocation (LDA) and topic models, we develop several probabilistic financial community models and we use those models to identify interesting financial communities in two datasets.
Drug-induced liver injury (DILI) is an uncommon but important and challenging adverse drug event ... more Drug-induced liver injury (DILI) is an uncommon but important and challenging adverse drug event developed following the use of drugs, both prescription and over-the-counter. Early detection of DILI cases can greatly improve the patient care as discontinuing the offending drugs is essential for the care of DILI cases. An online resource, LiverTox, has been established to provide up-to-date, comprehensive clinical information on DILI in the form of case reports. In this study, we explored the use of the Human Phenotype Ontology (HPO) to annotate case reports with HPO terms and to predict a ranked list of phenotype categories (describing patient outcomes) that is most closely matched to the HPO annotations that are attached to the case report. The prediction performance based on our method was found to be good to excellent for 67% of case reports included in this study, i.e., the phenotype category that was assigned to the report was among the Top 3 predicted phenotype category descriptions. Future directions would be to incorporate other annotations, laboratory findings, and the exploration of other semanticbased methods for case report retrieval and ranking.
Annotation graph datasets are a natural representation of scientific knowledge. They are common i... more Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences and health sciences, where concepts such as genes, proteins or clinical trials are annotated with controlled vocabulary terms from ontologies. We present a tool, PAnG (Patterns in Annotation Graphs), that is based on a complementary methodology of graph summarization and dense subgraphs. The elements of a graph summary correspond to a pattern and its visualization can provide an explanation of the underlying knowledge. Scientists can use PAnG to develop hypotheses and for exploration.
We consider an architecture of mediators and wrappers for Internet accessible WebSources of limit... more We consider an architecture of mediators and wrappers for Internet accessible WebSources of limited query capability. Each call to a source is a WebSource Implementation (WSI) and it is associated with both a capability and (a possibly dynamic) cost. The multiplicity of WSIs with varying costs and capabilities increases the complexity of a traditional optimizer that must assign WSIs for each remote relation in the query while generating an (optimal) plan. We present a twophase Web Query Optimizer (WQO). In a pre-optimization phase, the WQO selects one or more WSIs for a pre-plan; a pre-plan represents a space of query evaluation plans (plans) based on this choice of WSIs. The WQO uses cost-based heuristics to evaluate the choice of WSI assignment in the pre-plan and to choose a good pre-plan. The WQO uses the pre-plan to drive the extended relational optimizer to obtain the best plan for a pre-plan. A prototype of the WQO has been developed. We compare the effectiveness of the WQO, i.e., its ability to efficiently search a large space of plans and obtain a low cost plan, in comparison to a traditional optimizer. We also validate the cost-based heuristics by experimental evaluation of queries in the noisy Internet environment.
This report presents the goals and outcomes of the 2019 Financial Entity Identification and Infor... more This report presents the goals and outcomes of the 2019 Financial Entity Identification and Information Integration (FEIII) Challenge. We describe two challenge datasets and tasks. FEIII SHIP was a bill of lading dataset for incoming shipments to the United States and the task was to identify the major shippers for some product, from some country. FEIII CALI included state and local regulatory data from California, and the task was entity linkage for San Francisco restaurants. The report summarizes plans for the 2020 Challenge and the Business Open Knowledge Network (BOKN).
This report presents the motivation and challenge tasks for Year One of the Financial Entity Iden... more This report presents the motivation and challenge tasks for Year One of the Financial Entity Identification and Information Integration (FEIII) Challenge. It summarizes the process and outcomes as well as lessons learned and future plans.
This report presents the goals and outcomes of the 2017 Financial Entity Identification and Infor... more This report presents the goals and outcomes of the 2017 Financial Entity Identification and Information Integration (FEIII) Challenge. We describe the dataset and challenge task and the protocol to create labeled data. The report summarizes the process, outcomes and plans for the 2018 Challenge.
Proceedings of the International AAAI Conference on Web and Social Media, Aug 3, 2021
The phenomenal growth in both scale and importance of social media such as blogs, micro-blogs and... more The phenomenal growth in both scale and importance of social media such as blogs, micro-blogs and user-generated content, has created a need for tools that monitor information diffusion and make recommendations within these platforms. An essential element of social media, particularly blogs, is the hyperlink graph that connects various pieces of content. There are two types of links within the blogosphere; one from blog post to blog post, and another from blog post to blog channel (an event stream of blog posts). These links can be viewed as a proxy for the flow of information between blog channels and to reflect influence. Given this assumption about links, the ability to predict future links can facilitate the monitoring of information diffusion, making recommendations, and word-of-mouth (WOM) marketing. We propose different methods for link predictions and we evaluate these methods on an extensive blog dataset.
Many classes of drugs, their interaction pathways and gene targets are known to play a role in dr... more Many classes of drugs, their interaction pathways and gene targets are known to play a role in drug induced liver injury (DILI). Pharmacogenomics research to understand the impact of genetic variation on how patients respond to drugs may help explain some of the variability observed in the occurrence of adverse drug reactions (ADR) such as DILI. The goal of this project is to combine rich genotype and phenotype data to better understand these scenarios. We consider similarities between drugs, similarities between drug targets, drug-pathway-gene interactions, etc. Links to the patients will include patient drug usage, ADR, disease outcomes, etc. We will develop appropriate protocols to create these rich datasets and methods to identify patterns in graphs for explanation and prediction.
Understanding the behavior of complex financial supply chains is usually difficult due to a lack ... more Understanding the behavior of complex financial supply chains is usually difficult due to a lack of data capturing the interactions between financial institutions (FIs) and the roles that they play in financial contracts (FCs). resMBS is an example supply chain corresponding to the US residential mortgage backed securities that were critical in the 2008 US financial crisis. In this paper, we describe the process of creating the resMBS graph dataset from financial prospectus. We use the SystemT rule-based text extraction platform to develop two tools, ORG NER and Dict NER, for named entity recognition of financial institution (FI) names. The resMBS graph comprises a set of FC nodes (each prospectus) and the corresponding FI nodes that are extracted from the prospectus. A Role-FI extractor matches a role keyword such as originator, sponsor or servicer, with FI names. We study the performance of the Role-FI extractor, and ORG NER and Dict NER, in constructing the resMBS dataset. We also present preliminary results of a clustering based analysis to identify financial communities and their evolution in the resMBS financial supply chain.
We propose an algorithm for the non-negative factorization of an occurrence tensor built from het... more We propose an algorithm for the non-negative factorization of an occurrence tensor built from heterogeneous networks. We use 0 norm to model sparse errors over discrete values (occurrences), and use decomposed factors to model the embedded groups of nodes. An efficient splitting method is developed to optimize the nonconvex and nonsmooth objective. We study both synthetic problems and a new dataset built from financial documents, resMBS.
There is a wealth of information about financial systems that is embedded in document collections... more There is a wealth of information about financial systems that is embedded in document collections. In this paper, we focus on a specialized text extraction task for this domain. The objective is to extract mentions of names of financial institutions, or FI names, from financial prospectus documents, and to identify the corresponding real world entities, e.g., by matching against a corpus of such entities. The tasks are Named Entity Recognition (NER) and Entity Resolution (ER); both are well studied in the literature. Our contribution is to develop a rule-based approach that will exploit lists of FI names for both tasks; our solution is labeled Dict-based NER and Rank-based ER. Since the FI names are typically represented by a root, and a suffix that modifies the root, we use these lists of FI names to create specialized root and suffix dictionaries. To evaluate the effectiveness of our specialized solution for extracting FI names, we compare Dict-based NER with a general purpose rule-based NER solution, ORG NER. Our evaluation highlights the benefits and limitations of specialized versus general purpose approaches, and presents additional suggestions for tuning and customization for FI name extraction. To our knowledge, our proposed solutions, Dict-based NER and Rank-based ER, and the root and suffix dictionaries, are the first attempt to exploit specialized knowledge, i.e., lists of FI names, for rule-based NER and ER.
The exponential growth of machine-readable data to record and communicate activities throughout t... more The exponential growth of machine-readable data to record and communicate activities throughout the financial system has significant implications for macroprudential monitoring. The central challenge is the scalability of institutions and processes in the face of the variety, volume, and rate of the “big data” deluge. This deluge also provides opportunities in the form of new, rapidly available, valuable streams of information with finer levels of detail and granularity. A difference in scale can become a difference in kind, as legacy processes are overwhelmed and innovative responses emerge.Despite the importance and ubiquity of data in financial markets, processes to manage this crucial resource must adapt. This need applies especially to financial stability or macroprudential analysis, where information must be assembled, cleaned, and integrated from regulators around the world to build a coherent view of the financial system to support policy decisions. We consider the key challenges for systemic risk supervision from the expanding volume and diversity of financial data. The discussion is organised around five broad supervisory tasks in the typical life cycle of supervisory data.
Page 1. Using Quality of Data Metadata for Source Selection and Ranking George A. Mihaila Departm... more Page 1. Using Quality of Data Metadata for Source Selection and Ranking George A. Mihaila Department of Computer Science University of Toronto [email protected]. edu Louiqa Raschid Smith School of Business and UMIACS ...
Social media including blogs and microblogs provide a rich window into user online activity. Moni... more Social media including blogs and microblogs provide a rich window into user online activity. Monitoring social media datasets can be expensive due to the scale and inherent noise in such data streams. Monitoring and prediction can provide significant benefit for many applications including brand monitoring and making recommendations. Consider a focal topic and posts on multiple blog channels on this topic. Being able to target a few potentially influential blog channels which will contain relevant posts is valuable. Once these channels have been identified, a user can proactively join the conversation themselves to encourage positive word-ofmouth and to mitigate negative word-of-mouth. Links between different blog channels, and retweets and mentions between different microblog users, are a proxy of information flow and influence. When trying to monitor where information will flow and who will be influenced by a focal user, it is valuable to predict future links, retweets and mentions. Predictions of users who will post on a focal topic or who will be influenced by a focal user can yield valuable recommendations.
Proceedings of the Second International Workshop on Data Science for Macro-Modeling
There is a growing interest in modeling and predicting the behavior of financial systems and supp... more There is a growing interest in modeling and predicting the behavior of financial systems and supply chains. In this paper, we focus on the the analysis of the resMBS supply chain; it is associated with the US residential mortgage backed securities and subprime mortgages that were critical in the 2008 US financial crisis. We develop models based on financial institutions (FI), and their participation described by their roles (Role) on financial contracts (FC). Our models are based on an intuitive assumption that FIs will form communities within an FC, and FIs within a community are more likely to collaborate with other FIs in that community, and play the same role, in another FC. Inspired by the Latent Dirichlet Allocation (LDA) and topic models, we develop two probabilistic financial community models. In FI-Comm, each FC (document) is a mix of topics where a topic is a distribution over FIs (words). In Role-FI-Comm, each topic is a distribution over Role-FI pairs (words). Experimental results over 5000+ financial prospecti demonstrate the effectiveness of our models.
Proceedings of the Second International Workshop on Data Science for Macro-Modeling, 2016
This is a demonstration paper of the Karsha Explorer to visualize price changes in the S&P 500 In... more This is a demonstration paper of the Karsha Explorer to visualize price changes in the S&P 500 Index dataset.
This demonstration paper will illustrate the use of two tools, Ferret and semEP, to mine the sent... more This demonstration paper will illustrate the use of two tools, Ferret and semEP, to mine the sentence evidence and the annotation evidence, respectively, for cross genome gene function discovery.
Uploads
Papers by Louiqa Raschid