Papers by Alireza Javadian Sabet
arXiv (Cornell University), Apr 19, 2024
Higher education plays a critical role in driving an innovative economy by equipping students wit... more Higher education plays a critical role in driving an innovative economy by equipping students with knowledge and skills demanded by the workforce. While researchers and practitioners have developed data systems to track detailed occupational skills, such as those established by the U.S. Department of Labor (DOL), much less effort has been made to document skill development in higher education at a similar granularity. Here, we fill this gap by presenting a longitudinal dataset of skills inferred from over three million course syllabi taught at nearly three thousand U.S. higher education institutions. To construct this dataset, we apply natural language processing to extract from course descriptions detailed workplace activities (DWAs) used by the DOL to describe occupations. We then aggregate these DWAs to create skill profiles for institutions and academic majors. Our dataset offers a largescale representation of college-educated workers and their role in the economy. To showcase the utility of this dataset, we use it to 1) compare the similarity of skills taught and skills in the workforce according to the US Bureau of Labor Statistics, 2) estimate gender differences in acquired skills based on enrollment data, 3) depict temporal trends in the skills taught in social science curricula, and 4) connect college majors' skill distinctiveness to salary differences of graduates. Overall, this dataset can enable new research on the source of skills in the context of workforce development and provide actionable insights for shaping the future of higher education to meet evolving labor demands especially in the face of new technologies.
arXiv (Cornell University), Apr 19, 2024
arXiv, 2024
Higher education plays a critical role in driving an innovative economy by equipping students wit... more Higher education plays a critical role in driving an innovative economy by equipping students with knowledge and skills demanded by the workforce.
While researchers and practitioners have developed data systems to track detailed occupational skills, such as those established by the U.S. Department of Labor (DOL), much less effort has been made to document skill development in higher education at a similar granularity.
Here, we fill this gap by presenting a longitudinal dataset of skills inferred from over three million course syllabi taught at nearly three thousand U.S. higher education institutions.
To construct this dataset, we apply natural language processing to extract from course descriptions detailed workplace activities (DWAs) used by the DOL to describe occupations.
We then aggregate these DWAs to create skill profiles for institutions and academic majors.
Our dataset offers a large-scale representation of college-educated workers and their role in the economy.
To showcase the utility of this dataset, we use it to 1) compare the similarity of skills taught and skills in the workforce according to the US Bureau of Labor Statistics, 2) estimate gender differences in acquired skills based on enrollment data, 3) depict temporal trends in the skills taught in social science curricula, and 4) connect college majors' skill distinctiveness to salary differences of graduates.
Overall, this dataset can enable new research on the source of skills in the context of workforce development and provide actionable insights for shaping the future of higher education to meet evolving labor demands especially in the face of new technologies.
Zenodo (CERN European Organization for Nuclear Research), Mar 6, 2023
Online Social Networks and Media, Jul 1, 2021
Abstract In the last few years, thanks to the emergence of Web 2.0, social media has made the con... more Abstract In the last few years, thanks to the emergence of Web 2.0, social media has made the concept of online live events possible. Users participate more and more in long-running recurring events in social media by sharing their experiences and desires. In the last few years, thanks to the emergence of Web 2.0, social media has made the concept of online live events possible. Users participate more and more in long-running recurring events in social media by sharing their experiences and desires. This work introduces long-running live events (LRLEs), as a type of activity that span physical spaces and digital ecosystems, including social media. LRLEs encompass several individuals, organizations, and brands collaborating/competing in the same event. This provides unprecedented opportunities to understand the dynamics and behavior of event-oriented participation, through collection and analysis of data of user behaviors enabled by the Web platform, where most of the digital traces are left by users. What makes this setting interesting is that the behaviors that are traced are not focused only on one individual brand or organization, and thus allows one to understand and compare the respective roles and influence in a defined setting. In this paper we provide a high-level and multi-perspective roadmap to mine, model, and study LRLEs. Among the various aspects, we develop a multi-modal approach to solve the problem of post popularity prediction that exploits potentially influential factors within LRLE. We employ two methods for implementing feature selection, together with an automated grid search for optimizing hyper-parameters in various regression methods.
We present Grapevine, a user-controlled recommender that enables undergraduate and graduate stude... more We present Grapevine, a user-controlled recommender that enables undergraduate and graduate students to find a suitable research advisor. This system combines the ideas from the areas of exploratory search, user modeling, and recommender systems by employing state-of-the-art knowledge extraction, grape-based recommendation, and an intelligent user interface. In this paper, we demonstrate the system's key components and how they work as a whole.
Conference on Recommender Systems, 2021
Using carousels to present recommendation results has been widely adapted for consumer-focused ap... more Using carousels to present recommendation results has been widely adapted for consumer-focused applications such as recommending movies and music. Carousel-based interfaces engage users in the recommendation process, leaving it to the user to decide which category of items is most relevant to them, yet leaving it to AI to produce a ranking of both items and carousels. This paper explores the idea to give possible uses of a carousel interface for two dimensions of user control, engaging then into both item ranking and carousel selection. We present an implementation of this idea for a recommender system that assists college students in finding relevant courses and advisors.
Usage frequency of the hashtags in Big Four's Fall/Winter 2018 fashion week. The x-axis lists the... more Usage frequency of the hashtags in Big Four's Fall/Winter 2018 fashion week. The x-axis lists the usage ranks of the hashtags, while y-axis reports the logarithm of the frequency.. 7.2 Top 15 most frequent hashtags in Big Four's Fall/Winter 2018 fashion week. The x-axis lists the hashtags ordered by their percentage of usage, while the y-axis reports the percentage of the posts contain those hashtags.. .. .. .. .. .. .. .. 7.3 Word cloud representation for the of most frequent used hashtags in Big Four's Fall/Winter 2018 fashion week.. .. .. .. 7.4 Instagram users' responses to Big Four's Fall/Winter 2018 fashion week for the entire experiment period. The granularity is 1 hour. The related signal to each city is defined by different color and the colored boxes in background, specify the official calendar for each sub-events.. .. .. .. .. .. .. 7.5 Venn diagram representing the portion of Big Four's Fall/Winter 2018 fashion week posts contain hashtags of the different combination of cities. .
Personalization of user experience through recommendations involves understanding their preferenc... more Personalization of user experience through recommendations involves understanding their preferences and the context they are living in. In this work, we present a method to rank travel offers returned in response to a travel request made by a user. To give a sensible answer, we learn users' preferences over time and use them to understand travelers' needs. Our solution is based on a data-mining-based recommender system. We first design a database of historical traveler data and populate it with data generated according to rules mimicking the features of actual user profiles. These rules are then used as ground truth to validate the accuracy of the proposed learning algorithm. After performing data pre-processing, a knowledge base is set up by mining association rules from the database, which will then be used along with the travel request to assign a score to each of the potential travel offers, thus ranking them. To test the proposed methodology, we generate synthesized data according to some distributions. The results of the experiments approve the effectiveness of the proposed ranking mechanisms. Finally, we demonstrate the presentation of the ranked offers to the user via some mock-ups of the intended application.
The rapid penetration of social media has been redefining every facet of the old marketing and cu... more The rapid penetration of social media has been redefining every facet of the old marketing and customer engagement tactics, not only for the low-end and mass products but also for luxury brands. In this context, brands are dealing with the challenge of keeping the balance between using mass marketing strategies concurrent with accentuating the exclusivity of their offerings. Social media can be considered a boon if brands employ them to reach the right audience and use the right platform by incorporating the right content. In this work, we propose a sector-specific, integrated, and holistic investigation of the social media strategies of luxury brands, together with the impact they generate in terms of the engagement level of the users as an indicator of their success. We provide empirical validation of the method in the sector of luxury fashion brands in the Italian market, providing qualitative and quantitative analysis of the content shared on social media, considering the type, ...
The growing societal dependence on social media and user generated content for news and informati... more The growing societal dependence on social media and user generated content for news and information has increased the influence of unreliable sources and fake content, which muddles public discourse and lessens trust in the media. Validating the credibility of such information is a difficult task that is susceptible to confirmation bias, leading to the development of algorithmic techniques to distinguish between fake and real news. However, most existing methods are challenging to interpret, making it difficult to establish trust in predictions, and make assumptions that are unrealistic in many real-world scenarios, e.g., the availability of audiovisual features or provenance. In this work, we focus on fake news detection of textual content using interpretable features and methods. In particular, we have developed a deep probabilistic model that integrates a dense representation of textual news using a variational autoencoder and bi-directional Long Short-Term Memory (LSTM) networks with semantic topic-related features inferred from a Bayesian admixture model. Extensive experimental studies with 3 real-world datasets demonstrate that our model achieves comparable performance to state-of-theart competing models while facilitating model interpretability from the learned topics. Finally, we have conducted model ablation studies to justify the effectiveness and accuracy of integrating neural embeddings and topic features both quantitatively by evaluating performance and qualitatively through separability in lower dimensional embeddings. ⋆ MH and DA were funded by DA's University of Connecticut start-up research funds.
Big Data and Cognitive Computing
Social media platforms offer their audience the possibility to reply to posts through comments an... more Social media platforms offer their audience the possibility to reply to posts through comments and reactions. This allows social media users to express their ideas and opinions on shared content, thus opening virtual discussions. Most studies on social networks have focused only on user relationships or on the shared content, while ignoring the valuable information hidden in the digital conversations, in terms of structure of the discussion and relation between contents, which is essential for understanding online communication behavior. This work proposes a graph-based framework to assess the shape and structure of online conversations. The analysis was composed of two main stages: intent analysis and network generation. Users’ intention was detected using keyword-based classification, followed by the implementation of machine learning-based classification algorithms for uncategorized comments. Afterwards, human-in-the-loop was involved in improving the keyword-based classification...
Big Data and Cognitive Computing
One of the travelers’ main challenges is that they have to spend a great effort to find and choos... more One of the travelers’ main challenges is that they have to spend a great effort to find and choose the most desired travel offer(s) among a vast list of non-categorized and non-personalized items. Recommendation systems provide an effective way to solve the problem of information overload. In this work, we design and implement “The Hybrid Offer Ranker” (THOR), a hybrid, personalized recommender system for the transportation domain. THOR assigns every traveler a unique contextual preference model built using solely their personal data, which makes the model sensitive to the user’s choices. This model is used to rank travel offers presented to each user according to their personal preferences. We reduce the recommendation problem to one of binary classification that predicts the probability with which the traveler will buy each available travel offer. Travel offers are ranked according to the computed probabilities, hence to the user’s personal preference model. Moreover, to tackle th...
Fourth Workshop on Intelligent Textbooks (iTextbooks' 2022), 2022
One of the main directions of increasing the educational value of a digital textbook is its enric... more One of the main directions of increasing the educational value of a digital textbook is its enrichment with interactive content. Such content can come from outside the textbooks - from multiple existing repositories of educational resources. However, finding the right place for such external resources is not always a trivial task. There exist multiple sources of potential problems: from mismatching metadata to mutually contradicting prerequisite-outcome structures of underlying resources, from differences in granularity and coverage to ontological conflicts. In this paper, we make an attempt to categorize these problems and give examples from our recent experiment on automated assignment of smart interactive learning content to the chapters of an intelligent textbook in a programming domain.
Big Data and Cognitive Computing, 2022
One of travelers’ main challenges is that they have to spend a great effort to find and choose th... more One of travelers’ main challenges is that they have to spend a great effort to find and choose the most desired travel offer(s) among a vast list of non-categorized and non-personalized items.
Recommendation systems provide an effective way to solve the problem of information overload. In this work, we design and implement “The Hybrid Offer Ranker” (THOR), a hybrid, personalized recommender system for the transportation domain. THOR assigns every traveler a unique contextual preference model built using solely their personal data, which makes the model sensitive to the user’s choices. This model is used to rank travel offers presented to each user according to their personal preferences. We reduce the recommendation problem to one of binary classification that predicts the probability with which the traveler will buy each available travel offer. Travel offers are ranked according to the computed probabilities, hence to the user’s personal preference model.
Moreover, to tackle the cold start problem for new users, we apply clustering algorithms to identify groups of travelers with similar profiles, and build a preference model for each group. To test the system’s performance, we generate a dataset according to some carefully designed rules.
The results of the experiments show that the THOR tool is capable of learning the contextual preferences of each traveler and ranks offers starting from those that have the higher probability of being selected
Big Data and Cognitive Computing, 2022
One of the travelers’ main challenges is that they have to spend a great effort to find and choos... more One of the travelers’ main challenges is that they have to spend a great effort to find and choose the most desired travel offer(s) among a vast list of non-categorized and non-personalized items. Recommendation systems provide an effective way to solve the problem of information overload. In this work, we design and implement “The Hybrid Offer Ranker” (THOR), a hybrid, personalized recommender system for the transportation domain. THOR assigns every traveler a unique contextual preference model built using solely their personal data, which makes the model sensitive to the user’s choices. This model is used to rank travel offers presented to each user according to their personal preferences. We reduce the recommendation problem to one of binary classification that predicts the probability with which the traveler will buy each available travel offer. Travel offers are ranked according to the computed probabilities, hence to the user’s personal preference model. Moreover, to tackle the cold start problem for new users, we apply clustering algorithms to identify groups of travelers with similar profiles and build a preference model for each group. To test the system’s performance, we generate a dataset according to some carefully designed rules. The results of the experiments show that the THOR tool is capable of learning the contextual preferences of each traveler and ranks offers starting from those that have the higher probability of being selected.
Big Data and Cognitive Computing, 2022
Social media platforms offer their audience the possibility to reply to posts through comments an... more Social media platforms offer their audience the possibility to reply to posts through comments and reactions. This allows social media users to express their ideas and opinions on shared content, thus opening virtual discussions. Most studies on social networks have focused only on user relationships or on the shared content, while ignoring the valuable information hidden in the digital conversations, in terms of structure of the discussion and relation between contents, which is essential for understanding online communication behavior. This work proposes a graph-based framework to assess the shape and structure of online conversations. The analysis was composed of two main stages: intent analysis and network generation. Users’ intention was detected using keyword-based classification, followed by the implementation of machine learning-based classification algorithms for uncategorized comments. Afterwards, human-in-the-loop was involved in improving the keyword-based classification. To extract essential information on social media communication patterns among the users, we built conversation graphs using a directed multigraph network and we show our model at work in two real-life experiments. The first experiment used data from a real social media challenge and it was able to categorize 90% of comments with 98% accuracy. The second experiment focused on COVID vaccine-related discussions in online forums and investigated the stance and sentiment to understand how the comments are affected by their parent discussion. Finally, the most popular online discussion patterns were mined and interpreted. We see that the dynamics obtained from conversation graphs are similar to traditional communication activities.
Advances in Intelligent Systems and Computing, 2020
Providing personalized offers, and services in general, for the users of a system requires percei... more Providing personalized offers, and services in general, for the users of a system requires perceiving the context in which the users' preferences are rooted. Accordingly, context modeling is becoming a relevant issue and an expanding research field. Moreover, the frequent changes of context may induce a change in the current preferences; thus, appropriate learning methods should be employed for the system to adapt automatically. In this work, we introduce a methodology based on the so-called Context Dimension Tree-a model for representing the possible contexts in the very first stages of Application Design-as well as an appropriate conceptual architecture to build a recommender system for travelers.
The deliverable describes the activities of RIDE2RAIL T2.1 related to the conceptualization of ch... more The deliverable describes the activities of RIDE2RAIL T2.1 related to the conceptualization of choice criteria and incentives for journey planning in the door-to-door multi-modal scenario addressed by Shift2Rail IP4. The deliverable describes the survey designed, administered and analysed by RIDE2RAIL to collect data from European travellers on their choice criteria when looking for a travel solution and on their potential interest in specific incentives to multi-modal travel offers. The deliverable applies clustering techniques to the collected data with the goal to find mobility patterns.
Dataset reporting posts and users on instagram about 2018 fashion weeks.
Uploads
Papers by Alireza Javadian Sabet
While researchers and practitioners have developed data systems to track detailed occupational skills, such as those established by the U.S. Department of Labor (DOL), much less effort has been made to document skill development in higher education at a similar granularity.
Here, we fill this gap by presenting a longitudinal dataset of skills inferred from over three million course syllabi taught at nearly three thousand U.S. higher education institutions.
To construct this dataset, we apply natural language processing to extract from course descriptions detailed workplace activities (DWAs) used by the DOL to describe occupations.
We then aggregate these DWAs to create skill profiles for institutions and academic majors.
Our dataset offers a large-scale representation of college-educated workers and their role in the economy.
To showcase the utility of this dataset, we use it to 1) compare the similarity of skills taught and skills in the workforce according to the US Bureau of Labor Statistics, 2) estimate gender differences in acquired skills based on enrollment data, 3) depict temporal trends in the skills taught in social science curricula, and 4) connect college majors' skill distinctiveness to salary differences of graduates.
Overall, this dataset can enable new research on the source of skills in the context of workforce development and provide actionable insights for shaping the future of higher education to meet evolving labor demands especially in the face of new technologies.
Recommendation systems provide an effective way to solve the problem of information overload. In this work, we design and implement “The Hybrid Offer Ranker” (THOR), a hybrid, personalized recommender system for the transportation domain. THOR assigns every traveler a unique contextual preference model built using solely their personal data, which makes the model sensitive to the user’s choices. This model is used to rank travel offers presented to each user according to their personal preferences. We reduce the recommendation problem to one of binary classification that predicts the probability with which the traveler will buy each available travel offer. Travel offers are ranked according to the computed probabilities, hence to the user’s personal preference model.
Moreover, to tackle the cold start problem for new users, we apply clustering algorithms to identify groups of travelers with similar profiles, and build a preference model for each group. To test the system’s performance, we generate a dataset according to some carefully designed rules.
The results of the experiments show that the THOR tool is capable of learning the contextual preferences of each traveler and ranks offers starting from those that have the higher probability of being selected
While researchers and practitioners have developed data systems to track detailed occupational skills, such as those established by the U.S. Department of Labor (DOL), much less effort has been made to document skill development in higher education at a similar granularity.
Here, we fill this gap by presenting a longitudinal dataset of skills inferred from over three million course syllabi taught at nearly three thousand U.S. higher education institutions.
To construct this dataset, we apply natural language processing to extract from course descriptions detailed workplace activities (DWAs) used by the DOL to describe occupations.
We then aggregate these DWAs to create skill profiles for institutions and academic majors.
Our dataset offers a large-scale representation of college-educated workers and their role in the economy.
To showcase the utility of this dataset, we use it to 1) compare the similarity of skills taught and skills in the workforce according to the US Bureau of Labor Statistics, 2) estimate gender differences in acquired skills based on enrollment data, 3) depict temporal trends in the skills taught in social science curricula, and 4) connect college majors' skill distinctiveness to salary differences of graduates.
Overall, this dataset can enable new research on the source of skills in the context of workforce development and provide actionable insights for shaping the future of higher education to meet evolving labor demands especially in the face of new technologies.
Recommendation systems provide an effective way to solve the problem of information overload. In this work, we design and implement “The Hybrid Offer Ranker” (THOR), a hybrid, personalized recommender system for the transportation domain. THOR assigns every traveler a unique contextual preference model built using solely their personal data, which makes the model sensitive to the user’s choices. This model is used to rank travel offers presented to each user according to their personal preferences. We reduce the recommendation problem to one of binary classification that predicts the probability with which the traveler will buy each available travel offer. Travel offers are ranked according to the computed probabilities, hence to the user’s personal preference model.
Moreover, to tackle the cold start problem for new users, we apply clustering algorithms to identify groups of travelers with similar profiles, and build a preference model for each group. To test the system’s performance, we generate a dataset according to some carefully designed rules.
The results of the experiments show that the THOR tool is capable of learning the contextual preferences of each traveler and ranks offers starting from those that have the higher probability of being selected