Given a graph over which defects, viruses, or contagions spread, leveraging a set of highly corre... more Given a graph over which defects, viruses, or contagions spread, leveraging a set of highly correlated subgraphs is an appealing research area with many applications. However, the challenges abound. Firstly, an initial defect in one node can cause different defects in other nodes. Second, while the time is the most significant medium to understand diffusion processes, it is not clear when the members of a subgraph may change. Third, given a pair of nodes, a contagion can spread in both directions. Previous works only consider the sequential time-window and suppose that the contagion may spread from one node to the other during a predefined time span. But the propagation can differ in various temporal dimensions (e.g. hours and days). Therefore, we propose a framework that takes both sequential and multi-aspect attributes of the time into consideration. Moreover, we devise an empirical model to estimate how frequently the subgraphs may reshape. Experiment show that our framework can effectively leverage the reshaping subgraphs.
Personalized recommender system has become an essential means to help people discover attractive ... more Personalized recommender system has become an essential means to help people discover attractive and interesting items. We find that to buy an item, a user is influenced not only by her intrinsic interests and temporal contexts, but also by the crowd sentiment to this item. Users tend to refuse to accept the recommended items whose most reviews are negative. In light of this, we propose a temporal-sentiment-aware user behavior model (TSAUB) to learn personal interests, temporal contexts (i.e., temporal preferences of the public) and crowd sentiment from user review data. Based on the learnt knowledge from TSAUB, we design a temporal-sentiment-aware recommender system. To improve the training efficiency of TSAUB, we develop a distributed learning algorithm for model parameter estimation using the Spark framework. Extensive experiments have been performed on four Amazon datasets, and the results show that our recommender system significantly outperforms the state-of-the-arts by making more effective and efficient recommendations.
Web Information Systems Engineering – WISE 2018, 2018
Friend and item recommendation in online social networks is a vital task, which benefits for both... more Friend and item recommendation in online social networks is a vital task, which benefits for both users and platform providers. However, extreme sparsity of user-user matrix and user-item matrix issue create severe challenges, causing collaborative filtering methods to degrade significantly in their recommendation performance. Moreover, the factors those affect users’ preference for items and friends are complex in social networks. For example, users may be influenced by their friends in addition to their own preferences when they choose items. To tackle these problems, we first construct two implicit graphs of users according to the users’ shared neighbours in friendship network and the users’ common interested items in interest network to ease data sparsity issue. Then we stand on recent advances in embedding learning techniques and propose a unified graph-based embedding model, called UGE. UGE learns two implicit representations for each user from implicit graphs, so that users can be represented as two weighted implicit representations which reflect the influence of friendship and interest. The weights and items’ representation can be learnt from explicit friendship network and interest network mutually. Experimental results on real-world datasets demonstrate the effectiveness of the proposed approach.
The advanced development of various technologies on social network, e-commerce and online educati... more The advanced development of various technologies on social network, e-commerce and online education has contributed to an increasing amount of large-scale network data. Among all sorts of network analysis tasks, one basic task is to search important nodes in a network. Closeness centrality is one of the popular metrics which measure the importance of a node in a network. Based on the closeness centrality, the basic task is called top-k closeness centrality search. However, the existing exact approaches cannot process large-scale networks because of their polynomial time complexity. Recently, some approximation algorithms are proposed, which achieve high performance by sacrificing the precision of results. But according to our study, we find that the loss of the precision of results is too much. To improve the precision of results while maintaining the high performance, in this paper, we propose a Sketch-based approximation algorithm for fast searching top-k closeness centrality in a...
2020 IEEE 36th International Conference on Data Engineering (ICDE), 2020
Effective information retrieval (IR) relies on the ability to comprehensively capture a user’s in... more Effective information retrieval (IR) relies on the ability to comprehensively capture a user’s information needs. Traditional IR systems are limited to homogeneous queries that define the information to retrieve by a single modality. Support for heterogeneous queries that combine different modalities has been proposed recently. Yet, existing approaches for heterogeneous querying are computationally expensive, as they require several passes over the data to construct a query answer.In this paper, we propose an IR system that overcomes the computational challenges imposed by heterogeneous queries by adopting graph embeddings. Specifically, we propose graph-based models in which both, data and queries, incorporate information of different modalities. Then, we show how either representation is transformed into a graph embedding in the same space, capturing relations between information of different modalities. By grounding query processing in graph embeddings, we enable processing of heterogeneous queries with a single pass over the data representation. Our experiments on several real-world and synthetic datasets illustrate that our technique is able to return twice the amount of relevant information in comparison with several baselines, while being scalable to large-scale data.
Web Information Systems Engineering – WISE 2018, 2018
In the real world, a majority of facts are not static or immutable but highly ephemeral. Each fac... more In the real world, a majority of facts are not static or immutable but highly ephemeral. Each fact is valid for only a limited amount of time, or it stands in temporal dependencies. In addition, facts with time information are usually accompanied by a real-valued weight which witnesses the possibility of a fact. However, most of existing Knowledge Graphs (KGs) focus on static data thus impeding the comprehensive solution for the management of uncertain and temporal facts in KGs. To fill this gap, we emphasize the characteristics of time and propose a coherent management framework ETC (Eliminate Temporal Conflicts) for temporal consistency. ETC is based on maximum weight clique to detect temporal conflicts in uncertain temporal knowledge graphs and eliminate them to achieve the most probable knowledge graph according to related constraints. Constraint graphs with detailed description have first been proposed to identify temporal constraints for the conflict detection. Also, implicit constraints and weight conversion have been propose for conflict resolution. Experiments over two different temporal knowledge graphs demonstrate the high recall rate and precision rate of our framework.
2018 19th IEEE International Conference on Mobile Data Management (MDM), 2018
An alarm is raised due to a defect in a transportation system. Given a graph over which the alarm... more An alarm is raised due to a defect in a transportation system. Given a graph over which the alarms propagate, we aim to exploit a set of subgraphs with highly correlated nodes (or entities). The edge weight between each pair of entities can be computed using the temporal dynamics of the propagation process. We retrieve the top k edge weights and each group of connected entities can consequently form a tightly coupled subgraph. However, numerous challenges abound. First, the textual contents associated with the alarms of the same type differ during the propagation process. Hence, in the lack of textual data, the temporal information can only be employed to compute the correlation weights. Second, in many scenarios, the same alarm does not propagate. Third, given a pair of entities, the propagation can occur in both directions. Most of the prior work only consider the time-window and assume that the propagation between a pair of entities occurs sequentially. But, the propagation process should be inferred using miscellaneous temporal features. Therefore, we devise a generative approach that, on the one hand, utilizes infinite temporal latent factors (e.g. hour, day, and etc.) to compute the correlation weights, and on the other hand, analyzes how an alarm in one entity can cause a set of alarms in another. We also conduct an extensive set of experiments to compare the performance of the subgraph mining methods. The results show that our unified framework can effectively exploit the tightly coupled subgraphs.
With the rapid development of location-based social networks (LBSNs), spatial item recommendation... more With the rapid development of location-based social networks (LBSNs), spatial item recommendation has become an important way of helping users discover interesting locations to increase their engagement with location-based services. The availability of spatial, temporal, and social information in LBSNs offers an unprecedented opportunity to enhance the spatial item recommendation. Many previous works studied spatial and social influences on spatial item recommendation in LBSNs. Due to the strong correlations between a user’s check-in time and the corresponding check-in location, which include the sequential influence and temporal cyclic effect, it is essential for spatial item recommender system to exploit the temporal effect to improve the recommendation accuracy. Leveraging temporal information in spatial item recommendation is, however, very challenging, considering (1) when integrating sequential influences, users’ check-in data in LBSNs has a low sampling rate in both space and...
2021 IEEE 37th International Conference on Data Engineering (ICDE), 2021
Learning vector representations (i.e., embeddings) of nodes for graph-structured information netw... more Learning vector representations (i.e., embeddings) of nodes for graph-structured information network has attracted vast interest from both industry and academia. Most real-world networks exhibit a complex and heterogeneous format, enclosing high-order relationships and rich semantic information among nodes. However, existing heterogeneous network embedding (HNE) frameworks are commonly designed in a centralized fashion, i.e., all the data storage and learning process take place on a single machine. Hence, those HNE methods show severe performance bottlenecks when handling large-scale networks due to high consumption on memory, storage, and running time. In light of this, to cope with large-scale HNE tasks with strong efficiency and effectiveness guarantee, we propose Decentralized Deep Heterogeneous Hypergraph (DDHH) embedding framework in this paper. In DDHH, we innovatively formulate a large heterogeneous network as a hypergraph, where its hyperedges can connect a set of semantically similar nodes. Our framework then intelligently partitions the heterogeneous network using the identified hyperedges. Then, each resulted subnetwork is assigned to a distributed worker, which employs the deep information maximization theorem to locally learn node embeddings from the partition received. We further devise a novel embedding alignment scheme to precisely project independently learned node embeddings from all subnetworks onto a public vector space, thus allowing for downstream tasks. As shown from our experimental results, DDHH significantly improves the efficiency and accuracy of existing HNE models, and can easily scale up to large-scale heterogeneous networks.
The growing popularity of storing large data graphs in cloud has inspired the emergence of subgra... more The growing popularity of storing large data graphs in cloud has inspired the emergence of subgraph pattern matching on a remote cloud, which is usually defined in terms of subgraph isomorphism. However, it is an NP-complete problem and too strict to find useful matches in certain applications. In addition, there exists another important concern, i.e., how to protect the privacy of data graphs in subgraph pattern matching without undermining matching results. To tackle these problems, we propose a novel framework to achieve the privacy-preserving subgraph pattern matching via strong simulation in cloud. Firstly, we develop a k-automorphism model based method to protect structural privacy in data graphs. Additionally, we use a cost-model based label generalization method to protect label privacy in both data graphs and pattern graphs. Owing to the symmetry in a k-automorphic graph, the subgraph pattern matching can be answered using the outsourced graph, which is only a subset of a k-automorphic graph. The efficiency of subgraph pattern matching can be greatly improved by this way. Extensive experiments on real-world datasets demonstrate the high efficiency and effectiveness of our framework.
Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021
Deep learning techniques have ushered in significant progress in large-scale multi-modal retrieva... more Deep learning techniques have ushered in significant progress in large-scale multi-modal retrieval. Nevertheless, the advanced techniques may be used nefariously to conduct a search that violates the privacy of individuals. In this paper, we propose a novel PrIvacy Protection method (PIP) against malicious multi-modal retrieval models, which proactively transfers original data into adversarial data with quasi-imperceptible perturbations before releasing them. Consequently, unauthorized malicious parties are not able to use deployed deep models to find out desired sensitive information with them. In addition to privacy preserving, PIP synchronously learns an effective multi-modal retrieval model to facilitate authorized uses, endowed with strong resilience to the perturbations. To the best of our knowledge, it is a very first attempt to consider privacy issues in multi-modal retrieval, and encapsulate both privacy protection against unauthorized retrieval and robust multi-modal learning for authorized uses into a unified framework. This work is conducted in the challenging no-box and unsupervised settings, where neither target malicious models nor supervised information is known. The optimization objective of our versatile PIP is achieved through a two-player game between different components with both the intra- and inter-modality graph alignments and the domain distribution alignment considered. Besides, a high-level similarity matrix is developed to obtain reliable guidance for learning. Empirically, we apply the proposed PIP to hashing based multi-modal retrieval scenarios and prove its effectiveness on a range of benchmarks and tasks.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018
The efficiency of top-k recommendation is vital to large-scale recommender systems. Hashing is no... more The efficiency of top-k recommendation is vital to large-scale recommender systems. Hashing is not only an efficient alternative but also complementary to distributed computing, and also a practical and effective option in a computing environment with limited resources. Hashing techniques improve the efficiency of online recommendation by representing users and items by binary codes. However, objective functions of existing methods are not consistent with ultimate goals of recommender systems, and are often optimized via discrete coordinate descent, easily getting stuck in a local optimum. To this end, we propose a Discrete Ranking-based Matrix Factorization (DRMF) algorithm based on each user's pairwise preferences, and formulate it into binary quadratic programming problems to learn binary codes. Due to non-convexity and binary constraints, we further propose self-paced learning for improving the optimization, to include pairwise preferences gradually from easy to complex. We finally evaluate the proposed algorithm on three public real-world datasets, and show that the proposed algorithm outperforms the state-of-the-art hashing-based recommendation algorithms, and even achieves comparable performance to matrix factorization methods.
IEEE Transactions on Knowledge and Data Engineering, 2021
Network alignment is the task of identifying topologically and semantically similar nodes across ... more Network alignment is the task of identifying topologically and semantically similar nodes across (two) different networks. It plays an important role in various applications ranging from social network analysis to bioinformatic network interactions. However, existing alignment models either cannot handle large-scale graphs or fail to leverage different types of network information or modalities. In this paper, we propose a novel end-to-end alignment framework that can leverage different modalities to compare and align network nodes in an efficient way. In order to exploit the richness of the network context, our model constructs multiple embeddings for each node, each of which captures one modality or type of network information. We then design a late-fusion mechanism to combine the learned embeddings based on the importance of the underlying information. Our fusion mechanism allows our model to be adapted to various types of structure of the input network. Experimental results show that our technique outperforms state-of-the-art approaches in terms of accuracy on real and synthetic datasets, while being robust against various noise factors.
IEEE Journal of Biomedical and Health Informatics, 2021
With the increasingly available electronic medical records (EMRs), disease prediction has recentl... more With the increasingly available electronic medical records (EMRs), disease prediction has recently gained immense research attention, where an accurate classifier needs to be trained to map the input prediction signals (e.g., symptoms, patient demographics, etc.) to the estimated diseases for each patient. However, existing machine learning-based solutions heavily rely on abundant manually labeled EMR training data to ensure satisfactory prediction results, impeding their performance in the existence of rare diseases that are subject to severe data scarcity. For each rare disease, the limited EMR data can hardly offer sufficient information for a model to correctly distinguish its identity from other diseases with similar clinical symptoms. Furthermore, most existing disease prediction approaches are based on the sequential EMRs collected for every patient and are unable to handle new patients without historical EMRs, reducing their real-life practicality. In this paper, we introduce an innovative model based on Graph Neural Networks (GNNs) for disease prediction, which utilizes external knowledge bases to augment the insufficient EMR data, and learns highly representative node embeddings for patients, diseases and symptoms from the medical concept graph and patient record graph respectively constructed from the medical knowledge base and EMRs. By aggregating information from directly connected neighbor nodes, the proposed neural graph encoder can effectively generate embeddings that capture knowledge from both data sources, and is able to inductively infer the embeddings for a new patient based on the symptoms reported in her/his EMRs to allow for accurate prediction on both general diseases and rare diseases. Extensive experiments on a real-world EMR dataset have demonstrated the state-of-the-art performance of our proposed model.
Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021
Recommender systems have played a vital role in online platforms due to the ability of incorporat... more Recommender systems have played a vital role in online platforms due to the ability of incorporating users' personal tastes. Beyond accuracy, diversity has been recognized as a key factor in recommendation to broaden user's horizons as well as to promote enterprises' sales. However, the trading-off between accuracy and diversity remains to be a big challenge, and the data and user biases have not been explored yet. In this paper, we develop an adaptive learning framework for accurate and diversified recommendation. We generalize recent proposed bilateral branch network in the computer vision community from image classification to item recommendation. Specifically, we encode domain level diversity by adaptively balancing accurate recommendation in the conventional branch and diversified recommendation in the adaptive branch of a bilateral branch network. We also capture user level diversity using a two-way adaptive metric learning backbone network in each branch. We conduct extensive experiments on three real-world datasets. Results demonstrate that our proposed approach consistently outperforms the state-of-theart baselines.
In the modern tourism industry, next point-of-interest (POI) recommendation is an important mobil... more In the modern tourism industry, next point-of-interest (POI) recommendation is an important mobile service as it effectively aids hesitating travelers to decide the next POI to visit. Currently, most next POI recommender systems are built upon a cloud-based paradigm, where the recommendation models are trained and deployed on the powerful cloud servers. When a recommendation request is made by a user via mobile devices, the current contextual information will be uploaded to the cloud servers to help the well-trained models generate personalized recommendation results. However, in reality, this paradigm heavily relies on high-quality network connectivity, and is subject to high energy footprint in the operation and increasing privacy concerns among the public. To bypass these defects, we propose a novel Light Location Recommender System (LLRec) to perform next POI recommendation locally on resource-constrained mobile devices. To make LLRec fully compatible with the limited computing resources and memory space, we leverage FastGRNN, a lightweight but effective gated Recurrent Neural Network (RNN) as its main building block, and significantly compress the model size by adopting the tensor-train composition in the embedding layer. As a compact model, LLRec maintains its robustness via an innovative teacher-student training framework, where a powerful teacher model is trained on the cloud to learn essential knowledge from available contextual data, and the simplified student model LLRec is trained under the guidance of the teacher model. The final LLRec is downloaded and deployed on users’ mobile devices to generate accurate recommendations solely utilizing users’ local data. As a result, LLRec significantly reduces the dependency on cloud servers, thus allowing for next POI recommendation in a stable, cost-effective and secure way. Extensive experiments on two large-scale recommendation datasets further demonstrate the superiority of our proposed solution.
With the popularity of storing large data graph in cloud, the emergence of subgraph pattern match... more With the popularity of storing large data graph in cloud, the emergence of subgraph pattern matching on a remote cloud has been inspired. Typically, subgraph pattern matching is defined in terms of subgraph isomorphism, which is an NP-complete problem and sometimes too strict to find useful matches in certain applications. And how to protect the privacy of data graphs in subgraph pattern matching without undermining matching results is an important concern. Thus, we propose a novel framework to achieve the privacy-preserving subgraph pattern matching in cloud. In order to protect the structural privacy in data graphs, we firstly develop a k-automorphism model based method. Additionally, we use a cost-model based label generalization method to protect label privacy in both data graphs and pattern graphs. During the generation of the k-automorphic graph, a large number of noise edges or vertices might be introduced to the original data graph. Thus, we use the outsourced graph, which is only a subset of a k-automorphic graph, to answer the subgraph pattern matching. The efficiency of the pattern matching process can be greatly improved in this way. Extensive experiments on real-world datasets demonstrate the high efficiency of our framework.
Given a graph over which defects, viruses, or contagions spread, leveraging a set of highly corre... more Given a graph over which defects, viruses, or contagions spread, leveraging a set of highly correlated subgraphs is an appealing research area with many applications. However, the challenges abound. Firstly, an initial defect in one node can cause different defects in other nodes. Second, while the time is the most significant medium to understand diffusion processes, it is not clear when the members of a subgraph may change. Third, given a pair of nodes, a contagion can spread in both directions. Previous works only consider the sequential time-window and suppose that the contagion may spread from one node to the other during a predefined time span. But the propagation can differ in various temporal dimensions (e.g. hours and days). Therefore, we propose a framework that takes both sequential and multi-aspect attributes of the time into consideration. Moreover, we devise an empirical model to estimate how frequently the subgraphs may reshape. Experiment show that our framework can effectively leverage the reshaping subgraphs.
Personalized recommender system has become an essential means to help people discover attractive ... more Personalized recommender system has become an essential means to help people discover attractive and interesting items. We find that to buy an item, a user is influenced not only by her intrinsic interests and temporal contexts, but also by the crowd sentiment to this item. Users tend to refuse to accept the recommended items whose most reviews are negative. In light of this, we propose a temporal-sentiment-aware user behavior model (TSAUB) to learn personal interests, temporal contexts (i.e., temporal preferences of the public) and crowd sentiment from user review data. Based on the learnt knowledge from TSAUB, we design a temporal-sentiment-aware recommender system. To improve the training efficiency of TSAUB, we develop a distributed learning algorithm for model parameter estimation using the Spark framework. Extensive experiments have been performed on four Amazon datasets, and the results show that our recommender system significantly outperforms the state-of-the-arts by making more effective and efficient recommendations.
Web Information Systems Engineering – WISE 2018, 2018
Friend and item recommendation in online social networks is a vital task, which benefits for both... more Friend and item recommendation in online social networks is a vital task, which benefits for both users and platform providers. However, extreme sparsity of user-user matrix and user-item matrix issue create severe challenges, causing collaborative filtering methods to degrade significantly in their recommendation performance. Moreover, the factors those affect users’ preference for items and friends are complex in social networks. For example, users may be influenced by their friends in addition to their own preferences when they choose items. To tackle these problems, we first construct two implicit graphs of users according to the users’ shared neighbours in friendship network and the users’ common interested items in interest network to ease data sparsity issue. Then we stand on recent advances in embedding learning techniques and propose a unified graph-based embedding model, called UGE. UGE learns two implicit representations for each user from implicit graphs, so that users can be represented as two weighted implicit representations which reflect the influence of friendship and interest. The weights and items’ representation can be learnt from explicit friendship network and interest network mutually. Experimental results on real-world datasets demonstrate the effectiveness of the proposed approach.
The advanced development of various technologies on social network, e-commerce and online educati... more The advanced development of various technologies on social network, e-commerce and online education has contributed to an increasing amount of large-scale network data. Among all sorts of network analysis tasks, one basic task is to search important nodes in a network. Closeness centrality is one of the popular metrics which measure the importance of a node in a network. Based on the closeness centrality, the basic task is called top-k closeness centrality search. However, the existing exact approaches cannot process large-scale networks because of their polynomial time complexity. Recently, some approximation algorithms are proposed, which achieve high performance by sacrificing the precision of results. But according to our study, we find that the loss of the precision of results is too much. To improve the precision of results while maintaining the high performance, in this paper, we propose a Sketch-based approximation algorithm for fast searching top-k closeness centrality in a...
2020 IEEE 36th International Conference on Data Engineering (ICDE), 2020
Effective information retrieval (IR) relies on the ability to comprehensively capture a user’s in... more Effective information retrieval (IR) relies on the ability to comprehensively capture a user’s information needs. Traditional IR systems are limited to homogeneous queries that define the information to retrieve by a single modality. Support for heterogeneous queries that combine different modalities has been proposed recently. Yet, existing approaches for heterogeneous querying are computationally expensive, as they require several passes over the data to construct a query answer.In this paper, we propose an IR system that overcomes the computational challenges imposed by heterogeneous queries by adopting graph embeddings. Specifically, we propose graph-based models in which both, data and queries, incorporate information of different modalities. Then, we show how either representation is transformed into a graph embedding in the same space, capturing relations between information of different modalities. By grounding query processing in graph embeddings, we enable processing of heterogeneous queries with a single pass over the data representation. Our experiments on several real-world and synthetic datasets illustrate that our technique is able to return twice the amount of relevant information in comparison with several baselines, while being scalable to large-scale data.
Web Information Systems Engineering – WISE 2018, 2018
In the real world, a majority of facts are not static or immutable but highly ephemeral. Each fac... more In the real world, a majority of facts are not static or immutable but highly ephemeral. Each fact is valid for only a limited amount of time, or it stands in temporal dependencies. In addition, facts with time information are usually accompanied by a real-valued weight which witnesses the possibility of a fact. However, most of existing Knowledge Graphs (KGs) focus on static data thus impeding the comprehensive solution for the management of uncertain and temporal facts in KGs. To fill this gap, we emphasize the characteristics of time and propose a coherent management framework ETC (Eliminate Temporal Conflicts) for temporal consistency. ETC is based on maximum weight clique to detect temporal conflicts in uncertain temporal knowledge graphs and eliminate them to achieve the most probable knowledge graph according to related constraints. Constraint graphs with detailed description have first been proposed to identify temporal constraints for the conflict detection. Also, implicit constraints and weight conversion have been propose for conflict resolution. Experiments over two different temporal knowledge graphs demonstrate the high recall rate and precision rate of our framework.
2018 19th IEEE International Conference on Mobile Data Management (MDM), 2018
An alarm is raised due to a defect in a transportation system. Given a graph over which the alarm... more An alarm is raised due to a defect in a transportation system. Given a graph over which the alarms propagate, we aim to exploit a set of subgraphs with highly correlated nodes (or entities). The edge weight between each pair of entities can be computed using the temporal dynamics of the propagation process. We retrieve the top k edge weights and each group of connected entities can consequently form a tightly coupled subgraph. However, numerous challenges abound. First, the textual contents associated with the alarms of the same type differ during the propagation process. Hence, in the lack of textual data, the temporal information can only be employed to compute the correlation weights. Second, in many scenarios, the same alarm does not propagate. Third, given a pair of entities, the propagation can occur in both directions. Most of the prior work only consider the time-window and assume that the propagation between a pair of entities occurs sequentially. But, the propagation process should be inferred using miscellaneous temporal features. Therefore, we devise a generative approach that, on the one hand, utilizes infinite temporal latent factors (e.g. hour, day, and etc.) to compute the correlation weights, and on the other hand, analyzes how an alarm in one entity can cause a set of alarms in another. We also conduct an extensive set of experiments to compare the performance of the subgraph mining methods. The results show that our unified framework can effectively exploit the tightly coupled subgraphs.
With the rapid development of location-based social networks (LBSNs), spatial item recommendation... more With the rapid development of location-based social networks (LBSNs), spatial item recommendation has become an important way of helping users discover interesting locations to increase their engagement with location-based services. The availability of spatial, temporal, and social information in LBSNs offers an unprecedented opportunity to enhance the spatial item recommendation. Many previous works studied spatial and social influences on spatial item recommendation in LBSNs. Due to the strong correlations between a user’s check-in time and the corresponding check-in location, which include the sequential influence and temporal cyclic effect, it is essential for spatial item recommender system to exploit the temporal effect to improve the recommendation accuracy. Leveraging temporal information in spatial item recommendation is, however, very challenging, considering (1) when integrating sequential influences, users’ check-in data in LBSNs has a low sampling rate in both space and...
2021 IEEE 37th International Conference on Data Engineering (ICDE), 2021
Learning vector representations (i.e., embeddings) of nodes for graph-structured information netw... more Learning vector representations (i.e., embeddings) of nodes for graph-structured information network has attracted vast interest from both industry and academia. Most real-world networks exhibit a complex and heterogeneous format, enclosing high-order relationships and rich semantic information among nodes. However, existing heterogeneous network embedding (HNE) frameworks are commonly designed in a centralized fashion, i.e., all the data storage and learning process take place on a single machine. Hence, those HNE methods show severe performance bottlenecks when handling large-scale networks due to high consumption on memory, storage, and running time. In light of this, to cope with large-scale HNE tasks with strong efficiency and effectiveness guarantee, we propose Decentralized Deep Heterogeneous Hypergraph (DDHH) embedding framework in this paper. In DDHH, we innovatively formulate a large heterogeneous network as a hypergraph, where its hyperedges can connect a set of semantically similar nodes. Our framework then intelligently partitions the heterogeneous network using the identified hyperedges. Then, each resulted subnetwork is assigned to a distributed worker, which employs the deep information maximization theorem to locally learn node embeddings from the partition received. We further devise a novel embedding alignment scheme to precisely project independently learned node embeddings from all subnetworks onto a public vector space, thus allowing for downstream tasks. As shown from our experimental results, DDHH significantly improves the efficiency and accuracy of existing HNE models, and can easily scale up to large-scale heterogeneous networks.
The growing popularity of storing large data graphs in cloud has inspired the emergence of subgra... more The growing popularity of storing large data graphs in cloud has inspired the emergence of subgraph pattern matching on a remote cloud, which is usually defined in terms of subgraph isomorphism. However, it is an NP-complete problem and too strict to find useful matches in certain applications. In addition, there exists another important concern, i.e., how to protect the privacy of data graphs in subgraph pattern matching without undermining matching results. To tackle these problems, we propose a novel framework to achieve the privacy-preserving subgraph pattern matching via strong simulation in cloud. Firstly, we develop a k-automorphism model based method to protect structural privacy in data graphs. Additionally, we use a cost-model based label generalization method to protect label privacy in both data graphs and pattern graphs. Owing to the symmetry in a k-automorphic graph, the subgraph pattern matching can be answered using the outsourced graph, which is only a subset of a k-automorphic graph. The efficiency of subgraph pattern matching can be greatly improved by this way. Extensive experiments on real-world datasets demonstrate the high efficiency and effectiveness of our framework.
Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021
Deep learning techniques have ushered in significant progress in large-scale multi-modal retrieva... more Deep learning techniques have ushered in significant progress in large-scale multi-modal retrieval. Nevertheless, the advanced techniques may be used nefariously to conduct a search that violates the privacy of individuals. In this paper, we propose a novel PrIvacy Protection method (PIP) against malicious multi-modal retrieval models, which proactively transfers original data into adversarial data with quasi-imperceptible perturbations before releasing them. Consequently, unauthorized malicious parties are not able to use deployed deep models to find out desired sensitive information with them. In addition to privacy preserving, PIP synchronously learns an effective multi-modal retrieval model to facilitate authorized uses, endowed with strong resilience to the perturbations. To the best of our knowledge, it is a very first attempt to consider privacy issues in multi-modal retrieval, and encapsulate both privacy protection against unauthorized retrieval and robust multi-modal learning for authorized uses into a unified framework. This work is conducted in the challenging no-box and unsupervised settings, where neither target malicious models nor supervised information is known. The optimization objective of our versatile PIP is achieved through a two-player game between different components with both the intra- and inter-modality graph alignments and the domain distribution alignment considered. Besides, a high-level similarity matrix is developed to obtain reliable guidance for learning. Empirically, we apply the proposed PIP to hashing based multi-modal retrieval scenarios and prove its effectiveness on a range of benchmarks and tasks.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018
The efficiency of top-k recommendation is vital to large-scale recommender systems. Hashing is no... more The efficiency of top-k recommendation is vital to large-scale recommender systems. Hashing is not only an efficient alternative but also complementary to distributed computing, and also a practical and effective option in a computing environment with limited resources. Hashing techniques improve the efficiency of online recommendation by representing users and items by binary codes. However, objective functions of existing methods are not consistent with ultimate goals of recommender systems, and are often optimized via discrete coordinate descent, easily getting stuck in a local optimum. To this end, we propose a Discrete Ranking-based Matrix Factorization (DRMF) algorithm based on each user's pairwise preferences, and formulate it into binary quadratic programming problems to learn binary codes. Due to non-convexity and binary constraints, we further propose self-paced learning for improving the optimization, to include pairwise preferences gradually from easy to complex. We finally evaluate the proposed algorithm on three public real-world datasets, and show that the proposed algorithm outperforms the state-of-the-art hashing-based recommendation algorithms, and even achieves comparable performance to matrix factorization methods.
IEEE Transactions on Knowledge and Data Engineering, 2021
Network alignment is the task of identifying topologically and semantically similar nodes across ... more Network alignment is the task of identifying topologically and semantically similar nodes across (two) different networks. It plays an important role in various applications ranging from social network analysis to bioinformatic network interactions. However, existing alignment models either cannot handle large-scale graphs or fail to leverage different types of network information or modalities. In this paper, we propose a novel end-to-end alignment framework that can leverage different modalities to compare and align network nodes in an efficient way. In order to exploit the richness of the network context, our model constructs multiple embeddings for each node, each of which captures one modality or type of network information. We then design a late-fusion mechanism to combine the learned embeddings based on the importance of the underlying information. Our fusion mechanism allows our model to be adapted to various types of structure of the input network. Experimental results show that our technique outperforms state-of-the-art approaches in terms of accuracy on real and synthetic datasets, while being robust against various noise factors.
IEEE Journal of Biomedical and Health Informatics, 2021
With the increasingly available electronic medical records (EMRs), disease prediction has recentl... more With the increasingly available electronic medical records (EMRs), disease prediction has recently gained immense research attention, where an accurate classifier needs to be trained to map the input prediction signals (e.g., symptoms, patient demographics, etc.) to the estimated diseases for each patient. However, existing machine learning-based solutions heavily rely on abundant manually labeled EMR training data to ensure satisfactory prediction results, impeding their performance in the existence of rare diseases that are subject to severe data scarcity. For each rare disease, the limited EMR data can hardly offer sufficient information for a model to correctly distinguish its identity from other diseases with similar clinical symptoms. Furthermore, most existing disease prediction approaches are based on the sequential EMRs collected for every patient and are unable to handle new patients without historical EMRs, reducing their real-life practicality. In this paper, we introduce an innovative model based on Graph Neural Networks (GNNs) for disease prediction, which utilizes external knowledge bases to augment the insufficient EMR data, and learns highly representative node embeddings for patients, diseases and symptoms from the medical concept graph and patient record graph respectively constructed from the medical knowledge base and EMRs. By aggregating information from directly connected neighbor nodes, the proposed neural graph encoder can effectively generate embeddings that capture knowledge from both data sources, and is able to inductively infer the embeddings for a new patient based on the symptoms reported in her/his EMRs to allow for accurate prediction on both general diseases and rare diseases. Extensive experiments on a real-world EMR dataset have demonstrated the state-of-the-art performance of our proposed model.
Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021
Recommender systems have played a vital role in online platforms due to the ability of incorporat... more Recommender systems have played a vital role in online platforms due to the ability of incorporating users' personal tastes. Beyond accuracy, diversity has been recognized as a key factor in recommendation to broaden user's horizons as well as to promote enterprises' sales. However, the trading-off between accuracy and diversity remains to be a big challenge, and the data and user biases have not been explored yet. In this paper, we develop an adaptive learning framework for accurate and diversified recommendation. We generalize recent proposed bilateral branch network in the computer vision community from image classification to item recommendation. Specifically, we encode domain level diversity by adaptively balancing accurate recommendation in the conventional branch and diversified recommendation in the adaptive branch of a bilateral branch network. We also capture user level diversity using a two-way adaptive metric learning backbone network in each branch. We conduct extensive experiments on three real-world datasets. Results demonstrate that our proposed approach consistently outperforms the state-of-theart baselines.
In the modern tourism industry, next point-of-interest (POI) recommendation is an important mobil... more In the modern tourism industry, next point-of-interest (POI) recommendation is an important mobile service as it effectively aids hesitating travelers to decide the next POI to visit. Currently, most next POI recommender systems are built upon a cloud-based paradigm, where the recommendation models are trained and deployed on the powerful cloud servers. When a recommendation request is made by a user via mobile devices, the current contextual information will be uploaded to the cloud servers to help the well-trained models generate personalized recommendation results. However, in reality, this paradigm heavily relies on high-quality network connectivity, and is subject to high energy footprint in the operation and increasing privacy concerns among the public. To bypass these defects, we propose a novel Light Location Recommender System (LLRec) to perform next POI recommendation locally on resource-constrained mobile devices. To make LLRec fully compatible with the limited computing resources and memory space, we leverage FastGRNN, a lightweight but effective gated Recurrent Neural Network (RNN) as its main building block, and significantly compress the model size by adopting the tensor-train composition in the embedding layer. As a compact model, LLRec maintains its robustness via an innovative teacher-student training framework, where a powerful teacher model is trained on the cloud to learn essential knowledge from available contextual data, and the simplified student model LLRec is trained under the guidance of the teacher model. The final LLRec is downloaded and deployed on users’ mobile devices to generate accurate recommendations solely utilizing users’ local data. As a result, LLRec significantly reduces the dependency on cloud servers, thus allowing for next POI recommendation in a stable, cost-effective and secure way. Extensive experiments on two large-scale recommendation datasets further demonstrate the superiority of our proposed solution.
With the popularity of storing large data graph in cloud, the emergence of subgraph pattern match... more With the popularity of storing large data graph in cloud, the emergence of subgraph pattern matching on a remote cloud has been inspired. Typically, subgraph pattern matching is defined in terms of subgraph isomorphism, which is an NP-complete problem and sometimes too strict to find useful matches in certain applications. And how to protect the privacy of data graphs in subgraph pattern matching without undermining matching results is an important concern. Thus, we propose a novel framework to achieve the privacy-preserving subgraph pattern matching in cloud. In order to protect the structural privacy in data graphs, we firstly develop a k-automorphism model based method. Additionally, we use a cost-model based label generalization method to protect label privacy in both data graphs and pattern graphs. During the generation of the k-automorphic graph, a large number of noise edges or vertices might be introduced to the original data graph. Thus, we use the outsourced graph, which is only a subset of a k-automorphic graph, to answer the subgraph pattern matching. The efficiency of the pattern matching process can be greatly improved in this way. Extensive experiments on real-world datasets demonstrate the high efficiency of our framework.
Uploads
Papers by Hongzhi Yin