The learning-enhanced relevance feedback has been one of the most active research areas in conten... more The learning-enhanced relevance feedback has been one of the most active research areas in content-based image retrieval in recent years. However, few methods using the relevance feedback are currently available to process relatively complex queries on large image databases. In the case of complex image queries, the feature space and the distance function of the user's perception are usually different from those of the system. This difference leads to the representation of a query with multiple clusters (i.e., regions) in the feature space. Therefore, it is necessary to handle disjunctive queries in the feature space. In this paper, we propose a new content-based image retrieval method using adaptive classification and clustermerging to find multiple clusters of a complex image query. When the measures of a retrieval method are invariant under linear transformations, the method can achieve the same retrieval quality regardless of the shapes of clusters of a query. Our method achieves the same high retrieval quality regardless of the shapes of clusters of a query since it uses such measures. Extensive experiments show that the result of our method converges to the user's true information need fast, and the retrieval quality of our method is about 22% in recall and 20% in precision better than that of the query expansion approach, and about 34% in recall and about 33% in precision better than that of the query point movement approach, in MARS.
This paper investigates the MaxRS problem in spatial databases. Given a set O of weighted points ... more This paper investigates the MaxRS problem in spatial databases. Given a set O of weighted points and a rectangu-lar region r of a given size, the goal of the MaxRS problem is to find a location of r such that the sum of the weights of all the points covered by r is maximized. This problem is use-ful in many location-based applications such as finding the best place for a new franchise store with a limited delivery range and finding the most attractive place for a tourist with a limited reachable range. However, the problem has been studied mainly in theory, particularly, in computational ge-ometry. The existing algorithms from the computational geometry community are in-memory algorithms which do not guarantee the scalability. In this paper, we propose a scalable external-memory algorithm (ExactMaxRS) for the MaxRS problem, which is optimal in terms of the I/O com-plexity. Furthermore, we propose an approximation algo-rithm (ApproxMaxCRS) for the MaxCRS problem that is a circle vers...
This paper describes a decision tree model and 3-dimensional representation of information retrie... more This paper describes a decision tree model and 3-dimensional representation of information retrieved from various weblogs in relation to argumentative logics. The weblogs are considered as datasets that show significant correlations between the queries applied to them. We have extracted a compact set of rules to support the dataset with the queries and employed effective evaluation metrics to evaluate the weighted average of the weblogs categorized into different types. The opinions from the weblogs are retrieved and represented as an object oriented 3-Dimensional system. The goal of our approach is to generate rules from rough sets and to represent them in a 3-dimensional interactive program, Blog Cosmos. We used rough set theory as a candidate framework for query refinement.
Efficient query processing for complex spatial objects is one of the most challenging requirement... more Efficient query processing for complex spatial objects is one of the most challenging requirements in non-traditional applications such as geographic information systems, computer-aided design, and multimedia databases. The performance of spatial query processing can be improved by decomposing a complex object into a small number of simple components. This paper investigates the natural trade-off between the number and the complexity of decomposed components. In particular, we propose a new object decomposition method that can control the number of components using a parameter. This method enables the user to select the optimal trade-off by controlling the parameter. The proposed method is compared with traditional decomposition methods by an analytical study and experimental measurements. These comparisons show that our decomposition method outperforms traditional decomposition methods.
The learning-enhanced relevance feedback has been one of the most active research areas in conten... more The learning-enhanced relevance feedback has been one of the most active research areas in content-based image re-trieval in recent years. However, few methods using the rel-evance feedback are currently available to process relatively complex queries on large image databases. In the case of complex image queries, the feature space and the distance function of the user’s perception are usually different from those of the system. This difference leads to the represen-tation of a query with multiple clusters (i.e., regions) in the feature space. Therefore, it is necessary to handle disjunc-tive queries in the feature space. In this paper, we propose a new content-based image retrieval method using adaptive classification and cluster-merging to find multiple clusters of a complex image query. When the measures of a retrieval method are invariant under linear transformations, the method can achieve the same re-trieval quality regardless of the shapes of clusters of a query. Our method a...
Proceedings of the 23rd International Conference on World Wide Web - WWW '14 Companion
The semantic Web is a promising future Web environment. In order to realize the semantic Web, the... more The semantic Web is a promising future Web environment. In order to realize the semantic Web, the semantic annotation should be widely available. The studies for generating the semantic annotation do not provide a solution to the 'document evolution' requirement which is to maintain the consistency between semantic annotations and Web pages. In this paper, we propose an efficient solution to the requirement, that is to separately generate the longterm annotation and the short-term annotation. The experimental results show that our approach outperforms an existing approach which is the most efficient among the automatic approaches based on static Web pages.
With the advances in multimedia databases on the World Wide Web, it becomes more important to pro... more With the advances in multimedia databases on the World Wide Web, it becomes more important to provide users with the search capability of distributed multimedia data. While there have been many studies about the database selection and the collection fusion for text databases. The multimedia databases on the Web have autonomous and heterogeneous properties and they use mainly the content based retrieval. The collection fusion problem of multimedia databases is concerned with the merging of results retrieved by content based retrieval from heterogeneous multimedia databases on the Web. This problem is crucial for the search in distributed multimedia databases, however, it has not been studied yet. This paper provides novel algorithms for processing the collection fusion of heterogeneous multimedia databases on the Web. We propose two heuristic algorithms for estimating the number of objects to be retrieved from local databases and an algorithm using the linear regression. Extensive ex...
Proceedings of the 23rd International Conference on World Wide Web, 2014
The betweenness centrality is a measure for the relative participation of the vertex in the short... more The betweenness centrality is a measure for the relative participation of the vertex in the shortest paths in the graph. In many cases, we are interested in the k-highest betweenness centrality vertices only rather than all the vertices in a graph. In this paper, we study an efficient algorithm for finding the exact k-highest betweenness centrality vertices.
With respect to the Semantic Web proposed to overcome the limitation of the Web, OWL has been rec... more With respect to the Semantic Web proposed to overcome the limitation of the Web, OWL has been recommended as the ontology language used to give a well-defined meaning to diverse data. OWL is the representative ontology language suggested by W3C. An efficient retrieval of OWL data requires a well-constructed storage schema. In this paper, we propose a storage schema construction technique which supports more efficient query processing. A retrieval technique corresponding to the proposed storage schema is also introduced. OWL data includes inheritance information of classes and properties. When OWL data is extracted, hierarchy information should be considered. For this reason, an additional XML document is created to preserve hierarchy information and stored in an XML database system. An existing numbering scheme is utilized to extract ancestor/descendent relationships, and order information of nodes is added as attribute values of elements in an XML document. Thus, it is possible to ...
Proceedings of the 23rd International Conference on World Wide Web
For predicting the diffusion process of information, we introduce and analyze a new correlation b... more For predicting the diffusion process of information, we introduce and analyze a new correlation between the information adoptions of users sharing a friend in online social networks. Based on the correlation, we propose a probabilistic model to estimate the probability of a user's adoption using the naive Bayes classifier. Next, we build a recommendation method using the probabilistic model. Finally, we demonstrate the effectiveness of the proposed method with the data from Flickr and Movielens which are well-known web services. For all cases in the experiments, the proposed method is more accurate than comparison methods.
Content Based Image Retrieval (CBIR) is to store and retrieve images using the feature descriptio... more Content Based Image Retrieval (CBIR) is to store and retrieve images using the feature description of image contents. In order to support more accurate image retrieval, it has become necessary to develop features that can effectively describe image contents. The commonly used low-level features, such as color, texture, and shape features may not be directly mapped to human visual perception. In addition, such features cannot effectively describe a single image that contains multiple objects of interest. As a result, the research on feature descriptions has shifted to focus on higher-level features, which support representations more similar to human visual perception like spatial relationships between objects. Nevertheless, the prior works on the representation of spatial relations still have shortcomings, particularly with respect to supporting rotational invariance, Rotational invariance is a key requirement for a feature description to provide robust and accurate retrieval of ima...
Efficient query processing for complex spatial objects is one of the most challenging requirement... more Efficient query processing for complex spatial objects is one of the most challenging requirements in non-traditional applications such as geographic information systems, computer-aided design, and multimedia databases. The performance of spatial query processing can be improved by decomposing a complex object into a small number of simple components. This paper investigates the natural trade-off between the number and the complexity of decomposed components. In particular, we propose a new object decomposition method that can control the number of components using a parameter. This method enables the user to select the optimal trade-off by controlling the parameter. The proposed method is compared with traditional decomposition methods by an analytical study and experimental measurements. These comparisons show that our decomposition method outperforms traditional decomposition methods.
We present ONTOMS2, an efficient and scalable ONTOlogy Management System with an incremental reas... more We present ONTOMS2, an efficient and scalable ONTOlogy Management System with an incremental reasoning. ONTOMS2 stores an OWL document and processes OWL-QL and SPARQL queries. Especially, ONTOMS2 supports SPARQL Update queries with an incremental instance reasoning of inverseOf, symmetric and transitive properties.
Proceedings of the 23rd International Conference on World Wide Web
Based on the propagation of information in social networks, social network services have been con... more Based on the propagation of information in social networks, social network services have been considered as an effective viral marketing platform. To maximize the benefit of viral marketing in a social network, influence maximization is introduced, and many researches have been proposed to approximate it efficiently in large networks. In this paper, we approximate the influence maximization problem with the following two steps: extracting candidates and finding approximated optimal seeds. For the first step, we investigate how to remove unnecessary nodes for influence maximization based on optimal seed's local influence heuristics. For the second step, we devise a new simulated annealing method with a fast fitness function. In our experiments, we evaluate our proposed method with real-life datasets, and compare it with recent existing methods. From our experimental results, the proposed method is at least an order of magnitude faster than the existing methods while achieving high accuracy. In addition, we demonstrate that our candidate extraction method is very effective to exclude uninfluential nodes, so it can be used to make any algorithm for influence maximization much faster.
This article studies I/O-efficient algorithms for the triangle listing problem and the triangle c... more This article studies I/O-efficient algorithms for the triangle listing problem and the triangle counting problem , whose solutions are basic operators in dealing with many other graph problems. In the former problem, given an undirected graph G , the objective is to find all the cliques involving 3 vertices in G . In the latter problem, the objective is to report just the number of such cliques without having to enumerate them. Both problems have been well studied in internal memory, but still remain as difficult challenges when G does not fit in memory, thus making it crucial to minimize the number of disk I/Os performed. Although previous research has attempted to tackle these challenges, the state-of-the-art solutions rely on a set of crippling assumptions to guarantee good performance. Motivated by this, we develop a new algorithm that is provably I/O and CPU efficient at the same time, without making any assumption on the input G at all. The algorithm uses ideas drastically dif...
Mobile sensor networks consist of a number of sensor nodes which are capable of sensing, processi... more Mobile sensor networks consist of a number of sensor nodes which are capable of sensing, processing, communicating and moving. These mobile sensor nodes move around and explore their surrounding areas. Top-k queries are useful in many mobile sensor network applications. However, the mobility of sensor nodes incurs new challenges in addition to the problems of static sensor networks (i.e., resource constraints). Since mobile sensor nodes tend to move continuously, the network condition changes frequently and they consume considerably more energy than static sensor nodes. In this paper, we propose an efficient top-k query processing framework in a mobile sensor network environment called mSensor. To construct an efficient routing topology, we devise a mobility-aware routing method. Using the semantics of the top-k query, we develop a filter-based data collection method which can save the energy consumption and provide more accurate query results. We also devise a data compression method for disconnected sensor nodes to deal with the problem of limited memory space of sensor nodes. The performance of our proposed approach is extensively evaluated using synthetic data sets and real data sets. The results show the effectiveness of our approach.
IEEE Transactions on Knowledge and Data Engineering
Influence maximization is introduced to maximize the profit of viral marketing in social networks... more Influence maximization is introduced to maximize the profit of viral marketing in social networks. The weakness of influence maximization is that it does not distinguish specific users from others, even if some items can be only useful for the specific users. For such items, it is a better strategy to focus on maximizing the influence on the specific users. In this paper, we formulate an influence maximization problem as query processing to distinguish specific users from others. We show that the query processing problem is NP-hard and its objective function is submodular. We propose an expectation model for the value of the objective function and a fast greedy-based approximation method using the expectation model. For the expectation model, we investigate a relationship of paths between users. For the greedy method, we work out an efficient incremental updating of the marginal gain to our objective function. We conduct experiments to evaluate the proposed method with real-life datasets, and compare the results with those of existing methods that are adapted to the problem. From our experimental results, the proposed method is at least an order of magnitude faster than the existing methods in most cases while achieving high accuracy.
Community detection in social networks is one of the most active problems with lots of applicatio... more Community detection in social networks is one of the most active problems with lots of applications. Most of the existing works on the problem have focused on detecting the community considering only the closeness between community members. In the real world, however, it is also important to consider bad relationships between members. In this paper, we propose a new variant of the community detection problem, called friendly community search. In the proposed problem, for a given graph, we aim to not only find a densely connected subgraph that contains a given set of query nodes but also minimizes the number of nodes involved in bad relationships in the subgraph. We prove that is Non-deterministic Polynomial-time hard (NP-hard), and develop two novel algorithms, called Greedy and SteinerSwap that return the near optimal solutions. Experimental results show that two proposed algorithms outperform the algorithm adapted from an existing algorithm for the optimal quasi-clique problem.
The volume of spatio-textual data is drastically increasing in these days, and this makes more an... more The volume of spatio-textual data is drastically increasing in these days, and this makes more and more essential to process such a large-scale spatio-textual dataset. Even though numerous works have been studied for answering various kinds of spatio-textual queries, the analyzing method for spatio-textual data has rarely been considered so far. Motivated by this, this paper proposes a k-means based clustering algorithm specialized for a massive spatio-textual data. One of the strong points of the k-means algorithm lies in its efficiency and scalability, implying that it is appropriate for a large-scale data. However, it is challenging to apply the normal k-means algorithm to spatio-textual data, since each spatio-textual object has non-numeric attributes, that is, textual dimension, as well as numeric attributes, that is, spatial dimension. We address this problem by using the expected distance between a random pair of objects rather than constructing actual centroid of each cluster. Based on our experimental results, we show that the clustering quality of our algorithm is comparable to those of other k-partitioning algorithms that can process spatio-textual data, and its efficiency is superior to those competitors. platforms such as Twitter and Facebook, the importance of extract
The learning-enhanced relevance feedback has been one of the most active research areas in conten... more The learning-enhanced relevance feedback has been one of the most active research areas in content-based image retrieval in recent years. However, few methods using the relevance feedback are currently available to process relatively complex queries on large image databases. In the case of complex image queries, the feature space and the distance function of the user's perception are usually different from those of the system. This difference leads to the representation of a query with multiple clusters (i.e., regions) in the feature space. Therefore, it is necessary to handle disjunctive queries in the feature space. In this paper, we propose a new content-based image retrieval method using adaptive classification and clustermerging to find multiple clusters of a complex image query. When the measures of a retrieval method are invariant under linear transformations, the method can achieve the same retrieval quality regardless of the shapes of clusters of a query. Our method achieves the same high retrieval quality regardless of the shapes of clusters of a query since it uses such measures. Extensive experiments show that the result of our method converges to the user's true information need fast, and the retrieval quality of our method is about 22% in recall and 20% in precision better than that of the query expansion approach, and about 34% in recall and about 33% in precision better than that of the query point movement approach, in MARS.
This paper investigates the MaxRS problem in spatial databases. Given a set O of weighted points ... more This paper investigates the MaxRS problem in spatial databases. Given a set O of weighted points and a rectangu-lar region r of a given size, the goal of the MaxRS problem is to find a location of r such that the sum of the weights of all the points covered by r is maximized. This problem is use-ful in many location-based applications such as finding the best place for a new franchise store with a limited delivery range and finding the most attractive place for a tourist with a limited reachable range. However, the problem has been studied mainly in theory, particularly, in computational ge-ometry. The existing algorithms from the computational geometry community are in-memory algorithms which do not guarantee the scalability. In this paper, we propose a scalable external-memory algorithm (ExactMaxRS) for the MaxRS problem, which is optimal in terms of the I/O com-plexity. Furthermore, we propose an approximation algo-rithm (ApproxMaxCRS) for the MaxCRS problem that is a circle vers...
This paper describes a decision tree model and 3-dimensional representation of information retrie... more This paper describes a decision tree model and 3-dimensional representation of information retrieved from various weblogs in relation to argumentative logics. The weblogs are considered as datasets that show significant correlations between the queries applied to them. We have extracted a compact set of rules to support the dataset with the queries and employed effective evaluation metrics to evaluate the weighted average of the weblogs categorized into different types. The opinions from the weblogs are retrieved and represented as an object oriented 3-Dimensional system. The goal of our approach is to generate rules from rough sets and to represent them in a 3-dimensional interactive program, Blog Cosmos. We used rough set theory as a candidate framework for query refinement.
Efficient query processing for complex spatial objects is one of the most challenging requirement... more Efficient query processing for complex spatial objects is one of the most challenging requirements in non-traditional applications such as geographic information systems, computer-aided design, and multimedia databases. The performance of spatial query processing can be improved by decomposing a complex object into a small number of simple components. This paper investigates the natural trade-off between the number and the complexity of decomposed components. In particular, we propose a new object decomposition method that can control the number of components using a parameter. This method enables the user to select the optimal trade-off by controlling the parameter. The proposed method is compared with traditional decomposition methods by an analytical study and experimental measurements. These comparisons show that our decomposition method outperforms traditional decomposition methods.
The learning-enhanced relevance feedback has been one of the most active research areas in conten... more The learning-enhanced relevance feedback has been one of the most active research areas in content-based image re-trieval in recent years. However, few methods using the rel-evance feedback are currently available to process relatively complex queries on large image databases. In the case of complex image queries, the feature space and the distance function of the user’s perception are usually different from those of the system. This difference leads to the represen-tation of a query with multiple clusters (i.e., regions) in the feature space. Therefore, it is necessary to handle disjunc-tive queries in the feature space. In this paper, we propose a new content-based image retrieval method using adaptive classification and cluster-merging to find multiple clusters of a complex image query. When the measures of a retrieval method are invariant under linear transformations, the method can achieve the same re-trieval quality regardless of the shapes of clusters of a query. Our method a...
Proceedings of the 23rd International Conference on World Wide Web - WWW '14 Companion
The semantic Web is a promising future Web environment. In order to realize the semantic Web, the... more The semantic Web is a promising future Web environment. In order to realize the semantic Web, the semantic annotation should be widely available. The studies for generating the semantic annotation do not provide a solution to the 'document evolution' requirement which is to maintain the consistency between semantic annotations and Web pages. In this paper, we propose an efficient solution to the requirement, that is to separately generate the longterm annotation and the short-term annotation. The experimental results show that our approach outperforms an existing approach which is the most efficient among the automatic approaches based on static Web pages.
With the advances in multimedia databases on the World Wide Web, it becomes more important to pro... more With the advances in multimedia databases on the World Wide Web, it becomes more important to provide users with the search capability of distributed multimedia data. While there have been many studies about the database selection and the collection fusion for text databases. The multimedia databases on the Web have autonomous and heterogeneous properties and they use mainly the content based retrieval. The collection fusion problem of multimedia databases is concerned with the merging of results retrieved by content based retrieval from heterogeneous multimedia databases on the Web. This problem is crucial for the search in distributed multimedia databases, however, it has not been studied yet. This paper provides novel algorithms for processing the collection fusion of heterogeneous multimedia databases on the Web. We propose two heuristic algorithms for estimating the number of objects to be retrieved from local databases and an algorithm using the linear regression. Extensive ex...
Proceedings of the 23rd International Conference on World Wide Web, 2014
The betweenness centrality is a measure for the relative participation of the vertex in the short... more The betweenness centrality is a measure for the relative participation of the vertex in the shortest paths in the graph. In many cases, we are interested in the k-highest betweenness centrality vertices only rather than all the vertices in a graph. In this paper, we study an efficient algorithm for finding the exact k-highest betweenness centrality vertices.
With respect to the Semantic Web proposed to overcome the limitation of the Web, OWL has been rec... more With respect to the Semantic Web proposed to overcome the limitation of the Web, OWL has been recommended as the ontology language used to give a well-defined meaning to diverse data. OWL is the representative ontology language suggested by W3C. An efficient retrieval of OWL data requires a well-constructed storage schema. In this paper, we propose a storage schema construction technique which supports more efficient query processing. A retrieval technique corresponding to the proposed storage schema is also introduced. OWL data includes inheritance information of classes and properties. When OWL data is extracted, hierarchy information should be considered. For this reason, an additional XML document is created to preserve hierarchy information and stored in an XML database system. An existing numbering scheme is utilized to extract ancestor/descendent relationships, and order information of nodes is added as attribute values of elements in an XML document. Thus, it is possible to ...
Proceedings of the 23rd International Conference on World Wide Web
For predicting the diffusion process of information, we introduce and analyze a new correlation b... more For predicting the diffusion process of information, we introduce and analyze a new correlation between the information adoptions of users sharing a friend in online social networks. Based on the correlation, we propose a probabilistic model to estimate the probability of a user's adoption using the naive Bayes classifier. Next, we build a recommendation method using the probabilistic model. Finally, we demonstrate the effectiveness of the proposed method with the data from Flickr and Movielens which are well-known web services. For all cases in the experiments, the proposed method is more accurate than comparison methods.
Content Based Image Retrieval (CBIR) is to store and retrieve images using the feature descriptio... more Content Based Image Retrieval (CBIR) is to store and retrieve images using the feature description of image contents. In order to support more accurate image retrieval, it has become necessary to develop features that can effectively describe image contents. The commonly used low-level features, such as color, texture, and shape features may not be directly mapped to human visual perception. In addition, such features cannot effectively describe a single image that contains multiple objects of interest. As a result, the research on feature descriptions has shifted to focus on higher-level features, which support representations more similar to human visual perception like spatial relationships between objects. Nevertheless, the prior works on the representation of spatial relations still have shortcomings, particularly with respect to supporting rotational invariance, Rotational invariance is a key requirement for a feature description to provide robust and accurate retrieval of ima...
Efficient query processing for complex spatial objects is one of the most challenging requirement... more Efficient query processing for complex spatial objects is one of the most challenging requirements in non-traditional applications such as geographic information systems, computer-aided design, and multimedia databases. The performance of spatial query processing can be improved by decomposing a complex object into a small number of simple components. This paper investigates the natural trade-off between the number and the complexity of decomposed components. In particular, we propose a new object decomposition method that can control the number of components using a parameter. This method enables the user to select the optimal trade-off by controlling the parameter. The proposed method is compared with traditional decomposition methods by an analytical study and experimental measurements. These comparisons show that our decomposition method outperforms traditional decomposition methods.
We present ONTOMS2, an efficient and scalable ONTOlogy Management System with an incremental reas... more We present ONTOMS2, an efficient and scalable ONTOlogy Management System with an incremental reasoning. ONTOMS2 stores an OWL document and processes OWL-QL and SPARQL queries. Especially, ONTOMS2 supports SPARQL Update queries with an incremental instance reasoning of inverseOf, symmetric and transitive properties.
Proceedings of the 23rd International Conference on World Wide Web
Based on the propagation of information in social networks, social network services have been con... more Based on the propagation of information in social networks, social network services have been considered as an effective viral marketing platform. To maximize the benefit of viral marketing in a social network, influence maximization is introduced, and many researches have been proposed to approximate it efficiently in large networks. In this paper, we approximate the influence maximization problem with the following two steps: extracting candidates and finding approximated optimal seeds. For the first step, we investigate how to remove unnecessary nodes for influence maximization based on optimal seed's local influence heuristics. For the second step, we devise a new simulated annealing method with a fast fitness function. In our experiments, we evaluate our proposed method with real-life datasets, and compare it with recent existing methods. From our experimental results, the proposed method is at least an order of magnitude faster than the existing methods while achieving high accuracy. In addition, we demonstrate that our candidate extraction method is very effective to exclude uninfluential nodes, so it can be used to make any algorithm for influence maximization much faster.
This article studies I/O-efficient algorithms for the triangle listing problem and the triangle c... more This article studies I/O-efficient algorithms for the triangle listing problem and the triangle counting problem , whose solutions are basic operators in dealing with many other graph problems. In the former problem, given an undirected graph G , the objective is to find all the cliques involving 3 vertices in G . In the latter problem, the objective is to report just the number of such cliques without having to enumerate them. Both problems have been well studied in internal memory, but still remain as difficult challenges when G does not fit in memory, thus making it crucial to minimize the number of disk I/Os performed. Although previous research has attempted to tackle these challenges, the state-of-the-art solutions rely on a set of crippling assumptions to guarantee good performance. Motivated by this, we develop a new algorithm that is provably I/O and CPU efficient at the same time, without making any assumption on the input G at all. The algorithm uses ideas drastically dif...
Mobile sensor networks consist of a number of sensor nodes which are capable of sensing, processi... more Mobile sensor networks consist of a number of sensor nodes which are capable of sensing, processing, communicating and moving. These mobile sensor nodes move around and explore their surrounding areas. Top-k queries are useful in many mobile sensor network applications. However, the mobility of sensor nodes incurs new challenges in addition to the problems of static sensor networks (i.e., resource constraints). Since mobile sensor nodes tend to move continuously, the network condition changes frequently and they consume considerably more energy than static sensor nodes. In this paper, we propose an efficient top-k query processing framework in a mobile sensor network environment called mSensor. To construct an efficient routing topology, we devise a mobility-aware routing method. Using the semantics of the top-k query, we develop a filter-based data collection method which can save the energy consumption and provide more accurate query results. We also devise a data compression method for disconnected sensor nodes to deal with the problem of limited memory space of sensor nodes. The performance of our proposed approach is extensively evaluated using synthetic data sets and real data sets. The results show the effectiveness of our approach.
IEEE Transactions on Knowledge and Data Engineering
Influence maximization is introduced to maximize the profit of viral marketing in social networks... more Influence maximization is introduced to maximize the profit of viral marketing in social networks. The weakness of influence maximization is that it does not distinguish specific users from others, even if some items can be only useful for the specific users. For such items, it is a better strategy to focus on maximizing the influence on the specific users. In this paper, we formulate an influence maximization problem as query processing to distinguish specific users from others. We show that the query processing problem is NP-hard and its objective function is submodular. We propose an expectation model for the value of the objective function and a fast greedy-based approximation method using the expectation model. For the expectation model, we investigate a relationship of paths between users. For the greedy method, we work out an efficient incremental updating of the marginal gain to our objective function. We conduct experiments to evaluate the proposed method with real-life datasets, and compare the results with those of existing methods that are adapted to the problem. From our experimental results, the proposed method is at least an order of magnitude faster than the existing methods in most cases while achieving high accuracy.
Community detection in social networks is one of the most active problems with lots of applicatio... more Community detection in social networks is one of the most active problems with lots of applications. Most of the existing works on the problem have focused on detecting the community considering only the closeness between community members. In the real world, however, it is also important to consider bad relationships between members. In this paper, we propose a new variant of the community detection problem, called friendly community search. In the proposed problem, for a given graph, we aim to not only find a densely connected subgraph that contains a given set of query nodes but also minimizes the number of nodes involved in bad relationships in the subgraph. We prove that is Non-deterministic Polynomial-time hard (NP-hard), and develop two novel algorithms, called Greedy and SteinerSwap that return the near optimal solutions. Experimental results show that two proposed algorithms outperform the algorithm adapted from an existing algorithm for the optimal quasi-clique problem.
The volume of spatio-textual data is drastically increasing in these days, and this makes more an... more The volume of spatio-textual data is drastically increasing in these days, and this makes more and more essential to process such a large-scale spatio-textual dataset. Even though numerous works have been studied for answering various kinds of spatio-textual queries, the analyzing method for spatio-textual data has rarely been considered so far. Motivated by this, this paper proposes a k-means based clustering algorithm specialized for a massive spatio-textual data. One of the strong points of the k-means algorithm lies in its efficiency and scalability, implying that it is appropriate for a large-scale data. However, it is challenging to apply the normal k-means algorithm to spatio-textual data, since each spatio-textual object has non-numeric attributes, that is, textual dimension, as well as numeric attributes, that is, spatial dimension. We address this problem by using the expected distance between a random pair of objects rather than constructing actual centroid of each cluster. Based on our experimental results, we show that the clustering quality of our algorithm is comparable to those of other k-partitioning algorithms that can process spatio-textual data, and its efficiency is superior to those competitors. platforms such as Twitter and Facebook, the importance of extract
Uploads
Papers by Chin-Wan Chung