Purpose -To identify the pros and the cons of Google Scholar. Design/methodology/approach -Chroni... more Purpose -To identify the pros and the cons of Google Scholar. Design/methodology/approach -Chronicles the recent history of the Google Scholar search engine from its inception in November 2004 and critiques it with regard to its merits and demerits. Findings -Feels that there are massive content omissions presently but that, with future changes in its structure, Google Scholar will become an excellent free tool for scholarly information discovery and retrieval. Originality/value -Presents a useful analysis for potential users of the Google Scholar site.
... REVIEWED BY PABLO FERNANDEZ irst Scene.. California, 1998. ... Departamento de Matemfiticas F... more ... REVIEWED BY PABLO FERNANDEZ irst Scene.. California, 1998. ... Departamento de Matemfiticas Facultad de Ciencias Universidad Aut6noma de Madrid Ciudad Universitaria de Cant9169 28049 Madrid, Spain e-mail: [email protected] Mathematical Form: ...
Objective To determine how often searching with Google (the most popular search engine on the wor... more Objective To determine how often searching with Google (the most popular search engine on the world wide web) leads doctors to the correct diagnosis. Design Internet based study using Google to search for diagnoses; researchers were blind to the correct diagnoses.
Several approaches to collaborative filtering have been studied but seldom have studies been repo... more Several approaches to collaborative filtering have been studied but seldom have studies been reported for large (several million users and items) and dynamic (the underlying item set is continually changing) settings. In this paper we describe our approach to collaborative filtering for generating personalized recommendations for users of Google News. We generate recommendations using three approaches: collaborative filtering using MinHash clustering, Probabilistic Latent Semantic Indexing (PLSI), and covisitation counts. We combine recommendations from different algorithms using a linear model. Our approach is content agnostic and consequently domain independent, making it easily adaptable for other applications and languages with minimal effort. This paper will describe our algorithms and system setup in detail, and report results of running the recommendations engine on Google News.
IEEE Transactions on Knowledge and Data Engineering, 2007
Words and phrases acquire meaning from the way they are used in society, from their relative sema... more Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of 'society' is 'database,' and the equivalent of 'use' is 'way to search the database.' We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts we use the world-wide-web as database, and Google as search engine. The method is also applicable to other search engines and databases. This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the world-wideweb using Google page counts. The world-wide-web is the largest database on earth, and the context information entered by millions of independent users averages out to provide automatic semantics of useful quality. We give applications in hierarchical clustering, classification, and language translation. We give examples to distinguish between colors and numbers, cluster names of paintings by 17th century Dutch masters and names of books by English novelists, the ability to understand emergencies, and primes, and we demonstrate the ability to do a simple automatic English-Spanish translation. Finally, we use the WordNet database as an objective baseline against which to judge the performance of our method. We conduct a massive randomized trial in binary classification using support vector machines to learn categories based on our Google distance, resulting in an a mean agreement of 87% with the expert crafted WordNet categories.
Current approaches to object category recognition require datasets of training images to be manua... more Current approaches to object category recognition require datasets of training images to be manually prepared, with varying degrees of supervision. We present an approach that can learn an object category from just its name, by utilizing the raw output of image search engines available on the Internet. We develop a new model, TSI-pLSA, which extends pLSA (as applied to visual words) to include spatial information in a translation and scale invariant manner. Our approach can handle the high intra-class variability and large proportion of unrelated images returned by search engines. We evaluate the models on standard test sets, showing performance competitive with existing methods trained on hand prepared datasets.
We describe an approach to object and scene retrieval which searches for and localizes all the oc... more We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors.
We have designed and implemented the Google File System, a scalable distributed file system for l... more We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.
Purpose -To identify the pros and the cons of Google Scholar. Design/methodology/approach -Chroni... more Purpose -To identify the pros and the cons of Google Scholar. Design/methodology/approach -Chronicles the recent history of the Google Scholar search engine from its inception in November 2004 and critiques it with regard to its merits and demerits. Findings -Feels that there are massive content omissions presently but that, with future changes in its structure, Google Scholar will become an excellent free tool for scholarly information discovery and retrieval. Originality/value -Presents a useful analysis for potential users of the Google Scholar site.
... REVIEWED BY PABLO FERNANDEZ irst Scene.. California, 1998. ... Departamento de Matemfiticas F... more ... REVIEWED BY PABLO FERNANDEZ irst Scene.. California, 1998. ... Departamento de Matemfiticas Facultad de Ciencias Universidad Aut6noma de Madrid Ciudad Universitaria de Cant9169 28049 Madrid, Spain e-mail: [email protected] Mathematical Form: ...
Objective To determine how often searching with Google (the most popular search engine on the wor... more Objective To determine how often searching with Google (the most popular search engine on the world wide web) leads doctors to the correct diagnosis. Design Internet based study using Google to search for diagnoses; researchers were blind to the correct diagnoses.
Several approaches to collaborative filtering have been studied but seldom have studies been repo... more Several approaches to collaborative filtering have been studied but seldom have studies been reported for large (several million users and items) and dynamic (the underlying item set is continually changing) settings. In this paper we describe our approach to collaborative filtering for generating personalized recommendations for users of Google News. We generate recommendations using three approaches: collaborative filtering using MinHash clustering, Probabilistic Latent Semantic Indexing (PLSI), and covisitation counts. We combine recommendations from different algorithms using a linear model. Our approach is content agnostic and consequently domain independent, making it easily adaptable for other applications and languages with minimal effort. This paper will describe our algorithms and system setup in detail, and report results of running the recommendations engine on Google News.
IEEE Transactions on Knowledge and Data Engineering, 2007
Words and phrases acquire meaning from the way they are used in society, from their relative sema... more Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of 'society' is 'database,' and the equivalent of 'use' is 'way to search the database.' We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts we use the world-wide-web as database, and Google as search engine. The method is also applicable to other search engines and databases. This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the world-wideweb using Google page counts. The world-wide-web is the largest database on earth, and the context information entered by millions of independent users averages out to provide automatic semantics of useful quality. We give applications in hierarchical clustering, classification, and language translation. We give examples to distinguish between colors and numbers, cluster names of paintings by 17th century Dutch masters and names of books by English novelists, the ability to understand emergencies, and primes, and we demonstrate the ability to do a simple automatic English-Spanish translation. Finally, we use the WordNet database as an objective baseline against which to judge the performance of our method. We conduct a massive randomized trial in binary classification using support vector machines to learn categories based on our Google distance, resulting in an a mean agreement of 87% with the expert crafted WordNet categories.
Current approaches to object category recognition require datasets of training images to be manua... more Current approaches to object category recognition require datasets of training images to be manually prepared, with varying degrees of supervision. We present an approach that can learn an object category from just its name, by utilizing the raw output of image search engines available on the Internet. We develop a new model, TSI-pLSA, which extends pLSA (as applied to visual words) to include spatial information in a translation and scale invariant manner. Our approach can handle the high intra-class variability and large proportion of unrelated images returned by search engines. We evaluate the models on standard test sets, showing performance competitive with existing methods trained on hand prepared datasets.
We describe an approach to object and scene retrieval which searches for and localizes all the oc... more We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors.
We have designed and implemented the Google File System, a scalable distributed file system for l... more We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.
Uploads
Papers by Ouful SaiYf