Ricardo Baeza-Yates

Pompeu Fabra University, DTIC, Faculty Member

Followers

Following

Co-authors

Public Views

Alireza Noruzi

University of Tehran

Aswani Kumar Cherukuri

VIT University

Alexander Arzamastsev

Tambov State University

Kati (Katalin) Prajda

University of Vienna

Lev Manovich

Graduate Center of the City University of New York

Carlo Bianchini

University of Pavia

Samia C H E H B I Gamoura

Université de Strasbourg

Andres Garcia-Camino

CSIC (Consejo Superior de Investigaciones Científicas-Spanish National Research Council)

Maria Teresa Biagetti

Università degli Studi "La Sapienza" di Roma

Mohamad Ivan Fanany

University of Indonesia

Interests

Uploads

Papers by Ricardo Baeza-Yates

The Components and Impact of Sponsored Search

by Jim Jansen, Theresa Clarke, Ricardo Baeza-Yates, and Ricardo Baeza-Yates

Computer, 2000

Sponsored search advertising has dramatically impacted search engines, consumers, and organizatio... more

Download

A model and a visual query language for structured text

Proceedings. String Processing and Information Retrieval: A South American Symposium (Cat. No.98EX207), 1998

We present a new model to query document databases by content and structure. The main merits of t... more We present a new model to query document databases by content and structure. The main merits of the model are: it allows rich structure in the documents; the query algebra is intuitive (moreover, complemented by a visual query language) and powerful; it is e ciently implementable; it can be built on top of a traditional indexing system or even with no index at all; it is strongly oriented to user-de nable relevance ranking instead of boolean logic; and it allows exible visualization of results in terms of structure, contents and highlighting of user-de ned important parts in the query.

Download

Modern Information Retrieval: Addison Wesley

Web Spam Challenge

by Ricardo Baeza-Yates and Brian D Davison

EvoluciÛn de la Web Chilena 2001-2002

Evolution of the Web Structure

World Wide Web Conference Series, 2003

Concurrent query processing using distributed inverted files

by Ricardo Baeza-Yates and N. Ziviani

String Processing and Information Retrieval, 2001

Capacity Planning for Vertical Search Engines

by Ricardo Baeza-Yates and N. Ziviani

Computing Research Repository, 2010

Vertical search engines focus on specific slices of content, such as the Web of a single country ... more Vertical search engines focus on specific slices of content, such as the Web of a single country or the document collection of a large corporation. Despite this, like general open web search engines, they are expensive to maintain, expensive to operate, and hard to design. Because of this, predicting the response time of a vertical search engine is usually done

Download

A model for visualizing large answers in WWW retrieval

Proceedings SCCC'98. 18th International Conference of the Chilean Society of Computer Science (Cat. No.98EX212), 1998

In this paper we present a model f or visualizing large collections of documents in World Wide We... more In this paper we present a model f or visualizing large collections of documents in World Wide Web retrieval, independently of t he retrieval system. Our proposal allows to ease the use of visualization tools which partially solve the problem of data o verload on Internet. We present a specific software architecture to separate the user interface from the retrieval

A model and software architecture for search results visualization on the WWW

Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000, 2000

We analyze the dependency problem of the user interface with the information retrieval software. ... more We analyze the dependency problem of the user interface with the information retrieval software. Our approach allows the separation of the user interface from the retrieval component. This is useful when the user wants to select an interface or visualization metaphor that could not always be available for different information retrieval systems. We present a model for visualizing large collections

Alternative implementation techniques for Web text visualization

Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726), 2003

We present an approach for building text visualizations that avoids using plug-ins or clients bas... more We present an approach for building text visualizations that avoids using plug-ins or clients based on languages like Java. Instead we propose to make the search engine application more aware of the visualization process and use the web browser standard features to do the rendering work. We demonstrate the ideas with a text visualization metaphor implementation that is part of

New Models and Algorithms for Multidimensional Approximate Pattern Matching

Journal of Discrete Algorithms, 2000

We focus on how to compute the edit distance (or similarity) between two images and the problem o... more We focus on how to compute the edit distance (or similarity) between two images and the problem of approximate string matching in two dimensions, that is, to find a pattern of size in a text of size with at most errors (character substitutions, insertions and deletions). Pattern and text are matrices over an alphabet o f size . We present

A language for queries on structure and contents of textual databases

Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '95, 1995

Permission to make digiL:~l/ll:lr~lc{jpics OFail or p:lrt of this m:~tcri:d without fee is grante... more Permission to make digiL:~l/ll:lr~lc{jpics OFail or p:lrt of this m:~tcri:d without fee is granted provided th:~l lhc c{)pics :lre m)t tn:lcie or distributed for profit or commcrci:il adwrnt:~gc, the ACM c,Jpyright/ server notice, the title of&amp;amp;amp;amp;amp;amp;#x27; the puhlic:](it)t~ and its ci:itc appc:w, :md notice ...

The impact of caching on search engines

Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07, 2007

Abstract In this paper we study the trade-offs in designing efficient caching systems for Web sea... more Abstract In this paper we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs. caching posting lists. Using a query log spanning a whole year we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the ...

The effect of links on networked user engagement

by Ricardo Baeza-Yates, G. Dupret, and Pinard Donmez

Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion, 2012

In the online world, user engagement refers to the phenomena associated with being captivated by ... more In the online world, user engagement refers to the phenomena associated with being captivated by a web application and wanting to use it longer and frequently. Nowadays, many providers operate multiple content sites, very different from each other. Due to their extremely varied content, these are usually studied and optimized separately. However, user engagement should be examined not only within individual sites, but also across sites, that is the entire content provider network. In previous work, we investigated networked user engagement, by defining a global measure of engagement that captures the effect that sites have on the engagement on other sites within the same browsing session. Here, we look at the effect of links on networked user engagement, as these are commonly used by online content providers to increase user engagement.

Download

Load-Balancing and Caching for Collection Selection Architectures

by Ricardo Baeza-Yates and Diego Puppin

Proceedings of the 2nd International ICST Conference on Scalable Information Systems, 2007

To address the rapid growth of the Internet, modern Web search engines have to adopt distributed ... more To address the rapid growth of the Internet, modern Web search engines have to adopt distributed organizations, where the collection of indexed documents is partitioned among several servers, and query answering is performed as a parallel and distributed task. Collection selection can be a way to reduce the overall computing load, by finding a trade-off between the quality of results retrieved and the cost of solving queries. In this paper, we analyze the relationship between the collection selection strategy, the effect on load balancing and on the caching subsystem, by exploring the design-space of a distributed search engine based on collection selection. In particular, we propose a strategy to perform collection selection in a load-driven way, and a novel caching policy able to incrementally refine the effectiveness of the results returned for each subsequent cache hit. The combination of load-driven collection selection and incremental caching strategies allows our system to retrieve two thirds of the top-ranked results returned by a baseline centralized index, with only one fifth of the computing workload.

Download

Admission Policies for Caches of Search Engine Results

Lecture Notes in Computer Science, 2007

This paper studies the impact of the tail of the query distribution on caches of Web search engin... more This paper studies the impact of the tail of the query distribution on caches of Web search engines, and proposes a technique for achieving higher hit ratios compared to traditional heuristics such as LRU. The main problem we solve is the one of identifying infrequent queries, which cause a reduction on hit ratio because caching them often does not lead to hits. To mitigate this problem, we introduce a cache management policy that employs an admission policy to prevent infrequent queries from taking space of more frequent queries in the cache. The admission policy uses either stateless features, which depend only on the query, or stateful features based on usage information. The proposed management policy is more general than existing policies for caching of search engine results, and it is fully dynamic. The evaluation results on two different query logs show that our policy achieves higher hit ratios when compared to previously proposed cache management policies.

Download

Query-sets

Proceeding of the 17th international conference on World Wide Web - WWW '08, 2008

Website Privacy Preservation for Query Log Publishing

by Ricardo Baeza-Yates and Myra Spiliopoulou

Lecture Notes in Computer Science, 2008

In this paper we study privacy preservation for the publication of search engine query logs. In p... more In this paper we study privacy preservation for the publication of search engine query logs. In particular, we introduce a new privacy concern, which is that of website privacy (or business privacy). We define the possible adversaries that could be interested in disclosing website information and the vulnerabilities found in the query log, from which they could benefit. In this work we also detail anonymization techniques to protect website information, and explore the different types of attacks that an adversary could use. We then present a graph-based heuristic to validate the effectiveness of our anonymization method, and perform an experimental evaluation of this approach. Our experimental results show that the query log can be appropriately anonymized against a specific attack for website exposure, by only removing approximately 9% of the total volume of queries and clicked URLs.

Download

Very fast and simple approximate string matching

... Yates and G. Navarro, Faster approximate string matching. Algorithmica 23 2 (1999), pp. 1271... more

The Components and Impact of Sponsored Search

by Jim Jansen, Theresa Clarke, Ricardo Baeza-Yates, and Ricardo Baeza-Yates

Computer, 2000

Sponsored search advertising has dramatically impacted search engines, consumers, and organizatio... more

Download

A model and a visual query language for structured text

Proceedings. String Processing and Information Retrieval: A South American Symposium (Cat. No.98EX207), 1998

Download

Modern Information Retrieval: Addison Wesley

Web Spam Challenge

by Ricardo Baeza-Yates and Brian D Davison

EvoluciÛn de la Web Chilena 2001-2002

Evolution of the Web Structure

World Wide Web Conference Series, 2003

Concurrent query processing using distributed inverted files

by Ricardo Baeza-Yates and N. Ziviani

String Processing and Information Retrieval, 2001

Capacity Planning for Vertical Search Engines

by Ricardo Baeza-Yates and N. Ziviani

Computing Research Repository, 2010

Download

A model for visualizing large answers in WWW retrieval

Proceedings SCCC'98. 18th International Conference of the Chilean Society of Computer Science (Cat. No.98EX212), 1998

A model and software architecture for search results visualization on the WWW

Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000, 2000

Alternative implementation techniques for Web text visualization

Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726), 2003

New Models and Algorithms for Multidimensional Approximate Pattern Matching

Journal of Discrete Algorithms, 2000

A language for queries on structure and contents of textual databases

Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '95, 1995

The impact of caching on search engines

Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07, 2007

The effect of links on networked user engagement

by Ricardo Baeza-Yates, G. Dupret, and Pinard Donmez

Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion, 2012

Download

Load-Balancing and Caching for Collection Selection Architectures

by Ricardo Baeza-Yates and Diego Puppin

Proceedings of the 2nd International ICST Conference on Scalable Information Systems, 2007

Download

Admission Policies for Caches of Search Engine Results

Lecture Notes in Computer Science, 2007

Download

Query-sets

Proceeding of the 17th international conference on World Wide Web - WWW '08, 2008

Website Privacy Preservation for Query Log Publishing

by Ricardo Baeza-Yates and Myra Spiliopoulou

Lecture Notes in Computer Science, 2008

Download

Very fast and simple approximate string matching

... Yates and G. Navarro, Faster approximate string matching. Algorithmica 23 2 (1999), pp. 1271... more

Ricardo Baeza-Yates

Uploads

Papers by Ricardo Baeza-Yates

Log In