Papers by Michael Hahsler
Journal of Business Economics, 2016
Association rule mining is one of the most popular data mining methods. However, mining associati... more Association rule mining is one of the most popular data mining methods. However, mining association rules often results in a very large number of found rules, leaving the analyst with the task to go through all the rules and discover interesting ones. Sifting manually through large sets of rules is time consuming and strenuous. Visualization has a long history of making large amounts of data better accessible using techniques like selecting and zooming. However, most association rule visualization techniques are still falling short when it comes to a large number of rules. In this paper we present a new interactive visualization technique which lets the user navigate through a hierarchy of groups of association rules. We demonstrate how this new visualization techniques can be used to analyze a large sets of association rules with examples from our implementation in the R-package arulesViz. * 42nd Symposium on the Interface: Statistical, Machine Learning, and Visualization Algorithms (Interface 2011)
myVU is a next generation recommender system based on observed consumer behavior and interactive ... more myVU is a next generation recommender system based on observed consumer behavior and interactive evolutionary algorithms implementing customer relationship management and one-to-one marketing in the educational and scientific broker system of a virtual university. myVU provides a personalized, adaptive WWW-based user interface for all members of a virtual university and it delivers routine recommendations for frequently used scientific and educational Web-sites.
Recent advances in Metagenomics and the Human Microbiome provide a complex landscape for dealing ... more Recent advances in Metagenomics and the Human Microbiome provide a complex landscape for dealing with a multitude of genomes all at once. One of the many challenges in this field is classification of the genomes present in a sample. Effective metagenomic classification and diversity analysis require complex representations of taxa. With this package we develop a suite of tools, based on novel quasi-alignment techniques to rapidly classify organisms using our new approach on a laptop computer instead of several multiprocessor servers. This approach will facilitate the development of fast and inexpensive devices for microbiome-based health screening in the near future.
ABSTRACT ePubWU ist eine elektronische Plattform für wissenschaftliche Publikationen der Wirtscha... more ABSTRACT ePubWU ist eine elektronische Plattform für wissenschaftliche Publikationen der Wirtschaftsuniversität Wien, wo forschungsbezogene Veröffentlichungen der WU im Volltext über das WWW zugänglich gemacht werden. ePubWU wird als Gemeinschaftsprojekt der Universitätsbibliothek der Wirtschaftsuniversität Wien und der Abteilung für Informationswirtschaft betrieben. Derzeit werden in ePubWU zwei Publikationsarten gesammelt - Working Papers und Dissertationen. In dem Beitrag werden Erfahrungen der über zweijährigen Laufzeit des Projektes dargestellt, u.a. in den Bereichen Akquisition, Workflows, Erschließung, Vermittlung.
Association rule mining is a popular data mining method available in R as the extension package a... more Association rule mining is a popular data mining method available in R as the extension package arules. However, mining association rules often results in a very large number of found rules, leaving the analyst with the task to go through all the rules and discover interesting ones. Sifting manually through large sets of rules is time consuming and strenuous. Visualization has a long history of making large data sets better accessible using techniques like selecting and zooming. In this paper we present the R-extension package arulesViz which implements several known and novel visualization techniques to explore association rules. With examples we show how these visualization techniques can be used to analyze a data set.
Mining frequent itemsets and association rules is a popular and well researched approach for disc... more Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.
J Stat Softw, 2005
Mining frequent itemsets and association rules is a popular and well researched approach to disco... more Mining frequent itemsets and association rules is a popular and well researched approach to discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.
Today users of Internet information services like e.g. Yahoo! or AltaVista often experience high ... more Today users of Internet information services like e.g. Yahoo! or AltaVista often experience high search costs. One important reason for this is the necessity to browse long reference lists manually, because of the well-known problems of relevance ranking. A possible remedy is to complement the references with automatically generated labels which provide valuable information about the referenced information source. Presenting suitably labelled lists of references to users aims at improving the clarity and thus comprehensibility of the information offered and at reducing the search cost. In the following we survey several dimensions for labelling (time, frequency of usage, region, language, subject, industry, and preferences) and the corresponding classification problems. To solve these problems automatically we sketch for each problem a pragmatic mix of machine learning methods and report selected results.
Mining frequent itemsets and association rules is a popular and well researched approach for disc... more Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.
Amazon.com paved the way for several large-scale, behavior-based recommendation services as an im... more Amazon.com paved the way for several large-scale, behavior-based recommendation services as an important value-added expert advice service for online book shops. In this contribution we discuss the effects (and possible reductions of transaction costs) for such services and investigate how such a value-added service can be implemented in the context of scientific libraries. For this purpose we present a new, recently developed recommender system based on a stochastic purchase incidence model, present the underlying stochastic model from repeat-buying theory and analyze whether the underlying assumptions on consumer behavior hold for users of scientific libraries, too. We have analyzed the logfiles with approximately 85 million http-transactions of the web-based online public access catalog (OPAC) of the library of the Universität Karlsruhe (TH) since January 2001 and performed some diagnostic checks. A test prototype is already operational and is currently being evaluated. The recommender service will be fully operational within the library system of the Universität Karlsruhe (TH) by the end of June 2002.
In this contribution we transfer a customer purchase incidence model for consumer products which ... more In this contribution we transfer a customer purchase incidence model for consumer products which is based on Ehrenberg's repeat-buying theory to Web-based information products. Ehrenberg's repeat-buying theory successfully describes regularities on a large number of consumer product markets. We show that these regularities exist in electronic markets for information goods, too, and that purchase incidence models provide a well founded theoretical base for recommender and alert services. The article consists of two parts. In the first part Ehrenberg's repeat-buying theory and its assumptions are reviewed and adapted for web-based information markets. Second, we present the empirical validation of the model based on data collected from the information market of the Virtual
ABSTRACT Recommendersysteme liefern einen wichtigen Beitrag f{\"u}r die Ausgestaltung vo... more ABSTRACT Recommendersysteme liefern einen wichtigen Beitrag f{\"u}r die Ausgestaltung von eMarketing Aktivit{\"a}ten. Ausgehend von einer Diskussion von Input/Output Charakteristika zur Beschreibung solcher Systeme, die bereits eine geeignete Unterscheidung praxisrelevanter Erscheinungsformen erlauben, wird motiviert, warum eine solche Charakterisierung durch die Einbeziehung methodischer Aspekte aus der Marketing Forschung angereichert werden muss. Ein auf der Theorie des Wiederkaufverhaltens basierendes Recommendersystem sowie ein System, das Empfehlungen mittels Analyse des Navigationsverhaltens von Site Besuchern erzeugt, werden vorgestellt. Am Beispiel der Amazon Site werden die Marketing M{\"o}glichkeiten von Recommendersystemen verdeutlicht. Abschlie{\ss}end wird zur Abrundung auf weitere Literatur mit Recommendersystem Bezug eingegangen. In einem Ausblick werden Hinweise gegeben, in welche Richtungen Weiterentwicklungen geplant sind.}, SERIES = {Spezialausgabe ''E-Marketing''}, URL = {http://vahlen.becksche.de/zeitschriften/}, category={recommender systems, marketing
Uh, 2001
In diesem Beitrag wird die Rolle von Recommendersystemen und ihr Potential in der Lehr-, Lern- un... more In diesem Beitrag wird die Rolle von Recommendersystemen und ihr Potential in der Lehr-, Lern- und Forschungsumgebung einer Virtuellen Universit{\"a}t untersucht.Die Hauptidee dieses Beitrags besteht darin, die Informationsaggregationsf{\"a}higkeiten von Recommendersystemen in einer Virtuellen Universit{\"a}t auszunutzen, um Tutoren-und Beratungsdienste in einer Virtuellen Universit{\"a}t automatisch zu verbessern, um damit Betreuung und Beratung von Studierenden zu personalisieren und f{\"u}r eine gr{\"o}{\ss}ere Anzahl von Teilnehmern bei gleichzeitiger Entlastung der Lehrenden verf{\"u}gbar zu machen. Im zweiten Teil dieses Beitrags werden die Recommenderdienste von myVU, der Sammlung der personalisierten Dienste der Virtuellen Universit{\"a}t (VU) der Wirtschaftsuniversit{\"a}t Wien und ihre nicht-personalisierten Variantenbeschrieben, die im Wesentlichen auf beobachtetem Benutzerverhalten und, in der personalisierten Variante, zus{\"a}tzlich auf Selbstselektion durch Selbsteinsch{\"a}tzung der Erfahrung in einem Fachgebiet beruhen. Abschlie{\ss}end wird noch der innovative Einsatz solcher Systeme diskutiert und an einigen Szenarien beschrieben. }, PDF = {http://michael.hahsler.net/research/unternehmenhochschule2001/uh2001.pdf}, URL = {http://www.gi-ev.de/}, category={recommender systems
This paper describes the ecosystem of R add-on packages developed around the infrastructure provi... more This paper describes the ecosystem of R add-on packages developed around the infrastructure provided by the package arules. The packages provide comprehensive functionality for analyzing interesting patterns including frequent itemsets, association rules, frequent sequences and for building applications like associative classification. After discussing the ecosystem's design we illustrate the ease of mining and visualizing rules with a short example.
ABSTRACT ePubWU ist eine elektronische Plattform für wissenschaftliche Publikationen der Wirtscha... more ABSTRACT ePubWU ist eine elektronische Plattform für wissenschaftliche Publikationen der Wirtschaftsuniversität Wien, wo forschungsbezogene Veröffentlichungen der WU im Volltext über das WWW zugänglich gemacht werden. ePubWU wird als Gemeinschaftsprojekt der Universitätsbibliothek der Wirtschaftsuniversität Wien und der Abteilung für Informationswirtschaft betrieben. Derzeit werden in ePubWU zwei Publikationsarten gesammelt - Working Papers und Dissertationen. In dem Beitrag werden Erfahrungen der über zweijährigen Laufzeit des Projektes dargestellt, u.a. in den Bereichen Akquisition, Workflows, Erschließung, Vermittlung.
Analyzing stock market trends and sentiment is an interdisciplinary area of research being undert... more Analyzing stock market trends and sentiment is an interdisciplinary area of research being undertaken by many disciplines such as Finance, Computer Science, Statistics, and Economics. It has been well established that real time news plays a strong role in the movement of stock prices. With the advent of electronic and online news sources, analysts have to deal with enormous amounts of real-time, unstructured streaming data. In this paper, we present an automated text mining based approach to aggregate news stories from diverse sources and create a News Corpus. The Corpus is filtered down to relevant sentences and analyzed using Natural Language Processing (NLP) techniques. A sentiment metric, called NewsSentiment, utilizing the count of positive and negative polarity words is proposed as a measure of the sentiment of the overall news corpus. We have used various open source packages and tools to develop the news collection and aggregation engine as well as the sentiment evaluation engine. Extensive experimentation has been done using news stories about various stocks. The time variation of NewsSentiment shows a very strong correlation with the actual stock price movement. Our proposed metric has many applications in analyzing current news stories and predicting stock trends for specific companies and sectors of the economy.
Uploads
Papers by Michael Hahsler