Data Mining Tools
Data Mining Tools
Data Mining Tools
AgentBase/Marketeer
AgentBase/Marketeer is, according to its designers, the industry's first secondgeneration data-mining product. It is based on emerging intelligent-agent technology. The system comes with a group of wizards to guide a user through different stages of data mining. This makes it easy to use. AgentBase/Marketeer is primarily aimed at marketing applications. It uses several datamining methodologies whose results are combined by intelligent agents. It can access data from all major sources, and it runs on Windows95, Windows NT, and the Solaris operating system.
Autoclass III
Autoclass is an unsupervised Bayesian classification system for independent data. It seeks a maximum posterior probability to provide a simple approach to problems such as classification, clustering, and general mixture separation. It works on Unix platforms.
BusinessMiner
BusinessMiner is a single-strategy, easy-to-use tool based on decision trees. It can access data from multiple sources including Oracle, Sybase, SQL Server, and
Teradata. BusinessMiner runs on all Windows platforms, and it can be used standalone or in conjunction with OLAP tools.
CART
CART is a robust data-mining tool that automatically searches for important patterns and relationships in large data sets and quickly uncovers hidden structures even in highly complex data sets. It works on the Windows, Mac, and Unix platforms.
Clementine
Clementine is a comprehensive toolkit for data mining. It uses neural networks and rule-induction methodologies. The toolkit includes data manipulation and visualization capabilities. It runs on Windows and Unix platforms and accepts the data from Oracle, Ingres, Sybase, and Informix databases. A recent version offers sequence association and clustering for Web-data analyses.
DataEngine
DataEngine is a multiple-strategy data-mining tool for data modeling, combining conventional data-analysis methods with fuzzy technology, neural networks, and advanced statistical techniques. It works on the Windows platform.
Data Surveyor
Data Surveyor is a single-strategy (classification) tool. It consists of two components: a front-end a back-end. The front-end is responsible for data mining using the treegeneration methodology. The back-end consists of a fast, parallel, database server where the data are loaded from a user's databases. The back-end runs on parallel Unix servers and the front-end works with Unix and Windows platforms.
DataMind
DataMind's architecture consists of two components: DataCruncher for serverside data mining and DataMind Professional for client-side specification and viewing results. It can implement classification, clustering, and association-rule technologies. DataMind can be set up to mine data locally or on a remote server, where data are organized using any of the major relational databases.
Datasage
Datasage is a comprehensive data-mining product whose architecture incorporates a data mart in its data-mining server. The user accesses Datasage through an interface operating as a thin client, using either a Windows client or a Java-enabled browser client.
DBMiner
DBMiner is a publicly available tool for data mining. It is a multiple-strategy tool and it supports methodologies such as clustering, association rules, summarization, and visualization. DBMiner uses Microsoft SQL Server 7.0 Plato and runs on different Windows platforms.
Decision Series
Decision Series is a multiple-strategy tool that uses artificial neural networks, clustering algorithms, and genetic algorithms to perform data mining. It can operate on scalable, parallel platforms to provide speedy solutions. It runs on standard industry platforms such as HP, SUN, and DEC, and it supports most of the commercial, relational database-management systems.
Decisionhouse
Decisionhouse is a suite of tightly integrated tools that primarily support classification and visualization processes. Various aspects of data preparation and reporting are included. It works on the Unix platform.
Delta Miner
Delta Miner is a multiple-strategy tool supporting clustering, summarization, deviation-detection, and visualization processes. A common application is the analysis of financial controlling data. It runs on Windows platforms and it integrates new search techniques and "business intelligence" methodologies into an OLAP frontend.
Emerald
Emerald is a publicly available tool still used as a research system. It consists of five different machine-learning programs supporting clustering, classification, and summarization tasks.
Evolver
Evolver is a single-strategy tool. It uses genetic-algorithm technology to solve complex optimization problems. This tool runs on all Windows platforms and it is based on data stored in Microsoft Excel tables.
GainSmarts
GainSmarts uses predictive-modeling technology that can analyze past purchases and demographic and lifestyle data to predict the likelihood of response and other characteristics of customers.
IBM Datajoiner
Datajoiner allows the user to view multivendor-relational and nonrelational, local and remote-geographically distributed databases as local databases to access and join tables without knowing the source locations.
KATE
KATE is a single, rule-based strategy tool consisting of four components: KATEeditor, KATE-CBR, KATE-Datamining, and KATE-Runtime. It runs on Windows and Unix platforms, and it is applicable to several databases.
Kensington 2000
Kensington 2000 is an internet-based knowledge-discovery and-management platform for the analyses of large and distributed data sets.
Kepler
Kepler is an extensible, multiple-strategy data-mining system. The key element of its architecture is extensibility through a "plug-in" interface for external tools without redeveloping the system core. The tool supports datamining tasks such as classification, clustering, regression, and visualization. It runs on Windows and Unix platforms.
Knowledge Seeker
Knowledge Seeker is a single-strategy desktop or client/server tool relying on a treebased methodology for data mining. It provides a nice GUI for model building and
letting the user explore data. It also allows users to export the discovered data model as text, SQL query, or Prolog program. It runs on Windows and Unix platforms, and accepts data from a variety of sources.
MATLAB NN Toolbox
A MATLAB extension implements an engineering environment (i.e. a computerbased environment for engineers to help them solve their common tasks) for research in neural networks and its design, simulation, and application. It offers various network architectures and different learning strategies. Classification and function approximations are typical data-mining problems that can be solved using this tool. It runs on Windows, Mac, and Unix platforms.
Marksman
Marksman is a single-methodology tool based on artificial neural networks. It provides a number of useful data-manipulation features, which are very important in preprocessing. Its design is optimized for the database-analysis needs of directmarketing professionals, and it runs on PC/Windows platforms.
MARS
MARS is a logistic-regression tool for binary classification. It automatically handles missing values, detection of interaction between input variables, and transformation of variables.
MineSet
MineSet is comprehensive tool for data mining. Its features include extensive data manipulation and transformation capabilities, varius data-mining approaches, and powerful visualization capabilities. MineSet supports client/server architecture and runs on Silicon Graphics platforms.
NETMAP
NETMAP is a general purpose, information-visualization tool. It is most effective for large, qualitative, text-based data sets. It runs on Unix workstations.
Neuro Net
NeuroNet is a publicly available software for experimentation with different artificial neural-network architectures and types.
NeuroSolutions V3.0
NeuroSolutions V3.0 combines a modular, icon-based artificial neuralnetwork design, and it solves data-mining problems such as classification, prediction, and function approximation. Its implementations are based on advanced learning techniques such
as recurrent backpropagation and backpropagation through time. The tool runs on all Windows platforms.
OCI
OCI is publicly available software for data mining. It is specially designed as a decision tree induction system for applications where the samples have continuous feature values.
OMEGA
OMEGA is a system for developing, evaluating, and implementing predictive models using the genetic-programming approach. It is suitable for the classification and visualization of data. It runs on all Windows platforms.
Partek
Partek is a multiple-strategy data-mining product. It is based on several methodologies including statistical techniques, neural networks, fuzzy logic, genetic algorithms, and data visualization. It runs on UNIX platforms.
Scenario
Scenario is a single-strategy tool that uses the tree-based approach to data mining. The GUI relies on wizards to guide a user through different tasks, and it is easy to use. It runs on Windows platforms.
Sipina-W
Sipina-W is publicly available software that includes different traditional data-mining techniques such as CART, Elisee, ID3, C4.5, and some new methods for generating decision trees.
SNNS
SNNS is a publicly available software. It is a simulation environment for research on and application of artificial neural networks. The environment is available on Unix and Windows platforms.
SPIRIT
SPIRIT is a tool for exploration and modeling using Bayesian techniques. The system allows communication with the user in the rich language of conditional events. It works on Windows platforms.
SPSS
SPSS is one of the most comprehensive integrated tools for data mining. It has datamanagement and data-summarization capabilities and includes tools for both discovery and verification. The complete suite includes statistical methods, neural networks, and visualization techniques. It is available on a variety of commercial platforms.
S-Plus
S-Plus is an interactive, object-oriented programming language for data mining. Its commercial version supports clustering, classification, summarization, visualization, and regression techniques. It works on Windows and Unix platforms.
STATlab
STATlab is a single-strategy tool that relies on interactive visualization to help a user perform exploratory data analysis. It can import data from common relational databases and it runs on Windows, Mac, and Unix platforms.
STATISTICA-Neural Networks
STATISTICA-Neural Networks is a single-strategy tool includes a standard backpropagation-learning algorithm and iterative procedures such as Conjugate Gradient Descent and Levenberg-Marquardt. It runs on all Windows platforms.
Strategist
Strategist is a tool based on Bayesian-network methodology to support different dependency analyses. It provides the methodology for integration of expert judgments and data-mining results, which are based on modeling of uncertainties and decisionmaking processes. It runs on all Windows platforms.
Syllogic
Syllogic Data Mining Tool is a toolbox that combines many data-mining methodologies and offers a variety of approaches to uncover hidden information. It includes several data-preprocessing and -transformation functions. It is available on Windows NT and Unix platforms and it supports most of the commercial relational databases.
TiMBL
TiMBL is a publicly available software. It includes several memory-based learning techniques for discrete data. A representation of the training set is explicitly stored in memory, and new cases are classified by extrapolation from the most similar cases.
TOOLDIAG
TOOLDIAG is a publicly available tool for data mining. It consists of several programs in C for statistical pattern recognition of multivariate numeric data. The tool is primary oriented toward classification problems.
WINROSA
WINROSA is a software tool that complements many other tools available for building fuzzy logic systems. It automatically generates fuzzy rules from the available data set. It works on Windows platforms.
ViscoverySOMine
This single-strategy data-mining tool is based on self-organizing maps and is uniquely capable of visualizing multidimensional data. ViscoverySOMine supports clustering, classification, and visualization processes. It works on all Windows platforms.
Weka (2.2)
Weka is a software environment that integrates several machine-learning tools within a common framework and a uniform GUI. Classification and summarization are the main data-mining tasks supported by the Weka system.
WUM
WUM 6.0 is a publicly available integrated environment for Web-log preparation, querying, and visualization of summarized activities on a Web site.