Syllabus
Announcements
Course
Material
Assignments
Class
Project
Online Resources
Home
Comments/Suggestions
|
-
WEKA
WEKA is an open-source data mining package containing a full collection of machine learning algorithms for solving various
data mining problems. It is written in Java and runs on almost any
platform. The algorithms can either be applied directly to a dataset or
called from your own Java code. It includes several implemented schemes
for classification, association rule discovery, clustering, prediction,
etc. The full distribution of WEKA as well as additional information and supporting material can be found at the official
WEKA Web site. The site also includes
additional data (from the UCI data repository) already converted into the ARFF format
which is used by WEKA.
-
Clustering and Profile Generation Tools
This is set of programs developed here for clustering and generation of
profiles based
on the results of clustering. The set also includes some programs to
assist in
characterizing the generated clusters. The documentation for each
program and some
example data sets are included in the distribution. All of these
programs and the
documentation are included in a single Zip Archive.
-
ODBCMine
ODBCMINE is a data mining tool for classification that generates decision trees from ODBC
databases using the C4.5 classification model algorithm. It analyzes the data in any ODBC
data source, and creates graphical decision trees in Scalable Vector Graphics (SVG) format.
-
Magnum Opus
Magnum Opus is a tool for finding association rules from data. It uses
a highly efficient search algorithm for fast association rule
discovery and does not rely on sparse data for efficient processing.
More information on Magnum Opus as well as an evaluation download
version can be found from the
G.I. Webb & Associates.
-
See5/C5.0
See5 is the commercial version of the C4.5 decision tree algorithm
developed by Ross Quinlan. See5/C5.0 classifiers are expressed as
decision trees or sets of if-then rules. RuleQuest provides C source
code so that classifiers constructed by See5/C5.0 can be embedded in
your own systems.
-
More information on See5/C5.0 as well as an evaluation download
version can be found from the RuleQuest Site.
-
The evaluation version can also be downloaded locally from here (zip archive). Note that the
evaluation version is limited only to 200 cases. Please read the Help
files for the program to become familiar with the data format (the
distribution includes a sample data sets). Also available locally (and
from the RuleQuest site) is the file
see5-public.zip which contains
public source code and binary programs for applying the classifier
(obtained after running See5) to new cases. Please read the
documentation to see how these programs can be used.
-
Cubist
Yet another program from RuleQuest. Cubist builds rule-based predictive
models that output values, complementing See5/C5.0 that predicts
categories. For instance, See5/C5.0 might classify the yield from some
process as "high", "medium", or "low", whereas Cubist would output a
number such as 73%.
-
CBA
CBA is a data mining tool for the discovery of association rules and
for classification. The classification technique used in CBA is based
on using a subset of associations discovered. CBA implements two
versions of Apriori algorithm (one using a single minimum support
parameter, and another using multiple minimum support at different
levels). It also includes features for visualizing association rules
using a tree structure. A paper describing the multiple minimum support
feature can be found here in
Postscript format, and another paper describes the Association Based Classification method.
Back to Online Resources
|