Syllabus
Announcements
Course Material
Assignments/Exams
Class Project
Online Resources
Home
Comments/Suggestions
|
- Clustering and Profile Generation Tools
This is set of programs developed here for clustering and generation of profiles based
on the results of clustering. The set also includes some programs to assist in
characterizing the generated clusters. The documentation for each program and some
example data sets are included in the distribution. All of these programs and the
documentation are included in a single Zip Archive.
- WEKA
Weka is a collection of machine learning algorithms for solving various data mining problems. It is written in Java and runs on almost any platform. The algorithms can either be applied directly to a dataset or called from your own Java code. It includes several implemented schemes for classification, association rule discovery, clustering, prediction, etc.
- The latest beta version, as well as the stable version (3.0) can be downloaded from WEKA Web site. The site also includes additional data (from the UCI data repository) already converted into the ARFF format which is used by Weka.
- You can also download the beta version (which includes a Java-based GUI) locally from here (in Netscape, use SHIFT-CLICK to download a jar file). Note that, in this version the GUI requires Swing components which are part of Java 2 platform. Once you have downloaded the "jar" file, you can unjar it using the command "jar -xf <filename>.jar".
- Magnum Opus
Magnum Opus is a tool for finding association rules from data. It uses the highly efficient OPUS search algorithm for fast association rule discovery and does not rely on sparse data for efficient processing.
- More information on Magnum Opus as well as an evaluation download version can be found from the RuleQuest Site.
- The evaluation version can also be downloaded locally from here (self-extracting archive). Note that the evaluation version is limited only to 1000 cases (transactions). Be sure to read the Help files for the program to become familiar with the data format (the distribution includes a sample data set).
- See5/C5.0
See5 is the commercial version of the C4.5 decision tree algorithm developed by Ross Quinlan. See5/C5.0 classifiers are expressed as decision trees or sets of if-then rules. RuleQuest provides C source code so that classifiers constructed by See5/C5.0 can be embedded in your own systems.
- More information on See5/C5.0 as well as an evaluation download version can be found from the RuleQuest Site.
- The evaluation version can also be downloaded locally from here (zip archive). Note that the evaluation version is limited only to 200 cases. Please read the Help files for the program to become familiar with the data format (the distribution includes a sample data sets).
- Cubist
Yet another program from RuleQuest. Cubist builds rule-based predictive models that output values, complementing See5/C5.0 that predicts categories. For instance, See5/C5.0 might classify the yield from some process as "high", "medium", or "low", whereas Cubist would output a number such as 73%.
- Information on Cubist and the evaluation download version can be found from the RuleQuest Site.
- Local downloaded for the evaluation version (limited to 200 cases) can be found here (zip archive).
- CBA
CBA is a data mining tool for the discovery of association rules and for classification. The classification technique used in CBA is based on using a subset of associations discovered. CBA implements two versions of Apriori algorithm (one using a single minimum support parameter, and another using multiple minimum support at different levels). It also includes features for visualizing association rules using a tree structure. A paper describing the multiple minimum support feature can be found here in Postscript format, and another paper describes the Association Based Classification method.
- More Information on CBA can be found from the DMII Site.
- Local downloaded for the full educational version of CBA can be found here (zip archive).
- CViz
CViz is a visualization tool for clustering and analysis of multi-dimensional data sets. CViz has been developed at the IBM Almaden Research Center. CViz is most valuable when applied to situations where little or no information is known about the relationships between attributes and class or between different attributes. CViz gives the data analyst a unique tool for viewing the entire set of data points across the most interesting dimensions in a short period. The download distribution comes with an excellent tutorial demonstrating various features.
- More Information on CViz and other tools can be found from the IBM AlphaWorks Web site.
- Local downloaded for the evaluation version of CViz can be found here (self-extracting archive).
- MLC++
MLC++ is a library of C++ classes for supervised machine learning such as classification. It is used in SGI's MineSet product as the main engine for the server data mining. A set of utilities compiled for NT (and other platforms) is also available which have been developed based on the library.
- More Information on MLC++, including the source code and documentation, can be found from the SGI MLC++ site.
- Locally you can download the Source Code as well as the set of MLC++ utilities for NT (zip archives). Be sure to read the "readme" files included with distribution, as well as the documentation and compilation instructions available from the above Web site.
- ODBCMine
ODBCMINE is a shareware program that generates decision rules from ODBC databases using the C4.5 classification model algorithm. It analyzes the data in any ODBC data source, and writes decision rules in ASCII to the standard output device. The current release is just beta-test software, and has not yet been completely tested.
- More Information on ODBCMine can be found here.
- Locally you can download the program here (zip archive). The distribution includes a sample Access database used to demonstrate the program.
Back to Online Resources
|