General Information
Announcements
Course Material
Assignments/Exams
Class Project
Online Resources
Home
Comments/Suggestions
|
Instructor:
Bamshad Mobasher
Email: mobasher@cs.depaul.edu
Office: Loop Campus, CTI Building, Room 655
Phone: (312) 362-5174
Office Hours: Monday 2-3 PM; Wednesday 5-6:30 PM (or by appointment)
Course Syllabus in MS Word format.
Description and Objectives:
Web data mining has recently become one of the hottest areas in Computer
Science because of its direct applications in e-commerce, information
retrieval/filtering, and Web information systems. This course will provide
a comprehensive coverage of various topics in Web data mining including
Web usage mining, Web content mining, and Web data management and warehousing.
Specifically, we will consider techniques from machine learning, statistics,
databases, and information retrieval to extract useful knowledge from Web data
which could be used for business intelligence, site management, personalization,
and user profiling. We also discuss how to harnessing semi-structured and
heterogeneous data on the Web through techniques based on text mining and
meta-data representation and manipulation using XML. The course will be
self-contained and will provide a basic overview of relevant techniques from
machine learning, data mining, and information retrieval. This course will
count as a level II course for both the AI and the Database concentrations.
Textbooks and Reading Material:
Prerequisites:
CSC 416 and CSC 449; or permission of instructor.
Grading Policy:
The final grade will be determined (tentatively) based on the following components:
Midterm Exam = 30%
Assignments = 40%
Final Project = 30%
The general grading scheme will be based on a curve, but the grade cutoffs will
be no higher than: A = 90-100%, B = 80-89%, C = 65-79%, D = 50-64%, F = 0-49%.
At the end of the quarter, some adjustments may be made based on overall class
performance as well as signs of individual effort.
Plusses and minuses will be given at the high/low ends of each grade range.
Exams/Assignments:
The midterm exams will cover material from the textbook as well material from other
assigned readings, class discussions, and lectures. For the tentative date of
the exam is Thursday, October 26. The exam will be open-book/notes. In general,
make-up exams will only be given with prior approval or in cases of emergencies.
There will be 2-3 assignments throughout the quarter involving the concepts and
techniques discussed in class. The assignments may involve experimenting with various
tools, as well as other written or problem-oriented exercises. These
assignments must be done individually. Late assignments will be penalized 10% per day
(with weekends counting as one day). The late penalty will start accruing
on the weekend following the due date for the assignment.
Course Project:
For the class project, students will have a choice of either an implementation project
(involving the implementation, improvement, combination and/or application of one or more
of the techniques discussed in class) or a term paper (involving either a detailed
survey of one or more topics related to class material or an in-depth case study examining
the application of Web data mining techniques in a particular domain).
Projects may be done individually or in groups of 2 people (depending the complexity of the
project). Each group or individual will submit a specific project proposal to be approved by
no later than Monday, October 16. More detail about the possible project options are
available in the Project section.
Tentative List of Topics
The following issues and topics will be covered throughout the course. Many of these
topics will be revisited several times during the course in a variety of contexts.
Data Mining and Knowledge Discovery
- The KDD process and methodology
- Data preparation for knowledge discovery
- Overview of data mining techniques
- Market basket analysis
- Classification and prediction
- Clustering
- Memory-based reasoning
- Evaluation and interpretation
Web Usage Mining
- Data collection and sources of data
- Data preparation for usage mining
- Mining navigation patterns
- Dealing with e-commerce data
- User tracking and profiling
- e-Metrics: measuring success in e-commerce
- Privacy issues
Web Content Mining
- Basic concepts from information retrieval and filtering
- Information retrieval models and techniques
- Text mining and knowledge discovery from semi-structured data sources
- Knowledge representation and meta-data
- Mining structural information on the Web
- Web agents for information filtering
- Web content servers and adaptive hypertext
Web Mining Applications and Other Topics
- Data integration for e-commerce
- Web data warehousing
- Web personalization and recommender systems
- Review of tools, applications, and systems
|