DePaul University DePaul CTI Homepage

cs589.gif (3224 bytes)


 General Information 

 Announcements 

 Course Material 

 Assignments/Exams 

 Class Project 

 Online Resources 

 Home


Comments/Suggestions

General Information


Instructor:

Bamshad Mobasher
Email: mobasher@cs.depaul.edu
Office: Loop Campus, CTI Building, Room 655
Phone: (312) 362-5174
Office Hours: Monday 2-3 PM; Wednesday 5-6:30 PM (or by appointment)

Course Syllabus in MS Word format.

Description and Objectives:

Web data mining has recently become one of the hottest areas in Computer Science because of its direct applications in e-commerce, information retrieval/filtering, and Web information systems. This course will provide a comprehensive coverage of various topics in Web data mining including Web usage mining, Web content mining, and Web data management and warehousing. Specifically, we will consider techniques from machine learning, statistics, databases, and information retrieval to extract useful knowledge from Web data which could be used for business intelligence, site management, personalization, and user profiling. We also discuss how to harnessing semi-structured and heterogeneous data on the Web through techniques based on text mining and meta-data representation and manipulation using XML. The course will be self-contained and will provide a basic overview of relevant techniques from machine learning, data mining, and information retrieval. This course will count as a level II course for both the AI and the Database concentrations.

Textbooks and Reading Material:

Prerequisites:

CSC 416 and CSC 449; or permission of instructor.

Grading Policy:

The final grade will be determined (tentatively) based on the following components:

    Midterm Exam = 30%
    Assignments = 40%
    Final Project = 30%

The general grading scheme will be based on a curve, but the grade cutoffs will be no higher than: A = 90-100%, B = 80-89%, C = 65-79%, D = 50-64%, F = 0-49%. At the end of the quarter, some adjustments may be made based on overall class performance as well as signs of individual effort. Plusses and minuses will be given at the high/low ends of each grade range.

Exams/Assignments:

The midterm exams will cover material from the textbook as well material from other assigned readings, class discussions, and lectures. For the tentative date of the exam is Thursday, October 26. The exam will be open-book/notes. In general, make-up exams will only be given with prior approval or in cases of emergencies.

There will be 2-3 assignments throughout the quarter involving the concepts and techniques discussed in class. The assignments may involve experimenting with various tools, as well as other written or problem-oriented exercises. These assignments must be done individually. Late assignments will be penalized 10% per day (with weekends counting as one day). The late penalty will start accruing on the weekend following the due date for the assignment.

Course Project:

For the class project, students will have a choice of either an implementation project (involving the implementation, improvement, combination and/or application of one or more of the techniques discussed in class) or a term paper (involving either a detailed survey of one or more topics related to class material or an in-depth case study examining the application of Web data mining techniques in a particular domain). Projects may be done individually or in groups of 2 people (depending the complexity of the project). Each group or individual will submit a specific project proposal to be approved by no later than Monday, October 16. More detail about the possible project options are available in the Project section.

Tentative List of Topics

The following issues and topics will be covered throughout the course. Many of these topics will be revisited several times during the course in a variety of contexts.

  • Data Mining and Knowledge Discovery
    • The KDD process and methodology
    • Data preparation for knowledge discovery
    • Overview of data mining techniques
      • Market basket analysis
      • Classification and prediction
      • Clustering
      • Memory-based reasoning
    • Evaluation and interpretation

  • Web Usage Mining
    • Data collection and sources of data
    • Data preparation for usage mining
    • Mining navigation patterns
    • Dealing with e-commerce data
    • User tracking and profiling
    • e-Metrics: measuring success in e-commerce
    • Privacy issues

  • Web Content Mining
    • Basic concepts from information retrieval and filtering
    • Information retrieval models and techniques
    • Text mining and knowledge discovery from semi-structured data sources
    • Knowledge representation and meta-data
    • Mining structural information on the Web
    • Web agents for information filtering
    • Web content servers and adaptive hypertext

  • Web Mining Applications and Other Topics
    • Data integration for e-commerce
    • Web data warehousing
    • Web personalization and recommender systems
    • Review of tools, applications, and systems




  • Copyright © 2000, Bamshad Mobasher, School of CTI, DePaul University.