DePaul University DePaul CTI Homepage

Assignment 1

Due Date: Saturday, April 19

Consider the data collected by a hypothetical video store for 50 regular customers. This data consists of a table which, for each customer, records the following attributes: Gender, Income, Age, Rentals (total number of video rentals in the past year), Avg. per visit (average number of video rentals per visit during the past year), Incidentals (whether the customer tends to buy incidental items such as refreshments when renting a video), and Genre (the customer's preferred movie genre). This data is available as an Excel spreadsheet.

Perform each of the following data preparation tasks (each task applies to the original data):

  1. Use smoothing by bin means to smooth the values of the Rentals attribute. Use a bin depth of 4. Illustrate your steps.
  2. Use min-max normalization to transform the values of the Income attribute onto the range [1-5].
  3. Use z-score normalization to standardize the values of the age attribute.
  4. Discretize the Age attribute based on the following categories: Young = 1-20; MidAge = 21-40; Old = 41+.
  5. Convert the original data into the standard spreadsheet format (note that this requires creation of multiple attributes for each possible values of a categorical attribute).
  6. Using the standardized data set (from part e), perform basic correlation analysis among the attributes. Discuss your results by indicating any strong correlations (positive or negative) among pairs of attributes. You need to construct a complete Correlation Matrix (Please read the document Basic Correlation Analysis for more detail and an example). Can you observe any "significant" patterns among groups of two or more variables? Explain.
  7. Perform a cross-tabulation of the two "gender" variables versus the three "genre" variables. Show this as a 2 x 3 table with entries representing the total counts. Then, use a graph or chart that provides the best visualization of the relationships between these sets of variables (see Chapter 17 of the text for an example). Can you draw any significant conclusions?
  8. Select all "good" customers with a high value for the Rentals attribute (greater than or equal to 30). Then, create a summary (e.g., using means, medians, and/or other statistics) of the selected data with respect to all other attributes. Can you observe any significant patterns that characterize this segment of customers? Explain. Note: to know whether your observed patterns in the target group are significant, you need to compare them with the general population  using the same metrics.
  9. Suppose that because of the high profit margin, the store would like to increase the sales of incidentals. Based on your observations in previous parts discuss how this could be accomplished (e.g., should customers with specific characteristics be targeted? Should certain types of movies be preferred? Etc.). Explain your answer based on your analysis of the data.

Note: You can give the final results of parts (a) through (d) as a single table which includes the original data and has an added column for each of the parts (a) through (d). The results of part (e) should be a separate table. For the correlation analysis (part f) give your correlation matrix (rows and columns of the matrix are the attributes, and entries would represent correlation value for a pair of attributes (e.g., "Income" versus "Age"). Your analyses for various parts can be added to the same spreadsheet file, or it could be included in another document (e.g., a text or MS Word file).


Return to Assignments Page
Return to Main Page

Copyright © 2007-2008, Bamshad Mobasher, School of CTI, DePaul University.