next up previous
Next: Experiments with Movie Ratings Up: Experimental Evaluation Previous: Experimental Evaluation

Data Sets and Evaluation Metrics

For the movie data set we used the ratings data from the MovieLens recommendation system (www.movielens.org). This data set contains 100,000 ratings on 1682 movies from 943 users. Each user has rated 20 or more movies with a rating scale of 1 to 5. We used our own wrapper agent to extract movie instances from the Internet Movie Database (www.imdb.com) based on the movie ontology depicted in Figure 1. Specifically, each instance was populated with semantic attributes, including movie title, release year, director(s), cast, genre, and plot.

The extracted instances were then converted into a binary table in standard spreadsheet format, where each row represents a movie, and each column represents a unique attribute value. For attributes involving continuous data types (such as "price" and "year") we performed discretization to generate a set of intervals as attributes. Similarly, for attributes involving a concept hierarchy, each concept node was represented as a unique attribute. This process resulted in a table representing each movie as an attribute vector with 2762 dimensions. Prior to computing the semantic similarity among movies, singular value decomposition was performed on the data, using different SVD dimensions, resulting in the corresponding semantic similarity matrices. The generated similarity matrices where then used in our experiments along with the rating similarities among movies, computed from the original ratings data.

To measure the accuracy of the recommendations we computed the standard Mean Absolute Error (MAE) between ratings and predictions in the test data sets. Specifically, given the set of actual/predicted rating pairs $\mbox{$\langle$}a_i,p_i \mbox{$\rangle$}$ for all the $n$ movies in the test set, the MAE is computed as:

\begin{displaymath}
MAE = \frac{\sum^{n}_{i=1} \vert a_i - p_i\vert}{n}.
\end{displaymath}

Note that lower MAE values represent higher recommendation accuracy. In this case, the ratings are based on a discrete scale of 1 (lowest) to 5 (highest). Thus, the maximum possible value for MAE is 4 (indicating a maximum possible error on all predictions).

In the case of the real estate data, we started with the raw Web usage data from the server logs of a local affiliate of a national real estate company. The primary function of the Web site is to allow prospective buyers visit various Web pages containing information related to some 300 residential properties. The portion of the Web usage data during the period of analysis contained approximately 24,000 user sessions from 3800 unique users. The preprocessing phase for this data was focused on extracting a full record for each user of properties she visited. This required performing the necessary aggregation operations pageviews in order to treat a property as the atomic unit of analysis. In addition, the visit frequency for each user-property pair was recorded, since the number of times a user comes back to a property listing is a good measure of that user's interest in the property. Finally, the data was filtered to limit the final data set to those users that had visited at least three properties. In our final data matrix, each row represented a user vector with properties as dimensions and visit frequencies as the corresponding dimension values.


Figure 2: Portion of the ontology for the class "Property" in the real estate Web site
\begin{figure}
\centerline{\psfig{file=realty-ontology.eps,width=3.5 in}}
\end{figure}

To automatically extract semantic information about the properties, we used a reference ontology for the domain depicted in Figure 2. In this case, our ontology only contained a single class called "property." The figure only shows a subset of the attributes associated with "property" that were used for computing semantic similarities. An example of an instance of this class is also depicted in Figure 2 (dotted arrows show the mapping between each attribute and the corresponding attribute value in the extracted instance). Using a wrapper agent based on this reference ontology, the attribute values for each property instance were extracted directly from pages related to that property on the Web site. The discretization and normalization process described above for the movie data was also applied in this case resulting in final set of 120 unique attribute dimensions for each property vector. We then applied singular value decomposition to generate different semantic similarity matrices that were used in our experiments.

In contrast to the movie data set, this usage data does not involve item ratings. Thus, the standard MAE measure is not the appropriate approach for determining the accuracy of predictions. Instead we use the notion of hit ratio in the context of top-$N$ recommendations. For each user, we randomly held one visited property as test data and used the rest as training data. The recommendation algorithm generates the top $N$ recommended properties in the test set. If the previously held property appears in the recommendation set, this is considered a hit. We defined the Hit Ratio as the total number of hits divided by the total number of users in the test set.

It should be noted that the hit ratio increases as the value of $N$ (number of recommendations) increases. Thus, in our experiments, we pay especial attention to a smaller number of recommendations (between 1 and 10) that result in good hit ratios.


next up previous
Next: Experiments with Movie Ratings Up: Experimental Evaluation Previous: Experimental Evaluation
Bamshad Mobasher 2004-03-09