Figure 3 depicts the prediction accuracy of our semantically enhanced recommendations in contrast to those produced by standard item-based collaborative filtering. Here the MAE has been plotted with respect to the number of neighbors (similar items) in the
-nearest-neighbor algorithm. In both cases, the MAE converges between 80 and 100 neighbors, however, the semantically enhanced approach results in an overall improvement in accuracy.
A more telling picture emerges when we compare the range of values for the semantic combination parameter
. Recall that
is the parameter determining the degrees to which the semantic and rating similarities are used in the generation of neighbors. When
, then only semantic similarity among items is used, while
represents the other side of the spectrum where only rating similarity is used (i.e., standard item-based recommendations). Figure 4 serves two purposes. First, it shows the impact of
on MAE, and secondly, it shows the impact of performing singular value decomposition (in this case, 100 dimensions) on the semantic data prior to computing similarities.
Applying SVD provides a two-fold advantage. On the one hand, SVD generally results in much better computational performance during tasks such as similarity computations or clustering. On the other hand, as clearly indicated by these results, it results in a general improvement in recommendation accuracy (most likely due to a reduction in noise). In the SVD case, the optimum value of
is around 0.40 which is also the point at which performing SVD has the largest impact. Note that at
, results for SVD-100 and no SVD are the same, since in that case the semantic similarity matrix is not taken into account. Interestingly, the results also show that in this data set using only semantic attributes (
) results in recommendations whose quality are in par with (or better) than recommendations based on rating similarities. However, it is clear that the combination of semantic and rating similarities provides an advantage over both of these boundary conditions.
![]() |
As noted earlier, one of the problems associated with traditional collaborative filtering algorithms emanate from the sparsity of data sets to which they are applied. This sparsity has a negative impact on the accuracy and predictability of recommendations. This is one area in which, we believe, the integration of semantic knowledge with ratings data can provide significant advantage. To test this hypothesis, we created multiple training/test data sets in which the proportion of the training data to the complete ratings data set was changed from 90% to 10%. These proportions have a direct correspondence with the level of sparsity in the ratings data. In the case of each of the combination parameter values, we created five random training and test data sets and computed average MAE's over the five folds. We then computed the average improvement in MAE achieved by the semantically enhanced method over the standard item-based CF approach.
Figure 5 depicts these results for the SVD-100 data using a combination parameter
. While the overall recommendation accuracy drops as the proportion of training data is reduced (not shown), the results indicate that, generally, for sparser data sets, the semantic approach achieves larger improvements. As might be expected, this improvement starts to converge to 0 for very sparse data sets. This is because for very small training sets, neither approach can generate a reasonable number of recommendations. However, for up to a training ratio of 30%, the semantic approach provides improvements of up to 20% in MAE scores.
As a final experiment with the movie data set, we focused our attention on another common problem with CF-based approaches, namely, the "new item" ("cold start") problem: since there are no ratings for new items, standard item-based algorithm cannot find item neighbors using rating similarity and fail to give predictions. Our goal here was to determine the degree to which semantic information from the domain can help produce accurate recommendations in the absence of any available ratings data for new items.
To achieve this goal we chose all movies which only received one rating and held these ratings as the test data. The actual movies in the selected data set were predominantly those that were very recently released (relative to the last date captured by the data). Thus, the sample closely modeled the conditions under which newly added items are considered for recommendation. In the training data, these movies received no ratings at all, and thus they were considered to be "new items". We compared our algorithm to a baseline algorithm, in which each user's average rating from training data was used as the prediction for "new items" in test data. These results are depicted in Figure 6. They show that, at all neighbor size levels, our algorithm can provide more accurate predictions than the baseline case.