In this paper we have extended the item-based collaborative filtering framework by integrating structured semantic information about items for similarity computations. We have used domain-specific reference ontologies to automatically extract such features from the Web and populate class instances. Our enhanced similarity measure combines domain-based semantic item similarities with item similarities based on the user-item mappings. Our experimental results show that the semantically enhanced approach improves the prediction accuracies, while maintaining the computational advantages of item-based CF. In the context of Web usage and e-commerce data, the improvements are even more significant, particularly when focusing on a small number of recommendations.
The application of latent semantic analysis to the extracted semantic features, which reduces noise in the data, further improves the results when the hybrid approach is compared to usage-only or semantic-only recommendations. Furthermore, we have experimentally shown that, for new, unrated items, our approach can produce reasonably accurate recommendations, thus alleviating the "new item problem" associated with standard collaborative filtering. Our experiments also suggest that the integrated approach provides better quality predictions in the face of very sparse ratings or usage data.
An interesting area of current and future work is to use the characteristics of the domain together with machine learning techniques to automatically determine the semantic combination parameter (i.e., the degree to which the semantic similarity is combined with the item similarities based on ratings or usage). We will also further study the impact of using other approaches for measuring semantic similarities which take into account the structure of the underlying domain ontologies. Of particular relevance in this context is the work of Ganesan et al. [9] on using hierarchical structures in computing similarities, and that of Hotho et al. [13] on ontology-based text clustering.