The semantic similarity measure
SemSim(ip , iq),
for a pair of items ip and iq, is computed using the standard vector-based cosine similarity on the reduced semantic space. This process can be viewed as multiplying the matrix
by its transpose and normalizing each corresponding row and column vector by its norm. This results in a
square matrix in which an entry
(i,j) corresponds to the semantic similarity of items
i and j.
Similarly, we compute item similarities based on the user-item matrix
. As noted in Section 2, in the case of usage data, we use the cosine similarity measure. In the case of ratings data (such as movie ratings) we employ the adjusted cosine similarity in order to take into account the variances in user ratings. We denote the rating (or usage) similarity between two items ip and iq as
RateSim(ip , iq).
Finally, we combine these two similarity measures to get CombinedSim as their linear combination:
where
is a semantic combination parameter specifying the weight of semantic similarity in the combined measure. If
, then we have
CombinedSim(ip , iq) = RateSim(ip , iq), in other words we have the standard item-based filtering. On the other hand, if
, then only the semantic similarity is used which, essentially, results in a form of content-based filtering. Finding the appropriate value for
is not a trivial task, and is usually highly dependent on the characteristics of the data. We choose the proper value by performing sensitivity analysis for particular data sets in our experimental section below.
In order to compute predicted ratings or recommendations, we use the weighted sum approach discussed in Section 2. Specifically,
where,
denotes the prediction value of target user
on target item
.