next up previous
Next: Experimental Evaluation Up: Integrating Semantic Similarity with Previous: Using Latent Semantic Analysis

Predictions Based on a Combined Similarity Measure.

The semantic similarity measure SemSim(ip , iq), for a pair of items ip and iq, is computed using the standard vector-based cosine similarity on the reduced semantic space. This process can be viewed as multiplying the matrix $S'$ by its transpose and normalizing each corresponding row and column vector by its norm. This results in a $n \times n$ square matrix in which an entry (i,j) corresponds to the semantic similarity of items i and j.

Similarly, we compute item similarities based on the user-item matrix $M$. As noted in Section 2, in the case of usage data, we use the cosine similarity measure. In the case of ratings data (such as movie ratings) we employ the adjusted cosine similarity in order to take into account the variances in user ratings. We denote the rating (or usage) similarity between two items ip and iq as RateSim(ip , iq).

Finally, we combine these two similarity measures to get CombinedSim as their linear combination:

\begin{displaymath}
CombinedSim(i_p,i_q) = \alpha\cdot SemSim(i_p,i_q) + (1-\alpha)\cdot RateSim(i_p,i_q)
\end{displaymath}

where $\alpha$ is a semantic combination parameter specifying the weight of semantic similarity in the combined measure. If $\alpha = 0$, then we have CombinedSim(ip , iq) = RateSim(ip , iq), in other words we have the standard item-based filtering. On the other hand, if $\alpha = 1$, then only the semantic similarity is used which, essentially, results in a form of content-based filtering. Finding the appropriate value for $\alpha$ is not a trivial task, and is usually highly dependent on the characteristics of the data. We choose the proper value by performing sensitivity analysis for particular data sets in our experimental section below.

In order to compute predicted ratings or recommendations, we use the weighted sum approach discussed in Section 2. Specifically,

\begin{displaymath}
M_{a,t} = \frac{{\sum\limits_{j = 1}^k {(M_{a,j} \times CombinedSim(i_j,i_t))}}} {{\sum\limits_{j = 1}^k {sim(i_j,i_t)}}},
\end{displaymath}

where, $M_{a,t}$ denotes the prediction value of target user $u_a$ on target item $i_t$.


next up previous
Next: Experimental Evaluation Up: Integrating Semantic Similarity with Previous: Using Latent Semantic Analysis
Bamshad Mobasher 2004-03-09