Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Yuqing Sun, Jim Wiltshire
School of Computer Science, Telecommunications, and Information Systems
DePaul University, Chicago, Illinois, USA
Abstract: Web usage mining, possibly used in conjunction with standard approaches to personalization such as collaborative filtering, can help address some of the shortcomings of these techniques, including reliance on subjective user ratings, lack of scalability, and poor performance in the face high-dimensional and sparse data. However, the discovery of patterns from usage data by itself is not sufficient for performing the personalization tasks. The critical step is the effective derivation of good quality and useful (i.e., actionable) "aggregate usage profiles" from these patterns. In this paper we present and experimentally evaluate two techniques, based on clustering of user transactions and clustering of pageviews, in order to discover overlapping aggregate profiles that can be effectively used by recommender systems for real-time personalization. We evaluate these techniques both in terms of the quality of the individual profiles generated, as well as in the context of providing recommendations as an integrated part of a personalization engine.
Several recent proposals have explored Web usage mining as an enabling mechanism to overcome some of the problems associated with more traditional techniques [MCS99, Mob99, Yu99, NFJK99] or as a mechanism for improving and optimizing the structure of a site [PE98, CTS99, SPF99]. Data mining techniques, such as clustering, have also been shown to improve the scalability and performance of collaborative filtering techniques [OH99]. In general, Web usage mining systems [ZXH98, CMS99, BM99, SF99] run any number of data mining algorithms on usage or clickstream data gathered from one or more Web sites in order to discover interesting patterns in the navigational behavior of users. For an up-to-date survey of Web usage mining techniques and systems see [SCDT00].
However, the discovery of patterns from usage data, such as association rules, sequential patterns, and clusters of user sessions or pages, by itself is not sufficient for performing the personalization tasks. The critical step is the effective derivation of good quality and useful (i.e., actionable) "aggregate profiles" from these patterns. The discovery of aggregate usage profiles, through clustering as well as other Web mining techniques, has been explored by several research groups [YJGD96, SZAS97, SKS98, PE98, NFJK99]. However, in all of these cases, the frameworks proposed for the discovery of profiles have not been extended to show how these profiles can be used as an integrated part of recommender systems. In the case of [PE98], aggregate usage profiles were discovered using an algorithm called PageGather which uses as its basis clustering of pages based the Clique (complete link) clustering technique. While the generated profiles were not integrated as part of a recommender system, they were used to automatically synthesize alternative static index pages for a site.
In this paper we present and experimentally evaluate two Web usage mining
techniques, each with its own characteristics, for the discovery of aggregate
usage profiles that can be effective in Web personalization. The first
technique, called PACT (Profile Aggregations based on Clustering Transactions),
is based on the derivation of overlapping profiles from user transactions
clusters. A preliminary version of this technique was first introduced
in the context of a generalized framework for usage-based Web personalization
in [Mob99]. The second technique, originally introduced
in [MCS99], uses Association Rule Hypergraph Partitioning
[HKKM97, HKKM98] to directly
derive overlapping aggregate profiles from pageviews (rather than from
user transactions). Each of these techniques generates overlapping profiles
which capture aggregate views of the behavior of subsets of site users
based their interests and/or information needs. The primary focus of this
paper is the experimental evaluation of these techniques based on real
usage data. To this end, we compare and evaluate both the quality of generated
profiles, as well as the effectiveness the techniques when used as part
of a recommender system for Web personalization. We also compare our techniques
with the Clique-based clustering technique used in [PE98],
described above. Finally, based on the experimental results we draw some
conclusions as to the circumstances under which each technique is most
appropriately used.
Finally, the transaction file can be further filtered by removing very low support or very high support pageview references (i.e., references to those pageviews which do not appear in a sufficient number of transactions, or those that are present in nearly all transactions). This type of support filtering can be useful in eliminating noise from the data, such as that generated by shallow navigational patterns of "non-active" users, and pageview references with minimal knowledge value for the purpose of personalization.
The above preprocessing tasks result in a set of n pageview records, P = {p1, p2, …, pn}, appearing in the transaction file with each pageview record uniquely represented by its associated URL; and a set of m user transactions, T = {t1, t2,…, tm}, where each ti ÎT is a subset of P. To facilitate various data mining operations such as clustering, we view each transaction t as an n-dimensional vector over the space of pageview references, i.e.,
t = <w(p1, t), w(p2, t), …, w(pn, t)>,
where w(pi, t) is a weight, in the transaction t, associated with the pageview represented by piÎP. The weights can be determined in a number of ways, for example, binary weights can be used to represent existence or non-existence of a product-purchase or a documents access in the transaction. On the other hand, the weights can be a function of the duration of the associated pageview in order to capture the user's interest in a content page. The weights may also, in part, be based on domain-specific significance weights assigned by the analyst.
The transaction file obtained in the data preparation stage is used as the input to the profile generation methods. Ideally, profiles capture an aggregate view of the behavior of subsets of users based their interests and/or information needs. In particular, to be effective for personalization, aggregate profiles must exhibit three important characteristics:
Preliminary results [Mob99] have identified one potentially effective method for the derivation of profiles from transaction clusters. To obtain aggregate profiles from transaction clusters, we employ a technique analogous to concept indexing methods used to extract document cluster summaries in information retrieval and filtering [KE00]. We call this method PACT (Profile Aggregations based on Clustering Transactions). For each transaction cluster c ÎTC, we compute the mean vector mc. The mean value for each pageview in the mean vector is computed by finding the ratio of the sum of the pageview weights across transactions in c to the total number of transactions in the cluster. The weight if each pageview within a profile is a function of this quantity thus obtained. In generating the usage profiles, the weights are normalized so that the maximum weight in each usage profile is 1, and low-support pageviews (i.e. those with mean value below a certain threshold m?) are filtered out. Thus, a usage profile associated with a transaction cluster c, is the set of all pageviews whose weight is greater than or equal to m. In particular, if we simply use binary weights for pageviews, and the threshold m is set at 0.5, then each profile will contain only those pageviews which appear in at least 50% of transactions within its associated transaction cluster.
The primary difference between PACT and the concept indexing method proposed in [KE00] is that we start with clusters of transactions (rather than document clusters), and that the weights associated with items (in this case pageviews) are obtained differently.
To summarize, given a transaction cluster c, we construct a usage profile prc as a set of pageview-weight pairs:
![]()
where the significance weight, weight(p, prc), of the pageview p within the usage profile prc is given by
![]()
and w(p, t) is the weight of pageview p
in transaction t Î c. Each
profile, in turn, can be represented as vectors in the original
n-dimensional
space.
However, traditional clustering techniques, such as distance-based methods generally cannot handle this type clustering. The reason is that instead of using pageviews as features, the transactions must be used as features, whose number is in tens to hundreds of thousands in a typical application. Furthermore, dimensionality reduction in this context may not be appropriate, as removing a significant number of transactions as features may result in losing too much information.
We have found that the Association Rule Hypergraph Partitioning (ARHP) technique [HKKM97, HKKM98] is well-suited for this task since it can efficiently cluster high-dimensional data sets without requiring dimensionality reduction as a preprocessing step. Furthermore, the ARHP provides automatic filtering capabilities, and does not require distance computations. The ARHP has been used successfully in a variety of domains, including the categorization of Web documents [HBG+99].
Association rules capture the relationships among items based on their patterns of co-occurrence across transactions. In the case of Web transactions, association rules capture relationships among pageviews based on their co-occurrence in navigational patterns of users. The association rule discovery methods such as the Apriori algorithm [AS94], initially find groups of items (which in this case are the URLs appearing in the transaction file) occurring frequently together in many transactions. Such groups of items are referred to as frequent item sets.
Given a set IS = {I1, I2, …, Ik} of frequent itemsets, the support of Ii is defined as

Generally, a support threshold is specified before mining and is used by the algorithm for pruning the search space. The itemsets returned by the algorithm satisfy this minimum support threshold.
In the ARHP, the set IS of large frequent itemsets are used as hyperedges to form a hypergraph H = <V, E>, where VÍP and EÍIS. A hypergraph is an extension of a graph in the sense that each hyperedge can connect more than two vertices. The weights associated with each hyperedge can be computed based on a variety of criteria such as the confidence of the association rules involving the items in the frequent itemset, the support of the itemset, or the "interest" of the itemset. In our experiments, we weight each hyperedge using a function of the interest of the itemset which is defined as:

The hypergraph H is then partitioned into a set of clusters C. Each partition is examined to filter out vertices that are not highly connected to the rest of the vertices of the partition. The connectivity of vertex v (a pageview appearing in the frequent itemset) with respect to a cluster c is defined as:

A high connectivity value suggests that the vertex has strong edges connecting it to other vertices in the partition. The vertices with connectivity measure greater than a given threshold value are considered to belong to the partition, and the remaining vertices are dropped from the partition.
The hypergraph is recursively partitioned until a stopping criterion for each partition is reached. The stopping criterion is determined according to a threshold on the ratio of the weights of the cut edges to the weights of uncut edges in the partition. Once the partitioning is completed, vertices can be "added back in" to clusters depending on the user defined overlap parameter. For each partial edge that is left in a cluster, if the percentage of vertices from the original edge that are still in the cluster exceed the overlap percentage, the removed vertices are added back in. This will allow some vertices to belong to more than one cluster. In the ARHP method, additional filtering of non-relevant items can be achieved using the support criteria in the association rule discovery components of the algorithm.
The connectivity value of an item (pageviews) defined above is important
also because it is used as the primary factor in determining the weight
associated with that item within the profile. As noted earlier, the weights
associated with pageviews in each profile are used as part of the recommendation
process when profiles are matched against an active user session (see below).
Maintaining a history depth may be important because most users navigate several paths leading to independent pieces of information within a session. In many cases these sub-sessions have a length of no more than 3 or 4 references. In such a situation, it may not be appropriate to use references a user made in a previous sub-session to make recommendations during the current sub-session. We capture the user history depth within a sliding window over the current session. The sliding window of size n over the active session allows only the last n visited pages to influence the recommendation value of items in the recommendation set. The notion of a sliding session window is similar to that of N-grammars discussed in [Cha96]. Structural characteristics of the site or prior domain knowledge can also be used to associate an additional measure of significance with each pageview in the user's active session. For instance, the site owner or the site designer may wish to consider certain page types (e.g., content versus navigational) or product categories as having more significance in terms of their recommendation value. In this case, significance weights can be specified as part of the domain knowledge.
Usage profiles, obtained using any of the techniques described in the previous section, are represented as sets of pageview-weight pairs. This will allow for both the active session and the profiles to be treated as n-dimensional vectors over the space of pageviews in the site. Thus, given a usage profile C, we can represent C as a vector
![]()
where
![]()
Similarly, the current active session S is also represented as a vector S = ás1, s2, … , snñ, where si is a significance weight associated with the corresponding pageview reference, if the user has accessed pi in this session, and si = 0, otherwise. In our experiments, discussed in the next section, we simply used binary weighting for the active session. We compute the profile matching score using the normalized cosine similarity measure for vectors:

Note that the matching score is normalized for the size of the clusters and the active session. This corresponds to the intuitive notion that we should see more of the user's active session before obtaining a better match with a larger cluster representing a user profile. Given a profile C and an active session S, a recommendation score, Rec(S, p), is computed for each pageview p in C as follows:
![]()
If the pageview p is in the current active session, then its recommendation value is set to zero. We obtain the usage recommendation set, UREC(S), for current active session S by collecting from each usage profile all pageviews whose recommendation score satisfies a minimum recommendation threshold r, i.e.,
![]()
where UP is the collection of all usage profiles. Furthermore,
for each pageview that is contributed by several usage profiles, we use
its maximal recommendation score from all of the contributing profiles.
For the PACT method, we used multivariate k-means clustering to partition the transaction file. Overlapping aggregate profiles were generated from transaction clusters using the method described earlier. For Association Rule Hypergraph Partitioning, the frequent itemsets were found using the Apriori algorithm [AS94]. Each pageview serves as a vertex in the hypergraph, and each edge represents a frequent itemset with the weight of the edge taken as the interest for the set. Since interest increases dramatically with the number of items in a rule, the log of the interest is taken in order to prevent the larger rules from completely dominating the clustering process.
As mentioned in the Introduction section, for comparison purposes, we also generated usage profiles using the Clique-based clustering technique used in [PE98]. We used a similarity threshold of 0.5 to form the similarity graph among pairs of pageviews. Profiles were then generated from the completely connected components of the graph. The weight of items in each Clique profile was determined by measuring the similarity of the item vector (a vector of transactions) to the cluster centroid.
In all cases, the weights of pageviews were normalized so that the maximum weight in each profile would be 1. In the case of PACT and Hypergraph, the maximum overlap among any pairs of profiles was already less than 50%, however, the Clique method tends to generate a large number of highly overlapping clusters, often differing by only 1 or 2 items. In order to rectify this situation we employed the overlap reduction method discussed in [PE98].
The profiles were ranked according to average similarity of items within the profiles, and then the lower ranking profiles which had more than 50% overlap with a previous profile were eliminated.
As an example of aggregate profiles, Table 1 depicts 3 of the profiles generated using the PACT method for the ACR site. Only pageview URLs with weights of at least 0.5 have been shown in each profile. The first profile in Table 1 represents the activity of users who are primarily interested in general ACR sponsored conferences. The second profile, while containing some overlap with the first, seems to capture the activity of users whose interests are more focused on specific conferences or journals related to marketing and consumer behavior. Finally, the third profile captures the activity of users interested in news items as well as specific columns that appear in the "Online Archives" section of the ACR site.
Table 1. Examples of aggregate usage profiles obtained using the PACT method
![]()
The (weighted) average visit percentage (WAVP) is this average divided by the total weight of items within the profile pr:

Profiles generated by each method ranked according to their WAVP. Figure
1 depicts the comparison of top ranking profiles.
![]() |
| Figure 1. Comparison of top profiles based on Weighted Average Visit Percentage |
The top ranking profiles generated by the Hypergraph method perform
quite well under this measure, however, beyond the top 2 or 3 profiles,
both Hypergraph and the Clique methods seems to perform similarly. On the
other hand the PACT method, overall, performs consistently better than
the other techniques. It should be noted that, while WAVP provides a measure
of the predictive power of individual profiles, it does not necessarily
measure the "usefulness" of the profiles. For instance, the Hypergraph
method tends to produce highly cohesive clusters in which potentially "interesting"
items, such as pageviews that occur more deeply within the site graph,
dominate. This is verified by our experiments on the recommendation accuracy
of the method as a whole, discussed below.
In order to evaluate the recommendation effectiveness for each method, we measured the error rate to compute the accuracy of the recommendation set produced for each transaction in the evaluation set. The basic methodology used is as follows. For a given transaction t, and an active session window size n, we randomly chose | t |-n+1 groups of items from the transaction as the surrogate active session window. For each of these active sessions, we produced a recommendation set and compared the set to the remaining items in the transaction by computing the percentage of visited pages for which a recommendation was produced. The final score for transaction t is the mean score over all of the | t |-n+1 surrogate active sessions. Finally, the mean over all transactions in the training set was computed as the evaluation score. To determine a recommendation set based on an active session, we varied the recommendation threshold from 0.2 to 0.9. A page is included in the recommendation set only if it has a recommendation score above this threshold.
Clearly, fewer recommendations are produced at higher thresholds, while higher evaluations scores are achieved at lower thresholds (with larger recommendation sets). Ideally, we would like the recommendation engine to produce few but highly relevant recommendations. Table 2 shows the results produced by the recommendation engine for the 3 profile generation methods using a session window size of 2. For example, at a threshold of 0.7, the PACT method produced an evaluation score of 0.82 with an average recommendation set size of 10 over all trials. Roughly speaking, this means that on average 82% of unique pages actually visited by users in the evaluation set transactions matched the top 10 recommendations produced by the system.
In order to compare the overall relative recommendation accuracy among all 3 methods, the evaluation score percentage was divided by the size of the recommendation set. Thus, a higher number according to this measure corresponds to better overall performance by the recommendation engine. The results for session window sizes of 2 and 3 are depicted in Figures 2 and 3, respectively. It is clear from these results that the PACT method provided better overall performance, especially for higher threshold values. The Hypergraph method tended to give better relative performance for lower threshold values when session window size was smaller. Also, as expected, all 3 methods did better when the window size used by the recommendation engine was increased to 3, however, the improved performance due to larger window size was more dramatic for PACT than the other two methods. Despite the fact that the Hypergraph method scored lower in than PACT in these experiments, casual observation of the recommendation results show that the Hypergraph methods tends to produce more "interesting" recommendations. In particular, this method more often gives recommended pages that occur more deeply in the site graph as compared to top level (and more frequently visited pages). This is in part due to the fact that interest of the itemsets was used to compute the weights for the hyperedges.
Intuitively, we may consider a recommended object (e.g., a page or a product) more interesting or useful if a larger amount of user navigational activity is required to reach the object without the recommendation engine. In our experimental data set, these objects correspond to content pages that are located deeper in the site graph (as opposed to top level navigational pages).
In order to evaluate the effectiveness of the 3 profile generation methods in this context, we filtered out the top-level navigational pages in both the training and the evaluation sets and regenerated the aggregate profiles from the filtered data set. All other parameters for profile generation and the recommendation engine were kept constant. Figure 4 depicts the relative performance of the 3 methods on the filtered evaluation set.
As these results indicate, filtering the data set resulted in better performance for all 3 methods. There was moderate improvement for Clique, while the improvement was much more dramatic for Hypergraph and (to a lesser degree) PACT. In particular, the Hypergraph method performed consistently better that the other two methods in these experiments, supporting our conjecture that it tends to produce more interesting recommendations.
To see the impact of filtering more clearly, Figure
5 depicts the relative improvement of the PACT and Hypergraph methods
when comparing the results for filtered and unfiltered data sets.
In comparing PACT and Hypergraph, it is clear that PACT emerges as the overall winner in terms of recommendation accuracy. However, as noted above, Hypergraph tends to perform better at lower recommendation thresholds when the session window size is smaller. Furthermore, Hypergraph does dramatically better when we focus on more "interesting" objects (e.g., content pages). In general, the Hypergraph method seems to produce a smaller set of high quality, and more specialized, recommendations, even when a smaller portion of the user's clickstream is used by the recommendation engine. On the other hand, PACT provides a clear performance advantage when dealing with all the relevant pageviews in the site, particularly as the session window size is increased.
Whether PACT or Hypergraph methods should be used in a given site depends,
in part, on the goals of personalization. Based on the above observations,
we conclude that, if the goal is to provide a smaller number of highly
focused recommendations, then Hypergraph may be a more appropriate method.
This is particularly the case if only specific portions of the site (such
as product related pages) are to be personalized. On the other hand, if
the goal is to provide a more generalized personalization solution integrating
both content and navigational pages throughout the whole site, then using
PACT as the underlying aggregate profile generation method seems to provide
clear advantages.
[BM99] A. Buchner and M. D. Mulvenna. Discovering internet marketing intelligence through online analytical Web usage mining. SIGMOD Record, (4) 27, 1999.
[Cha96] E. Charniak. Statistical language learning. MIT Press, 1996.
[CMS99] R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining World Wide Web browsing patterns. Journal of Knowledge and Information Systems, (1) 1, 1999.
[CTS99] R. Cooley, P-T. Tan., and J. Srivastava. WebSIFT: The Web site information filter system. In Workshop on Web Usage Analysis and User Profiling (WebKKD99), San Diego, August 1999.
[HBG+99] E-H. Han, D. Boley, M. Gini, R. Gross, K. Hastings, G. Karypis, V. Kumar, B. Mobasher, and J. More. Document categorization and query generation on the World Wide Web using WebACE. Journal of Artificial Intelligence Review, January 1999.
[HKBR99] J. Herlocker, J. Konstan, A. Borchers, and J. Riedl. An algorithmic framework for performing collaborative filtering. To appear in Proceedings of the 1999 Conference on Research and Development in Information Retrieval, August 1999.
[HKKM97] E-H. Han, G. Karypis, V. Kumar, and B. Mobasher. Clustering based on association rule hypergraphs. In Proccedings of SIGMOD’97 Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD’97), May 1997.
[HKKM98] E-H. Han, G. Karypis, V. Kumar, and B. Mobasher. Hypergraph based clustering in high-dimensional data sets: a summary of results. IEEE Bulletin of the Technical Committee on Data Engineering, (21) 1, March 1998.
[KE00] G. Karypis, E-H. Han. Concept indexing: a fast dimensionality reduction algorithm with applications to document retrieval and categorization. Technical Report #00-016, Department of Computer Science and Engineering, University of Minnesota, March 2000.
[KMM+97] J. Konstan, B. Miller, D. Maltz, J. Herlocker, L. Gordon, and J. Riedl. GroupLens: applying collaborative filtering to usenet news. Communications of the ACM (40) 3, 1997.
[MCS99] B. Mobasher, R. Cooley, and J. Srivastava. Creating adaptive web sites through usage-based clustering of urls. In IEEE Knowledge and Data Engineering Workshop (KDEX'99), 1999.
[Mob99] B. Mobasher. A Web personalization engine based on user transaction clustering. In Proceedings of the 9th Workshop on Information Technologies and Systems (WITS'99), December 1999.
[NFJK99] O. Nasraoui, H. Frigui, A. Joshi, R. Krishnapuram. Mining Web access logs using relational competitive fuzzy clustering. To appear in the Proceedings of the Eight International Fuzzy Systems Association World Congress, August 1999.
[OH99] M. O'Conner, J. Herlocker. Clustering items for collaborative filtering. In Proceedings of the ACM SIGIR Workshop on Recommender Systems, Berkeley, CA, 1999.
[PE98] M. Perkowitz and O. Etzioni. Adaptive Web sites: automatically synthesizing Web pages. In Proceedings of Fifteenth National Conference on Artificial Intelligence, Madison, WI, 1998.
[SF99] M. Spiliopoulou and L. C. Faulstich. WUM: A Web Utilization Miner. In Proceedings of EDBT Workshop WebDB98, Valencia, Spain, LNCS 1590, Springer Verlag, 1999.
[SPF99] M. Spiliopoulou, C. Pohle, and L. C. Faulstich. Improving the effectiveness of a Web site with Web usage mining. In Workshop on Web Usage Analysis and User Profiling (WebKKD99), San Diego, August 1999.
[SCDT00] J. Srivastava, R. Cooley, M. Deshpande, P-T. Tan. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIGKDD Explorations, (1) 2, 2000.
[SKS98] S. Schechter, M. Krishnan, and M. D. Smith. Using path profiles to predict HTTP requests. In Proceedings of 7th International World Wide Web Conference, Brisbane, Australia, 1998.
[SM95] U. Shardanand, P. Maes. Social information filtering: algorithms for automating "word of mouth." In Proceedings of the ACM CHI Conference, 1995.
[SZAS97] C. Shahabi, A. Zarkesh, J. Adibi, and V. Shah. Knowledge discovery from users Web-page navigation. In Proceedings of Workshop on Research Issues in Data Engineering, Birmingham, England, 1997.
[YJGD96] T. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal. From user access patterns to dynamic hypertext linking. In Proceedings of the 5th International World Wide Web Conference, Paris, France, 1996.
[Yu99] P. S. Yu. Data mining and personalization technologies. In Int'l Conference on Database Systems for Advanced Applications (DASFAA99), April 1999, Hsinchu, Taiwan.
[ZXH98] O. R. Zaiane, M. Xin, and J. Han. Discovering
web access patterns and trends by applying OLAP and data mining technology
on web logs. In
Advances in Digital Libraries, pp. 19-29, Santa
Barbara, 1998.