Association rule discovery techniques [AS94,HS95,SON95,SA95] are generally applied to databases of transactions where each transaction consists of a set of items. In such a framework the problem is to discover all associations and correlations among data items where the presence of one set of items in a transaction implies (with a certain degree of confidence) the presence of other items. In the context of Web mining, this problem amounts to discovering the correlations among references to various files available on the server by a given client. Each transaction is comprised of a set of URLs accessed by a client in one visit to the server. For example, using association rule discovery techniques we can find correlations such as the following:
Since usually such transaction databases contain extremely large amounts of data, current association rule discovery techniques try to prune the search space according to support for items under consideration. Support is a measure based on the number of occurrences of user transactions within transaction logs.
Discovery of such rules for organizations engaged in electronic commerce
can help in the development of effective marketing strategies. But, in
addition, association rules discovered from WWW access logs can give an
indication of how to best organize the organization's Web space. For
example, if one discovers that 80% of the clients accessing
/company/products and
/company/products/file1.html also
accessed /company/products/file2.html, but only 30% of those who
accessed /company/products also accessed
/company/products/file2.html, then it is likely that some information in
file1.html leads clients to access file2.html. This
correlation might suggest that this information should be moved to a
higher level (e.g., /company/products) to increase access to
file2.html.