Next: The Mining Process
Up: Research Directions
Previous: Research Directions
Web usage data is collected in
various ways, each mechanism collecting attributes relevant for its
purpose. There is a need to pre-process the data to make it easier to
mine for knowledge. Specifically, we believe the following issues need
to be addressed:
- Instrumentation & Data Collection: Clearly improved data
quality can improve the quality of any analysis on it. A problem in the
Web domain is the inherent conflict between the analysis needs of the
analysts (who want more detailed usage data collected), and the privacy
needs of users (who want as little data collected as possible). This
has lead to the development of cookie files on one side and
cache busting on the other. The emerging OPS standard on collecting
profile data may be a compromise on what can and will be collected.
However, it is not clear how much compliance to this can be expected.
Hence, there will be a continual need to develop better instrumentation
and data collection techniques, based on whatever is possible and
allowable at any point in time.
- Data Integration: Portions of Web usage data exist in sources
as diverse as Web server logs, referral logs, registration files, and
index server logs. Intelligent integration and correlation of
information from these diverse sources can reveal usage information
which may not be evident from any one of them. Techniques from data
integration [LHS
95] should be examined for this purpose.
- Transaction Identification: Web usage data collected in various
logs is at a very fine granularity. Hence, while it has the advantage of
being extremely general and fairly detailed, it also has the corresponding
drawback that it cannot be analyzed directly, since the analysis may
start focusing on micro trends rather than on the macro trends. On the
other hand, the issue of whether a trend is micro or macro depends on
the purpose of a specific analysis. Hence, we believe there is a need to
group individual data collection events into groups, called
Web transactions [CMS97], before feeding it to the mining
system. While [MJHS96,CPY96,CMS97] have proposed
techniques to do so, more attention needs to be given to this issue.
Next: The Mining Process
Up: Research Directions
Previous: Research Directions
Bamshad Mobasher
Wed Jul 16 02:08:33 CDT 1997