next up previous
Next: Database Approach Up: Web Content Mining Previous: Web Content Mining

Agent-Based Approach

The agent-based approach to Web mining involves the development of sophisticated AI systems that can act autonomously or semi-autonomously on behalf of a particular user, to discover and organize Web-based information. Generally, the agent-based Web mining systems can be placed into the following three categories:

  1. Intelligent Search Agents
    Several intelligent Web agents have been developed that search for relevant information using characteristics of a particular domain (and possibly a user profile) to organize and interpret the discovered information. For example, agents such as Harvest [BDH94], FAQ-Finder [HBML95], Information Manifold [KLSS95], OCCAM [KW96], and ParaSite [Spe97] rely either on pre-specified and domain specific information about particular types of documents, or on hard coded models of the information sources to retrieve and interpret documents. Other agents, such as ShopBot [DEW96] and ILA (Internet Learning Agent) [PE95], attempt to interact with and learn the structure of unfamiliar information sources. ShopBot retrieves product information from a variety of vendor sites using only general information about the product domain. ILA, on the other hand, learns models of various information sources and translates these into its own internal concept hierarchy.

  2. Information Filtering/Categorization
    A number of Web agents use various information retrieval techniques [FBY92] and characteristics of open hypertext Web documents to automatically retrieve, filter, and categorize them [CH97,BGMZ97,MS96,WP97,WVS96]. For example, HyPursuit [WVS96] uses semantic information embedded in link structures as well as document content to create cluster hierarchies of hypertext documents, and structure an information space. BO (Bookmark Organizer) [MS96] combines hierarchical clustering techniques and user interaction to organize a collection of Web documents based on conceptual information.

  3. Personalized Web Agents
    Another category of Web agents includes those that obtain or learn user preferences and discover Web information sources that correspond to these preferences, and possibly those of other individuals with similar interests (using collaborative filtering). A few recent examples of such agents include the WebWatcher [AFJM95], PAINT [OPW94], Syskill & Webert [PMB96], and others [BSY95]. For example, Syskill & Webert is a system that utilizes a user profile and learns to rate Web pages of interest using a Bayesian classifier.



next up previous
Next: Database Approach Up: Web Content Mining Previous: Web Content Mining



Bamshad Mobasher
Wed Jul 16 02:08:33 CDT 1997