next up previous
Next: Web Usage Mining Up: Web Content Mining Previous: Agent-Based Approach

Database Approach

The database approaches to Web mining have generally focused on techniques for integrating and organizing the heterogeneous and semi-structured data on the Web into more structured and high-level collections of resources, such as in relational databases, and using standard database querying mechanisms and data mining techniques to access and analyze this information.

  1. Multilevel Databases
    Several researchers have proposed a multilevel database approach to organizing Web-based information. The main idea behind these proposals is that the lowest level of the database contains primitive semi-structured information stored in various Web repositories, such as hypertext documents. At the higher level(s) meta data or generalizations are extracted from lower levels and organized in structured collections such as relational or object-oriented databases. For example, Han, et. al. [ZH95] use a multi-layered database where each layer is obtained via generalization and transformation operations performed on the lower layers. Kholsa, et. al. [KKS96] propose the creation and maintenance of meta-databases at each information providing domain and the use of a global schema for the meta-database. King & Novak [KN96] propose the incremental integration of a portion of the schema from each information source, rather than relying on a global heterogeneous database schema. ARANEUS system [PA97] extracts relevant information from hypertext documents and integrates these into higher-level derived Web Hypertexts which are generalizations of the notion of database views.

  2. Web Query Systems
    There have been many Web-base query systems and languages developed recently that attempt to utilize standard database query languages such as SQL, structural information about Web documents, and even natural language processing for accommodating the types of queries that are used in World Wide Web searches. We mention a few examples of these Web-base query systems here. W3QL [KS95]: combines structure queries, based on the organization of hypertext documents, and content queries, based on information retrieval techniques. WebLog [LSS96]: Logic-based query language for restructuring extracted information from Web information sources. Lorel [QRS95] and UnQL [BDS95,BDHS96]: query heterogeneous and semi-structured information on the Web using a labeled graph data model. TSIMMIS [CGMH94]: extracts data from heterogeneous and semi-structured information sources and correlates them to generate an integrated database representation of the extracted information.



next up previous
Next: Web Usage Mining Up: Web Content Mining Previous: Agent-Based Approach



Bamshad Mobasher
Wed Jul 16 02:08:33 CDT 1997