Next: Web Usage Mining
Up: Web Content Mining
Previous: Agent-Based Approach
The database approaches to Web mining have generally focused on techniques
for integrating and organizing the heterogeneous and semi-structured data
on the Web into more structured and high-level collections of resources,
such as in relational databases, and using standard database querying
mechanisms and data mining techniques to access and analyze this
information.
- Multilevel Databases
Several researchers have proposed a multilevel database approach to
organizing Web-based information. The main idea behind these proposals is
that the lowest level of the database contains primitive semi-structured
information stored in various Web repositories, such as hypertext
documents. At the higher level(s) meta data or generalizations are
extracted from lower levels and organized in structured collections such
as relational or object-oriented databases. For example, Han, et. al.
[ZH95]
use a multi-layered database where each layer is obtained via
generalization and transformation operations performed on the lower
layers. Kholsa, et. al. [KKS96] propose the creation and maintenance
of
meta-databases at each information providing domain and the use of a
global schema for the meta-database. King & Novak [KN96] propose
the
incremental integration of a portion of the schema from each
information source, rather than relying on a global heterogeneous database
schema. ARANEUS system [PA97] extracts relevant information
from hypertext documents and integrates these into higher-level
derived Web Hypertexts which are generalizations of the notion of
database views.
- Web Query Systems
There have been many Web-base query systems and languages developed
recently that attempt to utilize standard database query languages such as
SQL, structural information about Web documents, and even natural language
processing for accommodating the types of queries that are used in World
Wide Web searches. We mention a few examples of these Web-base query
systems here. W3QL [KS95]: combines structure queries, based on
the organization of hypertext documents, and content queries, based on
information retrieval techniques. WebLog [LSS96]: Logic-based query
language for restructuring extracted information from Web information
sources. Lorel [QRS
95] and UnQL [BDS95,BDHS96]: query
heterogeneous and semi-structured information on the Web using a labeled
graph data model. TSIMMIS [CGMH
94]: extracts data from
heterogeneous and semi-structured information sources and correlates them
to generate an integrated database representation of the extracted
information.
Next: Web Usage Mining
Up: Web Content Mining
Previous: Agent-Based Approach
Bamshad Mobasher
Wed Jul 16 02:08:33 CDT 1997