PhD Thesis Title: Towards a Generalized Theory for Deductive Databases with Uncertainty

Abstract:

Uncertainty management has been identified as one of the important future challenges in database research: ``Further research in uncertainty is essential, as we must learn not only to cope with data of limited reliability , but do so efficiently , with massive amounts of data"[Silberschatz et al '91]. Practical considerations
dictate that the framework used for management of uncertain knowledge admit efficient implementation and efficient computations. Logic database programming, with its advantages of modularity and its powerful top-down and bottom-up query processing techniques has attracted the attention of researchers, and numerous frameworks
have been proposed in recent years for deductive databases with uncertainty. These frameworks differ in
(i) their underlying notion of uncertainty,
(ii) the way in which uncertainties are manipulated, and
(iii) the way in which uncertainty is associated with the facts and rules of a program.
On the basis of (iii), we can classify these frameworks into implication based (IB) and annotation based (AB) frameworks. In this research, we propose a generic framework called the parametric framework as a unifying umbrella for known IB frameworks. We developed the declarative, fixpoint, and proof-theoretic semantics of programs in the parametric framework and showed their equivalence. The semantics developed uses multisets as the underlying structure. Using the parametric framework as a basis, we studied query optimization problem of containment of conjunctive queries in this framework, and established necessary and sufficient conditions for containment for several classes of parametric conjunctive queries . Our results yield tools for use in the query optimization for large classes of query programs in known IB deductive databases with uncertainty. We also used this language as a basis for studying termination and data complexity of evaluating programs in the parametric framework. We have identified large classes of programs whose evaluation is similar to datalog programs, i.e. PTIME in the size of the input database.

In terms of expressive power , we compare the two approaches to uncertainty classified in the above. It is shown that the annotated logics are in general more powerful than the IB frameworks. For instance, the generalized annotated logic programming framework (GAP) proposed by Kifer and Subrahmanian '92 can simulate the language proposed by van Emden '86, through the annotations. As the semantics developed for GAP is based on (ideal of) lattices, we note that it cannot simulate an IB framework
whose semantics is based on multisets. We also note that the parametric framework with its unifying capability lacks two useful operations of selection by certainty and join by certainty . To incorporate these operations, we observed that allowing certainty constraints is all that is needed. We introduce the extended parametric framework (EP, for short). We established in [7] that the EP framework is equivalent to a generic AB framework (GAB) extended with constraints, provided annotation functions allowed in GAB are subject to certain postulates similar to those imposed on the combination functions in the (extended) parametric framework. The proof is based on simulation. We remark that the GAB framework introduced is strictly more powerful than the existing AB frameworks. This is because of the stronger form of the selection by join operation provided by the certainty constraints. Finally, we have studied efficient fixpoint computations with uncertainty. We proposed a semi-naive method and established its equivalence with the naive method.

August 1997 -- Concordia University, Dept. of Computer Science