The OBDA Paradigm
Ontology-based Data Access (OBDA) is a new paradigm, based on the use of knowledge representation and reasoning techniques, for governing the resources (data, meta-data, services, processes, etc.) of modern information systems. The OBDA approach is the result of more than a decade of research carried out by the DASILab group at the Department of Computer, Control, and Management Engineering Antonio Ruberti at Sapienza University of Rome, under the guidance of Prof. Maurizio Lenzerini, and now also by OBDA Systems. In recent years this research has transformed into work in the field, in collaboration with leading companies in the Italian and international public and private business sectors.
OBDA: A three-level architecture
The key idea of OBDA is provide users with access to the information in their data sources through a three-level architecture, constituted by the ontology, the sources, and the mapping between the two, where the ontology is a formal description of the domain of interest, and is the heart of the system. Through this architecture, OBDA provides a semantic end-to-end connection between users and data sources, allowing users to directly query data spread across multiple distributed sources, through the familiar vocabulary of the ontology: the user formulates SPARQL queries over the ontology which are transformed, through the mapping layer, into SQL queries over the underlying relational databases.
- The Ontology Layer
The Ontology layer in the architecture is the mean for pursuing a declarative approach to information integration, and, more generally, to data governance. The domain knowledge base of the organization is specified through a formal and high level description of both its static and dynamic aspects, represented by the ontology. By making the representation of the domain explicit, we gain re-usability of the acquired knowledge, which is not achieved when the global schema is simply a unified description of the underlying data sources.
- The Mapping Layer
The Mapping layer connects the Ontology layer with the Data Source layer by defining the relationships between the domain concepts on the one hand and the data sources on the other hand. These mappings are not only used for the operation of the information system, but can also be a significant asset for documentation purposes in cases where the information about data is widespread into separate pieces of documentation that are often difficult to access and rarely conforming to common standards.
- The Data Source Layer
The Data Source layer is constituted by the existing data sources of the organization.
Advantages of the approach
In OBDA, the client of the information system can interact with the system by means of an abstract representation of the domain, eliminating the man-in-the-middle, represented by the IT expert. Users can ask queries on the basis of the concepts of the domain, rather than the structures of the data sources. By taking into account the ontology and the mappings to the data sources, the OBDA system is in charge of translating the original query into a query to be evaluated at the source.
OBDA can be seen as a form of information integration, where the usual global schema is replaced by the conceptual model of the application domain, formulated as an ontology. With this approach, the integrated view that the system provides to information consumers is not merely a data structure accommodating the various data at the sources, but a semantically rich description of the relevant concepts in the domain of interest, as well as the relationships between such concepts.
The OBDA approach does not impose to fully integrate the data sources at once. Rather, after building even a rough skeleton of the domain model, one can incrementally add new data sources or new elements therein, when they become available, or when needed, thus amortising the cost of integration. Therefore, the overall design can be regarded as the incremental process of understanding and representing the domain, the available data sources, and the relationships between them. The goal is to support the evolution of both the ontology and the mappings in such a way that the system continues to operate while evolving.
The ontology and the corresponding mappings to the data sources provide a common ground for the documentation of all the data in the organisation, with obvious advantages for the governance and the management of the information system.