Current Challenges For Data Integration
Alon Y. Halevy
University of Washington
Integration of data from multiple sources is one of the longest standing problems facing the Database and AI research communities. In addition to being a problem in large enterprises, research on this topic has been fueled by the promise of integrating data on the WWW. In the past few years, we have made very significant progress on data integration, from the conceptual and algorithmic aspects, to the systems and product aspects. This talk will briefly review our successes in data integration, and will describe some significant current challenges. In particular, I will describe peer-data management systems, a novel architecture that enables ad-hoc large scale sharing of data, and discuss recent work on the problem of trying to semi-automatically find a semantic mapping between a pair of schemas. For the latter, I describe an approach to schema matching that is based on analyzing a large corpus of database schemas and learning properties of how terms are used in database structures. The talk will discuss some work in progress, but will also highlight opportunities for future research.
- Categories: