You are here

Towards New Models and Languages for Data Mining and Integration

e-Science Institute Public Lecture
Peter Brezany

Data mining refers to extracting or "mining" knowledge from large amount of data. It has enjoyed great popularity and success, especially in commercial applications, in recent years. Several models of data mining processes were first proposed by vendors of data mining tools, like SEMMA (Sample, Explore, Modify, Model, Asses) by SAS. Later there was a relevant effort aiming at the design of a standard model resulting into the specification of the CRISP-DM (CRoss-Industry Standard Process for Data Mining), which was published in 2000.

Modern business and e-Science applications are typically associated with production of large, geographically distributed, and heterogeneous data sets. They put new requirements on data mining and integration (DMI) processes and, consequently, establish new challenges for research and development. As a response to these requirements, the new EU project ADMIRE (Advanced Data Mining and Integration Research for Europe) is developing a novel DMI infrastructure.

A part of this effort is the development of specification of a new multi-level model called CRISP-DMI. The proposed model in details defines a life cycle of DMI processes. In this talk, features of CRISP-DMI will be illustrated on a medical application.

The model concepts strongly influence the design of the DMI language. The talk will characterize state of the art in DMI languages, present the first proposal of the ADMIRE DMI language, and outline the architecture of the processor of this language. Both forms of the language, graphical and textual are considered.

We conclude with a short discussion of impacts the model and language have on the architecture of the ADMIRE platform.

Coffee and biscuits will be served after the talk.

Date and time: 
Wednesday, 13 August, 2008 - 16:00
60 minutes