Principal goal: by a way of real case study in the Life Science, the goals of this project include: 1) Understanding data-parallel processing using MapReduce model for addressing Performance issues in data intensive applications; 2) Investigating how to adapt data mining algorihtms to the MapReduce model; 3) Prototyping and comparing performance with other frameworks that support data intensive applications.
Performance is an open issue in data intensive applications, e.g., distributed data mining and integration. Recent MapReduce programming model  has become a popular paradigm due to its simplicity and scalability at low cost. It can easily parallelise data over large-scale data centres with thousands of computing nodes and process data on terabyte and petabyte scales and thereby improve system performance. The MapReduce model was originally developed by Google. The MapReduce model provides a simple interface of two functions and allows developers to parallelise data processing tasks. Map function performs grouping that produces intermediate data sets and reduce function performs aggregation that aggregates intermediate data sets into smaller data sets. This project will apply the MapReduce model to a real data mining use case in the Life Science EURExpress-II [2, 3] that aims to automatically annotate anatomical components in an image with corresponding terminologies stored in an ontology database. Performance will be evaluated by a comparison study between the prototype of this project and the prototypes of the ADMIRE project  that is conducting research into architectures for large-scale and long-running data-intensive computations. Through this project, a student will be able to learn knowledge from through levels: 1)At the conceptual level, understanding the conception of data parallel frameworks for supporting large-scale data mining and integration applications. 2)From an algorithmic perspective, investigating the adaptation of data mining algorithms to the MapReduce model. 3)From a practical point of view, gaining practical programming skills via the architectural implementation and being able to thinking critically by a comparison study.