You are here

Create Parallel Data Mining Algorithms for Cloud Computing

Tantana Saengngam

Principle goal: to take an existing algorithm and to make it parallel in a cloud computing environment following the Map and Reduce approach of Google.

Research at Google in combination with their vast computational resources have led to interesting ways of making algorithms parallel with the aim to make them faster for problems with large amount of input data [1]. Data mining is such an area where this same principle can apply, assuming algorithms can be run in parallel in a similar fashion. Important to note is that not only the algorithm itself, but also the processes in which it is embedded are distributed. For example, the data may need to be integrated, cleaned and transformed before supplied to the data mining algorithm.

In this project, you will take an algorithm used in a specific project where the aim is to automatically classify anatomical components that exhibit gene expression patterns. These patterns are taken from images taken from stained embryo sections. It is then your task to make the data mining algorithm parallel using the map and reduce principle and then execute your implementation of it on a cloud computing infrastructure, such as Eucalyptus [2].

Project status: 
Degree level: 
Supervisors @ NeSC: 
Subject areas: 
Algorithm Design
Computer Architecture
Distributed Systems
Machine Learning/Neural Networks/Connectionist Computing
Student project type: 
[1] C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In B. Schölkopf, J. C. Platt, and T. Hoffman, editors, NIPS, pages 281–288. MIT Press, 2006. [2]