Note: Andrei's thesis was awarded a 91% and received the Best Undergraduate Project award sponsored by Microsoft
To perform data mining in the form of association rules on scientific data from a microarray study on Cystic Fibrosis, with the objective of evaluating and improving the mining algorithm.
The cystic fibrosis group in the Molecular Medicine Centre (MMC) at the Western General are developing gene gene therapies to treat cystic fibrosis, a life-threatening hereditary disease. To understand which genes are responsible for the disease, and which genes may have diagnostic or prognostic value, they go through a long process of laboratory experiments and analysis, involving microarray experiments and several types of statistical tests. A system is now in place that captures the data from a study that aims to find out which genes may be good targets for therapy into a standard (MySQL) database.
In this project, you will take the data from the target study and apply association rules, a particular type of data mining, to it. The aim is to find relationships between genes that help predict which processes should be changed in order to stop or reduce Cystic Fibrosis from affecting a patient. The advantage for the biologists is that they may discover more complex relationships, which cannot be found using the typical statistical procedures they currently depend on. This is an exciting project as your will work with real scientific data.
The scope of the project includes testing various derivations of the original association rules algorithms, such as extending it with quantifications of gene expression levels, and clustering within rules to understand how rules vary between patients and non-patients. Our first hypothesis is that we can find more interesting gene interaction patterns using association rules. Other hypotheses can be investigated along the lines of comparing different variants of the association rule mining algorithm.
You will work closely with three people, Jano van Hemert (National e-Science Centre) will provide support on data mining, Rob Kitchen (National e-Science Centre) will provide support on the microarray data management, and Varrie Ogilvie (Molecular Medicine Centre) will provide support on interpretation of microarray data.