You are here

Student projects

Below follows a list of project descriptions for students. Some of the projects are finished, some are in progress, and some are still available to students that want to do a UG4, MSc or a PhD projects.

If you want to do an MSc or PhD with us then you need to go through the application procedures set by the School of Informatics. Make sure you discuss your research proposal first with Malcolm Atkinson or David Robertson. Important to note: you need to apply under Intelligent Systems & their Applications.

List of projects

Runoff prediction from a Hydrologic Spatio-Temporal Database

Student: 
Charalampos Sfyrakis
Grade: 
first

Present day instrumentation networks in rivers provide huge quantities of multi-dimensional data. Although there are numerous machine learning tools that can extract trends, find patterns and predict future states given some data, it is crucial to properly optimize these techniques according to the semantic content of the data. Hydrology is a data immense science, which requires efficient mining of trajectories of measurements taken at different time points and positions.

Project status: 
Finished
Degree level: 
MSc
Background: 
data mining
Supervisors @ NeSC: 
Subject areas: 
e-Science
Machine Learning/Neural Networks/Connectionist Computing
Projects: 
Student project type: 

Connecting Rapid with the jclouds multi-cloud framework

Principal goal: to extend Rapid, a tool for developing web portals for scientific computing, to operate with jclouds.

This is project is part of the Google Summer of Code 2010 (see http://www.omii.ac.uk/wiki/RapidJclouds)

Project status: 
Finished
Degree level: 
NR
Background: 
Java, XML
Supervisors @ NeSC: 
Subject areas: 
e-Science
Projects: 
Student project type: 

Extension of Rapid to the Hadoop Framework

Student: 
Harika Yasa

Principal goal: to extend Rapid, a tool for developing web portals for scientific computing, to operate with Apache Hadoop.

This is project is part of the Google Summer of Code 2010 (see http://www.omii.ac.uk/wiki/RapidHadoop)

Project status: 
Finished
Degree level: 
NR
Background: 
Java, XML
Supervisors @ NeSC: 
Subject areas: 
e-Science
Projects: 
Student project type: 

Accelerating Genome-Wide Association Studies with Graphics Processors

Student: 
Jeff Poznanovic
Grade: 
first

Principal goal: to substantially improve the performance of the data-intensive analysis for genome-wide association studies (GWAS) by using graphics processing units (GPUs).

Project status: 
Finished
Degree level: 
MSc
Supervisors @ NeSC: 
Other supervisors: 
Dave Liewald, Centre for Cognitive Ageing and Cognitive Epidemiology. Gail Davies, Centre for Cognitive Ageing and Cognitive Epidemiology.
Subject areas: 
e-Science
Bioinformatics
Computer Architecture
Distributed Systems
Parallel Programming
References: 
NIH National Human Genome Research Institute, "Genome-wide association studies," http://www.genome.gov/20019523 PLINK, http://pngu.mgh.harvard.edu/~purcell/plink CUDA, http://www.nvidia.com/object/cuda_home.html OpenCL, http://www.khronos.org/opencl/
Student project type: 

Parameter fitting of cosmological models using billions of galaxies

Student: 
Martha Axiak
Grade: 
first

Principal goal: to develop, test and make available to the cosmology community a parameter estimation method for models that explain our dark Universe.

Project status: 
Finished
Degree level: 
MSc
Background: 
Evolutionary computation, optimisation, machine learning and/or statistics are all desirable.
Supervisors @ NeSC: 
Other supervisors: 
Tom Kitching, Institute for Astronomy, Edinburgh; tdk@roe.ac.uk, tom.kitching@googlemail.com
Subject areas: 
Genetic Algorithms/Evolutionary Computing
Machine Learning/Neural Networks/Connectionist Computing
WWW Tools and Programming
References: 
There is a good review of statistical methods used in cosmology here with some further references suggested http://xxx.lanl.gov/abs/0911.3105 chapter 13 goes into some discussion on the monte carlo methods we use. The standard tool for cosmological parameter estimation is cosmomc which is here http://cosmologist.info/cosmomc/ The original paper for this is here http://arxiv.org/abs/astro-ph/0205436 and the first application is here http://arxiv.org/abs/astro-ph/0302306 A slightly more advances nested sampling method is called multinest which is described here http://xxx.lanl.gov/abs/0809.3437 A general discussion on the current status of cosmology is http://xxx.lanl.gov/abs/astro-ph/0610906 though warning there is some technical details (and a lot of acronyms).
Student project type: 

Data mining to identify small molecules with bioactivity

Student: 
Gideon Jansen Van Vuuren
Grade: 
first

Principal goal: to apply machines learning to identify small molecues that are likely candidates to have relevant bioactivity for follow-up wet-lab experiments.

Project status: 
Finished
Degree level: 
MSc
Background: 
Machine learning essential, biology/bioinformatics desirable.
Supervisors @ NeSC: 
Other supervisors: 
Jan Wildenhain, Tyers Lab, School of Biological Sciences (http://tyerslab.bio.ed.ac.uk/lisa/indPage.php?id=jwil315) Michaela Spitzer, Tyers Lab, School of Biological Sciences
Subject areas: 
Bioinformatics
Machine Learning/Neural Networks/Connectionist Computing
Student project type: 

Optimising Data-Streaming Elements in Distributed Data Mining

Principle goal: To evaluate existing data streaming implementation, formulate model to predict streaming performance corresponding to buffering strategy and then optimise data streaming with dynamical buffering implementation.

Project status: 
Finished
Degree level: 
MSc
Background: 
Java programming essential. Distributed/parallel computing desirable.
Subject areas: 
Computer Architecture
Distributed Systems
Parallel Programming
Performance Modelling and Simulation
System Level Integration
Student project type: 
Projects: 
References: 
[1] M. Atkinson, P. Brezany, O. Corcho, L. Han, J. van Hemert, L. Hluchy ́, A. Hume, I. Janciak, A. Krause, and D. Snelling. ADMIRE White Paper: Motivation, Strategy, Overview and Impact. Technical Report version 0.9, ADMIRE, EPCC, University of Edinburgh, January 2009. [2] A. Jacobs. The pathologies of big data. Commun. ACM, 52(8):36–44, 2009.

Improving Data Placement Strategy in Data-intensive Computations

Student: 
Yue Ma
Grade: 
third

Principle goal: to investigate existing data placement strategies and build a decision model to improve data placement strategies in enacting data-intensive workflow.

Project status: 
Finished
Degree level: 
MSc
Background: 
Distributed/parallel computing, databases desirable. Java programming essential.
Supervisors @ NeSC: 
Subject areas: 
Computer Architecture
Distributed Systems
System Level Integration
Projects: 
References: 
[1] M. Atkinson, P. Brezany, O. Corcho, L. Han, J. van Hemert, L. Hluchy ́, A. Hume, I. Janciak, A. Krause, and D. Snelling. ADMIRE White Paper: Motivation, Strategy, Overview and Impact. Technical Report version 0.9, ADMIRE, EPCC, University of Edinburgh, January 2009. [2] T. Hey, S. Tansley, and K. T. (Editors). The Fourth Paradigm: Data-Intensive Scientific
Student project type: 

Large-scale data mining of chemical-genetic data sets

Primary objective: to perform data mining on a real-world data set from a biology lab in the School of Biological Sciences with the aim to extract patterns that lead to hypotheses about mode of action of compounds and function of genes.

Project status: 
Finished
Degree level: 
MSc
Background: 
Data mining / machine learning / data exploration essential. Distributed computing a major advantage. Experience with biology/bioinformatics desirable, but not essential as you can lean on the biologists' expertise.
Supervisors @ NeSC: 
Other supervisors: 
Jan Wildenhain, Tyers Lab, School of Biological Sciences (http://tyerslab.bio.ed.ac.uk/lisa/indPage.php?id=jwil315) Michaela Spitzer, Tyers Lab, School of Biological Sciences
Subject areas: 
Bioinformatics
Distributed Systems
Machine Learning/Neural Networks/Connectionist Computing
WWW Tools and Programming
Student project type: 

Detecting Web Spam using Machine Learning

Student: 
Andrejs Mironovs
Grade: 
second1

Primary goal: to develop a classification algorithm to detect Web Spam.

Web Spam refers to a set of techniques that intend to increase the ranking of a page in a search engine. From search engine providers and Web users point of view, Web Spam decreases the quality of information search in the Web [1] [2] [3]. The Web Spam can be broadly classified into two types: content spam and link spam. It is a critical and challenging task to detect Web Spam. The success of Web Spam detection has a high commercial value for industries.

Project status: 
Finished
Degree level: 
MSc
Background: 
Machine learning, knowledge of Database, programming in Java or other languages
Supervisors @ NeSC: 
Subject areas: 
Machine Learning/Neural Networks/Connectionist Computing
Student project type: 
References: 
* [1] Z.Gyongyi, H.Garcia-Molina and J.Pedersen. Combating Web Spam with Trust Rank, In VLDB 2004. * [2] L. Becchett, C. Castillo, D. Donato, R. Baeza-yates, S. Leonardi. Link Analysis for Web Spam Detection. ACM Transactions on the Web (TWEB), 2(1) (2008) 2.1-2.45 * [3] H. Najada and I. Himeidi. Web Spam detection using Machine Learning in Specific Domain Features. Journal of Information Assurance and Security. 3 (2008) 220-229 * [4] WEBSPAM-UK2007, http://barcelona.research.yahoo.net/webspam/datasets/uk2007/

Pages