Principal goal: to substantially improve the performance of the data-intensive analysis for genome-wide association studies (GWAS) by using graphics processing units (GPUs).
Description of the project: Genome-wide association studies are performed in order to detect the genetic variations associated with a certain disease. To conduct the study, researchers obtain the genomes from two large groups of participants: people who have the disease of interest, and people who do not have the disease. By doing massive-scale genetic comparisons between the two groups, the genetic variations that occur far more frequently in the diseased group can be "associated" with the disease. The knowledge gained from the association studies can help to pinpoint the genetic factors that cause the disease.
Due to the extremely large datasets involved in GWAS studies, it is essential to use the most efficient processing methods that are currently available. Although GPUs are not generally applicable to all computing tasks, they have the capability to significantly speedup many types of data-intensive applications due to the GPUs' massively parallel architecture. Only within the last few years have programmers been able to utilise GPUs for non-graphical applications; GPU-targeted compilers now exist that take high-level, general-purpose languages as input (e.g., OpenCL and CUDA).
PLINK is a popular open-source whole-genome association analysis toolset. This project aims to port PLINK's most time-consuming code regions into CUDA and OpenCL, in order to run those code regions on a graphics processor. The steps to complete this goal include the following:
1) Performance profile the PLINK software and identify the "hotspots" with respect to the tasks required to perform GWAS
2) Design and implement the hotspots for GPU in CUDA and OpenCL, which will likely require a significant amount of code refactoring/redesign
3) Attempt various architecture-specific optimisations and compare the hardware/software limitations of the GPU against other forms of parallelisation and distributed systems
4) Compare the performance results of the CUDA and OpenCL implementations using real-world GWAS data