Josep is a postgraduate student at the Computer Science Department. He is visiting our group as part of the HPC-EUROPA2 programme where he will use our expertise on web portals and parallel computing for the Parallel-TCoffee sequence alignment software.
PERFORMING A COMPREHENSIVE STUDY OF THE SCALABILITY OF THE PARALLEL VERSION OF THE APPLICATION TCOFFEE.
It is necessary to scale the number of alignments performed in our cluster and thus it must be tested in bigger cluster. This study will analyze the performance scalability of the application by scaling the problem (align a larger number of sequences or that they have a greater length). Some previous studies of the underlying algorithm TCoffee have shown that two main factors that limit its scalability are the data dependencies imposed by the guide tree that defines the order in perfect alignment between sequences and the huge Memory requirements of the extended library to maintain consistency in the alignment. These two parameters have a quadratic cost in the algorithm (O (N2L2)), depending on the number of sequences (N) and size (L), which greatly difficult its scalability. In this framework, we plan to carry out a set of performance experiments, depending on the size of the problem, the number of processors and Memory available. In each experiment we will test the original version of the algorithm TCoffe and some alternative proposals for improving the efficiency of the parallel version (BGT-Coffee). From these experiments, the current limits of scalability of the application under study will be analyzed and new proposals in the algorithm to improve its efficiency and scalability will be performed.
TO PROVIDE EASY ACCESS TO THE BIOTECHNOLOGY COMMUNITY TO AVAILABLE HPC RESOURCES TO CONDUCT LARGE-SCALE ALIGNMENTS BY MEANS OF A GUI ALIGNMENT TOOL.
Nowadays, researchers in the field of genomics increasingly require greater computational resources to perform their experiments.
Until recently, to align few tens of sequences it was enough a desktop computer, which in few hours of work could obtain a satisfactory result.
But as the problems grows, it is more evident the need for an alignment tool for large-scale, enabling the processing of thousands or even tens of thousands of sequences. The computational requirements of these problems (months and terabytes of computer time and memory respectively) requires HPC systems with thousands of processors. However, the difficulty of using HPC platforms have limited the use of these resources in the field of biotechnology.
For this reason, we propose as part of this project to develop a portal or web application allowing access in an easy, efficient and controlled alignment tool (Parallel TCoffe) to the computing resources needed (Cluster or Grid) to perform large-scale alignments. In order to achieve this objective, we will use a specific tool called Rapid . This tool is a cost-effective and efficient way of designing and delivering portal interfaces to applications that require remote compute resources. The aim of Rapid is to make completing these tasks as simple as ordering a DVD or booking a ﬂight on the web. This customised interfaces allow tasks to be performed without referring to terminology about the underlying computational infrastructure. Moreover, the system allows to expose particular features of applications as not to overwhelm the user.