You are here

Open Research Questions

Our research throws up many questions that we cannot address immediately. Some of these are listed below. We would be delighted to hear from others who would like to join us in tackling them or already have the answers.

Extension of Rapid for submiting jobs to the best computational resource available depending of the features of the jobs.

In EFFORT project and in others, there are difference kind of jobs that must be submitted to a computational resource. Due of the characteristics of the job, sometimes the best computational resource could be EDIM1 (in case the job requires work with a huge volume of data), sometimes could be a typical cluster like ECDF (in case the job requires high performance computing), and other will be enough send the job to the esciences1-8 machines (for small and quick computation).

Subject areas: 
Computer Communication/Networking

Intelligent aggregator pattern for Collective I/O operations.

Many applications use collective I/O operations to read/write data from/to disk. One of the most used is the Two-Phase I/O technique extended by Thakur and Choudhary in ROMIO. Two-Phase I/O takes place in two phases: redistributed data exchange and an I/O phase. In the first phase, by means of communication, small file requests are grouped into larger ones. In the second phase, contiguous transfers are performed to or from the file system.

Degree level: 
NR
Subject areas: 
Computer Architecture

PRAcTICaL-MPI: Portable Adaptive Compression library for MPI implementations.

Message Passing Interface (MPI) is the message-passing library most widely used to provide communications in clusters. There are several MPI implementations like MPICH, CHIMP, LAM, OPEN MPI, etc. We have developed a library called PRAcTICaL-MPI (PoRtable AdpaTIve Compression Library) that reduces the data volume by using loss-less compression among processes.

Degree level: 
NR
Subject areas: 
Computer Architecture

DISPEL on the Cloud

DISPEL is a language designed for describing and organising data-intensive processing.
Cloud systems, such as OSDC and Microsoft's Azure are intended to provide easily accessed and economic data-intensive computation.
The challenge is that DISPEL is a streaming technology that potentially can handle large volumes of data as well as continuous streams of data.
This streaming needs computational nodes that can access disks and that can communicate with one another, e.g. stream data to one another.

Degree level: 
NR
Subject areas: 
e-Science
Computer Architecture
Computer Communication/Networking
Databases
Distributed Systems

Distributed implementation of brain imaging analysis applications

Brain images are used in a variety of multi-disciplinary studies including: medicine, psychology, linguistics,...
They used a range of image types generated with different equipment: PET, SPECT, EEG, MEG, MR, CT and
using different parameters that produce very different data sizes and number of images.

Degree level: 
NR
Subject areas: 
e-Science
Distributed Systems
Neuroinformatics

Benchmark comparisons of EDIM1 for tuned Map-Reduce

The EDIM1 data-intensive architecture is intended to accelerate data-intensive processing.
One good way to organise this processing is with a map-reduce model.
We can consider two candidate implementations: Hadoop and the Spectre+Sphere combination from Grossman et al.

Degree level: 
NR
Subject areas: 
e-Science
Computer Architecture
Computer Communication/Networking
Distributed Systems

Light-weight distributed workflow

Processing large amount of data across a set of nodes in a cluster like EDIM1 requires deploying and running a workflow and a set of processing elements and library across all the nodes.
The complexity of the problem and the size of the data implies that the execution of the workflow is often an exploratory and iterative process.

Degree level: 
NR
Subject areas: 
e-Science
Computer Communication/Networking
Distributed Systems
Parallel Programming
Programming Languages and Functional Programming
Programming Language Semantics
Software Engineering

Large data storage

Scientific laboratories produce large amounts of data, often stored as files in hierarchical folders. File systems do not scale well with large number of files. In particular, access to data becomes hard if query criteria do not match storage criteria.

Degree level: 
NR
Subject areas: 
e-Science
Computer Architecture
Computer Communication/Networking
Databases
Distributed Systems
Parallel Programming
Software Engineering

Privacy protection for medical data in clouds

Data protection is a great concern when dealing with medical data because it contains sensitive personal information.
Nevertheless, medical research could greatly profit from researchers being able to share data across institutional borders in
a safe way. There is a trade-off between privacy protection and research interests avoiding to extreme data removals or

Degree level: 
NR
Subject areas: 
Neuroinformatics

Data-streaming experiments on cloud platforms

Data streaming is a strategy for scalable or continuous data processing. We have developed a high-level notation for describing distributed and heterogeneous data-streaming workflows called DISPEL and have a substantial body of applications described in DISPEL. An implementation based on OGSA-DAI exists and at least two other implementations are partially constructed. The Open Questions that need investigating via a series of experiments are:

Degree level: 
NR
Subject areas: 
e-Science
Computer Architecture
Databases
Distributed Systems

Pages