You are here

DISPEL on the Cloud

DISPEL is a language designed for describing and organising data-intensive processing.
Cloud systems, such as OSDC and Microsoft's Azure are intended to provide easily accessed and economic data-intensive computation.
The challenge is that DISPEL is a streaming technology that potentially can handle large volumes of data as well as continuous streams of data.
This streaming needs computational nodes that can access disks and that can communicate with one another, e.g. stream data to one another.
A gateway is also needed that accepts requests encoded in DISPEL and distributes them to those nodes, partitioning the enactment graph to balance
the load on each node, and co-locating nodes that have large data volumes to send to one another.

The first project phase would develop a deployment phase that set up the gateway and intercommunicating nodes rather like a pilot job in other distributed computing systems. These would then be allocated work as it arrives as DISPEL requests at the gateway. Initially, with whole DISPEL graphs to a node,
and then with preconfigured distribution over the nodes. Measuring the performance or costs of this base-level system would complete this phase.

The second, and more ambitious, phase would dynamically partition the incoming graphs over the available nodes autonomically optimising the combined system to deliver results as quickly as possible. This would be advanced by taking into account alternative resources which might be used, building on the research of Chee Sun Liew. This phase would finish by measuring the effectiveness of various optimisation strategies.

The third phase would consider a continuous stream of DISPEL requests arriving at one or more gateways. It would consider two forms of dynamic optimisation in addition to and in conjunction with those above. (1) delegation of all or part of a request to another gateway, or (2) adjusting the number or type of nodes behind a gateway, increasing their number as the load increases and reducing it as it decreases. The measurements would now show how these optimisations affect the individual jobs and how they affect the cost ($ charged, energy, etc.) of running the workload.

Parts of this might be undertaken during OSDC PIRE research visits; it might also be developed into a PhD.
Contact Malcolm Atkinson, Paul Martin or Chee Sun Liew

Degree level: 
NR
Subject areas: 
e-Science
Computer Architecture
Computer Communication/Networking
Databases
Distributed Systems