You are here

Optimising Distributed Data Integration and Data Mining Service through Transformation of Data Workflow into Parallel Stream

Chee Sun Liew

Over the past decades, running large-scale experiments using computational tools has become popular in modern science. The data processing steps involved in such experiments are usually complex and compute intensive. A challenge arises when the demand comes from large collaboration projects that involve running computations across institutions and continents, where the data and machines are located on distributed sites. The common solution to make the experiments more manageable is executing the processing steps as a workflow, using domain-specific or generic workflow management systems. The workflow management systems map a scientific workflow onto available resources for execution. The resources can be heterogeneous and scattered in geographically distributed locations. Thus, the key factors in determining the success of scientific workflow execution will rely on the data integration, resource mapping and the process execution itself. In my thesis, I propose to improve the overall performance of scientific workflows by optimising the resource integration and mapping mechanism, and transforming the data workflows into parallel streams to speed up process execution.

Project status: 
Degree level: 
Supervisors @ NeSC: 
Student project type: