You are here

A Generic Parallel Processing Model for Facilitating Data Mining and Integration

Error message

Strict warning: Only variables should be passed by reference in theme_biblio_tabular() (line 285 of /var/www/html/sites/all/modules/biblio/includes/biblio_theme.inc).
TitleA Generic Parallel Processing Model for Facilitating Data Mining and Integration
Publication TypeJournal Article
Year of Publication2011
AuthorsHan, L, Liew, CS, van Hemert, J, Atkinson, M
Journal TitleParallel Computing
Volume37
Issue3
Pages157 - 171
KeywordsData Mining and Data Integration (DMI); Life Sciences; OGSA-DAI; Parallelism; Pipeline Streaming; workflow
Abstract

To facilitate Data Mining and Integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements PEs. The composition mechanism links PEs via data streams, which may be in memory, buffered via disks or inter-computer data-flows. This makes it possible to build arbitrary DAGs with pipelining and both data and task parallelisms, which provides room for performance enhancement. We have applied this approach to a real DMI case in the Life Sciences and implemented a prototype. To demonstrate feasibility of the modelled DMI task and assess the efficiency of the prototype, we have also built a performance evaluation model. The experimental evaluation results show that a linear speedup has been achieved with the increase of the number of distributed computing nodes in this case study.

DOI10.1016/j.parco.2011.02.006
Full Text