Principal goal: to extend Rapid, a tool for developing web portals for scientific computing, to operate with Apache Hadoop.
This is project is part of the Google Summer of Code 2010 (see http://www.omii.ac.uk/wiki/RapidHadoop)
Rapid is a unique way of quickly designing and delivering web portal interfaces to applications that require computational resources, such as utility computing infrastructures or high-performance computing facilities. It focuses on the requirements of the end-user by designing customised user interfaces for domain-specific applications that allow users to achieve particular tasks from the comfort of their own web browser.
To add a module to Rapid, a technology to quickly create web portal interfaces to execute applications on remote compute resources, that allows it to communicate with the Apache Hadoop framework. Currently, Rapid can submit to several job submission engines, such as Sun Grid Engine, Condor and PBS. You will extend Rapid with code that will facilitate the submission to, monitoring of and handling of data with the Hadoop framework.
Currently, Rapid works with Grid and High-Performance Computing infrastructure. It is your task to adapt Rapid so as it can be used to generate intuitive interfaces that submit jobs to several cloud infrastructures. Examples of these infrastructures are Amazon's Elastic Cloud 2, Eucalyptus, Rackspace, Linode and GoGrid. Preferably you will look into existing solutions that can handle several of these infrastructures at ones via a standard library, such as for example libcloud.
Cloud provides several advantages over other distributed computing approaches such as Grid and high-performance computing. However, it also brings several problems, such as expensive data movement and the potential of wasting resources if virtual machines run idle. In this project you investigate solutions that involve Apache Hadoop to better organise the computation so as to make efficient use of compute resource.
* Learn how Rapid is used.
* Learn how Rapid works internally, especially its modules to communicate with various computing infrastructures
* Plan and design how to add a module for Apache Hadoop
* Write the code with documentation
* Transfer the code to us