Currently, the analysis of gene expression data generated by microarray and real-time PCR experiments is a slow process, often taking several weeks to properly process data from a small number of patients. In anticipation of a new clinical trial expected to involve 200 patients it is necessary to improve the efficiency of these analyses, through automation, in order to return meaningful biological data on a much shorter time scale.
The current procedures involve numerous stages of extracting small amounts of data from machine formatted, comma-separated text files. The useful data are identified, subjected to processing by established standard operating procedures, and passed to dedicated software for more complicated statistical analyses. Currently, all of these steps are performed manually using Excel spreadsheets and, due to the repetitive and time-consuming nature of the tasks, are vulnerable to errors. It has been recognised that the entire process would be very well suited to computer automation; through the creation of specialised wrapping scripts, databases, and a workflow of some kind to be invoked remotely via a web service, there is the potential to reduce the time required to perform each analysis by several orders of magnitude.
The aim of this project is to outline the analysis process then design, test, and implement the necessary scripts, databases, workflows, and web services as required by the end-user. A possible extension to this includes the option of providing a portal to facilitate interaction with the workflow. Furthermore it would be possible, given sufficient time, to provide a comparison of the data output from existing software, such as R or GeneSpring, in order to verify their suitability for these analyses.