You are here

TaWeka: from biological web services to data mining

Speaker: 
Luna De Ferrari

TaWeka (Taverna-to-Weka) is a Java application that connects data gathering in Taverna with data mining using Weka. The aim is to speed up the creation and comparison of biological classifiers, while simplifying sharing and reuse.

TaWeka:

  1. Uses Taverna workflows to fetch data from web services.
  2. Writes an Attribute-Relation file following the user specifications.
  3. Calls Weka (local or web services) to run machine learning experiments on the data, e.g. classification.
  4. Stores the trained classifier and its success rate to process new instances.

TaWeka v 0.1 uses SQL queries as user specifications (step 2 above). In a benchmark abstracting five data mining scenarios, v 0.1 hits trouble when moving from simple classifiers to genomic learning. To solve this, I'm now working on a new TaWeka incorporating a lightweight semantic layer to better support the data collection narrative, which is where researchers spend most of their time.

Date and time: 
Wednesday, 14 November, 2007 - 11:00
Length: 
45 minutes
Location: 
Leith