You are here

Towards Supporting Data-Intensive Research

CISA Seminar Series
Jano van Hemert

Science, society and business is witnessing a data revolution. Data are now generated by faster and cheaper physical technologies, software tools and digital collaborations. Examples of these include satellite networks, simulation models and social network data. To successfully transform these data into information then into knowledge and finally into wisdom, we need new forms of computational thinking.

These may be enabled by building ``instruments'' that make data comprehensible for the ``naked mind'' in a similar fashion to how telescopes are making outer space available to the naked eye. These new instruments must be grounded in well-founded principles to ensure they have the capacity to transform the complex and large-scale data into comprehensive forms; this demands new data-intensive methods.

Current methods and tools are failing to effectively address data-intensive challenges. They fail for several reasons, all of which are aspects of scalability. Current tools for data-intensive processing are too difficult to use by stakeholders and hence rely on expert intermediaries, which are scarce. Current computational methods are not directly suitable for data-intensive processing; they need to be embedded in a computational framework that facilitates scaling on large computational resources. Computational platforms are changing continuously and consequently methods and people have to adapt to these platforms, which takes valuable time in terms of re-implementations and training.

In order to cope with the future expansion in interactions between people and computational platforms, data models and formats, and computational methods and implementations, we need access to methodologies and technologies that scale in all respects. Solving one particular data-intensive challenge given is always possible given sufficient resources. However, this point-solution approach does not generalise well and leads to an inefficient cottage-industry approach. In order to obtain a cost-effective approach to data-intensive challenges we need to have principles that can be applied in general. Currently, very few principles are derived and for certain aspects in isolation, such as data access or programming models. Other attempts at articulating principles have led to broad and vague guidelines.

In this talk I will (a) introduce the framework I intend to use to extract, articulate and evaluate these principles and (b) show examples of the current progress the National e-Science Centre has made towards scalable solutions.

Date and time: 
Tuesday, 13 October, 2009 - 13:00
60 minutes