You are here

Efficient Analytical Data Processing with In-Memory SQL/MR

16th September: Makoto

I will talk about analytical data processing on shared-nothing machines,
particularly on a memory-rich cluster. When I was a visiting post-doc of
CWI, I designed and implemented MonetDB/MR, a parallel shared-nothing
database using MonetDB, with Prof. Martin Kersten and Peter Boncz. A
key idea behind MonetDB/MR is exploiting memory-resident MapReduce
processing while a traditional MapReduce scheme is disk-resident;
MonetDB/MR exploits its memory-mapped columnar storage and avoids
on-the-fly data shuffling so that most of the work is done in memory.
By making most of the work done in memory within a single MapReduce job,
our system is resulting faster performance (3.1 to 19.9 times) than
Hive/Hadoop on TPC-H SF=100.

While the above are not for science application specific matters, I
recently started working with geological researchers is my institute and
started to manage big geological/satellite data in a MonetDB/PostGIS
cluster. I briefly introduce issues and challenges in managing the data.