You are here

MapD: In-Memory Database

MapD (Massively Parallel Database) is an analytics database being built by Todd Mostak and Prof. Sam Madden at MIT that allows interactive querying of big datasets.

It takes advantage of the immense computational power and memory bandwidth available in commodity-level, of-the-shelf multiocore architectures such as the Intel Phi platform, graphics processing units (GPUs), originally designed to accelerate the drawing of 3D graphics to a computer screen, to form the backbone of a vertically-integrated data processing and visualization engine that marries the data processing and querying features of a traditional database system with advanced analytic, visualization, and machine learning features.

MapD is designed to run on hardware configurations ranging from GPU-equipped laptops to dedicated 16-card GPU clusters, achieving many orders of magnitude speedups for common workloads operations.  Furthermore, it leverages a hybrid execution framework to run queries simultaneously on all available hardware, including CPUs, to achieve maximal performance.  Some of the MapD's optimizations, such as the compilation of end-to-end query execution plans on-the-fly, working on data at the block instead of the tuple level, and the batching of similar queries - are advantageous on both CPU and GPU and allow the database to fully saturate the respective architecture’s memory and compute bandwidth.

Even on a small GPU server, MapD can query and visualize of billions of streaming data points in real time (i.e., with latencies measured in milliseconds). Furthermore, by batching queries, the system can support hundreds of simultaneous users.  As the name suggests, MapD has been designed to allow for on-the-fly geospatial exploration of large datasets, including the generation of point maps, heat maps, and choropleths that can be consumed by any web client.  In addition, modules are currently being developed that will allow MapD to accelerate standard OLAP datacube and histogramming operations such as those provided by systems like Teradata and Vertica, making them real time as well.

See MapD example: