You are here

BIG DATA AND BLINKDB: A QUERY ENGINE FOR BIG DATA

September 23, 2013

BlinkDB is a new query processing system which allows one to run interactive SQL queries for tens of terabytes of data with the response time being equivalent to "blink-time".  This was the goal when CSAIL and UC Berkeley AMPLab joined together to collaborate on the BlinkDB.

BlinkDB allows users to trade-off query accuracy for response time by using two key ideas: BlinkDB uses two key ideas: (1) An adaptive optimization framework that builds and maintains a set of multi-dimensional samples from original data over time, and (2) A dynamic sample selection strategy that selects an appropriately sized sample based on a query’s accuracy and/or response time requirements. In an effort to deliver exceptional performance, BlinkDB addresses such various issues as data selection to be used for samples, supporting wide range of queries without having to declare them in advance, and recognizing when the approximation is unreliable.

BlinkDB's performance has been evaluated in comparison with response time of queries in Hive on Hadoops and Hive on Spark.  In such cases, Blink DB demonstrated its high efficiency because it requires considerably less data to compute a fairly factual result.

BlinkDB is being developed by Sameer Agarwal, Henry Milner, Aurojit Panda and Ion Stoica at the University of California, Berkeley in collaboration with Barzan Mozafari and Samuel Madden at the Massachusetts Institute of Technology.

For more information please see: http://blinkdb.org/  and http://istc-bigdata.org/index.php/blinkdb-a-massively-parallel-query-engine-for-big-data/

Publications:
Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, Ion Stoica. BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data. In ACM EuroSys 2013, Prague, Czech Republic (Best Paper Award).
Sameer Agarwal, Aurojit Panda, Barzan Mozafari, Anand P. Iyer, Samuel Madden, Ion Stoica. Blink and It’s Done: Interactive Queries on Very Large Data. In PVLDB 5(12): 1902-1905, 2012, Istanbul, Turkey.