The amount of data collected in the last two years is higher than the amount of data collected since the dawn of time. We collect data much faster than they can be transformed into valuable information and are often forced into hasty decisions on which parts to discard, potentially throwing away valuable data before it has been exploited fully. The reason is that query processing, which is the mechanism to squeeze information out of data, becomes slower as datasets grow larger. At the same time, the continuously increased number of hardware contexts ends up slowing processing down further, as keeping all cores busy with doing useful computation is difficult. Today's query engines cannot harness but a fraction of the potential of new hardware platforms. Is it possible to decouple query processing efficiency from the data growth curve?
This talk advocates a departure from the traditional "create a database, then run queries" paradigm. Instead, data analysts should run queries on raw data, while a database is built on the side. In fact the database should become an implementation detail, imperceptible by the user. To achieve this paradigm shift, query processing should be decoupled from specific data storage formats. Ad-hoc primitives and dynamically synthesized operators are key for just-in-time query optimization and processing. Finally, exploitation of compute and memory resources should be seamless and based on hardware hints; extreme vertical integration is an enemy to forward compatibility.