Big Data integration -- combining multiple diverse data sets together -- is one of the key problems organizations face when working with Big Data. The challenges of integrating data are myriad: different data sets may come from different sources (internal and external); may have been created by different people; be structured in different ways; and contain different data types (e.g., text, images, maps, database tables, etc.) In addition, data sets may represent different levels of temporal or spatial granularity (e.g., real time per-transaction data vs. historical monthly reports), may use different terms to describe the same object or entity, may contain duplicates, and so on. Solving these challenges today requires substantial manual effort. As organizations scale up the volumes of data and the different types of data they work with, new approaches that minimize the amount of human effort will be needed.
The goal of this workshop is to discuss these challenges and describe some new tools and technologies aimed to address them.
WORKSHOP #1: Big Data Integration
Date: Thursday, April 4, 2013
Location: MIT CSAIL