Bigdata@CSAIL Lecture Series: SAP HANA Evolution from High Performance Enterprise Data Management to Big Data Platform
Wednesday, December 9, 2015 - 4:00pm
Anil Goel, Vice President and Chief Architect, HANA Data Platform, SAP

SAP's HANA data management platform was architected from the ground up to leverage modern hardware technologies including large main memories, multi-core parallelism, SIMD architectures and vector processing, and to exploit software-hardware co-innovation.

BigData@CSAIL Lecture Series: The Future of Analytics
Friday, November 6, 2015 - 2:30pm
Joseph Sirosh, Corporate Vice President, Microsoft

A new era of analytics is being engendered by cloud computing. The cloud gives us the power to collect and integrate data from an enormous variety of sources, to process big data at amazing scale and economics, to dramatically simplify development and deployment, and offer amazing intelligent APIs and applications as hosted services.

BigData@CSAIL Lecture Series: Running a start up inside the world’s largest privately owned technology company
Wednesday, October 28, 2015 - 4:00pm
Aidan O'Brien, Senior Director, Strategic Big Data Initiative, EMC Corporation

Aidan O’Brien is running EMC’s Strategic Big Data Initiative. This program has been designed to both establish EMC as a leader in the big data market and accelerate the transformation of how the 30-year-old storage giant does business.

BigData@CSAIL Lecture Series with Jeremy Freeman
Thursday, September 24, 2015 - 2:30pm
Jeremy Freeman, Janelia Farm, HHMI

Computation + neuroscience

Future Architecture Research - Big Data environment
Wednesday, July 29, 2015 - 2:30pm
Uri Weiser, Professor, Electrical Engineering department of the Technion IIT

The era of Big Data Computing is already here. Data centers today are reaching 1 million meter square each, while the power required to operate such center reaches close to 100 MWatts each. The electrical cost of such centers dominates operating expenses.

Reverse-engineering online tracking for privacy, transparency, and accountability
Monday, December 8, 2014 - 4:00pm to 5:00pm
Arvind Narayanan, Assistant Professor, Princeton University
When we browse the web, data about us is collected, traded and put to use in creative ways. The utter lack of transparency makes web tracking problematic. The Princeton Web Transparency and Accountability Project ( aims to reverse engineer online data collection and data-driven personalization.
Extracting Declarative and Procedural Knowledge from Documents and Videos on the Web
Tuesday, December 2, 2014 - 2:00pm to 3:00pm
Kevin Murphy, Research Scientist, Google

We describe how we built a very large probabilistic database of declarative facts,  called "Knowledge Vault", by applying "machine reading" to the web. This approach extends previous work, such as NELL and YAGO, by leveraging existing knowledge bases as a form of "prior".

Running with Scissors: Fast Queries on Just-in-time Databases
Wednesday, October 15, 2014 - 4:00pm to 5:00pm
Anastasia Ailamaki, Professor and Lab Director, EPFL

The amount of data collected in the last two years is higher than the amount of data collected since the dawn of time. We collect data much faster than they can be transformed into valuable information and are often forced into hasty decisions on which parts to discard, potentially throwing away valuable data before it has been exploited fully.

Deep Learning: Overview and Trends
Thursday, September 25, 2014 - 11:00am to 12:00pm
Andrew Ng, Associate Professor of Computer Science at Stanford University

Deep learning is the leading approach to many problems in computer vision, speech recognition, NLP, and other areas.  In this presentation, I will give a broad overview of deep learning.  I will discuss the key reasons for its success, and the important role that scalability plays.  I will also describe unsupervised learning approaches to deep learning--such as the "Google cat" r

Data Cleaning from Theory to Practice
Wednesday, September 17, 2014 - 4:00pm to 5:00pm
Ihab F. Ilyas, Professor, University of Waterloo

With decades of research on the various aspects of data cleaning, multiple technical challenges have been tackled and interesting results have been published in many research papers. Example quality problems include missing values, functional dependency violations and duplicate records. Unfortunately, very little success can be claimed in adopting any of these results in practice.

From Answering Questions to Questioning Answers (and Questions): Toward Computational Fact-Checking
Thursday, May 15, 2014 - 4:00pm to 5:00pm
Jun Yang, Associate Professor, Duke University

Our news are saturated with claims of "facts" made from data.  Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, e.g., is a claim "cherry-picking"?  In this talk, I will describe a framework that we developed recently for checking facts based on queries over structured dat

Scuba: Diving into Data at Facebook
Wednesday, April 23, 2014 - 4:00pm to 5:00pm
Janet Wiener, Software Engineer, Facebook

Facebook engineers query multiple databases to monitor and analyze Facebook products and services. The fastest of these databases is Scuba, which achieves sub second query response time and latencies of under a minute from events occurring (a client request on a phone, a bug report filed, a code change checked in) to graphs showing those events on engineers’ monitors.

NSA Surveillance and What To Do About It
Thursday, February 6, 2014 - 5:00pm to 6:00pm
Bruce Schneier, Fellow, Berkman Center for Internet and Society

Edward Snowden has given us an unprecedented window into the NSA's surveillance activities.  Drawing from both the Snowden documents and revelations from previous whistleblowers, this talk describes the sorts of surveillance the NSA conducts and how it conducts it.  The emphasis will be on the technical capabilities of the NSA, and not the politics or legality of their actions. 

Very Small Arrays: Data Graphics at the New York Times
Tuesday, November 12, 2013 - 1:00pm to 2:30pm
Amanda Cox, New York Times

Journalism has very little in common with big data. (The data in journalism is almost entirely tiny.) But there may be some similarities, at least in spirit: we both to know things we shouldn't be able to know, depend heavily on asking the right questions and quick iteration, and prefer way more detail than we actually need, at least at the beginning.

NSA Mass Surveillance: Turning the Fourth (and First) Amendment on its Head
Monday, October 28, 2013 - 2:00pm to 3:00pm
Cindy Cohn, Legal Director, Electronic Frontier Foundation

Cindy Cohn, Legal Director of the Electronic Frontier Foundation, will walk us through the NSA Mass Surveillance activity  that has been now confirmed by the government in light of the revelations of Edward Snowden.