Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

A data 'sketch' is a small representation of a large dataset that allows us to approximately solve data analysis and machine learning problems on the original dataset. In the past, we have studied sketching algorithms for problems such as large scale linear regression and graph approximation. Recently, our work has focused on sketches for clustering and low rank approximation on very high dimensional data.

MYLAR: Security Web Applications

Mylar is a platform for building secure web applications. Mylar protects data confidentiality even when servers are compromised. Mylar addresses three challenges in making this approach work. First, Mylar allows the server to perform keyword search over encrypted documents, even if the documents are encrypted with different keys. Second, Mylar allows users to share keys and data securely in the presence of an active adversary. Finally, Mylar ensures that client-side application code is authentic, even if the server is malicious...

CryptDB: SQL Over Encrypted Data

CryptDB is a system that provides practical and provable confidentiality in the face of these attacks for applications backed by SQL databases. It works by executing SQL queries over encrypted data using a collection of efficient SQL-aware encryption schemes. CryptDB can also chain encryption keys to user passwords, so that a data item can be decrypted only by using the password of one of the users with access to that data... (See More)

User-controlled Privacy of Personal Data

The MIT Living Lab mobile apps (called labs) is aimed to allow MIT to be more data-driven. However, users cannot currently specify their privacy preferences for the data collection, contribution, and use by the labs. Though my research, I constructed a suite of tools to enable users to maintain better control over their personal data. It incorporates a 3-pronged privacy mechanism solution that includes context definition, opt-in by choice settings and preference enforcement. We are building the MIT-FIT lab to validate these tools.

Who? What? When? Why? And Where?: Towards Transparent Web Systems

HTTP with Accountability (HTTPA) provides an end-to-end accountability infrastructure that can be used to enable and determine appropriate use of data. Data providers can attach usage restrictions to their data that gets communicated to data consumers, and the data providers will be able to 'audit' usages of their data.

Big Data for Effective Marketing

This work shows how a principled approach to big data can improve customer segmentation...(See More)

Learning Connections in Financial Time Series

This Research presents a michine learning-based method to build a connectedness matrix with goal to address the quanitfications between equities and avoid risk during investment..(See More)

Using Algorithmic Attribution Techniques to Determine Authorship in Unsigned Judicial Opinions

This is analysis of judicial opinionsthat are published without indicating individual authorship. It aims to provide an unbiased, quantitative, and computer scientific answer to a problem that has long plagued legal commentators...(See More)

Learning To Detect Patterns of Crime

Many crimes can happen every day in a major city, and figuring out which ones are committed by the same individual or group is an important and difficult data mining challenge. To do this, we propose a pattern detection method called Series Finder. Series Finder incorporates both the common characteristics of all patterns and the unique aspects of each specific pattern. This is joint work between MIT and the Cambridge Police Department...(See More)

Pages

Subscribe to bigdata@CSAIL RSS