You are here

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

PI name: 
Student: 
Project Description: 
A data 'sketch' is a small representation of a large dataset that allows us to approximately solve data analysis and machine learning problems on the original dataset. In the past, we have studied sketching algorithms for problems such as large scale linear regression and graph approximation. Recently, our work has focused on sketches for clustering and low rank approximation on very high dimensional data. We have achieved a number of new theoretical bounds showing that sketching algorithms based on random projection, principal component analysis, and feature sampling can be used to significantly reduce dimension, while still maintaining enough information to find near optimal clusterings and low rank approximations. In the future, we hope to evaluate the performance of these algorithms in practice, and extend our techniques to other relevant data analysis problems
Team: 
Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, Madalina Persu