You are here


On The Computational and Statistical Interface and "Big Data"
October 21, 2013

The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the statistical and computational sciences.  That classical perspectives from these fields are not adequate to address emerging problems in "Big Data" is apparent from their sharply  divergent nature at an elementary level---in computer science, the  growth of the number of data points is a source of "complexity" that  must be tamed via algorithms or hardware, whereas in statistics, the  growth of the number of data points is a source of "simplicity" in  that inferences are generally stronger and asymptotic results can be  invoked. Indeed, if data are a data analyst's principal resource, why should more data be burdensome in some sense? Shouldn't it be possible to exploit the increasing inferential strength of data at scale to keep computational complexity at bay? I present three  research vignettes that pursue this theme, the first involving the  deployment of resampling methods such as the bootstrap on parallel and  distributed computing platforms, the second involving large-scale  matrix completion, and the third introducing a methodology of  "algorithmic weakening," whereby hierarchies of convex relaxations are  used to control statistical risk as data accrue. [Joint work with  Venkat Chandrasekaran, Ariel Kleiner, Lester Mackey, Purna Sarkar, and  Ameet Talwalkar].  



Bio: Michael I. Jordan is the Pehong Chen Distinguished Professor in the Department of Electrical Engineering and Computer Science and the  Department of Statistics at the University of California, Berkeley.  He received his Masters in Mathematics from Arizona State University, and earned his PhD in Cognitive Science in 1985 from the University of  California, San Diego. He was a professor at MIT from 1988 to 1998.  His research in recent years has focused on Bayesian nonparametric  analysis, probabilistic graphical models, spectral methods, variational  methods, kernel machines and applications to problems in statistical  genetics, signal processing, computational biology, information retrieval  and natural language processing. Prof. Jordan is a member of the National Academy of Sciences, a member of the National Academy of Engineering  and a member of the American Academy of Arts and Sciences. He is a Fellow of the American Association for the Advancement of Science.  He has been named a Neyman Lecturer and a Medallion Lecturer by the Institute of Mathematical Statistics. He is an Elected Member of the International Institute of Statistics. He is a Fellow of the AAAI,  ACM, ASA, CSS, IMS, IEEE and SIAM.