Reactive Point Processes for Power Grid Maintenance

Reactive point processes (RPP's) are a new statistical model designed for predicting discrete events, incorporating self-exciting, self-regulating, and saturating components. RPP's can saturate when too many events or inspections occur close together, which ensures that the probability of anevent stays within a realistic range...(See More)


beatDB aims to fulfill automated, large scale, rapid physiological condition prediction by exploiting untapped archival sensor data from humans...(See More)

Compressive Genomics: Making Sense out of Massive Data

The principle of this research is to take advantage of the redundancy in the system. Doing computation on the unique parts of the data instead of similarities. Compressing the data in the ways that computation can be done without having to decompress the whole data but just the results of interest...(See More)

Quantification and Analysis of Large Multimodal Clinical Image Studies: Application to Stroke

This project's goal is to understand and analyze large clinical imaging data sets in efforts of understanding underlying causes of the decease...(See More)

GenBase – Complex Analytics Benchmark for Genomics

This benchmark is a representative of data management and analytics workload that is meant to scale to large dataset sizes and multiple nodes across a cluster... (See More)


ICU Acuity: Real-time Models versus Daily Models

The aim of this research is to explore the feasibility of real-time mortality risk assessment for ICU patients.This study used retrospective analysis of mixed medical/surgical intensive care patients in a university hospital. ...(See More)

A Study in Transfer Learning: Leveraging Data from Multiple Hospitals to make Hospital-Specific Prediction

This research aims to investigate three approaches to learning hospital-specific predictions about the risk of hospital-associated infection with Clostridium difficile, and perform a comparative analysis of the value of different ways of using external data to enhance hospital-specific predictions. In conclusion it shows how external data from other hospitals can be successfully and efficiently incorporated into hospital-specific models...(See More)

SciDB: Open Source Data Management and Analytics Software

The vast majority of machine learning, statistical, and scientific operations can be expressed via a small number of linear algebra operations. SciDB is a database system designed to support scalable linear algebra over massive arrays stored on disk of a large cluster of machines. It is much faster than relational databases on these types of workloads, and scales to much larger datasets than main memory matrix-oriented systems like Matlab and R.

ExpertSourcing: Enterprise Question Collection and Classification

This research focuses on mining big data through meaningful questions answering and going beyond keyword search. The aim is to develop a questions feature extraction system...(See More)

MoocDB: a platform for Massive Open Online Course

The MOOCdb project aims to advance MOOC data science. It includes a platform agnostic functional data model for data exhaust from MOOCs, a collaborative-open source-open access data visualization framework, a crowd sourced knowledge discovery framework and a privacy preserving software framework... (See More)


Subscribe to bigdata@CSAIL RSS