You are here


We partner with different organizations to make data sets available for research here at MIT -- for exploring new ideas; for testing out theories and new algortihms, systems and tools; for student projects and challenges; and ultimately, for demonstrating the impact of Big Data with real world data.



Data Resources:  MGH and The Laboratory for Quantiative Medicine MGH/Harvard Medical School have built a number of very large databases on patients.   Last June, we were tasked by the MGH Cancer Center to build a database on all of the 173,301 Massachusetts General Hospital cancer patients,167,814 of whom were diagnosed between 1968 and 2010. The database contains 559,921 pathology reports, 575,204 discharge reports, 10,938,444 encounter notes, 304,211 operative reports, 22,009,527 procedure notes, 9,159,232 radiology reports,~1,700,000 aggregated medical bills, and ~250,000 images. The database contains all-cause survival information from the Social Security Administration Death Master File (which provides information on all deaths of persons issued social security numbers since 1937), and cause-of-death information from the Massachusetts Death Certificate Database (which contains international classification of disease cause of death information on 1,984,790 people who died in the state of Massachusetts between 1970 and 2008).  The database is linked to the MGH SNAPSHOT gene sequence dataset, thus providing a great wealth of genetic data on a large number of patients.

As far as we are aware, in terms of the total mass of data, this database is the largest source of clinical information on cancer in the world.

Additional Information on Data Resources and Applications, by James Michaelson, PhD

Seminar Series on Quantiative Medicine, Spring 2013 featuring speakers from Harvard, MIT, CSAIL and others.