You are here


A Survey of Systemic Risk Analytics

Andrew Lo
Dimitrios Bisias, Mark Flood, Stavros Valavanis

We provide a survey of 31 quantitative measures of systemic risk in the economics and finance literature, chosen to span key themes and issues in systemic risk measurement and management. We motivate these measures from the supervisory, research, and data perspectives in the main text, and present concise definitions of each risk measure—including required inputs, expected outputs, and data requirements—in an extensive appendix.

Big Data for a Better Life: The Trento Smart City Project

Alex Pentland

We have deployed a data sensing and data sharing architecture in the city of Trento in order to `mashup' government, company, and individual mobile data.  The goal is to validate the value, monitization, and privacy/ownership issues of Big Data in running a `smart city.' Joint with Telefonica, Telecom Italia, Government of Trento, European Inst. Technology.


Sam Madden
Benjamin Letham, Katherine A. Heller

BlinkDB is a database system that runs on top of Hadoop (MapReduce), running SQL queries and translating them into MapReduce jobs. The key idea is that rather than running queries over the entire data set, it runs queries on a random (precomputed) sample of the data, and uses sampling theory to estimate the true query answer.

More Information:

Growing a List

Building the Next Generation of Search Engines: Growing a List of Almost Anything

Cynthia Rudin

The next generation of search engines should not simply retrieve URLs, but should aim at retrieving information. We designed a system that leads into this next generation, leveraging information from across the Internet to grow an authoritative list on almost any topic. Our method starts from a small seed of examples, and intelligently grows a list of items relevant to the seed. 

Cloud-scale, Flexible Scaling and Factoring of Machine Learning Algorithms: The FlexGP Project

Una May O'Reilly


In a nutshell, the FlexGP project goal is scalable machine learning using genetic programming (GP).  Genetic programming is a mature, robust multi-point search technique (inspired by evolution) which supports readable, and flexibly specified learning representations which can readily express linear or non-linear data relationships. It is well suited to parallelization and machine learning. It has a strong record in real world domains.

Learn more FlexGP project:

Consumer credit-risk models via machine-learning algorithms

Andrew Lo
Project image Amir E. Khandani, Adlar J. Kim

We apply machine-learning techniques to construct nonlinear nonparametric forecasting models of consumer credit risk. By combining customer transactions and credit bureau data from January 2005 to April 2009 for a sample of a major commercial bank’s customers, we are able to construct out-of-sample forecasts that significantly improve the classification rates of credit-card-holder delinquencies and defaults, with linear regression R2’s of forecasted/realized delinquencies of 85%.

Declarative, Graphical Construction of Complex Report Queries

David Karger
Eirik Bakke

Many use cases for business-oriented databases involve the creation of tailor-made summaries known as "reports". Report development is tedious because multiple SQL queries may be required to generate a single report, because queries may include complex combinations of formulas and aggregate functions (e.g. averages of totals), and because the visual output layout of non-tabular results must be manually defined through the use of templating languages or a graphical form editor.

Dynamic Reduction of Query Result Sets for Interactive Visualizaton

Michael Stonebraker
Leilani Battle, Remco Chang

Modern database management systems (DBMS) have been designed to efficiently store, manage and perform computations on massive amounts of data. In contrast, many existing visualization systems do not scale seamlessly from small data sets to enormous ones. We have designed a three-tiered visualization system called ScalaR to deal with this issue. ScalaR dynamically performs resolution reduction when the expected result of a DBMS query is too large to be effectively rendered on existing screen real estate.

Energy-Efficient Algorithms

Erik Demaine

The new field of energy-efficient algorithms aims to develop new techniques for solving computational problems with vastly reduced energy consumption—for some problems, by several orders of magnitude—in exchange for a small increase in time and memory requirements.  Specifically, we explore how to algorithmically exploit reversible computation, an idea that has been around since the 1970s and has just started to become a practical reality in the latest AMD chips, but for which we have only just begun understanding how to design efficient algorithms.  Our preliminary investigations

Energy: Hydrocarbon Exploration

Piotr Indyk, Tommi Jaakkola, Bill Freeman
Tomaso Poggio

In this project, the goal is to identify boundaries between different types of underground rocks using seismic sensors. Such boundaries are of interest in hydrocarbon exploration as they are places where oil is often present. These sensors produce massive streams of data that need to be mined to understand the location of boundaries. Researchers are working these mining algorithms, as well as advanced compression and encoding techniques to compactly summarize these data streams.