You are here

Natural Language Interface for Big Data

As we develop storage and compute platforms for scaling to big data and practical algorithms for efficiently processing it, we will need to create new ways to access and interact with massive scale data. A comprehensive solution to the problem of dealing with large amounts of Web and sensor data involves not only analysis strategies, but also access strategies. It is entirely possible that for a given, large dataset, there will be hundreds if not thousands of distinct types of queries that may be applied to the data. Data visualization techniques can help reduce cognitive overload as analysis results are presented to a human user, but the mere presence of these visualization techniques only increases the range of requests a user may make of a big data system. Natural language interaction provides a powerful mechanism for pairing specific user needs with available analysis, query, and visualization techniques.

In our START information access system developed over the past 25 years, we have demonstrated the power of natural language interaction to enable users to find and invoke individual analysis and querying techniques from among thousands of parameterized possibilities, simply by expressing their information needs in compact language in the form of questions and commands.  By nature, the approach of matching requests to natural language annotations allows the user to indirectly engage a broad range of quantities: segments of static data, multimedia information, fragments of text, Web pages, database queries, short segments of executable code, GUI operations, and more.  In this project, we will demonstrate these capabilities with respect to big data.

More info:  START Natural Language Question Answering System

Investigators: