A Data-Efficient Learning System for General Purpose

Recent advances in machine learning have led to dramatic improvements in accuracy in many problem domains, including visual recognition and natural language understanding. These techniques, however, require large amounts of annotated training data, which is often collected through crowdsourcing. This requirement can be problematic in defense applications where training data is classified or when annotations can only be provided by a few experts who have limited time. Consider an ecologist, for example, trying to track populations of a specific kind of bird in camera traps in the wild or an expert trying to automatically quantify building damage from aerial footage in the wake of a natural disaster. In both cases, one cannot expect millions of labeled training images. 

Bharath Hariharan and Claire Cardie, Computer Science, are inventing new machine learning architectures and algorithms for learning from limited labeled data. Prior work on the problem has focused on a few narrowly defined problems, such as carefully constructed image classification benchmarks. In contrast, Hariharan and Cardie are working toward a truly general approach. When given any problem in any domain, they want to be able to quickly train a machine learning model for a given problem with very little labeled training data.

The researchers are exploring this idea in the context of computer vision and natural language problems and building on a significant amount of their past work done at Cornell. This general-purpose learner—an automated system that when presented with a new task with a minimal labeled dataset—outputs an effective model for the task. It can be used by lay-people without requiring knowledge of machine learning. It will also unlock many applications where labeled data is in short supply, both when training data is restricted and cannot be crowdsourced or when the organization has limited capital. This work also aims to spur the broader research community to consider the problem of building unified learners for a wider array of tasks.

Cornell Researchers

Funding Received

$1.4 Million spanning 3 years

Sponsored by