Google’s AI scientists showed that you can get better image recognition results by feeding machines a whole lot more data

caption
Google CEO Sundar Pichai delivers the keynote address at the Google I/O 2017 Conference at Shoreline Amphitheater on May 17, 2017 in Mountain View, California.
source
Justin Sullivan/Getty Images

A recent artificial intelligence (AI) experiment Google conducted in partnership with Carnegie Mellon University (CMU) showed that it’s possible to get far better image recognition results simply by feeding algorithms a lot more data, according to a Wired report.

Google released a new paper last week outlining the principles behind the experiment.

Machine learning algorithms learn to better perform a particular task by munching through enormous quantities of data, but so far the amount of data the major AI experiments have been conducted with has remain mostly unchanged.

Image recognition software usually works with a collection of about 1 million pictures, and AI scientists questioned whether merely tweaking the machine learning algorithms could return better, more accurate results.

“While both GPUs and model capacity have continued to grow, datasets to train these models have remained stagnant. Even a 101-layer ResNet with significantly more capacity and depth is still trained with 1M images from ImageNet circa 2011,” reads the paper. “Why is that? Have we once again belittled the importance of data in front of deeper models and computational power?”

So with this new experiment, they moved to a wholly different strategy, purely based on quantity, and fed the machines with 300 million photographs. The outcome was allegedly incredible, with the image processing system producing what the paper defines “state of the art results” on a series of standard image recognition tests (like object detection). Each test performed better on the new model with a bigger dataset.

Google and CMU’s data experts ultimately concluded that there is a clear relationship between the use of a vastly superior quantity of data and the markedly better results. “The general consensus seems to be that everyone expects some gain in performance numbers if the dataset size is increased dramatically,” the paper reads.