Get more stunning visual stories every week.
Share
share
tweet
Share
share
tweet
Words by Mark Healy
design by Martin Flores
Every time you prove you’re human to captcha, are you helping Google’s bots build a smarter self-driving car?
Click to see what's under the hood
Cpg Grey, 2017
How Machines Learn
“That’s not how
our tech works.”
So, it’s confirmed. Google does use reCaptcha to teach its self-driving Waymo cars to label images so they can, for example, tell the back of an Escalade from an empty patch of asphalt. Not only did this confirm our cynicism about data and the three companies who seem to be harvesting it all, but as data intrusions go, it’s kind of a welcome one. It’s helping train two-thousand pounds of unsupervised sheet metal not to run us over. Sure, I’ll pick all the tiles with ducks on them to get my concert tickets (and thereby ensure Google’s future domination of the transportation business) if it also prevents my family from getting hit by a self-driving bus.
But then we called Waymo to confirm. They declined, saying only that there are a lot of different methods they use to do the labeling. Their spokesman did not elaborate on what those methods were, but another veteran vision engineer suggested that the key to understanding Waymo was understanding DNNresearch, the company Google bought in 2013 whose algorithms now propel much of Google’s visual classification. DNN’s algorithms learn so rapidly and so accurately (by generating their own synthetic training data) that they make human confirmations essentially unnecessary.
Michael Cutter, a PhD in computer engineering who is director of computer vision at the ag tech startup Tortuga, is confident the reCaptcha data is being used in some way. “I couldn’t imagine wasting human effort like that. Training data is too valuable to modern computer vision techniques. You’d want to do something with it.”
But Cutter also recognizes a middle ground where the data is being used as a safeguard. “What I suspect is they’re using it to double check their classifiers,” he says. “To make sure that when their classifiers are making a decision, humans agree with it.” Even as a front-runner in the race to build a fleet of trustworthy autonomous cars, Google faces huge challenges in an AI enterprise whose accuracy will be scrutinized like few business before ever have. “They’re trying to solve a very hard problem,” says Cutter. “The company that has the best data has an advantage.”
We weren’t the first to ask. There was a YouTube clip from last year that explained machine learning and included an aside linking reCaptcha data to autonomous cars. “Seeing lots of questions about driving lately?” the narrator muses as a cartoon car coasts on screen. “Hmmm. What could that be building a test for?” But there’s also the simple fact that one of the few things we understood about machine learning is that it required a staggering amount of human-labeled visual data to train the Optical Character Recognition (OCR) of the artificial intelligence. And, given the number and variety of images a self-driving car would need to classify, the demand for that data could be insatiable. Plus, Google had already done it: they used reCaptcha data to train the vision for Google Books and Google Street View—first with text and then with house numbers and street signs.
We first floated our suspicions past a senior engineer at a luxury car brand which has plans to launch an autonomous car. “This absolutely makes sense,” he replied. “Google has a history of using humans to verify and improve their OCR recognition through reCaptcha in the past. Applying the same resources to verify and improve their image processing, machine learning, and AI algorithms is the next logical step. Autonomous driving is all about big data and AI, and this is likely part of their program.”
By now, only the most naive consumer would think that a massive, ubiquitous tech conglomerate would collect billions of precious, human-generated data and somehow fail to find a good use for it. So when the idea surfaced that Alphabet (corporate parent of Google, Waymo, Waze, reCaptcha, and dozens of other unavoidable tech companies) was using reCaptcha answers to teach Waymo's autonomous cars to decipher, say, an actual cow from the side of a Ben & Jerry’s truck, it instantly made sense. Here was the perfect confluence of Big Tech conspiracy theory and ninja-level corporate synergy.
We’d already noticed that, more and more, we were being asked to identify reCaptcha images that you’d typically see from the driver’s seat of a car—motorcycles, stop signs, SUV bumpers. And we also knew that the two dominant narratives coming out of the Tech Coast were the rampant misuse of personal data and the race to build the smartest
self-driving car. Was it that paranoid to connect reCaptcha’s seemingly benign visual confirmations and Google’s need for billions of humanly-labeled images to educate Waymo’s robo drivers.
“Google has a history of
using humans to verify and improve their OCR
recognition through
reCaptcha in the past.”
You did it! (We promise we won't use your answers to teach any cars how to drive.)
CONGRATS
click to play a round of
brand blackjack
and see how well you know the logos of your favorite apps.
Like this?
