raters of the world unite —

The secret lives of Google raters

They spent years testing Google's algorithms—then everything changed.

Things are on fire, as usual. That's Moss from the IT Crowd, who sometimes feels like a rater.
Things are on fire, as usual. That's Moss from the IT Crowd, who sometimes feels like a rater.
The IT Crowd/Channel 4
Something disturbing has been happening to Google's advertising algorithms. These are the programs responsible for placing ads in appropriate contexts; serving up travel-related ads to people searching for hotels or music-related ads to people watching the latest Beyoncé video. But in the UK, government ads for the Royal Navy, the Home Office, and Transport for London recently ran before YouTube videos featuring Holocaust-denying pastor Steven Anderson, who enthusiastically endorsed the man who killed 49 people in Florida's gay nightclub Pulse. According to the UK government, its taxpayer-sponsored ads also ran on videos from "rape apologists" and on white supremacist speeches from David Duke.

Google's business immediately took a hit: prominent European ad agencies cut ties with the company, while AT&T and Verizon cut all video ad buys. Acknowledging the gravity of the problem, Google assured advertisers and users that it would make sure no ads ran alongside "upsetting-offensive" content. The company said it was unleashing its army of over 10,000 raters, people who work around the clock to make sure Google's algorithms don't return results that are unhelpful, offensive, or downright horrific.

Who are these raters? They're carefully trained and tested staff who can spend 40 hours per week logged into a system called Raterhub, which is owned and operated by Google. Every day, the raters complete dozens of short but exacting tasks that produce invaluable data about the usefulness of Google's ever-changing algorithms. They contribute significantly to several Google and Android projects, from search and voice recognition to photos and personalization features.

Few people realize how much these raters contribute to the smooth functioning act we call “Googling.” Even Google engineers who work with rater data don't know who these people are. But some raters would now like that to change. That's because, earlier this month, thousands of them received an e-mail that said their hours would be cut in half, partly due to changes in Google's staffing policies.

Though Google boasts about its army of raters, the raters are not Google employees. Instead, they are employed by firms who have contracted them to Google, full time, for years on end. These raters believe that Google has reaped significant benefits from their labor without ensuring their jobs are secure and stable. That’s why 10 raters came to Ars Technica to tell the story of what their lives are really like.

Leapforce

While a handful of companies supply raters to Google on a contract basis, Leapforce is one of the biggest. Located in Pleasanton, California, its offices lie just outside Silicon Valley and a few miles away from the famous Cold War atomic weapons lab at Lawrence Livermore National Laboratory. Not that the location matters to most raters—few will ever work at the Leapforce offices. Most are contractors who work from home; they meet in chat rooms and have virtual weekly meetings on Leapforce servers. Their managers go by pseudonyms like LFAdmin, DarkSosu, and LFEditorCat, while the raters generally use pseudonyms too. It’s not uncommon for a rater at Leapforce to go years without ever knowing the real names of managers and colleagues.

All work at Leapforce is task-based, much like the crowdsource model popularized by MTurk. To get a task, raters log into Raterhub and see what's available. Some days plenty of tasks exists; on others, a rater might wait hours and be offered nothing. Rating might seem like a job where workers could set their own hours, but realistically they are at the mercy of task availability.

A typical task takes anywhere from 30 seconds to 15 minutes, and the amount of time the rater can bill for the task is pre-determined by Google. This can cause problems when Raterhub is slow, which is a relatively common occurrence.

"Say you have a five minute task but it takes two minutes to load. You now have three billable minutes left," one rater said. Others said that sometimes a task takes more time to load than has been allotted for completing the task itself.

"I have referred nine people to this job. Every one of them failed the exam."

The tasks themselves are widely varied. Some ask raters to evaluate whether a search result is useful or an audio file has been transcribed correctly, while others solicit feedback on the behavior of Android apps. According to raters, some tasks can feel "creepy." These are usually tasks related to personalization services, which require raters to first give Google access to their e-mail, chats, photos, and other Google services they use. Google then turns the rater's personal data into tasks that allow them to give feedback on how well Google's personalization algorithms work. "I don't like the photo tasks," one rater said. "They will show you pictures you have taken and have you rate them." Other raters expressed discomfort at giving Google access to their personal accounts.

Though each task is brief, a rater's work isn't easy. Before they begin at Leapforce, all raters must pass a series of rigorous exams to make sure they understand the 160-page book of guidelines that Google provides to raters. "It's hard to pass," one rater told Ars. "I have referred nine people to this job. Every one of them failed the exam."

For those who do pass, the testing doesn't end. Every few months, raters have to familiarize themselves with important updates to the guidelines, like the recent "upsetting/offensive" flag rules. Plus, each week brings new kinds of tasks or tweaks to what counts as a right answer on old tasks.

"The learning curve is steep," one rater said. Raters are encouraged to take weekly quizzes to keep up to date with changes and to make sure their task responses are in line with other raters. They say they are not paid for this re-training or testing, even though it can take a few hours every week.

At any time, raters may find themselves assigned a job called a "review task." In reality, it's a performance evaluation. Google has already figured out the right answers to the task and uses the review to make sure each rater gives answers that are calibrated with what the company expects. If a rater is too far off the mark, he or she is limited to one hour per day of work until scores improve.

This setup highlights one of the many contradictions embedded in rater work. On the one hand, raters are supposed to represent average users, providing feedback that will help Google craft algorithms that serve the general public. On the other, raters have to stick with Google's interpretation of what an average user is—or risk getting their hours cut. One rater noted that the right answer on a task "often doesn't fit our experiences as real users outside of work."

Sometimes a bot will spot-check raters' work. If the bot finds some kind of problem, it will automatically lock the rater out of the task assignment system. Raters call this "being botted." One rater told Ars that the bots can be buggy and will occasionally lock them out for no reason. In that case, a rater's only recourse is to write to the generic "admin" e-mail address that is the Leapforce raters' main connection to their managers. Replies often take days, and it's nearly impossible to make up any hours lost.

Still, despite the frustrations, the raters who talked to Ars say they aren't interested in quitting. They generally like their jobs, which can pay up to $17.40 an hour for "preferred agents" with excellent scores. Regular raters get $13.50, which is still comfortably above minimum wage in the US. Having sampled a range of online work such as MTurk, raters we spoke with vastly prefer what Leapforce offers.

The work isn't fun, necessarily, but raters do have a sense that they are doing something meaningful. "We actually do make a difference and we're integral to Google's main form of business," one said. Another said it was nice to have a job that involved thinking rather than just clicking.

Then came the e-mail.

All that seamless Googling doesn't happen because of machine magic alone.
Enlarge / All that seamless Googling doesn't happen because of machine magic alone.
Mark Walton

No more full-time work

On Monday, April 3, thousands of US Leapforce raters received an e-mail from “The Leapforce Team,” the same generic monicker that sends them updates on nearly everything. “Effective June 1, 2017, [Raters in the US] can work up to 26 hours in a calendar week (Sunday to Saturday),” it said. “We understand and appreciate that this will have a significant impact on a percentage of our Rater community. That is why we are trying to provide you with as much time as possible about the upcoming change [sic].” The roughly 20 percent of US raters who work full-time had just received a massive pay cut.

The e-mail was greeted with what one rater described to Ars as “chaos and panic.” In an internal Leapforce chat log we obtained, one rater called the situation a “nightmare.” Another responded to the news with: “weeps silently.” A significant portion of the Leapforce workers Ars spoke with are disabled, live in remote areas, or take care of young children, so it’s not easy for them to find full-time work outside the home.

Minutes after the e-mail went out, a Leapforce manager who goes by LFAdmin entered the chat room to address about 600 raters. Though LFAdmin doesn't use his real name, most Leapforce workers had figured out that he is Leapforce Founder/CEO Daren Jackson. (After Ars talked to Jackson for this story, LFAdmin jumped into Leapforce chat to tell everyone for the first time that he is Daren Jackson and that he is "real.")

“Hi guys... this is not a change we are able to control,” Jackson typed. “We are not looking forward to this.” He added that the change was due to “risk mitigation” related to “regulations,” but Jackson would not elaborate. In the e-mail, however, the Leapforce Team claimed the change was driven by “circumstances that are somewhat out of our control, but will ensure Leapforce is compliant with federal and state regulations including the FLSA [Fair Labor Standards Act] and the ACA [Affordable Care Act].”

However, the key provisions of FLSA and ACA don't apply to contract workers, which is what the raters were at the time. "If they are independent contractors, [Leapforce] has no ACA obligations," pointed out Lorie Maring, an Atlanta-based attorney at Fisher & Phillips who works with companies needing to comply with ACA regulations. That made the change even more confusing.

A manager named LFEditorCat told the raters in chat that the pay cut had come at the behest of “Big G’s lawyers,” referring to Google. Later, a rater asked Jackson, “If Google made this change, can Google reverse this change, in theory?” Jackson replied, “The chances of this changing are less than zero IMO.”

What really stung for a lot of raters was that this abrupt reduction in their hours came a couple of weeks after a Google engineer named Paul Haahr had raved in the media about how terrifically helpful raters are to the company. “We’ve only been able to improve ranking as much as we have over the years because we have this really strong rater program that gives us real feedback on what we’re doing,” Haahr told Danny Sullivan at Search Engine Land.

Why would Leapforce cut their hours when Google was so pleased with their work?

Channel Ars Technica