May 31, 2017 3:23 PM

Inside Google's Global Campaign to Shut Down Phishing

It's not easy keeping billions of devices safe from phishing attacks. Here's how Google pulls it off.

At the beginning of May, a phishing scam flooded the web, disguised as a typical Google Docs request. Some of the emails even appeared to come from acquaintances. If victims clicked through and granted seemingly innocuous permissions, they exposed their entire Gmail account to whoever was behind the scam. It was an explosive scheme. And Google responded in kind.

"We convened what we call a war room," says Mark Risher, Google's director of counter-abuse technology. "Basically we pulled people together in a physical room here in Mountain View, California, and we also had experts from many other offices around the company that quickly came together. Each specialty gets called in."

Unfortunately, that sort of crisis response is all too common for Google. Its massive user base and footprint on the web make its services and customers prominent targets for every imaginable phishing attack, not to mention all the other manner of hacks and assaults. But phishing presents an especially tricky problem. Campaigns are hard to spot by design, and also evolve rapidly.

"The bad guys try hard, so we are motivated to try even harder," says Sri Somanchi, a project manager in the Gmail anti-abuse team. "We keep going because we know that any little slip up on our side is going to have a huge cost for users."

That response can take many forms. And if they're doing their job right, you barely even notice.

Phish Fry

When the Google Docs phish spiked—affecting 0.1 percent of Gmail users, or about 1 million accounts—Google anti-abuse teams started by sharing information, and hammering out shifts across Google offices around the world to ensure 24-hour coverage.

"There’s a team that’s working specifically on Gmail inbounds, they’re trying to make sure that the email messages are not getting spread," Risher says. "There’s another team that’s working on account abuse patterns, and they’re trying to look at who is using the credentials that have been accessed. There’s a third team that’s looking at the spread of this message."

Within a few hours, Google had stopped the phishing attempt from spreading further. Within a day, Google rolled out expanded anti-phishing security warnings for Gmail on Android.

That joins a handful of other anti-phishing and threat-warning tools that Google has debuted over the last several years, like the Chrome extension Password Alert, which cautions you if it thinks you just entered your Google username and password into an imposter login page. The company also announced new phishing protections targeted at business users on Wednesday, including warnings when enterprise users attempt to send data outside their company, and additional ransomware protections.

Google works to make it as easy as possible for users to make safe choices and avoid scams, but the company's focus is on tech solutions that are meant to work seamlessly with minimal user buy-in. Some phishing specialists believe that emphasizing user training is the real key to stopping phishing, but as Aaron Higbee, the CTO of the user-training firm PhishMe, puts it, "We need technology to do as much as it can. For Google they have to pursue that." Focusing on technological solutions plays to Google's strengths.

One example is Google's Safe Browsing infrastructure, which displays warning messages across Chrome, Android, Search, and Gmail if you try to access a potentially malicious site or download. Google also makes Safe Browsing available to third-party developers—browsers like Mozilla's Firefox and Apple's Safari incorporate it. And Google uses Safe Browsing in its Ads service to catch ads that attempt to promote malicious content. In all, the company says that Safe Browsing benefits two billion devices per day.

Safe Browsing warnings give a boost to internet users who may feel overwhelmed by the daily barrage of digital threats. But for Google, the service reflects a long-term investment that originated over a decade ago. As Google crawls the internet for its flagship search engine, it uses that data to flag malicious pages that host social engineering attacks, malware, phishing campaigns, and more.

"For a lot of users their main interaction with Safe Browsing is a big red warning page," says Allison Miller, who leads the Safe Browsing team and also works with Google's Threat Analysis Group. That belies how much work happens behind the scenes to generate those warnings in the first place—and the huge scale of the challenge.

"We have to guard every door, every window, every opening, and the bad guys just need to get through one," Risher says. "It can be tough when most of your moments of fame and glory are something bad happened and you were able to stop it before it got too bad."

First Line of Defense

Hundreds of Google employees work on security and anti-abuse at every level. Their approach hinges on numerous layers of protection, a strategy known as defense in depth.

The first layer of defense between phishers and your Gmail account is an automated bulk filtering process, which draws on Safe Browsing and other black-listing tools that blocks a huge amount of junk. Actually, huge might not cover it. Google blocks as much as 90 percent of the email volume sent to Gmail before it ever reaches users. And no, that doesn't include what lands in your spam folder.

"A lot of this is achieved through maintaining the reputation of every sender of the world," Somanchi says. "As we keep receiving emails we compute reputation on thousands of email attributes, and then we use these reputations to pre-determine if a sender is legitimate or shady." Google also scans for bad links, meaning if you accidentally sent a known phishing link to a friend from Gmail, it won't deliver it no matter how sterling your email rep.

It gets even more granular from there. Google subjects messages that make the first cut to even more intense filtering, during which Gmail looks for impersonations and forgeries as it decides whether to plop a message into your spam folder or inbox. In cases of uncertainty, it'll let the message through, but add a banner warning that it may have been sent from a compromised account.

Similarly, if Gmail doesn't have enough information to make a final determination about an email, it may deliver it but with protections in place, like adding warnings and disabling the email's links or attachments. Rather than rely on one or two cure-alls, Gmail provides layers.

"Even if the bad guys have collected your password they still should not be able to use it, and even if they are able to use it what they do in the account is part of our continuous risk-based authentication," Risher says. "It’s very cumulative."

The Gray Area

Googlers say that the biggest challenge in phishing defense and anti-abuse in general is handling content that may or may not be malicious. "You have to moderate your response to ensure that you’re not blocking everyone who is in this gray area," Risher says. "It’s really important to understand the badness that’s taking place, but also to moderate that against the good activity—the regular, legitimate activity—and not go to the other extreme and end up harming the broad [Google] ecosystem."

One form of proactive defense? Notifying webmasters and email senders when they haven't taken steps to secure and authenticate their sites and activity, or when something happens on their site that they may not know about. "It’s a rising trend for bad actors to not just create their own websites for the sole purpose of hosting malware and phishing, but to compromise others, and then leverage that hacked site to host their own content" says Safe Browsing's Miller. Doing so lets bad actors leverage another party's solid reputation. Google's feedback can make webmasters and email senders aware that they've been compromised and that they're unintentionally engaging in abusive behavior.

It also wouldn't be a Google effort without a little machine learning thrown in. Somanchi says that upwards of 95 percent of all spam and phishing identification comes from machine learning. And in the past couple of years, these Gmail mechanisms have evolved to incorporate both traditional supervised learning, in which algorithms are trained on large data sets, with newer unsupervised learning techniques, in which algorithms learn through inference about how to tell legitimate inputs from malicious ones. "Whereas regular computers are making very hard black or white decisions, deep learning opens up the possibility to be more subjective and get closer to approximating, essentially, would humans fall for this?" Risher says.

Google also emphasizes that deploying machine learning also helps preserve user privacy. All that data scanning and user behavior modeling can feel plenty invasive. But Miller says that the goal is to limit access to sensitive information, keeping it siloed and therefore less vulnerable. "The systems operate on aggregated data, and the work is done without any human visibility into potentially private information," Miller says. "We focus our attention on clustering and identifying attackers and their methods. Systems like Safe Browsing and our anti-phishing defenses identify commonalities in attack patterns."

Scams like the Google Docs phish still slip past Google's protections. But what the anti-abuse teams learn from these types of attacks helps them prepare for next time. "When something goes wrong we really are not focused on who screwed up," he says. "We’re looking at what changes we ought to make so this can’t happen again."