Mar 3, 2011 1:36 PM

TED 2011: The 'Panda' That Hates Farms: A Q&A With Google's Top Search Engineers

LONG BEACH, California — Google announced a new update last week to its search engine that addressed the growing complaint that low-quality content sites (derisively referred to as content farms) were ranked higher than higher-quality sites that seemed to be more important to users. This major change affects almost 12 percent of all search results, […]

LONG BEACH, California -- Google announced a new update last week to its search engine that addressed the growing complaint that low-quality content sites (derisively referred to as content farms) were ranked higher than higher-quality sites that seemed to be more important to users. This major change affects almost 12 percent of all search results, and the web is still buzzing about its implications, which include dramatic losses for some companies (Mahalo, Suite 101), and gains by some established sites known for high-quality information.

The change comes at a time where critics are wondering whether Google’s search quality has flagged. I delved into the mysteries of the search engine for my upcoming book, In the Plex, and this week had breakfast at the TED conference with the Google engineers who wrote the blog item announcing the change: the company’s search-quality guru Amit Singhal and Matt Cutts, Google’s top search-spam fighter.

Here's an edited transcript.

Wired.com: What’s the code name of this update? Danny Sullivan of Search Engine Land has been calling it "Farmer" because its apparent target is content farms.

Amit Singhal: Well, we named it internally after an engineer, and his name is Panda. So internally we called a big Panda. He was one of the key guys. He basically came up with the breakthrough a few months back that made it possible.

Continue reading ...

Wired.com: What was the purpose?

Singhal: So we did Caffeine [a major update that improved Google’s indexing process] in late 2009. Our index grew so quickly, and we were just crawling at a much faster speed. When that happened, we basically got a lot of good fresh content, and some not so good. The problem had shifted from random gibberish, which the spam team had nicely taken care of, into somewhat more like written prose. But the content was shallow.

Matt Cutts: It was like, "What’s the bare minimum that I can do that’s not spam?" It sort of fell between our respective groups. And then we decided, okay, we’ve got to come together and figure out how to address this.

Wired.com: How do you recognize a shallow-content site? Do you have to wind up defining low quality content?

Singhal: That’s a very, very hard problem that we haven’t solved, and it’s an ongoing evolution how to solve that problem. We wanted to keep it strictly scientific, so we used our standard evaluation system that we’ve developed, where we basically sent out documents to outside testers. Then we asked the raters questions like: "Would you be comfortable giving this site your credit card? Would you be comfortable giving medicine prescribed by this site to your kids?"

Cutts: There was an engineer who came up with a rigorous set of questions, everything from. "Do you consider this site to be authoritative? Would it be okay if this was in a magazine? Does this site have excessive ads?" Questions along those lines.

Singhal: And based on that, we basically formed some definition of what could be considered low quality. In addition, we launched the Chrome Site Blocker [allowing users to specify sites they wanted blocked from their search results] earlier , and we didn’t use that data in this change. However, we compared and it was 84 percent overlap [between sites downloaded by the Chrome blocker and downgraded by the update]. So that said that we were in the right direction.

Wired.com: But how do you implement that algorithmically?

Cutts: I think you look for signals that recreate that same intuition, that same experience that you have as an engineer and that users have. Whenever we look at the most blocked sites, it did match our intuition and experience, but the key is, you also have your experience of the sorts of sites that are going to be adding value for users versus not adding value for users. And we actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side. And you can really see mathematical reasons ...

Singhal: You can imagine in a hyperspace a bunch of points, some points are red, some points are green, and in others there’s some mixture. Your job is to find a plane which says that most things on this side of the place are red, and most of the things on that side of the plane are the opposite of red.

Wired.com: Do you feel that this update has done what you wanted it to do?

Cutts: I would say so. I got an e-mail from someone who wrote out of the blue and said, “Hey, a couple months ago, I was worried that my daughter had pediatric multiple sclerosis, and the content farms were ranking above government sites,” Now, she said, the government sites are ranking higher. So I just wanted to write and say thank you.

Singhal: It’s really doing what we said it would do.

Cutts: Which isn’t to say we won’t look at feedback.

Wired.com: I spoke to someone yesterday who runs a site called Suite 101. His rankings have tanked, and his keyword traffic is down 94 percent. He says that it’s not fair, since he commissions and curates his own articles and contends the quality is high.

Cutts: Oh, yes. Suite 101, I’ve known about it for years.

Wired.com: So why did this guy take a much bigger hit than Demand Media, which has a reputation as the classic site that wins high rankings for low-quality content?

Cutts: I feel pretty confident about the algorithm on Suite 101.

Singhal: I won’t call out any site by name. However, our classifier that we built this time does a very good job of finding low-quality sites. It was more cautious with mixed-quality sites, because caution is important.

Wired.com: So you would say to this guy, “Sorry, but we’ve figured out what a low-quality site is, and that’s you”?

Cutts: In some sense when people come to Google, that’s exactly what they’re asking for -- our editorial judgment. They’re expressed via algorithms. When someone comes to Google, the only way to be neutral is either to randomize the links or to do it alphabetically. If we don’t have the ability to change how we rank things to try to improve the search engine, that goes right to the crux of everything. [Cutts is referring is the “search-neutrality argument” proposed by Google’s foes, which contends the company should accept oversight to make sure it doesn’t play favorites.]

Wired.com: Some people say you should be transparent, to prove that you aren’t making those algorithms to help your advertisers, something I know that you will deny.

Singhal: I can say categorically that money does not impact our decisions.

Wired.com: But people want the proof.

Cutts: If someone has a specific question about, for example, why a site dropped, I think it’s fair and justifiable and defensible to tell them why that site dropped. But for example, our most recent algorithm does contain signals that can be gamed. If that one were 100 percent transparent, the bad guys would know how to optimize their way back into the rankings.

Singhal: There is absolutely no algorithm out there which, when published, would not be gamed.

Cutts: I have to think, I have to hope, I have to aspire, there’s some algorithm out there that we could publish as open source but couldn’t be gamed. We haven’t found it yet.

Wired.com: Can we talk about the recent New York Times story that revealed unearned high results for J.C. Penney on some common queries? After the article you made some changes to address this. How did you guys miss that for so long?

Cutts: Essentially, that article was saying this team didn’t totally do their job. I think the right analogy is if you’re talking about the size of the solar system -- this little pebble is the Earth, then Pluto is 8 miles away. That kind of thing. A lot of people don’t realize the scale of the web. There’s over a billion searches a day, so that particular article was about a relatively small number of queries.

Wired.com: But some of those queries were pretty generic ...

Cutts: Some of them were generic, like dresses and things like that, absolutely. This was one of the few areas within Google where we were willing to take manual action. We had actually seen J.C. Penney two or three times in the past, and I think our takeaway was, "Look, after three or four times you’ve got to escalate."

Wired.com: So it was already sort of a low-level arms war, and you didn’t bring the bigger guns until now?

Cutts: Think about the main story in 2010 with Google: It was sites like eJustice or Foundem complaining to Europe that they were punished too harshly, right? [Those companies have complained to the EU that their low Google ranking was due to competitive bias.] So it’s a very strange situation where on the one hand we’re hearing people saying Google is being way too harsh, and then more recently, “Oh, Google needs to take stronger action.”

Wired.com: This does seem to be a period where Google is getting more criticism of its search practices and quality.

Cutts: I’m a bit of a connoisseur of Google criticism. If you look at the historical landscape, there’s this meme that goes in waves that says, “Google sucks,” or “Google has bad quality,” But it’s almost like the Seattle windshield-pitting incident, where the newspaper reported that there was more pitting on windshields and suddenly there was a huge spike, because no one had looked at their windshield before, and a couple weeks later everybody was like back to normal. I tend to hear two or three things coming through. What we heard was scrapers were sometimes outranking original sites, and we actually made a change to improve that. We heard complaints about what the outside world called content farms; we had a change that we were working on for months and months that just launched.

Singhal: People expect that we will do a good job, and that’s appropriate. The criticism is a good thing because that mean that they really want us to do an even better job, which we’ll go next week and do exactly that.

Cutts: We’re lucky to have the criticism, because that means people care enough to tell us what they want.

Composite photo: of Amit Singhal (left) and Matt Cutts. (Singhal: singhal.info; Cutts: Jolie O'Dell/Flickr)

See Also: