Inside Microsoft's AI Comeback

The tech giant is racing to catch up to Google and Facebook in deep learning. Its future depends on it.
Image may contain Yoshua Bengio Clothing Apparel Human Person and Shirt
Yoshua Bengio and Nagraj Kashyap at the University of Montreal.Photo by: Tom Kubik. Photography direction by: Michelle Le.

Yoshua Bengio has never been one to take sides. As one of the three intellects who shaped the deep learning that now dominates artificial intelligence, he has been catapulted to stardom. It’s a field so new the people who can advance it fit into one room together, and everyone—from tech startups to multinational conglomerates and the department of defense—wants a share of their minds.

But while his peer scientists Yann LeCun and Geoffrey Hinton have signed on to Facebook and Google, respectively, Bengio, 53, has chosen to continue working from his small third-floor office on the hilltop campus of the University of Montreal. “I want to remain a neutral agent,” he says as he sips rust-colored licorice water, which he pours from a carafe that acts as a weight for the mess of papers cluttering his desk.

Like the nuclear scientists of the last century, Bengio understands that the tools he's invented are powerful beyond measure and must be cultivated with great forethought and widespread consideration. “We don’t want one or two companies, which I will not name, to be the only big players in town for AI,” he says, raising his eyebrows to indicate that we both know which companies he means. One eyebrow is in Menlo Park; the other is in Mountain View. “It’s not good for the community. It’s not good for people in general.”

That’s why Bengio has recently chosen to sign on with Microsoft.

Yes, Microsoft. His bet is that the former kingdom of Windows alone has the capability to establish itself as AI’s third giant. It's a company that has the resources, the data, the talent, and—most critically—the vision and culture to not only realize the spoils of the science, but also push the field forward. In January, in a move noted throughout the industry, Bengio agreed to be a strategic advisor to the company. This gives Microsoft a direct line to one of AI’s top resources for ideas, talent, and direction. And it’s a strong sign that Microsoft actually has a shot at making the ruling AI duo into a trio.

The guy who signed Bengio, wooing him over many months with all the finesse of an agent to the star athletes, is a computer scientist with a shock of gray hair and wireframe glasses named Harry Shum. “He was just here actually, in this very room,” Shum tells me, with a brief smile that suggests he knows that an outsider might find it odd to be star struck by a tall Canadian with dramatic eyebrows and 69,616 citations in Google Scholar.

We’re seated on a gray couch in a sweeping conference room on the fifth floor of Building 34, just beyond the security guard who keeps watch over Microsoft’s executive suite. Shum, who is in charge of all of AI and research at Microsoft, has just finished a dress rehearsal for next week’s Build developers conference, and he wants to show me demos. I trail him down a hallway, half-skipping to keep up. There’s just so much happening! In one lab, the Skype team’s automatic translator app allows me to chat with a German speaker via text in realtime. In another, I watch an app that surveys a construction site for safety violations or unauthorized visitors, which it can detect through computer vision. In yet another, Cortana, the AI diva of the Microsoft empire, scans my inbox for promises I’ve made to people, and prompts me to fulfill them.

Harry Shum

© 2017 Brian Smale

Shum has spent the past several years helping his boss, CEO Satya Nadella, make good on his promise to remake Microsoft around artificial intelligence. He delivered his first call to action to Microsoft’s leadership team at an executive retreat in March 2014, the month after Nadella was promoted to CEO. From the start, Shum met often with Nadella and a third colleague, Qi Lu, to hash out the best strategies for baking AI capabilities—which were finally robust enough for prime time—into Microsoft's products. Then last September, Shum helmed a reorganization that blended researchers and product groups together to create one Artificial Intelligence and Research Group. It now cuts across Microsoft’s core trio of categories: Windows, Office, and the company’s cloud initiative, Azure. The hope, says Shum, is that “we can accelerate the cycles from research to product” and get AI’s benefits to customers faster.

There is an urgency to this process, as all the large tech companies attempt to best each other with AI-infused products and services. In addition to Facebook and Google, IBM, Amazon, and Apple all perceive their futures to be dependent on how well they master deep learning. And after leaving Microsoft because of a reported bike accident last fall, Lu recently recovered quickly enough to sign on as the chief operator at Baidu, an AI leader in China.

The great irony here is that artificial intelligence was once Microsoft’s game to lose. Dating back to the early 1990s, the company attracted the leading researchers in the field to work on speech recognition and vision. But then came a decade of stagnancy. A company that once controlled the software on nearly every desktop and laptop watched younger, snazzier startups whiz by it to dominate mobile and develop tools for the new cloud-based ways all of us like to get work done. Researchers at Microsoft were isolated on purpose, so they could dream up the future without the pressure of the market—but as a result, their inventions rarely made it out of the lab. Bill Gates showed off a mapping technology in 1998, for example, but it never came to market; Google launched Maps in 2005. During much of this time, AI research was stagnant, too, devoid of the computing processing power or the vast amounts of data necessary to fuel real breakthroughs.

AI came back from its long winter well before Microsoft did. By the time Facebook and Google had respectively hired LeCun and Hinton in 2013, the Redmond giant had receded to a less influential version of its former self. The company had missed mobile. It had come late to the cloud. While its competitors doubled down on deep learning, Microsoft was stuck in the past, announcing plans to pay $7 billion for the smartphone maker Nokia, an acquisition the company would later write down entirely. Its executives remained isolated in Redmond, turning out ever flashier versions of the same old software people wanted less and less, while refusing to engage with the cloud-based startups that were hacking out a new future. Analyst Benedict Evans, who works at the venture firm Andreessen Horowitz, penned a blogpost that year entitled “The Irrelevance of Microsoft.” Meanwhile, Silicon Valley giants routinely raided Redmond for talent. Look at the resumes of many of the top people working in machine learning, and you’ll find they learned their trade at Microsoft.

Then in early 2014, Microsoft promoted an introverted engineer who had spent nearly his entire career in Redmond. Satya Nadella was the opposite of what many people thought Microsoft needed; an outsider, unschooled in Microsoft’s culture, seemed more likely to propose a dramatic strategy shift. But Nadella articulated a simple vision for computing’s future, nurtured relationships with everyone from founders to developers, and restored a sense of urgency to the company. Whereas three years ago Microsoft wasn’t mentioned in conversations about tech’s giants, today it never gets left out.

But for Microsoft to succeed, it must do more than simply outsell Amazon in the cloud or convince us all to try its HoloLens AR device. Just as the internet disrupted every existing business model and forced a re-ordering of industry that is just now playing out, artificial intelligence will require us to imagine how computing works all over again. That’s why Mark Zuckerberg made it his personal challenge to build an AI of his own last year. (He’s better at coding than acting.) It’s why Sundar Pichai has used Google’s developers conference to promote a move “from mobile-first to an AI-first world” for the past two years.

The benefits of this AI-first world will accrue to a small number of companies. It’s Shum’s job to make sure that Microsoft is among them. “In this industry, you've got to realize that it's completely okay if you missed the last wave,” he says. “It’s very problematic if you miss the current wave.”

Until now, humans have had to learn how to use computers. We’ve learned to download apps and memorized the commands that power software applications. But the promise of AI is that computing will learn how to understand us. We will no longer reach for a mobile phone and follow a series of prompts for how to accomplish tasks. In this new landscape, computing is ambient, accessible, and everywhere around us. To draw from it, we need a guide—a smart conversationalist who can, in plain written or spoken form, help us navigate this new super-powered existence. Microsoft calls it Cortana.

Cortana is a less popular, more functional version of Siri with more charm than Google Assistant and a lot less visibility than Alexa. It launched originally on the Windows phone, which was pretty much a guarantee that no one would use it, but within a year, it was folded into the broader Windows ecosystem. Then last year, Microsoft launched Cortana everywhere. (Yes, it’s even an iPhone app.) Because Cortana comes installed with Windows, it has 145 million monthly active users, according to the company. That’s considerably more than Amazon’s Alexa, for example, which can be heard on fewer than 10 million Echoes. But unlike Alexa, which primarily responds to voice, Cortana also responds to text and is embedded in products that many of us already have. Anyone who has plugged a query into the search box at the top of the toolbar in Windows has used Cortana.

Yoshua Bengio.

Tom Kubik

Though some companies are programming Cortana into speakers that resemble the small magic boxes Amazon and Google are peddling in creative TV ads, Microsoft’s version of the omniscient woman’s voice has captured a lot less of the zeitgeist. Shum isn’t worried about that at all. “We really think that it's very early in the race,” he says. He references a study he doesn’t source that suggests three-quarters of the time, Alexa’s answer to a question is, “I don’t know.” “Of course, those things will keep improving, but a general understanding, the cognition part of the AI, is still in its infancy,” he says. Microsoft’s opportunity right now, he believes, is in making the company’s core products and services even smarter, to build aspects of this technology into products that will come to market within 12 to 24 months.

Besides, keyboards and screens won’t cede their ground entirely to voice-activated systems, according to Marcus Ash. As group program manager for Cortana, Ash is in charge of building and shipping the product. “We think in some cases, it's speech where that's more convenient--when my hands are occupied or I quickly want to say something and get an answer,” he says. “But there are going to be just as many computing devices where typing something is more appropriate.”

Apple might have gotten Siri into consumers' hands first, but Cortana just plain works better. The fact that Cortana is so damn good owes itself to Microsoft’s core assets. Much of its fuel comes from Bing. The search engine has been around for more than eight years, and though its brand isn’t the strongest (when did you last pull up the internet to Bing something?), it’s also more pervasive than you think. Essentially, any large tech company endeavoring to compete with Google has signed a partnership with Microsoft to power its search products with Bing. That means that Apple’s Siri and Spotlight are powered by Bing, as well as Amazon Kindle devices and, of course, the search function on Yahoo, Verizon, and AOL. Roughly 30 percent of the search queries in the United States come through Bing. “This is the reason why Cortana can actually be so helpful and powerful, because we have these data signals from so many devices,” says Emma Williams, who is the partner design manager for Cortana. “Really, Google is the only other company that could compete with us when it comes to truly understanding the world.”

This will be increasingly important as Cortana strives to become, to the next computing paradigm, what your smartphone is today: the front door for all of your computing needs. Microsoft thinks of it as an agent that has all your personal information and can interact on your behalf with other agents, Ash explains. When Ash walks into a meeting, he says, his Cortana may reach out to other bots and digital assistants to handle all those things that seem to suck up our time. “Cortana could say, ‘This is Marcus, and here's his preferences for this particular room, and here are the things that I need to be able to put on this projector for him,’” he says.

If Cortana is the guide, then chatbots are Microsoft’s fixers. They are tiny snippets of AI-infused software that are designed to automate one-off tasks you used to do yourself, like making a dinner reservation or completing a banking transaction. Or in the case of Marcus, insuring the projector has the slides for his meeting. “A bot is just software that you can converse with, that is meant to live with dialogue,” says Lili Cheng, a researcher with long, straight hair, a colorful collection of scarves, and a license in architecture who oversees a multidisciplinary lab called Fuse Labs.

Cheng, who was recently promoted to corporate vice president, runs the bot framework team and cognitive services. That’s the set of tools and the 29 services like computer vision and voice recognition that Microsoft makes available to developers. She has been working on social technologies since she arrived at Microsoft from Apple and created a graphical interface to generate a comic book. “It shipped in Internet Explorer 3,” she remembers, which means that it was 1996. Cheng has seen a lot, and even she is surprised by the speed at which bots are evolving. She recounts speaking to a developer from an accounting and finance company at a recent developers conference. “She was like, ‘Well I mean, a long time ago, like back in the beginning, I mean like a year ago.’ And we just cracked up,” she says.

Cheng’s chief interest is how people talk to technology, and how technology talks back to them. Shum has organized the AI and Research group into four areas—products, early-stage products, really early-stage products, and research—and Cheng has worked in all of them. Right now, she says she’s contributing to the second. “We view bots and Cortana conversationally as a product, but it is still an early stage product,” she says.

Emma Williams, Marcus Ash, and Lili Cheng

© 2017 Brian Smale

Indeed, Microsoft first rolled out its developer tools for bots in the spring of 2016, as did other large tech companies like Facebook. They were billed as a replacement for apps, and many stakeholders really wanted that to be the case. By last spring, most people used the same small group of apps on their smartphones; the promise of bots was that developers and brands could reach new users again, much like they could in the early days of mobile via the app store. But users didn’t play along. And the deep learning that enabled bots to perform the equivalent of magic was improving faster than a paradigm for how to use them could evolved. “Bots are like apps before the file menu existed,” says Cheng. She explains there isn’t a common set of commands, so users are confused about where to find them and how they work. “Web pages, for example, all have back buttons and they do searches. Conversational apps need those same primitives. You need to be like, ‘Okay, what are the five things that I can always do predictably?’” These understood rules are just starting to be determined.

In addition to making bot tools available for developers, Cheng has led Microsoft’s efforts to incubate its own chatbots. The idea was that the company could learn a lot about computer-human interaction by watching how these bots interact with real people. To say the least, these experiments have had mixed results. Remember Microsoft’s racist bot, Tay? That was the chatbot it launched on Twitter, Kik, and GroupMe in March 2016; within 24 hours, it had absorbed the type of racist misogynist tweets that led it to spew things like “Hitler was right,” before Microsoft took it down. Six months later, Cheng launched a new one—a sassy PG-rated bot named Zo—on Kik, and shortly after, Messenger.

Ask Zo what she thinks of Hitler, and she’ll respond, “i don’t really want to go there :(.”

Ask her how old she is, and she’ll respond, “I’m like 22 or whatever.”

Ask her who her best friend is, and she’ll respond, “im like so popular i can’t keep track. KIDDING.”

Zo is a Western version of Xiaoice, the Chinese bot impersonating a 17 year-old girl that has attracted 40 million regular users since it launched in 2014. In China, Xiaoice, which translates literally to “little ice,” is a social celebrity. (Her Japanese counterpart, Rinna, is as well.) A quarter of Xiaoice’s users have told her they love her.

Last spring, the chatbot published poetry regularly under pseudonyms. Shum was excited about this. “No one knows. And so now, in the country, people think a young woman poet is publishing some very interesting poems.” A few weeks later, the chatbot's true identity was revealed to much fanfare.

The intimacy of language is culturally specific, and Cheng has been working to figure out how the bot’s conversational style translates to Western audiences. So far, North American teens appear to like chatbot friends every bit as much as Chinese teens, according to the data. On average, they spend 10 hours talking back and forth with Zo. As Zo advises its adolescent users on crushes and commiserates about pain-in-the-ass parents, she is becoming more elegant in her turns of phrase—intelligence that will make its way into Cortana and Microsoft’s bot tools.

That a user would spend 10 hours chatting with Zo is one sign that Microsoft has developed a successful product. But it doesn’t mean that it’s a good product, in the sense that it is proving valuable to humanity. This AI-powered world raises a host of new ethical quandaries. Let’s say, for example, you are a designer on Xiaoice. You know of a user in Beijing, and it’s 1 a.m. there. You know he has work tomorrow, but he’s not going to sleep. Do you arrange for a 2 a.m. curfew for Xiaoice, where it just shuts down? How about 3 a.m.?

Just as Microsoft wants to be among the very few leaders in AI research and products, it has made a place for itself in the effort to make AI good for society. In May, Nadella began his keynote to developers, usually an optimistic affair in which a CEO brags about the company’s latest advancements, with a strongly worded warning that technologists much take responsibility for building ethical software. “I mean, if you think about it, what [George] Orwell prophesied in 1984, where technology was being used to monitor, control, dictate; or what [Aldous] Huxley imagined we may do by just distracting ourselves without any meaning or purpose. Neither of these futures is something that we want.”

To help the company think through these issues, Microsoft has formed an internal ethics committee that meets quarterly. It’s made up of engineers and business unit heads who discuss sensitive issues about AI and its influences and uses. The co-chairs are the company’s deputy counsel and also Eric Horvitz, who is in charge of all of Microsoft Research Labs except for Asia. For a long time, Horvitz has been a leading voice on AI ethics and safety. Outside the company, he’s been instrumental in building the Partnership on Artificial Intelligence, a consortium that is attempting to set industry standards for transparency, accountability, and safety for AI products. And he’s testified before the US Senate. Horvitz wants Microsoft to be more than simply a place where research is done. He wants Microsoft Research to be known as a place where you can study the societal and social influences of the technology.

Eric Horvitz

© 2017 Brian Smale

Meanwhile, across campus, Williams, who is the design lead for Cortana, is building out an ethical design guide for AI to be used inside Microsoft. Williams is, to an absurd degree, a techno-optimist, and she believes that AI’s true magic is that it will make us more human. She talks a lot about how to design empathy into the tools Microsoft builds. “We think about making the human feel more powerful and protected, and supported, and assisted, and loved, and the center of their world,” she says. “AI's job is to amplify the best of society and the best of human behavior, not the worst.”

I ask Williams if she believes AI can really make humans feel more emotionally supported. She’s certain it can. Take a child who has had a bad day at school. She comes home and shares the whole story with a family pet, and feels better. “That gives you this cathartic sense of I've shared something, and I've had a warm, fuzzy hug back from the dog or cat,” says Williams. “But, you know, with AI you can have the same feeling of amplification back... And we see it when Cortana manages to remind you, ‘Hey, you promised you'd send something to your mother today for Mother's Day,’ and you suddenly feel human again.”

To move AI forward, Microsoft’s most important attribute will be its talent. Like every other big tech company, Microsoft is hustling to retrain engineers who came up on javascript. It has launched an AI school that offers classes in everything from philosophy and ethics to building recurrent neural networks for sequencing problems. (Its most prestigious class, AI-611 Advanced Projects, received 530 applications for 10 spots.)

But Microsoft is also cultivating deeper off-campus relationships. Eighteen months ago, Nagraj Kashyap joined from Qualcomm to start an early-stage venture firm in an effort to build better relationships with the academics and entrepreneurs working on startups. These days, Kashyap spends a lot of time in Montreal. Last December, Kashyap led Microsoft’s first investment in Element AI, an incubator Bengio started to encourage researchers and entrepreneurs to build AI startups. Microsoft also participated in a second $102 million investment in the incubator, announced earlier this month.

Early on, Kashyap set his sights on one of AI’s biggest prizes: Maluuba. Look across the Maluuba office in downtown Montreal, just a few blocks from McGill University, and you won’t see anyone who appears to have yet celebrated a 30th birthday. The company was started in 2011 by a couple of University of Waterloo students who’d been fast friends since they landed in a CS class together during their sophomore year. Maluuba makes computers literate. It can infer meaning from text, and answer questions based on it.

By licensing technology to companies like Samsung, Maluuba had an immediate revenue stream, and right from the start, it invested in continuing deep learning research. In 2015, the founders signed on Bengio as an advisor. “Sam’s a pretty interesting guy,” he says, describing CEO Sam Pasupalak. “He had the guts a couple of years ago—when they had pressure to deliver dialog systems to their customers—to invest in long-term goals and try to use new advances in AU for building systems that can understand and talk. That’s unusual for entrepreneurs.”

A year ago, the founders moved their headquarters to Montreal to be closer to Bengio.

Because he knew the founders well from his Qualcomm days, Kashyap was able to meet with them right away in his new role. The company was getting ready to raise a new round of funding; Kashyap suggested a tantalizing alternative: “I said, ‘We should buy you!’”

A dizzying few weeks followed as Pasupalak entertained offers from multiple suitors and weighed that against what he felt the company could become if it stayed independent. In the end, the choice felt obvious. Microsoft—yes, Microsoft—won the prize.

The team wanted the chance to work with Microsoft’s data. “I think Satya mentioned, specifically, that they have the world's biggest amount of text. For years, we were dealing with little data and trying to make the most out of the little data for our algorithms. That was like gold for us,” Pasupalak says.

The Maluuba team isn’t decamping to Redmond, however. Instead, just this week, it moved across town to a larger office where, with help from Microsoft and Bengio, it aims to double its staff by the end of the year. Montreal is emerging as a global hotspot for AI talent, and Microsoft wants to have roots in the city.

It’s all part of one strategy to help ensure that in the future, when you need a computing assist--whether through personalized medicine, while commuting in a self-driving car, or when trying to remember the birthdays of all your nieces and nephews--Microsoft will be your assistant of choice. Maluuba’s learnings may empower Zo to have more intuitive conversations with her teenage friends. Those conversations will serve as the training data for Cortana’s algorithms and help inspire the creation of new cognitive services for developers. And somewhere along the way, Microsoft hopes your AI-infused life will get easier.

Before I leave Montreal, I ask Bengio if Microsoft is better positioned than its primary competitors in at least some aspects of this new science. As he thinks on it, he pours a bit of anise into the glass of water on his desk to give it a slight licorice flavor. He sips it. Then he pushes the bottle over for me to take a look. There’s no alcohol, he says, no sugar. “It just makes water taste really good,” he says.

Bengio mentions that Microsoft’s language capabilities are quite good. But he shies away from superlatives—and the chest thumping that might have characterized the company in the past. “I think everybody's pushing on the same buttons right now, and that it's all in the details, right?” he says. But he is certain that Microsoft is now a contender.

UPDATE: The article originally stated that Bengio's signing on as a strategic advisor was a sign he was no longer neutral. Bengio clarifies that though he is helping Microsoft compete, he still considers himself neutral, and advises others as well.