The One Who Co-created Siri

Publié dans Coder stories

07 déc. 2020


The One Who Co-created Siri
Anne-Laure Civeyrac

Tech Editor @ WTTJ

Co-creator of the voice assistant Siri and currently Chief Technology Officer and Senior Vice President of Samsung Strategy and Innovation Center, Luc Julia explains why human-computer interaction first piqued his interest, discusses how the back end for Siri was built at Apple, and shares his vision of the future of AI.

Playing with personal computers

So, when I was a kid, I started to play with electronics first, and then computers came when I was about 15. So it started then. I began to play with computers—the very first computers at the time, the personal computers—and, you know, I started and did not stop. At the time, only BASIC was available, but I was programming in assembly language.

Human-computer interaction

What was interesting was discovering whether computers could talk to humans and humans could talk to computers. And so I did what was called HCI [human-computer interaction] at the time, and I had to do some signal processing, because when you want computers to talk to you, you need to have the computer grasping, grabbing the signals, understanding the signals, and so that was the best way to do that. And speech recognition and gesture recognition and everything recognition was my specialty. And then, of course, it became AI, because when you start to understand the signals, this is AI. So I did my PhD here in France, but I did my postdoc at MIT and Stanford. Some people at the lab decided that it was time to create a company, because the technology seemed good enough. So SRI [Stanford Research Institute] is a spin-off factory! I was working on what is called multimodality, which is basically trying to put all the modalities together in order to create messages for the computer, but speech was kind of the easiest one because it was easier at the time to have a microphone in a computer than a camera. But that doesn’t mean it was easy to recognize speech, because speech recognition, even today, is really tricky—not the recognition of the words, but the recognition of the meaning, which is called natural language. Anything not verbalized, basically, is very difficult to grasp.

The Internet as a playground

At SRI we were doing some applied research, much more applied than anybody else in the group. So we launched CHIC [the Computer Human Interaction Center]—an applied lab where we created a bunch of products, and in 2000, I mean 1999, I decided that it was good enough and we could now enter the real world to create start-ups doing pretty much the same thing, but testing the products with real people. The nice thing about 1999 and 2000 is that, again, it was the early days of the Internet and it was easier to test products with real people because you only had to put it on the Internet for people to play with them.

The creation of Siri

Back in ’97, it was the beginning of the Internet, really—the public Internet, right? So it was starting to grow, and there was no Google at the time, there was no way, really, to find something on the Internet. And we thought it was a good idea to see if you could find something using speech. That was the idea of the Assistant—it was assisting us in finding the right information. It took a long, long time—the product wasn’t ready until 2007, which is when Adam Cheyer created his company Siri. I wasn’t part of it because I was involved with some other companies, because in 2000, I decided to create other start-ups. But so, in 2007, they created a product around this assistant. The iPhone also launched in 2007, so it was a natural, basically, to use speech with it, because there was a microphone in this phone, and so Steve Jobs had the vision to say that it was something that was going to be very interesting—to have an assistant in the phone—and Apple decided to buy Siri in 2010, though it only became available to the public in 2011.

Building a back end for Siri

Cheyer actually asked me to come back, because I mean, it was our baby that we were going to develop, right? So Siri, the company, had about 150,000 users—150,000 users! I mean, you can basically do that in your home with a server, so that’s fine. Working with Apple we knew that we were going to have 300 million users—a bit more difficult than 150,000. So the challenge was to be sure that the system was going to be scalable and we were going to be able to deliver the performance we needed, because when you have a speech recognizer, you also need something that is always going to give you a good response time. I had the chance to be in charge of everything on the Siri side—everything meaning, of course, we only see the part that is the interface, the voice interface, but there are a lot of servers and there’s a lot of back end. It’s a very complex system with a huge, huge back end, right? And so one thing that I didn’t have at the time was a back end, even though they had iTunes and stuff like that. I mean, not the same kind of back end, anyway. iTunes is basically “just” storage and we needed a lot of computing power in order to run Siri as well. So we had to build actually the full back end from scratch. So we knew the architecture, in a way, and we built an architecture that we knew would work, but we had to build it. And physically building it—buying the servers, buying the actual data centers themselves—meant that there were challenges everywhere. What wasn’t that great is having to explain to a big company—and especially a company that can get cold feet, which Apple does—that you want to and you need to add some new services to the product you are developing. When you look at Siri, the company, it had about 17 services. And when you look at Siri at Apple, it had about 5 services, and of course, we wanted to add more and more and more. But adding a service means adding a partner, and those large companies, sometimes they don’t like to work with partners, they prefer to do things themselves. We saw that with Maps, for instance—they wanted to do Maps themselves, and we saw that it was a disaster.

Introducing Agile methodology at HP

The difference between a start-up and a large company is the means you have at your disposal. And so, when I went to HP [Hewlett-Packard], the means were huge. On the very first day, they gave me 250 people, 250 engineers, which is about 10 times what I used to have. So managing that is different, and again, this was in 2010, right? So 2010, 10 years ago, a large company is what we call today the Agile methodology wasn’t there. It was using the waterfall methodology, meaning you plan for six months in advance—exactly what you don’t want to happen in modern software, right? So one of the things that I introduced there was Agile, and so having 250 people who were not used to it was a challenge, but it was a very interesting challenge. And we also built a new product that wasn’t like anything HP was doing. HP is very good at printers and stuff, but it didn’t have any idea of what a connected printer could be. So we created what is now called ePrint—the first connected printers—and we actually sold something like 80 million in the first year.

Opening Samsung’s AI lab in Paris

So there is something called SmartThings today, which Samsung bought a few years ago, and it also has a back end. The back end is basically something that allows for there to be a cloud—a cloud that is an IoT cloud—and the IoT cloud allows you to connect Samsung devices as well as some non-Samsung devices. So this is a very heterogeneous environment, so you need to be sure that all the devices can communicate with each other. And we worked on that for 5 years. Also, I wanted to open something here in Paris, and so we opened Samsung’s AI lab in June 2018. I really believe that French engineers are very good. And so I really wanted to take advantage of the new era that France entered back in 2012, 2013, when entrepreneurship became cool again. And so, because there are so many start-ups in France, in Paris in particular now, I wanted to take advantage of that, because when I was a student, after school, engineers were all just going directly to work for large companies and, you know, working for them forever. I heard recently that 60% of students at those engineering schools want to create or go to work for start-ups now. So there’s a more dynamic environment, and it also means that those people are going to see something—they are going to be, I wouldn’t say more capable, but they are going to be exposed to a lot more things. So they are going to be more flexible in terms of the projects we propose, so that’s why I like French engineers.

The ecological cost of AI

AI is too easy, in the sense that you can just throw a lot of data at a bunch of CPUs and GPUs, or whatever PUs, and so it’s easy and you can get some results. You can try, you can fail, and you try again, and you actually end up using a lot of resources, both in terms of storage, because of the data you’re using, and in computing power because of the different trials that you’re doing. So I’m noticing that we’re not really paying attention to that, because those CPUs and those resources, they’re out of sight in data centers, you don’t feel them yourself. What I mean is sometimes, when you have your own computers next to you, you realize that they are actually working, and they create kind of a warm environment, so you’re aware of them. But when you’re using AWS or Microsoft Azure, or whatever, you don’t notice that. So it’s too easy, and you don’t realize how much energy you’re actually using to do what you are doing, and so at some point, you need—we need—to wake up to that. That’s why I’m always trying to explain that we need to be very careful, and I hope people will become more careful. And I think that because, frankly, we are being increasingly encouraged to use those resources, at some point, regulations will need to be implemented.

The future of AI

The very first assistant we are talking about, back in the ’90s, was a speech assistant—but now we’re going to get an everything assistant. It’s going to be a bunch of assistants, because there isn’t going to be a generic, general AI. I don’t believe that will happen. We’re just using math the way we are using it today, but there will be a bunch of assistants in every domain, right? So we are going to see assistants in the medical domain, we are going to see assistants in the transportation domain. We are going to see AIs everywhere. So, the way it’s going to evolve… I mean, there are some obvious things, like in medicine—of course, you know that DNA, for instance, is something that is statistically very, very interesting. So we are going to see DNA being decoded so it can be used to help to fix things through machine learning, through deep learning, because it’s a lot of data. It’s a lot of interesting statistical data. So of course all the domains will see more and more of these assistants. Some people might be thinking about something else—we are thinking about doing physics with quantum computing. We are thinking about doing biology with bio computing. So the reality is that those things are still in the very early stages, right? So quantum computing, maybe a little bit less early, but it’s still very, very young. Biology, we are nowhere. If we are to believe that, one day, we are going to create something that is going to be close to our own intelligence, that is going to be this general intelligence that some people are talking about, it won’t be achieved through math. I don’t think it will be with physics, because the quantum is actually about applying math in another dimension. It might be biology, because biology is actually the closest thing to our brain, but we don’t know. We don’t even understand our own brain, so building one is going to take a very long time.

This article is part of Behind the Code, the media for developers, by developers. Discover more articles and videos by visiting Behind the Code!

Want to contribute? Get published!

Follow us on Twitter to stay tuned!

Illustration by WTTJ

Les thématiques abordées