Hey Siri, who made you?

Siri Inc’s Adam Cheyer

As agentic AI continues to rise, many wonder when Apple will give Siri an AI overhaul. Siri is perhaps the most ubiquitous and well-known voice assistant, the one that got users comfortable talking to our phones. Without Siri, who knows what kind of interface tools like ChatGPT would have? On this Pioneers of AI episode, Rana el Kaliouby talks to Siri co-founder Adam Cheyer about why the platform is as relevant as ever – and how it stacks up against current AI agents. Plus, more on Cheyer’s dynamic entrepreneurial journey (from machine learning to change.org), a bold prediction on the next big technological innovation, and his life hack to manifest change. Then, a conversation with the human who gave Siri her signature sound, voice actor Susan Bennett.

Hey Siri, who made you? — with Siri Inc’s Adam Cheyer and Pioneers of AI’s Rana el Kaliouby

0:00 / 0:00

About Adam

Co-founded Siri; sold to Apple and led Siri AI & server-side engineering
Started 5 successful companies, incl. Viv, Sentient, GamePlanner.ai, Change.org
Founding member of Change.org; petition platform with 570M+ members
60 publications and 50 patents in AI and related technologies
Award-winning magician; performed for presidents and on Penn & Teller

How a 1993 vision laid the groundwork for Siri
Why today’s AI still falls short of true assistants
What memory and personalization really need to be useful
Where empathy fits into the future of AI interactions
How Siri’s voice was built before modern voice AI
A contrarian blueprint for building and exiting startups
How to spot the next major technology paradigm shift
Using verbally stated goals to create meaningful life changes
What Siri’s human voice reveals about creative labor in AI
Episode Takeaways

Transcript:

Hey Siri, who made you?

ADAM CHEYER: I love magic as just a creative, inventive, scientific discipline. I’ve been a magician from age 10 to 14. I did nothing for 35 years. But then my son became a teen and it was something that we could do together.

RANA EL KALIOUBY: If you’re like Adam Cheyer, and lucky enough to find a way to bond with your teen – you hold onto it. But what started as a fun hobby, grew into something a lot bigger.

CHEYER: We’d perform at fundraiser parties, at birthday parties, now I can say I’ve performed on all the major TV shows.

EL KALIOUBY: Shows like Penn and Teller’s Fool Us.

Adam moonlights as a magician, but what he’s really known for is no less magical.

He’s one of the masterminds behind Siri. But since creating the now ubiquitous voice assistant, he’s gone on to found other companies, like Change dot org and GamePlanner.

CHEYER: I see huge parallels between entrepreneurship and magic, because if you think about it, when you’re an entrepreneur, you have to have a vision. An impossible future. You’re not gonna start a new company to create something that exists already. You have to reach far and imagine what if this could become true, and then you work backwards and you figure out the math and science to make it a reality.

I have known Adam for over a decade, and I find all of my conversations with him to be truly inspiring. And this conversation was no different.

Adam reflects on his entrepreneurial journey – and reveals his bold prediction on the next big innovation that will reshape the world. But my favorite part of this interview is how he manifests change in his life … and how you can, too.

And definitely stick around until the end of this episode, because you might just hear from Siri herself.

I’m Rana el Kaliouby and this is Pioneers of AI, a podcast taking you behind-the-scenes of the AI revolution.

[THEME MUSIC]

EL KALIOUBY: Hi, Adam. Welcome to Pioneers of AI. I am so excited for our conversation.

CHEYER: Thank you. Me too.

EL KALIOUBY: So we go way back. I think we first met almost like a decade and a half ago in Palo Alto, ’cause we shared common investors. And I have always been a big fan of you and followed your journey for many years. So yeah, I’m just excited that we get to do this.

CHEYER: Absolutely. It’s a small world in the tech world and there’s not that many people who’ve been working on AI for 15 years, not just the last five. So I think we are some of the early ones.

Copy LinkHow a 1993 vision laid the groundwork for Siri

EL KALIOUBY: Yes, we’re the OGs. So walk us quickly through that history and how that all culminated in you starting Siri.

CHEYER: Yeah. So I guess 33 years ago — everyone had a PC on their desk and you would load it up with software from CDs and floppy discs, and you’d put the software on your computer and that’s what you’d use to compute. There was no web yet in 1993, no web browser.

I said, someday there will be content and services spread around computers all around the world. Different providers who have content and services to offer, and we’ll need a way to discover them and to interact with them. And I never imagined a hyperlinked multimedia document framework as the way to do that.

I thought everyone would have an assistant and you could just say, I wanna know this, or I want to do that, or some combination of the two. And the assistant’s job would be to understand you, know where everything was, all the content and services, and it would orchestrate and automate. Solving that task for you by routing the information to the right servers, gathering it back, presenting to the user, learning from the interactions and helping the user get the job done.

So I built it in 1993 with collaborators, David, Martin, Luke, Julia, and it was amazing.

And that’s something I still can’t see today. We’re beginning to get it with agentic, but starting to orchestrate across services that do not know about each other in advance. So that was the original idea. And I kind of sat on that idea for 15 years, until the iPhone came out.

I said, this is gonna be huge.

Two years from now, Apple will have flipped the game. Every handset manufacturer in Telco will be desperate to compete or collaborate with the iPhone. This Siri idea, I’ve been plinking along with working on the side for 15 years. That could be just the thing to one up the iPhone in a sense, to uplevel the iPhone or whatever competitive phone it was built on.

Copy LinkWhy today’s AI still falls short of true assistants

EL KALIOUBY: Yeah, so Siri is arguably the OG conversational interface. But I would say it doesn’t have agentic capabilities. So first, do you agree with that statement? And second, to your point, we still don’t have that incredible chief of staff that you can ask to do things and delegate stuff to. Why is it such a complex undertaking? And maybe also for our listeners who perhaps don’t have that distinction clear in their minds — what is the difference between an assistant like Siri and an agentic assistant?

CHEYER: Oh boy. So I’ve got strong views and maybe contrarian views to the entire industry. So this’ll be interesting.

So yes, I’m very proud that Siri was the OG conversational assistant. It was the first voice assistant. Many came after — there was Google Now, then Google Assistant, Microsoft came out with Cortana.

Alexa came three years later. So it was the first. I would say if I have one superpower, it’s timing things to market, because not only did I help create Siri, the first voice assistant — I was a founding member and first developer of Change.org, which today has 570 million members.

I was a founder of Sentient, which was the first large scale compute platform for machine learning. We had more than 2 million CPUs and GPUs back around 2009, more than all of Amazon AWS, more than all of Microsoft Cloud, as a startup. And later on, Viv Labs, I think, was the first AI that could code on the fly for a user request.

You mentioned agentic and you said well, Siri wasn’t agentic, but today we have agentic systems. I would say somewhat differently — that Siri and Viv Labs were more agentic than any system we have today, and we can get into that. They were early and they didn’t do everything you’d want.

But they did many things that no system does today and that we need today. First of all, I say that AI today is missing three important things. The first is the right user experience. Most AI systems come up with primarily a chat interface.

And maybe they put in a little carousel at the bottom that scrolls off the page.

I can tell you there’s a reason that most websites, most apps are not chat interfaces. They’re graphical interfaces. And if you think about most tasks, like take planning your trip to a holiday, all-inclusive vacation.

Chat can help a little bit, but travel is complicated.

You wanna look at the photos, it’s emotional.

So there are so many things missing in today’s AI interface of just request-response, conversational chat.

No real rich GUI, no rich way to deliberate, no way to collaborate. Things are missing. So the interface is not right.

I would argue that Siri and Viv, which eventually sold to Samsung and deployed on half a billion devices of all sorts from wristwatches, TVs, phones, refrigerators, air purifiers, speakers — it had a better interface because it would seamlessly combine language and GUI.

So that’s the first thing that does not exist today, that’s missing — the right interface. The second thing that’s missing is the original Siri vision. I want a Siri that can know and do — I wanna know this, or I want to do that.

LLMs are incredible at knowing things. They know everything, it seems. But they can’t do much at all.

EL KALIOUBY: True. Yeah.

CHEYER: And so people say, oh, ChatGPT is so much better than that OG Siri. I’m like, oh, really?

I said, with Siri, I say tell my wife I’m gonna be late. And it sends her a message. Does ChatGPT do that? Oh no, it doesn’t send her a message.

EL KALIOUBY: Doesn’t.

CHEYER: Okay. With Siri, I can say play this playlist. Does ChatGPT? Oh no, it doesn’t play the playlist, but it’ll tell you how you can go to Spotify and play the playlist.

It knows things, but it can’t do things. And the architectures for knowing and doing are very different. Knowing is about cached knowledge primarily. Think about Google search or ChatGPT — it’s trained on knowledge up to a point.

There’s an expiration date on the knowledge in a large language model, right?

And then it can call a few tools. Maybe it calls to web. But doing — it’s not about cached knowledge, search and generation, personalization. That’s knowing. It’s about authentication. It’s about transactions. Back in Siri days, 2010 when we launched, before it came out on Apple, I could say book a four-star restaurant in Palo Alto for three people, 7:00 PM. Oh, make it 8:00 PM. So it was conversational.

We weren’t the knowledge engine that an LLM is. We didn’t know a lot of things, but we could do a lot, and much better than any LLM today. And for me, the third thing that’s required — that’s missing — okay, we have agentic that can crawl a webpage a little and automate it, but I hate that whole approach.

We can go into that more, but the problem is there’s no business model. There’s no ecosystem model that works. Now, if I’m a company, say I’m a travel provider — do I want some unknown agentic agent to scrape my site, automate my site, cut me out basically of the whole process, so the user will never come see me? That’s not gonna work. And from the user point of view, they’re not happy that I could just say, get me a ride and it’s crawling this site or that site.

I wanna be able to control my preferences. I like Lyft and not Uber. I like Uber and not Lyft, et cetera. And so the model for how this works — if I’m a content provider, how do I make money and brand? You’re cutting me out of the advertising.

So to me, in order to solve these things, it’s not just a technical thing.

You need to solve three things together: the right interface, a platform that can actually do transactions, live APIs so I can know if the ticket exists — all of that at a scalable way. You need a business model that supports both the needs of the provider and of the user.

And it all needs to come together. ’cause that provider needs to be able to differentiate on the user interface, log in through the transactions — you have to solve it together. And I don’t see anyone doing that. And the original Siri did all of that. Viv Labs did all of that. And so I would argue that those two OG systems are more agentic than anything we have today.

Copy LinkWhat memory and personalization really need to be useful

EL KALIOUBY: Two other parameters to your three things. One is this idea of memory as a moat. For these AI systems to work, they really have to have a concept, not just of the short-term contextual information — for example, it just booked a restaurant so it knows that you’re referencing that booking — but maybe a longer-term horizon of your preferences. It understands your relationships and your networks and travel preferences and all of that. Did Siri have that, and then again, what would it take to implement a memory system that is actually sophisticated? Not just a dump of everything that it’s ever conversed with you about.

CHEYER: So, memory is a tricky, big topic, but I’m gonna cleave down the middle into knowing and doing.

EL KALIOUBY: That is a theme. So today, ChatGPT will start to remember things — like it remembers that I like buttery, oaky chardonnays.

CHEYER: In fact, the other day I was asking a question — I came across some food and I didn’t know this type.

I said, what is this? Tell me about it. And it answered me and said, it pairs really well with the buttery, oaky chardonnays that you like. I’m like.

EL KALIOUBY: Cool.

CHEYER: Personalized. It made me feel good. It gave me useful information that was just specific to me. So I do think there’s a lot that can be done with personalization and memory on the knowing side. Knowing is about cached data and being able to search and generate based on that, and to personalize. I put personalization, a lot of it, there. However, there’s a different form of memory for transactions for doing. And I gave the example — if I’ve booked that reservation through the assistant, it should know about it and I should be able to say, modify my reservation to 8:00 PM, and it should know what I mean, right?

It should know my credentials. For all of it — for memory and personalization — you need to offer transparency and control.

EL KALIOUBY: That would be awesome.

CHEYER: It was awesome. It existed in 2010.

Copy LinkWhere empathy fits into the future of AI interactions

EL KALIOUBY: What about empathy? Because as you know, I’ve spent 20 years of my life building emotional intelligence into machines. What’s your view on that?

CHEYER: So back in the old days for doing, everything was really about live APIs, structured data. And we were trying to get this fuzzy natural language and maybe more, right? It would be nice if you knew your emotional state, which affects when you’re interacting.

Could give you input that not only am I asking this, I am perturbed or I’m happy, or whatever.

EL KALIOUBY: Or frustrated or in a rush or.

CHEYER: But largely the doing side was about structured data. And there, APIs are pretty rigid — you kind of have to work with them even if you have AI that can code to those APIs, which Viv did. There’s not that much room in those technology stacks for empathy. It was hard back then. Now LLMs come along — they have learned how people communicate.

They’ve learned subtleties not only about what we communicate about, but how we do it. It’s amazing, right?

There are studies that show that in many cases, LLMs are more empathetic than humans. In one example, they took people talking to their doctor through a messaging system, and compared the responses from the doctors to what if the LLM wrote the response, and then they scored it not only on accuracy, but on empathy.

EL KALIOUBY: Mm-hmm. LLMs are great at saying, well, I really feel your pain. Does it really feel its pain? No. Has it ever felt pain? No. But it knows how to be empathetic, which means to respond in a way that’s nuanced and takes into account many subtle cues about that interaction and accounts for them and tailors responses appropriately to that situation.

CHEYER: So LLMs, I think, do have the capacity — for the first time, we never had this back in the old days — to create simulations of empathy. We need more ways to give more signal into these systems and to have them actually respond even better.

EL KALIOUBY: Yeah, absolutely. To be able to see the facial expressions or listen to the nuances in the voice. Okay. Obviously Siri is powered and built on machine learning and natural language processing and all of that, but the voice itself is a human voice, so I wanna talk about that for a second.

Copy LinkHow Siri’s voice was built before modern voice AI

So Siri is of course powered by AI and machine learning. But behind the voice of Siri is an actual human voice. It’s the voice of Susan Bennett — she’s a voice actor. Can you walk us through the process of training, like what it takes to train a voice model like Siri?

CHEYER: We’re talking really about what I call text-to-speech or voice generation. I’ve been through many chapters of this. In the early nineties, there was text-to-speech, but it sounded like a machine, and it literally was a machine. You get the Stephen Hawking-like sound — no human was involved in the making of Stephen Hawking’s voice.

But later on, probably mid-nineties, something called concatenative speech emerged. The idea was to take real snippets of human voice and string them together and then try to smooth them out into prosody and natural-sounding voice. So you would get voice actors who had to read some very strange wording.

And the goal was to get snippets of sound of someone saying “ah,” someone saying “b,” someone saying “cut,” but in different ways — a “but” might sound different when followed by certain types of words versus other types of words. So you create these nonsensical phrases that a voice actor would spend hours on.

It was not easy work, saying some words that didn’t make sense. And the goal was to capture all the sounds as possible. And then they would usually record common phrases that an assistant would use — if you’re doing a navigation system, you kind of want it to say “turn left” a lot, right?

So you want them to actually say, “turn left,” and you’ll have a snippet that’ll be a perfect match. So the process of generating Siri’s text-to-speech voice was concatenative. If I have some words I want to make sounds for, take the longest span that fits. If I wanna say “turn left” and I have someone recorded saying “turn left,” just use that.

But maybe if I want “turn left,” but I only had them say, “turn” — I’ll use “turn” and then I’ll form “left” by L, F, T, and I’ll take those sounds from wherever I can find them. So that’s how text-to-speech used to work. That was what I call the second phase of text-to-speech. Today it’s still human-based, but these voice-to-voice models are neural nets that are trained differently, where now you can capture a recording and it will do this in context, not only of the sounds, but of the meaning and intent and emotion of the words. And it learns all of that. ‘Cause if I say “look out,” for instance — in the old days, you would take the word “look” and the word “out.” The latest models know that when I say “look out!” that phrase is gonna be expressed differently than, say, “hello.” And so the process has changed yet again, from the Siri days of text-to-speech to what I would call version three, where now there’s so much data on the web of how people speak different phrases and words that you can produce voice-to-voice models that learn all of it — not only how you sound, but how to phrase it, how to emote appropriately for that kind of expression, et cetera.

EL KALIOUBY: We’ll hear from the voice actor behind Siri – Susan Bennett – at the end of this episode. But first … Adam reveals his blueprint for founding companies. Plus he shares his not-so-secret sauce to success. That’s after a short break, stay with us.

[AD BREAK]

Copy LinkA contrarian blueprint for building and exiting startups

EL KALIOUBY: Okay. You are a serial entrepreneur and you’ve done this many times, all very successfully. And I believe you have a blueprint for how you start and exit companies, which is again, a little contrarian to the meme that’s out there. So I would love for you to share your blueprint.

CHEYER: Yeah, look, there are many ways to be successful in companies, many ways to be successful in technologies. I’ve done it a few times now. So I’ve been associated with five companies — three in a major way and two in a minor way. So I have a weird pattern, but it’s worked for me because all five of those have been successful at some level, right? Change.org is more than half a billion members. Siri was on 2 billion devices. Viv was on half a billion devices. Sentient reached a big valuation, et cetera. GamePlanner AI, my last company, we exited to Airbnb, so it had some success. So here are things that I do that I see entrepreneurs doing wrong.

First of all, valuations. I think of the game of entrepreneurship as steps, right? I use the metaphor of a merry-go-round that spins twice as fast every time it goes around. At some point you’re holding on, at some point you’re gonna be flung off. And the game, like a merry-go-round, is to grab the brass ring.

So what does this mean? The brass ring means exit. So either you IPO or you get acquired, right? You have a positive, good exit, or profitability — you can become profitable where you control your own destiny.

So when you raise money at a valuation, say you’re worth $50 million, you raise 10 million at $50 million — well that’s great. That buys you time. And the goal is, you have to be worth double. You have to be worth a hundred million.

But the next round you’re gonna have to be worth 200, then 400, then a billion. It’s an exponential curve.

If you’ve noticed, three of my companies, Siri, Viv, and GamePlanner all exited at around 200 million.

And you’re like, well, it’s only 200 million. It’s not a unicorn. It’s not a billion. Yeah, exactly. That’s the whole point.

I’ve had success every time. For a big company, 200 million is reasonable. Like one VP, senior VP can say, I like this. We should get it. If it’s a billion, it’s a much bigger discussion, right?

One person who doesn’t like it can say no. So being more modest with your valuations increases the likelihood that you’ll exit.

And the second thing is I’ve purposefully not wanted to make money on my startups in those early raises. Revenue puts you in a valuation box — okay, we have a million dollars of revenue. What are you worth? 10x revenue, 7x revenue.

EL KALIOUBY: It’s not gonna be a hundred x revenue, right?

CHEYER: It’s not gonna be a hundred x revenue typically. Because we said the valuation requirements are to go exponential. If your valuation is being judged by revenue, and once you have revenue it’s clear what you’re worth in some capacity.

Getting exponential revenue is hard. If you’re a tech platform kind of company, revenue puts you in a box and now you have to support users. So if we had sold Siri — it was a free app in the app store — if we had sold it for a dollar, we would’ve made a million or a couple of million dollars.

But it’s not a lot. And then Apple would’ve had to keep it around ’cause someone just spent a dollar on Siri. You can’t just discontinue it, right? Free app — they’re like, all right, we’re shutting down the old Siri, we’re coming out with the new Siri. Easy. Revenue can be a constraint if you’re gonna go an acquisition route.

Copy LinkHow to spot the next major technology paradigm shift

EL KALIOUBY: Okay. I have a few more questions for you. You’ve talked about timing the market and predicting what’s coming next. I would love for you to share your 10-plus view of paradigm shifts and perhaps give us your views on what the next paradigm shift is gonna be.

CHEYER: So my 10-plus theory, or conjecture — I’ll call it Cheyer’s conjecture — 1984 was the personal computing revolution. We had windows and a mouse for the first time, with the Macintosh and Apple, et cetera.

If you go 1984, 10-plus-one years later is 1995. The year Netscape and Internet Explorer came out. Mosaic was ’94, but that was the birth of the internet.

10-plus-two years later, you can see where I’m going. 10-plus-one, 10-plus-two — 2007, the iPhone, the birth of the true mobile revolution. It’s not that things didn’t exist before, but that’s when it kind of went mainstream. The following year, 2008, was the App Store. So I kind of put it right on that border. So I had been saying for more than a decade that in 10-plus-three years after 2008 — so I picked 2021 — there will be a new paradigm for interaction, a conversational assistant. And if you split GPT-3, which was June 2020, and GPT-3.5 or ChatGPT, which was November 2022, pretty spot on.

So what’s next? It’s 10-plus-four.

EL KALIOUBY: Four. Yep.

CHEYER: From 2021 — 2035. And this is the time we can pause and I want all the viewers to think about what they think is coming, what’s the prediction? I’ll give you my prediction — and it could be wrong. I think 2035 will be the year augmented reality goes mainstream. Not virtual — augmented, meaning we will all have either a pair of glasses or contacts that augment our visual field. And I say the same way that Siri awakened the desire and the possibility of a voice interface, but it wasn’t ready to do everything mainstream.

Things like Oculus and Apple Vision Pro are the Siri. It introduces people to the concept of augmented reality. It gets the desire, but they’re too big. They’re too heavy. I actually think it will take that long to solve the hardware problems — battery life — and there’s just a lot of hard physics. If you’re gonna have a pixel in the world, it has to be world-locked, not eye-locked.

I need to look at it from different positions. We don’t know how to do that yet, right? There are physics problems to solve in many ways. But I think 2035 will be the year some aspiring entrepreneur — hint, hint — out there brings it all together, the hardware, the software, the physics to make this a reality and go mainstream.

EL KALIOUBY: What do you think about the AI-native glasses that we’re seeing today? Meta has this collaboration with Ray-Ban and it’s AI-ish. And then there’s a Harvard spin-out, Mirror glasses, and they don’t have a camera, it’s mostly audio. It’s—

CHEYER: —getting there.

EL KALIOUBY: —sort of there.

CHEYER: Yeah, sure. Just like Siri was kind of there.

EL KALIOUBY: Right.

CHEYER: It was there, it was useful. And all of these are useful.

But is it augmented reality? Not really.

But give it about 10 years, 2035.

Copy LinkUsing verbally stated goals to create meaningful life changes

EL KALIOUBY: Okay. So one of the things that I love the most about you is your whole idea of verbally stated goals. I do a lot of reflection and intentionality and goal setting, but I just love how you approach your equivalent of goal setting. So tell us all about your verbally stated goals and how you do that.

CHEYER: This is my greatest secret and as a magician and entrepreneur, I had lots of secrets. Siri means secret in Swahili.

EL KALIOUBY: Ah.

CHEYER: So there’s a little factoid. But this is my greatest secret, not only for business, but for life. It starts with this idea that came to me — it’s not rocket science, but life is precious.

And the time we have spent on this planet — I don’t know what comes after — is a gift of the ultimate value. So you might ask, well, how do I live a good life? How do I not waste this gift? Well, the first is what is the meaning of life? My view — it’s up to you to decide.

You get to define what’s meaningful in your life. And you’re not going to be the same person at age 20 as at age 40, or married with a kid versus not. You will be a different person on this journey. So what I do is I say, life is like a book. Your job is to co-author the best story, and every good story, every good book has chapters. And there will be times when you know you’re coming to a chapter change — maybe you get fired or your company closes, maybe you graduate from university and you’re like, now what do I do? When that happens, go back to rule one. The time we have here is precious. You are obligated — if you are in a place where you’re not satisfied and fulfilled — to make a change. So then the question is, well, what do I do? I’m at a chapter change. What next? And how do you do that? So what I do is, at that time, you will be a unique person that you’ve never been before.

You focus on what are the core emotions that I’m feeling, and you let that truth boil into your chest until you feel it. And then you take all of that emotion and you turn it into words. A mission statement — what I call a verbally stated goal. Every word has to be meaningful, it has to be compact and fully capture what is at the core essence of your emotion.

EL KALIOUBY: Huh?

CHEYER: But once you have that mission statement, you then go out and you tell everyone you meet — this is what I’m going to do next. Even if you have no idea how to do it.

EL KALIOUBY: How? Okay.

CHEYER: No idea how you’re gonna do it. You say, I’m gonna do this. You manifest it. You put it out into the world. And that does two things. It commits you and then people start to help you.

One of my verbally stated goals was, can I fall in love?

I had had plenty of girlfriends that checked every box — pretty, right age, right religion, right education, whatever. But I didn’t feel like this is someone I could love forever. And I thought, I don’t know if I can feel that. I don’t know if anyone can feel it. Is it a hoax? Is love a hoax? But it became important to me to know.

I made it a verbally committed goal that I wanted to explore this, and it led me in directions that I would never have gone otherwise — to my soulmate, my wife who I will love forever. Wrong age, wrong religion, wrong education background, wrong everything.

But we have chemistry and compatibility. Her value system — we’re very aligned. And she’s smart and she’s beautiful, and we have chemistry. I’m crazy about her. Those were the two things that mattered. None of the other things did. So it led me in directions I couldn’t have gone, ’cause I knew I should explore this piece.

EL KALIOUBY: Do you have one for the new year or are you still working on one?

CHEYER: I was VP of AI experience at Airbnb after selling my last company. I quietly stepped away six months ago. Took a break that I was needing, traveled and did some bucket list items and I’m now just starting to think about my next chapter. So I’m thinking broad and I’m not making any fast decisions.

EL KALIOUBY: That is amazing.

CHEYER: Thank you.

EL KALIOUBY: Thank you for joining us. This was so awesome and fun.

CHEYER: Yeah, I enjoyed it. Thank you for having me. This was great.

EL KALIOUBY: I loved talking with Adam. And I have a feeling we’ll have another heartfelt check-in soon. But the Siri story isn’t over. One of the iconic aspects of this platform is indeed … the voice.

SUSAN BENNETT: I was created by Adam Cheyer, Tom Gruber and Doug Kilos, who first developed the technology through the Stanford Research Institute.

EL KALIOUBY: Siri – and the human voice who brought it to life – Susan Bennett. That’s after a short break.

[AD BREAK]

Copy LinkWhat Siri’s human voice reveals about creative labor in AI

EL KALIOUBY: Hey Siri, where do you come from?

BENNETT: I was designed by Apple in California.

EL KALIOUBY: And who gave you your voice?

BENNETT: Voice is modeled after voice actress Susan Bennett.

EL KALIOUBY: And what is Susan Bennett up to now?

BENNETT: Well, she’s on Pioneers of AI, pretending to be Siri as a fun way into the conversation.

EL KALIOUBY: Well, in case that wasn’t clear, I am in fact speaking with the human behind the ubiquitous Apple voice assistant Siri, Susan Bennett. Welcome to the show.

BENNETT: Thank you.

EL KALIOUBY: So you may not have heard her real name or seen her face, but her voice has given rise to the voice assistant revolution. So, Susan — you’re a voice actor. How did you even get this assignment? Walk us through it — transport us back to this time.

BENNETT: Well, the thing is that I had no idea that I was Siri, and I had no idea that I was recording something that would become Siri, because it was so many years ago.

Close to 20 years ago I recorded some scripts. I was working for a company that I still work for, a messaging company located here in Atlanta. And they said, we’ve got this project for you. It’s gonna last about four months. And so I spent four months reading sentences — and I’ll have to read you some of these sentences, you won’t believe it. They were created to get all of the sound combinations in the language. So there were a lot of really ridiculous things to read. Such as: “Say the shredding again. Say the media ding again. Say the zloty ding again. Say the word initiate ding again.” You get the picture. I think I lost some gray matter during those years.

After the recordings were done, they were gonna go in and just extract diphthongs and syllables and words and put it all together. So everything had to be consistent. And it was a bit challenging actually.

EL KALIOUBY: So then when did you realize that your voice made it into this technology that millions and millions of people use every day?

BENNETT: October 4th, 2011, to be specific. A fellow voiceover person emailed me — some guy that I don’t really know, somewhere in Delaware or something. And he emailed me and he said, have you heard this Siri voice? I think this is you. And I thought, what? No. And so I went onto the Apple site and listened and I went, oh my God.

EL KALIOUBY: What is the business model for voice actors, and once you realized you were the voice of Siri, how did that change your expectations?

BENNETT: Oh, well, I was quite concerned that I was gonna lose a lot of work because of being the voice of Siri, because it’s so ubiquitous. People say, oh, that sounds like Siri. Now let’s not use her. So it was a kind of a double-edged sword being the voice of Siri. When it first happened — and I actually owned up to it in 2013 — it actually opened up a whole new stage for me. It was really fun. I did a lot of traveling doing speaker events, telling people all about Siri and the voiceover business and the kind of recordings I had to do that became Siri.

EL KALIOUBY: So Susan, last year there was a lot of discussion around OpenAI releasing a voice that many felt sounded like Scarlett Johansson’s voice. She of course was the AI software in the movie Her. And she ended up suing OpenAI for that. Is that something you had come across, and what are your thoughts on that?

BENNETT: Well, I think the situation is different for her because she’s a celebrity. And I think that for me, the type of work that I have done mostly is kind of under the radar. No one knows who I am. And I guess it never really occurred to me that I could sue someone about that, because I agreed to do the recordings and I was paid well for the recordings. I just wasn’t paid for the usage.

EL KALIOUBY: I think that’s a great way to capture it actually. There’s the compensation for the time you spend recording it, but there ought to be a business model that compensates you as this thing grows.

BENNETT: Oh, absolutely. I agree, because it’s so ubiquitous. When you think of the millions of people who have heard my voice, it’s just weird. And I think that it may have affected my ability to get other voiceover jobs.

EL KALIOUBY: These are the types of conversations that I don’t think are really top of mind, unfortunately, in the AI world.

BENNETT: The AI world, I don’t think they even question the humans that are involved in it.

EL KALIOUBY: Yeah, I hope that this is a call to action to build more thoughtful business models and to compensate people fairly.

BENNETT: Yeah, that’s the biggest thing. If someone does a job, pay them. I mean, come on. What’s complicated about that?

EL KALIOUBY: I’m so grateful for Susan’s perspective on this – it’s a great window into how technology changes creative work, and how compensation can and should reflect that. But I had one last question for her. I am curious — do you use Siri?

BENNETT: No.

EL KALIOUBY: You don’t use Siri? I mean, that must be like a cognitive dissonance. It’s like.

BENNETT: Every time I’m talking on the phone, it’s like I’m talking to myself. It’s just too weird.

EL KALIOUBY: I learned so much talking to both Susan and Adam. Siri really did pave the way for tools like ChatGPT, Claude … and whatever next AI agent comes down the pipeline. It set a high bar for AI tools that feel natural to interact with and can act on our behalf.

Plus I love Adam’s verbally stated goals. It’s his playbook to making meaningful shifts in your life. He connects with his emotions, and this process leads him to a clear mission statement. But then the true magic is telling everyone and anyone he knows about those intentions. Inevitably, it brings the right people into his orbit.

Take this as a call to action. Don’t feel stuck for too long. What is one big goal you’re working on? Tell the world – including us! Find me on LinkedIn, I’d love to hear from you … and maybe I can help.

Episode Takeaways

Adam Cheyer opens with an unlikely throughline: his love of magic, which he says mirrors entrepreneurship by starting with an impossible vision and then engineering it into reality.
Tracing Siri back to 1993, Adam Cheyer says his original dream was not just a chatbot but a true assistant that could understand requests, connect services, and get things done.
Cheyer argues today’s AI boom is still missing the essentials he cares about most: a richer interface, real transactional capability, and a business model providers will actually support.
On building companies, he shares a deliberately unglamorous playbook: keep valuations modest, avoid boxing yourself in with early revenue, and optimize for a realistic exit over hype.
Looking ahead, Adam Cheyer predicts augmented reality will be the next great platform shift by 2035, and says his bigger life lesson is to manifest change through ‘verbally stated goals.’
In the closing coda, Siri voice actor Susan Bennett recounts how she unknowingly recorded the assistant’s voice years earlier, then reflects on the human labor and compensation AI still too often overlooks.