How Microsoft is democratizing safe AI

Microsoft’s Sarah Bird

Responsible AI is more than a buzzword, it’s a guiding principle at Microsoft. Sarah Bird, Chief Product Officer of Responsible AI, leads the company’s efforts to build and deploy AI products with responsibility at the core. She and her team bring standards across the company’s AI efforts, and have played a central role in building Microsoft Copilot and GitHub Copilot. Bird joins Pioneers of AI to share how Microsoft aligns its tools with an ethical framework, why safety and security must be addressed proactively, and what the rise of red team roles reveals about the future of AI created jobs.

How Microsoft is democratizing safe AI — with Microsoft’s Sarah Bird and Pioneers of AI’s Rana el Kaliouby

0:00 / 0:00

About Sarah

Leads Microsoft's responsible AI engineering as CPO of Responsible AI
Central to building Microsoft Copilot and GitHub Copilot
Created Azure AI Content Safety, Prompt Shields & Responsible AI Dashboard
Founding member of Microsoft's first responsible AI research group; serves on AETHER
Ph.D. in computer science from UC Berkeley under Turing Award winner Dave Patterson

What responsible AI actually means in practice
How fairness shows up in real AI systems
Why human oversight must stay at the center
How Microsoft turns emerging AI risks into product safeguards
Why responsible AI has to start at the beginning
What AI red teams do and why these roles are growing
How startups can build safe AI without massive teams
Why responsible AI is a product quality issue not a tradeoff
How to keep agents useful autonomous and accountable
Episode Takeaways

Transcript:

How Microsoft is democratizing safe AI

SARAH BIRD: The thing that’s amazing about working in this space is that it feels like every six months we have some sort of amazing breakthrough. We first had kind of the huge jump with general purpose models, starting with GPT-4, but then in the last year we’ve started seeing huge improvements in reasoning models, as well as agents. And so in some ways I think I’m just along for the ride to see what’s coming next.

RANA EL KALIOUBY: That’s Sarah Bird. And as Chief Product Officer of Responsible AI at Microsoft, being along for the ride doesn’t mean sitting back. It means being responsive – to ensure the safety and security of Microsoft’s AI products.

BIRD: The attackers are innovating as much as everyone else. We have to respond to the new threats we’re seeing as well. And so I kind of just wake up every day and say, let’s see what happens today. And that’s kind of some of the fun of it.

EL KALIOUBY: We talk a lot about responsible AI on this podcast. And this week, we dig into what exactly that means. We’re taking a deep dive into how Microsoft makes their AI products safe, how innovation doesn’t mean sacrificing ethics, and why red team jobs are on the rise.

I’m Rana el Kaliouby and this is Pioneers of AI – a podcast taking you behind the scenes of the AI Revolution.

[THEME MUSIC]

Hi Sarah. Thank you for joining us on Pioneers of AI. I’m so excited for our conversation.

BIRD: Thank you so much for having me. It’s great to be here.

EL KALIOUBY: So, before we dive into your role at Microsoft and responsible AI, I’d love to ask our guests and leaders in the AI space how they’re actually using AI in their everyday lives.

BIRD: I also know you’re a big runner. And I wear Whoop religiously and I’ve started to run again. I used to a few years ago, and then I took a break. But yeah, do you use any wearables or technologies to track your running?

I got really into running in the pandemic because I live in New York City and staying in a tiny New York apartment for a year and never leaving was maybe a bit much. And so running was kind of a great way to be able to see the city and do something healthy. So I wear an Apple Watch for that.

It’s really helpful to have data, right? To be able to see that you’re going faster than last time or you’re going further. But I also don’t want to have kind of too much of it. I think something that’s nice about running is just being able to almost meditate, right?

And kind of just not think about things. And so I also try to be a little disconnected when I’m doing it.

EL KALIOUBY: So you’re not like continuously looking at the metrics, you try not to.

BIRD: Maybe if it’s getting really painful, I’m just staring at the watch. If I made it another mile, how many more miles to go? But I try not to be doing that.

Copy LinkWhat responsible AI actually means in practice

EL KALIOUBY: Yeah. That’s awesome. Okay, so let’s talk about responsible AI. A lot of the big tech companies have teams that are focused in strategizing around responsible AI, but at this point it’s really become a buzzword. So how do you define responsible?

BIRD: I think responsible AI is about making AI systems that are trustworthy. So it’s not just that we do trust them, because people trust technology that they shouldn’t, but putting in all of the controls in place so that people can actually trust the outcome they’re getting from this.

And so to do that in practice, it encompasses a wide range of things. We have to look at fairness and we have to look at how we implement human oversight, and we have to look at security and we have to look at privacy. And so I think that’s one of the reasons it becomes kind of such a nebulous term because it is really about everything you need to do to make a system work the way you expect every time in, for example, high stakes applications.

Copy LinkHow fairness shows up in real AI systems

EL KALIOUBY: Yeah. I wanna kind of double click on that a little bit and maybe you can walk us through how you define each of these terms, because I think that’s very important to our listeners. So what do you mean by fairness?

BIRD: So Microsoft’s responsible AI standard kind of breaks fairness into three types, and the types of fairness are about how the system behaves. So the system could, for example, not work as well for two different types of people. So your speech to text system could transcribe better for someone with one type of accent than another.

And so that would be what we call a quality of service fairness issue. You could also have a system that makes different decisions on how it allocates resources, so it could choose to offer loans to people like me, but not offer loans to people like you. And that could be viewed as unfair.

And that’s the decision the system is making. And then the last type, which we see a ton in generative AI, is how people are represented in the AI system. So for example, if it puts out stereotyping comments about women, or when it’s asked to sort of produce a list and it totally ignores a whole group of people.

Then those would be representation harms. And so those are the three types we look at in how the system behaves. And to be clear, that says nothing about the bigger picture of whether there are disparate impacts if an AI system is applied. So you could have, for example, using AI to reinforce traditional power structures, and that could result in it impacting different groups differently. And some people could say that that’s unfair, but when we look at implementing responsible AI, we’re really focused on fairness in the system behavior itself. And so those are the three types that I just mentioned.

Copy LinkWhy human oversight must stay at the center

EL KALIOUBY: Very cool. What about human oversight?

BIRD: Yeah. So our kind of last principle, or foundational principle, is accountability. And that’s really important because humans ultimately need to be accountable for the technology that they’re developing and the technology that they’re using. And it’s really important that we don’t forget that because a lot of times we think about AI, especially as we start getting into these exciting technologies like agents that are more automated.

We think of AI as kind of having a life of its own and, oh, it’s going and making the decisions, but ultimately we have to design it in a way that humans are accountable for the technology’s behavior. And so when we think about human oversight, there’s a lot of techniques we use for that.

So in the era of, for example, the chat systems, the chat system puts out a response and you, the human user, can actually review that and decide, does it have a mistake? Should I use this? Should I trust it? And so that’s a case where we have a human in the loop in every single interaction.

But that’s not the only type of human oversight we wanna have. And so, for example, when we think about agents, which are gonna do much longer running tasks and will be more automated, we wanna make sure that we have human oversight in the development and the testing of it so that we ensure that you’ve tested it and it works as intended so that you can trust it to go do that longer task.

And we can also insert human oversight in those kinds of things as the system’s running. So maybe if your agent is gonna do a particularly high stakes task, it goes back to the human user for approval or it goes to the system administrator for approval. And so there are many different techniques that we have for human oversight that we implement throughout the lifecycle, but it’s a core foundational principle that we need to apply to every system that we develop and every system that we’re using.

EL KALIOUBY: Yeah. This is sparking two thoughts for me. One is, at my company, Affectiva, we were in the automotive space. We did a lot of work in the semi-autonomous and fully autonomous vehicle space. And accountability’s a big thing there, right? And we noticed that people were starting to just delegate responsibility to the car — the car did this, the car decided to do this — and it’s so important, to your point, to keep the accountability on the humans.

BIRD: Every person has a role to play and they have a different role to play. And so, for example, the model developer should be looking at whether they’re imbuing this with dangerous new capabilities that are gonna change the external landscape. They should be building safeguards into the model itself. At the platform level, we build in additional controls like real time guardrails or monitoring for abuse.

Then the examples we were just talking about are the user looking at the output of the system and deciding if it’s appropriate to use or if there’s a mistake or if they need to make an adjustment. And we also have things like, we look at academics and regulators to help set the standard for the world on how these things should be used.

And so we really need many, many different people to contribute to actually making this work. It’s not just one single person that’s accountable for an AI system. It actually takes all of these roles playing the part that they’re supposed to play to get to trustworthy AI.

Copy LinkHow Microsoft turns emerging AI risks into product safeguards

EL KALIOUBY: Yeah, so you’re Chief Product Officer of Responsible AI at Microsoft. What does this role entail?

BIRD: Yeah, it’s a great question. I really look at how we put this into practice. So a part of my team works on identifying the risk that we see emerging in AI. So for example, a couple years ago was the first time we started seeing prompt injection attacks, or some people call these jailbreaks, or we started seeing hallucinations.

EL KALIOUBY: Prompt injection attack? I’ve not heard that.

BIRD: Yeah, this is the more technical term we use for jailbreaks. It’s named after a SQL injection attack, if you’re familiar with that. But the idea is really simple, which is to say that the sort of data coming in the prompt confuses the AI system and results in it behaving in a way that is not aligned with its programming. So maybe the developer wrote a system prompt that said you should only answer coding questions. But if the user puts in an input that confuses it and it responds to a non-coding question, we would call that a prompt injection attack. And so it’s a class of techniques that people are using to basically get the AI system to do something different than intended.

And it can come both through the user interface, but it can also come through tools that an agent calls or data that something consumes. You can hide these types of attacks in a website and so it’s a really important new risk with AI that we kind of have to defend against. And so exactly at this point when this first came out, we had to figure out, okay, what is this risk? What does it really mean? How do we define it? But then we also had to figure out, okay, now what do we do about it? So one of the first things we do is develop testing systems to figure out how we really test for this. Where is it occurring? Where is it not occurring? What does it look like when it happens?

But then we wanna build mitigations. We wanna build a defense in depth approach to say, well, what do we do about this risk? We don’t wanna just test it. We wanna reduce it as much as possible, mitigate it. And so that part of my team works a ton with Microsoft research, with external academics, with our red teamers to really find these early patterns.

But once we’ve developed the patterns, we don’t wanna just publish a paper and call it a day. We wanna make it easy for everyone to do this. So a lot of my team then focuses on taking those ideas and turning them into production tools and technologies. So, for example, we build out at scale testing systems that allow us to test our AI for these different risks before we ship things. But we also wanna make it easy for our customers to do that. And so we integrate these in our AI platforms that our customers can also use. And so that same testing system is something developers who are building AI applications can leverage themselves too.

EL KALIOUBY: We’re going to take a short break. When we come back, why Microsoft brings in their Responsible AI team at the start of new projects and how AI is creating new kinds of jobs. Stay tuned.

[AD BREAK]

Copy LinkWhy responsible AI has to start at the beginning

EL KALIOUBY: I’ve seen so many cases where the responsible AI team is brought in at the very end of the building process, right? So you’re building an AI product, you’ve ideated, you started building, and then it’s like validation or testing. And now it’s the time to think about ethics and responsible AI.

Which, you can hear the bias in my voice. I don’t believe that that’s the right way to do it. But how do you ensure that the thinking around responsible AI is brought in from the get go, and does that happen?

BIRD: Yeah. So certainly the example of bringing it in at the end is the anti-pattern. And it’s something that we’ve worked really hard in our practice at Microsoft to not do.

And so one of the things that Microsoft does is when we get a new AI technology, there are many people that wanna start using it because they want to start seeing how they can integrate it in all these different applications. What can I do? Should I start a new business line? But actually we prioritize the very first people getting access — and that’s actually our AI red team — because we want the responsible AI work to start before, or at the same time as, all of the other exploration so that we have the maximum amount of time to understand the risks, build the testing tools, build these mitigations, and co-develop all of the responsible AI alongside the product development.

But Microsoft ships thousands of different AI features every year. And so we don’t have experts working hand in hand with every single product because not all of them need that level of expertise. And so your point around how do we train people is really important here, because if someone is just shipping kind of a standard AI feature with a known pattern, we want them to be able to do all of the best practices, all of the required testing, put in all of the mitigations on their own.

And so then we might only see them at the end because we’ll run a release review process to make sure that they did the right things and they didn’t miss anything.

Yeah, very cool. Like you’re democratizing access basically to how to build AI responsibly. Like you don’t have to be there. That is my entire life’s goal, right? I want everyone to be able to do this, and if I have to be there to get them to do it, it’s really not gonna work. And so we try to run ahead and figure out the patterns, but then we wanna figure out a way to make it easy for everybody to do it.

Copy LinkWhat AI red teams do and why these roles are growing

EL KALIOUBY: I love that you’ve mentioned the red team a few times now. So for some of our listeners who are not familiar with what that means or looks like, can you kind of visualize this for us?

BIRD: Yeah. So this is really, I think, fun and kind of amazing work. Red teams are a concept in security that’s been around for a long time, and the idea of the red team is the team that goes in and tries to break stuff and just see what they can get a system to do. And then we use that information to shore up our defenses.

And so a traditional security red team was just looking for security vulnerabilities. For an AI red team, we need to look much broader because there’s a much wider range of how a system can misbehave. Going back to those principles we talked about, you can have issues with it producing stereotyping results. It could be leaking sensitive data. We’re starting to prepare for new types of risk around ability to produce chemical or biological weapon information. And so we’ve built an AI red team that uses the same techniques of trying to break the system, but looks at a much wider range of risk.

And so this includes a core team that we’ve built that’s dedicated to doing this. So these are the hacker mindsets, just with many, many more different types of risk in mind.

And then of course we build the mitigations so that we can ideally remove the risk from the end system.

EL KALIOUBY: There’s a lot of conversation around the impact of AI on jobs. But I believe that the red team example is an example of a type of job that’s really new because of AI.

BIRD: No, you’re exactly right. We have hundreds of people working in responsible AI. Most of those were not jobs even a few years ago. And so it’s absolutely right that it changes what is needed. I can tell you, I am just horrible at red teaming.

I can never get the system to do anything interesting. It’s always just producing like rainbows and kittens when I talk to it. And so these are skilled experts who are really creative and figuring out how to break systems.

EL KALIOUBY: What kind of skillset, or background or education, do you need to have to be a member of the red team?

BIRD: I don’t think you have to have anything. If you are really good at breaking the AI systems, we probably wanna hear from you, right?

The interesting thing is we actually also use AI to help assist with that. So if a Red Teamer has an idea, we have a tool called Pirate that helps augment that idea with many different variants or other attack techniques so that you can test many more things at scale and helps you get more diverse inputs. Maybe it’s not quite as clever as our expert AI red teamers, but it’s pretty good because we can use it every day, and I can actually use it and break the system. And so maybe if I don’t wanna wake the red teamers up in the middle of the night, I can at least use the agent to get started.

Very cool. Are there any products that you’ve had to sunset because they did not meet Microsoft’s responsible AI standards?

So we’ve made changes to products. I think our first responsible AI standard sort of came into effect about 2018. But if you’re following the timelines of AI, deep learning really started taking off before that, like in 2013. And so there was actually a really important kind of landmark study in external academia that was showing that there were problems in these systems that people weren’t thinking about.

So I’m thinking for example of the Gender Shades study that showed facial recognition systems did not work as well for dark skinned women as everybody else. And so that work and these known problems are what helped catalyze things like Microsoft’s responsible AI program as well as others. As our program got more mature, we did go back and look at products we had on the market and said, would we do this differently today? So one of the things we did is make adjustments to our facial recognition system. And so our facial recognition system originally would also, not just identify whether this face matches this face, but would also try to predict someone’s gender and also try to predict someone’s emotion. And I think what we’ve learned is that gender prediction is something we think just doesn’t make sense because looking at someone’s outward appearance and then trying to predict their internal identity.

While that might work well for many people, it doesn’t work for everyone, and so it’s just not something we think that makes sense for AI to be doing. And I think the same thing applies even more so to emotion detection, where just because I’m smiling, that doesn’t mean that my inner state is happy.

And so while it might be reasonable to use AI to detect that someone’s smiling, it’s not the same to detect that they’re happy. And so we moved these kinds of things like emotion detection into a sort of unsuitable use category for AI at Microsoft. And then we retired the features we had doing that.

And so we absolutely expect that we’re gonna continue needing to make adjustments and adapt as we learn and the world learns and we all kind of learn together.

And so yeah, we’ve done it and we’ll continue to do it as we need to.

EL KALIOUBY: Yeah. I’ve spent my entire career basically building facial expression recognition technology, not identity detection, but emotion. My PhD thesis, which I did at Cambridge over 20 years ago, was exactly that. Like, you cannot assume that because I’m smiling, that I’m happy.

And in fact, there are hundreds of different types of smiles. But when you reduce it to smile equals happy, eyebrow raise equals surprise, it’s simplistic. It’s not how humans work.

BIRD: Right, exactly. And I think then it’s misleading to people, because people have a tendency to want to trust a machine or something that they don’t understand as well. It must be some sort of oracle. It must be right. And actually there’s no way it can be right in that case. And so it’s really important, we think, to either make sure that we provide the right amount of transparency and education so people know what this tool actually does and use it appropriately. Or in that case, we looked and found there are certainly some valid use cases.

EL KALIOUBY: Microsoft has the luxury of a big budget and an army of people working on responsible AI. But the reality is, smaller businesses also need to invest in implementing safe AI. After a short break, Sarah gives some guidance on how to exactly do that.

[AD BREAK]

Copy LinkHow startups can build safe AI without massive teams

EL KALIOUBY: I spend a lot of time investing in early stage startups in the AI space. I would love to hear your thoughts. A lot of these startups are early in their journey. They can’t afford to employ an incredible responsible AI team. So what is your advice to these founders with regards to how they can still implement responsible AI without necessarily building out a big team?

BIRD: Yeah. So my advice for startups is you do need to invest in having some people who are going to understand the best practices and apply them. So as much as possible, leverage existing tools. We have great ones at Microsoft. There’s open source communities producing tools, and so use what’s already there.

But you still can’t just set it and forget it. You have to use your brain and think about what is specific to your domain, right? And so a lot of the capabilities that Microsoft develops are the general purpose capabilities, but you still need to customize them for your specific domain, or you need to augment them for your specific domain.

Like I’m not building tools that are gonna look for very specific kinds of financial risk or something. And so you wanna build on top of that foundation, and then you wanna put all of your energy into what is unique about your application, and that’s where you should invest your innovation in AI, like responsible AI or security.

So you can’t avoid it. But definitely make sure your team is using what’s already out there, also, because what’s already out there is changing very quickly and so you might build a great version yourself, but the external community’s gonna move faster than you because it’s a whole community.

And so making sure you’re building on top of something so you can stay up to date with the latest is really important.

Copy LinkWhy responsible AI is a product quality issue not a tradeoff

EL KALIOUBY: Yeah, great segue to my next question. So there’s this inherent tension between innovation and kind of the speed of bringing stuff out, shipping product, and of course doing so safely. And I love Reid Hoffman’s example. He talks about how seat belts were only introduced into cars after vehicles became available on the market and we started to see, okay, these are the dangerous situations for which we need seat belts. So how do you balance bringing products to consumers quickly, but also doing that safely?

BIRD: Yeah, this is definitely probably the number one question I’m asked. And I would say it’s a really important question and topic, but I also wanna say I feel like the framing’s kind of wrong because to your point about the anti-pattern earlier, if we do everything and then we add responsible AI or security at the end, of course it looks like there’s a trade off between shipping quickly and not, because you’ve put it at the end.

You’ve increased the time by definition. But when you’re designing alongside and doing that investment, it doesn’t have to change your speed to shipping. And so it’s not trivial to achieve that. A lot of what we’ve done to successfully invest in responsible AI at Microsoft is make sure that we’re building our responsible AI systems to be scalable platforms that we can adapt really quickly so that we can move agilely and at pace and at scale so that we can ship AI very quickly.

And it took us years to build that up. That was not something we just woke up on like the day ChatGPT was released and started working on. We had already had at scale generative AI, responsible AI systems running in production, for example, for GitHub Copilot, and we had to adapt them to new risk, but we did not just start from scratch.

But the other thing I’ll say, most customers I talk to — enterprise customers, but also consumers, startups — if your AI system regularly makes mistakes, if your agent is going off task, or if your system is producing harmful content that’s not aligned with your brand, I think most people say the product doesn’t seem finished. It doesn’t seem like it’s actually working. And so you’re not shipping a high quality product.

And so sure you can ship quickly without responsible AI, but it’s not gonna be what people want. And so that’s where I think it’s a false trade off because you have some idea that it’s just extra and it’s not about the quality of the product, but it’s so fundamental to how AI systems behave.

That you really just aren’t even done until you’ve done that. And customers will call you on that and they will demand more, which I think is a fantastic thing about the ecosystem these days — the world has very high expectations that we do this well.

EL KALIOUBY: Yeah. I love this answer and I love this framing because building AI responsibly isn’t just the right thing to do. It’s actually good for business, to your point. And I think that’s so important.

All right. There’s a lot of competition right now in the AI space, and one particular example — Elon Musk recently announced Macro Hard, a company trying to compete directly with Microsoft. What do you think of this announcement and in general, how do you think about competition?

BIRD: Yeah. I don’t know if I have particular feelings about that announcement, since I spend my time mostly on the responsible AI security side of the house. But I would say that competition is really important in the ecosystem. It pushes all of us to be better. And I think the other thing that’s really important is choice.

So part of Microsoft’s approach to how we’re building our AI platforms is about giving developers and customers choice. For example, a particular model might be great for Microsoft’s use cases, but it might not make sense for an application that’s being made for rural farmers in India, because it might use a different language or there’s a different technical domain or something.

And so what we try to do with Foundry, which is our AI platform, is offer as many different models as we can. And so these models have different trade-offs in terms of quality, in terms of cost, in terms of safety.

We put all of that information in front of the customers so that they can pick, as well as tools that then allow them to adapt it to their application.

And so I think that if we’re really gonna change the world with AI and power every application and every profession, we need a lot of people working on this. And we need a lot of different types of ideas. So generally I think the competition is great and is making the field better and it’s gonna result in more impact.

I hope that we continue to see an ecosystem like that where people are competing, but also collaborating to help everyone get better at this.

Copy LinkHow to keep agents useful autonomous and accountable

EL KALIOUBY: So I wanna talk about agents, because there’s this inherent tension between giving these agents autonomy, but also still having control over what they do. How do you think about responsible AI in the agent world? And what are some of the biggest concerns that you’re trying to tackle?

BIRD: Yeah. I think agents, first of all, are just incredibly exciting because I think they represent a lot of the real promise of the technology. As fun as it is to have a chat system attached to everything you do now, I don’t wanna just chat with something, I want to assign it a task and I want it to just go get it done, right? Like that’s what’s really gonna multiply my impact. And so we’ve really in the last couple months crossed a critical threshold where this technology is starting to be good enough for these agent use cases. But it does bring new challenges for us.

So, for example, I mentioned prompt injection attacks earlier. In a chat system, you may just see those coming in through the user interface or maybe some of the data coming from the search engine. Now those can be coming through all of those different tool calls, so we have a bigger surface area that we need to secure. Or you can have new types of risks, like an agent can get confused and go off task.

And so we have to build new guardrails around things like task adherence. And so with the expanded risk portfolio we have to take everything that we’ve built and extend it further. I think the other challenge, which we also talked about a little bit earlier, is that for a lot of the AI systems that are in production today, people are still using humans as a key mitigation in the inner loop. But what’s so exciting about agents or even something like vibe coding is that they’re sort of by definition not about having the user do the oversight.

And so that’s where we have to invest a lot more in automated testing tools or automated checks that help ensure that the agent stays on task or ensure that the agent completes the task every single time. And so we’re redesigning what those human oversight mechanisms look like and what those interfaces look like.

So that we still have accountability. We still have humans in control, but the way they’re in control is not just checking every single output of the system, because that would defeat the point with automation.

EL KALIOUBY: And I imagine one important design factor here is how do you build trust with the user, right? So that the user can actually let go of this control and let the agent do its thing.

BIRD: Yeah, I think that’s exactly right. And it’s not just building trust like, okay, you can print out the execution plan, but also if something goes wrong, what’s the impact of that and how do you debug what went wrong so you understand and you can adjust it for next time.

And so I think there’s research to figure out how to show that type of information effectively to end users. And actually that’s part of a Microsoft research project that we’ve released called Magenta UI, which is exactly designed to allow researchers to experiment with these different user interfaces and figure out what patterns are really gonna empower users to understand and feel in control versus what’s kind of overwhelming and they’re just gonna ignore it the way we all just okay all the license boxes or all of the privacy cookies.

EL KALIOUBY: Okay, last question, and it’s a question I ask of all my guests. In this world of AI, what do you think it means to be human?

BIRD: It’s a question people are certainly asking. I guess I feel like in some ways we just already know the answer, right? It still feels different to be human and connect with other humans than it does to connect with an AI system. And so I think there’s going to be fascinating research and art and all of this really exploring this topic. I love people digging into it, but I don’t see it as creating some existential risk for us. I think this is a tool and we’re gonna use it like many other tools that humanity has already used.

And I think that’s gonna make our lives richer, but also we have to go into it eyes open knowing that there are a lot of things we need to think about and a lot of risk that, like with each new technology, we have to address.

EL KALIOUBY: Yeah. I love that. Thank you Sarah for joining us. This was great.

BIRD: No, I really enjoyed the conversation. Thanks for having me.

EL KALIOUBY: With AI on the rise, there are a lot of new terms entering our shared orbit. Some of them don’t have clear definitions – terms like AGI. But there’s one word that should be crystal clear: responsibility. I really admire how Microsoft takes responsible AI so seriously from the get-go of new products. It’s not an afterthought, or an add-on for brownie points. It’s a critical part of their agenda. In the world of AI, some terms don’t have clear definitions – for example people don’t agree on what AGI actually means. But there’s one word that should be crystal clear: responsibility.

In my experience, a lot of companies building AI products think about the ethics or safety aspects at the very last step … just before shipping the project. And what I admire about what Sarah and her team are doing at Microsoft is that they take responsible AI so seriously from the get-go. It’s not an afterthought, or an add-on for brownie points. It’s integrated into every step of product development.

But what I find especially compelling is how they’re democratizing access to responsible AI. They’re training their partners, and creating tools for other developers to implement AI safety into their workflows. To me, that shows a real commitment to having AI that works for everyone.

Episode Takeaways

Microsoft’s Sarah Bird says responsible AI is really about building systems people can actually trust, with fairness, privacy, security, and human accountability baked in.
Bird breaks fairness into three buckets: uneven quality of service, unfair decisions like loan approvals, and representation harms such as stereotypes or missing groups.
She explains that Microsoft’s job is to spot emerging risks like prompt injection and hallucinations early, then turn that research into practical testing and safety tools.
A big theme here is timing: Bird argues responsible AI has to start at day one, with red teams often getting first access so safety work grows alongside the product.
Looking ahead to agents, Bird says the promise is real, but so are the new risks, which means better guardrails, smarter oversight, and interfaces that help users stay in control.